LLM Anonymizer
A CLI tool to anonymize code using a local LLM before sending to Claude Code or other AI services.
Prerequisites
Install Ollama
-
macOS/Linux:
curl -fsSL https://ollama.ai/install.sh | sh -
Windows: Download from ollama.ai
-
Start Ollama service:
-
Install a model (in a new terminal):
# Install Llama 3.2 (recommended) ollama pull llama3.2 # Or install other models ollama pull codellama ollama pull llama3.1
Installation
Quick Start (Recommended)
-
Install UV (if not already installed):
# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh
-
Install LLM Anonymizer:
-
Verify installation:
That's it! The llm-anon command is now available globally.
Alternative Installation Methods
Development Installation (click to expand)
For development or if you want to modify the code:
-
Clone this repository:
git clone https://github.com/ChristianBako/LLM-Anonymizer-.git cd LLM-Anonymizer- -
Install dependencies:
-
Use with
uv run:uv run python -m llm_anon.cli --help
Usage
Basic Usage
# Anonymize a single file and print to stdout llm-anon example.py # Anonymize and save to file llm-anon example.py -o anonymized.py # Process entire directory llm-anon src/ -r -o anonymized_output/
Development Usage (if installed from source)
# Use with uv run for development
uv run python -m llm_anon.cli example.pyOptions
-o, --output PATH: Output file or directory-m, --model TEXT: LLM model to use (default: llama3.2)-t, --temperature FLOAT: Temperature for generation (default: 0.1)--preserve-comments: Keep original comments--preserve-strings: Keep string literals unchanged-r, --recursive: Process directories recursively-v, --verbose: Show detailed progress--validation-config PATH: Path to validation config file with banned strings--max-retries INTEGER: Maximum retries for validation failures (default: 3)
Examples
# Use different model with higher creativity llm-anon code.py -m codellama -t 0.3 # Preserve important strings and comments llm-anon api.py --preserve-strings --preserve-comments # Process entire project with verbose output llm-anon ./src -r -v -o ./anonymized # Use validation to ensure sensitive strings are removed llm-anon examples/lamasoft_example.py --validation-config examples/banned_strings.txt -v # Process with custom retry limit for stubborn validations llm-anon examples/lamasoft_example.py --validation-config examples/banned_strings.txt --max-retries 5
Supported Languages
- Python (.py)
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Java (.java)
- C/C++ (.c, .cpp, .cc, .cxx, .h, .hpp)
- Rust (.rs)
- Go (.go)
Validation System
The tool includes a powerful validation system to ensure sensitive information is completely removed from anonymized code.
Creating a Validation Config
Create a text file with banned strings (one per line). Comments start with #:
# Quick example echo "MyCompany" > banned_strings.txt echo "secret_api_key" >> banned_strings.txt echo "internal.company.com" >> banned_strings.txt # Or use the provided example cp examples/banned_strings.txt my_company_secrets.txt # Edit my_company_secrets.txt with your specific terms
Or create a comprehensive config file:
# company_secrets.txt # Company names and branding MyCompany mycompany.com # API keys and credentials API_KEY_12345 secret_password_2024 # Contact information support@mycompany.com 1-800-MYCOMPANY # Product-specific terms MyCompanyCustomer MyCompanyLicenseManager
Pro tip: Start with your company name, domain, and any API keys or internal URLs. Check out examples/ for test files and sample configurations.
How Validation Works
- Initial Anonymization: LLM processes code normally
- String Detection: Scans output for banned strings using word boundaries
- Re-prompting: If banned strings found, sends explicit removal instructions
- Retry Logic: Repeats up to
--max-retriestimes until validation passes - Failure Handling: Reports specific banned strings if validation ultimately fails
Validation Features
- Case-sensitive matching by default
- Word boundary detection prevents false positives
- Progressive prompting gets more explicit with each retry
- Detailed error reporting shows exactly which strings were found
How It Works
- File Detection: Automatically detects programming language from file extension
- LLM Processing: Sends code to local Ollama model with anonymization prompt
- Validation (if enabled): Checks output against banned strings and re-prompts if needed
- Smart Replacement: Replaces variable names, function names, and identifiers while preserving:
- Code structure and logic
- Control flow
- Data types
- Import statements
- Syntax and formatting
Troubleshooting
Ollama Connection Issues
# Check if Ollama is running curl http://localhost:11434/api/tags # Restart Ollama service pkill ollama && ollama serve # List available models ollama list
Model Not Found
# Pull the default model ollama pull llama3.2 # Or specify a different model llm-anon code.py -m llama3.1
Performance Tips
- Use smaller models for faster processing:
llama3.2:1b - Lower temperature (0.1) for more consistent results
- Process files individually for large codebases to avoid timeouts