GitHub - ChristianBako/llm-anonymizer

LLM Anonymizer

A CLI tool to anonymize code using a local LLM before sending to Claude Code or other AI services.

Prerequisites

Install Ollama

macOS/Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai
Start Ollama service:

Install a model (in a new terminal):

# Install Llama 3.2 (recommended)
ollama pull llama3.2

# Or install other models
ollama pull codellama
ollama pull llama3.1

Installation

Quick Start (Recommended)

Install UV (if not already installed):

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

Install LLM Anonymizer:
Verify installation:

That's it! The llm-anon command is now available globally.

Alternative Installation Methods

Development Installation (click to expand)

For development or if you want to modify the code:

Clone this repository:

git clone https://github.com/ChristianBako/LLM-Anonymizer-.git
cd LLM-Anonymizer-

Install dependencies:
Use with uv run:
```
uv run python -m llm_anon.cli --help
```

Usage

Basic Usage

# Anonymize a single file and print to stdout
llm-anon example.py

# Anonymize and save to file
llm-anon example.py -o anonymized.py

# Process entire directory
llm-anon src/ -r -o anonymized_output/

Development Usage (if installed from source)

# Use with uv run for development
uv run python -m llm_anon.cli example.py

Options

-o, --output PATH: Output file or directory
-m, --model TEXT: LLM model to use (default: llama3.2)
-t, --temperature FLOAT: Temperature for generation (default: 0.1)
--preserve-comments: Keep original comments
--preserve-strings: Keep string literals unchanged
-r, --recursive: Process directories recursively
-v, --verbose: Show detailed progress
--validation-config PATH: Path to validation config file with banned strings
--max-retries INTEGER: Maximum retries for validation failures (default: 3)

Examples

# Use different model with higher creativity
llm-anon code.py -m codellama -t 0.3

# Preserve important strings and comments
llm-anon api.py --preserve-strings --preserve-comments

# Process entire project with verbose output
llm-anon ./src -r -v -o ./anonymized

# Use validation to ensure sensitive strings are removed
llm-anon examples/lamasoft_example.py --validation-config examples/banned_strings.txt -v

# Process with custom retry limit for stubborn validations
llm-anon examples/lamasoft_example.py --validation-config examples/banned_strings.txt --max-retries 5

Supported Languages

Python (.py)
JavaScript (.js, .jsx)
TypeScript (.ts, .tsx)
Java (.java)
C/C++ (.c, .cpp, .cc, .cxx, .h, .hpp)
Rust (.rs)
Go (.go)

Validation System

The tool includes a powerful validation system to ensure sensitive information is completely removed from anonymized code.

Creating a Validation Config

Create a text file with banned strings (one per line). Comments start with #:

# Quick example
echo "MyCompany" > banned_strings.txt
echo "secret_api_key" >> banned_strings.txt
echo "internal.company.com" >> banned_strings.txt

# Or use the provided example
cp examples/banned_strings.txt my_company_secrets.txt
# Edit my_company_secrets.txt with your specific terms

Or create a comprehensive config file:

# company_secrets.txt
# Company names and branding
MyCompany
mycompany.com

# API keys and credentials  
API_KEY_12345
secret_password_2024

# Contact information
support@mycompany.com
1-800-MYCOMPANY

# Product-specific terms
MyCompanyCustomer
MyCompanyLicenseManager

Pro tip: Start with your company name, domain, and any API keys or internal URLs. Check out examples/ for test files and sample configurations.

How Validation Works

Initial Anonymization: LLM processes code normally
String Detection: Scans output for banned strings using word boundaries
Re-prompting: If banned strings found, sends explicit removal instructions
Retry Logic: Repeats up to --max-retries times until validation passes
Failure Handling: Reports specific banned strings if validation ultimately fails

Validation Features

Case-sensitive matching by default
Word boundary detection prevents false positives
Progressive prompting gets more explicit with each retry
Detailed error reporting shows exactly which strings were found

How It Works

File Detection: Automatically detects programming language from file extension
LLM Processing: Sends code to local Ollama model with anonymization prompt
Validation (if enabled): Checks output against banned strings and re-prompts if needed
Smart Replacement: Replaces variable names, function names, and identifiers while preserving:
- Code structure and logic
- Control flow
- Data types
- Import statements
- Syntax and formatting

Troubleshooting

Ollama Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Restart Ollama service
pkill ollama && ollama serve

# List available models
ollama list

Model Not Found

# Pull the default model
ollama pull llama3.2

# Or specify a different model
llm-anon code.py -m llama3.1

Performance Tips

Use smaller models for faster processing: llama3.2:1b
Lower temperature (0.1) for more consistent results
Process files individually for large codebases to avoid timeouts