GitHub - ChristianBako/llm-anonymizer

4 min read Original article ↗

LLM Anonymizer

A CLI tool to anonymize code using a local LLM before sending to Claude Code or other AI services.

Prerequisites

Install Ollama

  1. macOS/Linux:

    curl -fsSL https://ollama.ai/install.sh | sh
  2. Windows: Download from ollama.ai

  3. Start Ollama service:

  4. Install a model (in a new terminal):

    # Install Llama 3.2 (recommended)
    ollama pull llama3.2
    
    # Or install other models
    ollama pull codellama
    ollama pull llama3.1

Installation

Quick Start (Recommended)

  1. Install UV (if not already installed):

    # macOS/Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Install LLM Anonymizer:

  3. Verify installation:

That's it! The llm-anon command is now available globally.

Alternative Installation Methods

Development Installation (click to expand)

For development or if you want to modify the code:

  1. Clone this repository:

    git clone https://github.com/ChristianBako/LLM-Anonymizer-.git
    cd LLM-Anonymizer-
  2. Install dependencies:

  3. Use with uv run:

    uv run python -m llm_anon.cli --help

Usage

Basic Usage

# Anonymize a single file and print to stdout
llm-anon example.py

# Anonymize and save to file
llm-anon example.py -o anonymized.py

# Process entire directory
llm-anon src/ -r -o anonymized_output/

Development Usage (if installed from source)

# Use with uv run for development
uv run python -m llm_anon.cli example.py

Options

  • -o, --output PATH: Output file or directory
  • -m, --model TEXT: LLM model to use (default: llama3.2)
  • -t, --temperature FLOAT: Temperature for generation (default: 0.1)
  • --preserve-comments: Keep original comments
  • --preserve-strings: Keep string literals unchanged
  • -r, --recursive: Process directories recursively
  • -v, --verbose: Show detailed progress
  • --validation-config PATH: Path to validation config file with banned strings
  • --max-retries INTEGER: Maximum retries for validation failures (default: 3)

Examples

# Use different model with higher creativity
llm-anon code.py -m codellama -t 0.3

# Preserve important strings and comments
llm-anon api.py --preserve-strings --preserve-comments

# Process entire project with verbose output
llm-anon ./src -r -v -o ./anonymized

# Use validation to ensure sensitive strings are removed
llm-anon examples/lamasoft_example.py --validation-config examples/banned_strings.txt -v

# Process with custom retry limit for stubborn validations
llm-anon examples/lamasoft_example.py --validation-config examples/banned_strings.txt --max-retries 5

Supported Languages

  • Python (.py)
  • JavaScript (.js, .jsx)
  • TypeScript (.ts, .tsx)
  • Java (.java)
  • C/C++ (.c, .cpp, .cc, .cxx, .h, .hpp)
  • Rust (.rs)
  • Go (.go)

Validation System

The tool includes a powerful validation system to ensure sensitive information is completely removed from anonymized code.

Creating a Validation Config

Create a text file with banned strings (one per line). Comments start with #:

# Quick example
echo "MyCompany" > banned_strings.txt
echo "secret_api_key" >> banned_strings.txt
echo "internal.company.com" >> banned_strings.txt

# Or use the provided example
cp examples/banned_strings.txt my_company_secrets.txt
# Edit my_company_secrets.txt with your specific terms

Or create a comprehensive config file:

# company_secrets.txt
# Company names and branding
MyCompany
mycompany.com

# API keys and credentials  
API_KEY_12345
secret_password_2024

# Contact information
support@mycompany.com
1-800-MYCOMPANY

# Product-specific terms
MyCompanyCustomer
MyCompanyLicenseManager

Pro tip: Start with your company name, domain, and any API keys or internal URLs. Check out examples/ for test files and sample configurations.

How Validation Works

  1. Initial Anonymization: LLM processes code normally
  2. String Detection: Scans output for banned strings using word boundaries
  3. Re-prompting: If banned strings found, sends explicit removal instructions
  4. Retry Logic: Repeats up to --max-retries times until validation passes
  5. Failure Handling: Reports specific banned strings if validation ultimately fails

Validation Features

  • Case-sensitive matching by default
  • Word boundary detection prevents false positives
  • Progressive prompting gets more explicit with each retry
  • Detailed error reporting shows exactly which strings were found

How It Works

  1. File Detection: Automatically detects programming language from file extension
  2. LLM Processing: Sends code to local Ollama model with anonymization prompt
  3. Validation (if enabled): Checks output against banned strings and re-prompts if needed
  4. Smart Replacement: Replaces variable names, function names, and identifiers while preserving:
    • Code structure and logic
    • Control flow
    • Data types
    • Import statements
    • Syntax and formatting

Troubleshooting

Ollama Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Restart Ollama service
pkill ollama && ollama serve

# List available models
ollama list

Model Not Found

# Pull the default model
ollama pull llama3.2

# Or specify a different model
llm-anon code.py -m llama3.1

Performance Tips

  • Use smaller models for faster processing: llama3.2:1b
  • Lower temperature (0.1) for more consistent results
  • Process files individually for large codebases to avoid timeouts