Hush Engine
Open-source Python engine for redacting PII in images, PDFs, and spreadsheets. Wide-ranging detection, 93% F1 on Kaggle, no cloud round-trip. 266 ms per doc.
Supports
Privacy Filter
Open source under AGPL-3.0. Commercial license available for proprietary use.
Python 3.10+, Microsoft Presidio with custom regex + checksum validators (Luhn, Verhoeff, mod-97, python-stdnum, phonenumbers) and an early-exit NER cascade (LightGBM → NLTagger → name database → optional Flair/Transformers/GLiNER/OpenAI Privacy Filter), with Apple Vision for OCR, faces, and barcodes.
A wide range of entity types across several categories
| Category | Entity Types |
|---|---|
| Personal Identity | Person, Face, Gender, NRP, Age |
| Contact Info | Email, Phone Number, URL |
| Government IDs | SSN, National ID, Passport, Driver’s License |
| Financial | Credit Card, IBAN, Financial Account, Crypto Wallet |
| Medical | Medical Record, Diagnosis, Medication, ICD-10, Lab Result |
| Credentials | Password, PIN, API Key, Token, AWS Key, Stripe Key |
| Network & Location | IP Address, MAC Address, Coordinates, Address, IMEI |
| Visual & Other | QR Code, Barcode, Vehicle ID, Company, Date/Time, Username |
20× the speed, a fraction of the size
| Model | Local / Cloud | Precision | Recall | F1 Score | Speed (ms/doc) | Parse Failures | Size |
|---|---|---|---|---|---|---|---|
| Hush Engine | Local | 99.6% | 94.5% | 97.0% | 80 | 0.0% | ~15 MB |
| Llama 3.2 (1B) | Local | 94.9% | 97.1% | 96.0% | 680 | 14.5% | 1.3 GB |
| Claude Haiku 4.5 | Cloud | 93.3% | 99.9% | 96.5% | 1,141 | 0.0% | Cloud |
| Qwen 2.5 (7B) | Local | 97.1% | 98.2% | 97.7% | 1,577 | 0.0% | 4.7 GB |
| Gemini 2.5 Flash | Cloud | 97.9% | 100.0% | 98.9% | 1,766 | 0.5% | Cloud |
| Mistral 7B | Local | 99.8% | 96.9% | 98.4% | 2,230 | 0.0% | 7.2 GB |
| Phi-4 (14B) | Local | 96.9% | 98.4% | 97.6% | 2,890 | 0.0% | 9.1 GB |
| Gemma 2 (9B) | Local | 99.8% | 97.9% | 98.8% | 3,419 | 0.0% | 5.4 GB |
Benchmarked on 1,000 synthetic PII samples (10 entity types) using identical zero-shot prompts, with local models running on Apple M2 Max (32 GB) and API latency including network round-trip.
Industry-leading accuracy where it matters most
| Entity Type | F1 Score | Precision | Recall |
|---|---|---|---|
| Overall (ai4privacy) | 94.6% | 96.8% | 92.6% |
| Overall (Golden Set) | 99.7% | 99.5% | 100% |
| 100% | 100% | 100% | |
| Credit Card | 100% | 100% | 100% |
| Date / Time | 99.6% | 100% | 99.2% |
| IP Address | 96.3% | 100% | 92.9% |
| National ID | 94.7% | 100% | 90.0% |
| Phone Number | 94.5% | 100% | 89.5% |
| Address / Location | 93.4% | 97.7% | 89.4% |
| Coordinates | 92.3% | 85.7% | 100% |
| Person Name | 89.4% | 89.6% | 89.2% |
| Company | 88.9% | 80.0% | 100% |
| Gender | 81.7% | 69.1% | 100% |
Benchmark: 1,000 samples, 5,000+ entities from ai4privacy dataset. View full detection matrix →