Hushbee Detection Engine | How It Works

2 min read Original article ↗

Hush Engine

Hush Engine

Open-source Python engine for redacting PII in images, PDFs, and spreadsheets. Wide-ranging detection,  93% F1 on Kaggle, no cloud round-trip. 266 ms per doc.

Supports

Privacy Filter

 Open source under AGPL-3.0. Commercial license available for proprietary use.

Python 3.10+, Microsoft Presidio with custom regex + checksum validators (Luhn, Verhoeff, mod-97, python-stdnum, phonenumbers) and an early-exit NER cascade (LightGBM → NLTagger → name database → optional Flair/Transformers/GLiNER/OpenAI Privacy Filter), with Apple Vision for OCR, faces, and barcodes.

A wide range of entity types across several categories

Category Entity Types
Personal Identity Person, Face, Gender, NRP, Age
Contact Info Email, Phone Number, URL
Government IDs SSN, National ID, Passport, Driver’s License
Financial Credit Card, IBAN, Financial Account, Crypto Wallet
Medical Medical Record, Diagnosis, Medication, ICD-10, Lab Result
Credentials Password, PIN, API Key, Token, AWS Key, Stripe Key
Network & Location IP Address, MAC Address, Coordinates, Address, IMEI
Visual & Other QR Code, Barcode, Vehicle ID, Company, Date/Time, Username

View full detection matrix with thresholds & methods →

20× the speed, a fraction of the size

Model Local / Cloud Precision Recall F1 Score Speed (ms/doc) Parse Failures Size
Hush Engine Local 99.6% 94.5% 97.0% 80 0.0% ~15 MB
Llama 3.2 (1B) Local 94.9% 97.1% 96.0% 680 14.5% 1.3 GB
Claude Haiku 4.5 Cloud 93.3% 99.9% 96.5% 1,141 0.0% Cloud
Qwen 2.5 (7B) Local 97.1% 98.2% 97.7% 1,577 0.0% 4.7 GB
Gemini 2.5 Flash Cloud 97.9% 100.0% 98.9% 1,766 0.5% Cloud
Mistral 7B Local 99.8% 96.9% 98.4% 2,230 0.0% 7.2 GB
Phi-4 (14B) Local 96.9% 98.4% 97.6% 2,890 0.0% 9.1 GB
Gemma 2 (9B) Local 99.8% 97.9% 98.8% 3,419 0.0% 5.4 GB

Benchmarked on 1,000 synthetic PII samples (10 entity types) using identical zero-shot prompts, with local models running on Apple M2 Max (32 GB) and API latency including network round-trip.

Industry-leading accuracy where it matters most

Entity Type F1 Score Precision Recall
Overall (ai4privacy) 94.6% 96.8% 92.6%
Overall (Golden Set) 99.7% 99.5% 100%
Email 100% 100% 100%
Credit Card 100% 100% 100%
Date / Time 99.6% 100% 99.2%
IP Address 96.3% 100% 92.9%
National ID 94.7% 100% 90.0%
Phone Number 94.5% 100% 89.5%
Address / Location 93.4% 97.7% 89.4%
Coordinates 92.3% 85.7% 100%
Person Name 89.4% 89.6% 89.2%
Company 88.9% 80.0% 100%
Gender 81.7% 69.1% 100%

Benchmark: 1,000 samples, 5,000+ entities from ai4privacy dataset. View full detection matrix →