IAB Content Taxonomy Mapper
View iab-mapper on GitHub • Open Mixpeek — IAB Taxonomy Mapper
📺 Watch the 5-minute walkthrough
Map IAB Content Taxonomy 2.x labels/codes to IAB 3.0 locally with a deterministic → fuzzy → (optional) semantic pipeline. Outputs are IAB‑3.0–compatible IDs for OpenRTB/VAST, with optional vector attributes (Channel, Type, Format, Language, Source, Environment) and SCD awareness.
Available in both Python and JavaScript/TypeScript
🎯 What it does
The IAB Mapper helps you migrate from IAB Content Taxonomy 2.x to 3.0 by:
- Input: Your existing 2.x codes/labels (CSV or JSON)
- Process: Deterministic matching → fuzzy matching → optional semantic enhancement
- Output: Valid IAB 3.0 IDs ready for OpenRTB/VAST integration
Example:
# Input: 2.x codes
"1-4","Sports"
"2-12","Food & Drink"
# Output: 3.0 IDs
"483","Sports"
"3-5-2","Food & Drink > Cooking"
Perfect for ad tech teams, content platforms, and anyone migrating to IAB 3.0.
📦 Client Libraries
This repository contains client libraries in multiple languages:
🐍 Python
Location: /python
Package: iab-mapper on PyPI
Features:
- CLI tool (
iab-mapper) - Python API for programmatic use
- Optional embeddings support (Sentence-Transformers)
- Optional LLM re-ranking (Ollama)
- Web demo with FastAPI
📦 JavaScript/TypeScript (Node.js)
Location: /javascript
Package: @mixpeek/iab-mapper on npm
npm install @mixpeek/iab-mapper
Features:
- Full TypeScript support with type definitions
- CommonJS and ES Modules compatible
- Zero dependencies for core functionality
- Lightweight fuzzy matching (fuzzball)
→ View JavaScript Documentation
🚀 Quick Start Examples
Python
from iab_mapper.pipeline import Mapper, MapConfig mapper = Mapper(MapConfig(fuzzy_cut=0.92, max_topics=3), data_dir="./python/data") result = mapper.map_record({ "code": "2-12", "label": "Food & Drink", "channel": "editorial" }) print(result["openrtb"]) # {"content": {"cat": ["3-5-2", "1026"], "cattax": "2"}}
JavaScript
const { Mapper } = require('@mixpeek/iab-mapper'); const mapper = new Mapper({ fuzzyCut: 0.92, maxTopics: 3 }); const result = mapper.mapRecord({ code: '2-12', label: 'Food & Drink', channel: 'editorial' }); console.log(result.openrtb); // { content: { cat: ['3-5-2', '1026'], cattax: '2' } }
TypeScript
import { Mapper, MapConfig } from '@mixpeek/iab-mapper'; const config: MapConfig = { fuzzyCut: 0.92, maxTopics: 3 }; const mapper = new Mapper(config); const result = mapper.mapRecord({ label: 'Food & Drink', channel: 'editorial' });
📁 Repository Structure
iab-mapper/
├── python/ # Python implementation
│ ├── iab_mapper/ # Python package source
│ ├── data/ # IAB taxonomy data files
│ ├── pyproject.toml # Python package config
│ ├── README.md # Python documentation
│ └── ...
│
├── javascript/ # JavaScript/TypeScript implementation
│ ├── src/ # TypeScript source
│ ├── data/ # IAB taxonomy data files
│ ├── examples/ # Usage examples
│ ├── package.json # npm package config
│ ├── tsconfig.json # TypeScript config
│ ├── README.md # JavaScript documentation
│ └── ...
│
├── demo/ # Web demo (uses Python backend)
├── scripts/ # Utility scripts
├── tests/ # Python tests
├── assets/ # Documentation assets
└── README.md # This file
✨ Features
Both implementations provide:
- ✅ Deterministic alias/exact matching → fuzzy string matching
- ✅ IAB 3.0 ID emission (not just labels) with configurable
cattaxfor OpenRTB - ✅ Multi-category output per input
- ✅ Vector attributes support (Channel, Type, Format, Language, Source, Environment)
- ✅ SCD (Sensitive Content) flag visibility and optional exclusion
- ✅ OpenRTB and VAST CONTENTCAT helpers
- ✅ Local-only, reproducible, versioned catalogs
- ✅ Override support for manual mappings
Python-specific features:
- 🐍 Local embeddings (Sentence-Transformers) for semantic matching
- 🐍 Optional LLM re-ranking via Ollama
- 🐍 CLI tool with CSV/JSON I/O
- 🐍 Web UI demo
🔎 Why migrate to IAB 3.0?
- 3.0 introduces clearer separation of primary topic "aboutness" vs. orthogonal vectors (e.g., news vs. opinion, formats, channels)
- Better support for CTV/video, podcasts, games, and app stores
- Non‑backwards compatible in areas like News/Opinion and entertainment genres; careful migration is required
This tool makes migration practical: it emits valid 3.0 IDs and helps curate edge cases with overrides, synonyms, thresholds, and audit outputs.
🧠 How it works
- Normalize text and apply alias/exact matches via synonyms
- Fuzzy retrieval (rapidfuzz | TF‑IDF | BM25) with configurable thresholds
- Optional semantic augmentation with local embeddings (Python only)
- Optional local LLM re‑ranking (Python only)
- Assemble outputs: topic IDs + vector IDs → OpenRTB
content.catwith configurablecattax - SCD flags are surfaced and can be excluded with
--drop-scd
🖥️ Web Demo
The repository includes a web UI with FastAPI backend:
cd python python -m venv .venv && source .venv/bin/activate pip install -e . pip install -r requirements-dev.txt uvicorn scripts.web_server:app --port 8000 --reload
Open http://localhost:8000/ in your browser.
📎 Official IAB References
- IAB Tech Lab: https://iabtechlab.com/
- Content Taxonomy: https://iabtechlab.com/standards/content-taxonomy/
- GitHub: https://github.com/mixpeek/iab-mapper
- Web Tool: https://mixpeek.com/tools/iab-taxonomy-mapper
🔐 Security & Operations
- Local-first: Processing happens on your machine; no external APIs needed
- No PII required: CSV/JSON processed in-memory
- Air‑gapped capable: Prebundle models and run fully offline
📜 License
BSD 2-Clause. See LICENSE.
Include IAB attribution in your deployed UI/footer:
"IAB is a registered trademark of the Interactive Advertising Bureau. This tool is an independent utility built by Mixpeek for interoperability with IAB Content Taxonomy standards."
📞 Support & Contact
- Issues: GitHub Issues
- Documentation: Mixpeek IAB Mapper
- Questions: Open an issue
- Enterprise support: Contact Mixpeek
For enterprise support, custom integrations, or questions about multimodal classification extensions, reach out to the Mixpeek team.
🌟 Language-Specific Documentation
- Python: python/README.md
- JavaScript/TypeScript: javascript/README.md
Made with ❤️ by Mixpeek
