GitHub - IABTechLab/iab-mapper: Local IAB Content Taxonomy Mapper (2.x → 3.0). Deterministic → fuzzy → optional embeddings. Exports OpenRTB & VAST-ready category IDs with vector attributes and SCD awareness.

5 min read Original article ↗

IAB Taxonomy Mapper

IAB Content Taxonomy Mapper

CI Status PyPI version Python versions PyPI downloads npm version npm downloads License GitHub stars Security Code of Conduct

View iab-mapper on GitHubOpen Mixpeek — IAB Taxonomy Mapper

IAB Mapper Walkthrough Video
📺 Watch the 5-minute walkthrough

Map IAB Content Taxonomy 2.x labels/codes to IAB 3.0 locally with a deterministic → fuzzy → (optional) semantic pipeline. Outputs are IAB‑3.0–compatible IDs for OpenRTB/VAST, with optional vector attributes (Channel, Type, Format, Language, Source, Environment) and SCD awareness.

Available in both Python and JavaScript/TypeScript

🎯 What it does

The IAB Mapper helps you migrate from IAB Content Taxonomy 2.x to 3.0 by:

  1. Input: Your existing 2.x codes/labels (CSV or JSON)
  2. Process: Deterministic matching → fuzzy matching → optional semantic enhancement
  3. Output: Valid IAB 3.0 IDs ready for OpenRTB/VAST integration

Example:

# Input: 2.x codes
"1-4","Sports"
"2-12","Food & Drink"

# Output: 3.0 IDs
"483","Sports"
"3-5-2","Food & Drink > Cooking"

Perfect for ad tech teams, content platforms, and anyone migrating to IAB 3.0.


📦 Client Libraries

This repository contains client libraries in multiple languages:

🐍 Python

Location: /python
Package: iab-mapper on PyPI

Features:

  • CLI tool (iab-mapper)
  • Python API for programmatic use
  • Optional embeddings support (Sentence-Transformers)
  • Optional LLM re-ranking (Ollama)
  • Web demo with FastAPI

→ View Python Documentation


📦 JavaScript/TypeScript (Node.js)

Location: /javascript
Package: @mixpeek/iab-mapper on npm

npm install @mixpeek/iab-mapper

Features:

  • Full TypeScript support with type definitions
  • CommonJS and ES Modules compatible
  • Zero dependencies for core functionality
  • Lightweight fuzzy matching (fuzzball)

→ View JavaScript Documentation


🚀 Quick Start Examples

Python

from iab_mapper.pipeline import Mapper, MapConfig

mapper = Mapper(MapConfig(fuzzy_cut=0.92, max_topics=3), data_dir="./python/data")

result = mapper.map_record({
    "code": "2-12",
    "label": "Food & Drink",
    "channel": "editorial"
})

print(result["openrtb"])
# {"content": {"cat": ["3-5-2", "1026"], "cattax": "2"}}

JavaScript

const { Mapper } = require('@mixpeek/iab-mapper');

const mapper = new Mapper({ fuzzyCut: 0.92, maxTopics: 3 });

const result = mapper.mapRecord({
  code: '2-12',
  label: 'Food & Drink',
  channel: 'editorial'
});

console.log(result.openrtb);
// { content: { cat: ['3-5-2', '1026'], cattax: '2' } }

TypeScript

import { Mapper, MapConfig } from '@mixpeek/iab-mapper';

const config: MapConfig = { fuzzyCut: 0.92, maxTopics: 3 };
const mapper = new Mapper(config);

const result = mapper.mapRecord({
  label: 'Food & Drink',
  channel: 'editorial'
});

📁 Repository Structure

iab-mapper/
├── python/                  # Python implementation
│   ├── iab_mapper/         # Python package source
│   ├── data/               # IAB taxonomy data files
│   ├── pyproject.toml      # Python package config
│   ├── README.md           # Python documentation
│   └── ...
│
├── javascript/             # JavaScript/TypeScript implementation
│   ├── src/               # TypeScript source
│   ├── data/              # IAB taxonomy data files
│   ├── examples/          # Usage examples
│   ├── package.json       # npm package config
│   ├── tsconfig.json      # TypeScript config
│   ├── README.md          # JavaScript documentation
│   └── ...
│
├── demo/                  # Web demo (uses Python backend)
├── scripts/               # Utility scripts
├── tests/                 # Python tests
├── assets/                # Documentation assets
└── README.md             # This file

✨ Features

Both implementations provide:

  • Deterministic alias/exact matching → fuzzy string matching
  • IAB 3.0 ID emission (not just labels) with configurable cattax for OpenRTB
  • Multi-category output per input
  • Vector attributes support (Channel, Type, Format, Language, Source, Environment)
  • SCD (Sensitive Content) flag visibility and optional exclusion
  • OpenRTB and VAST CONTENTCAT helpers
  • Local-only, reproducible, versioned catalogs
  • Override support for manual mappings

Python-specific features:

  • 🐍 Local embeddings (Sentence-Transformers) for semantic matching
  • 🐍 Optional LLM re-ranking via Ollama
  • 🐍 CLI tool with CSV/JSON I/O
  • 🐍 Web UI demo

🔎 Why migrate to IAB 3.0?

  • 3.0 introduces clearer separation of primary topic "aboutness" vs. orthogonal vectors (e.g., news vs. opinion, formats, channels)
  • Better support for CTV/video, podcasts, games, and app stores
  • Non‑backwards compatible in areas like News/Opinion and entertainment genres; careful migration is required

This tool makes migration practical: it emits valid 3.0 IDs and helps curate edge cases with overrides, synonyms, thresholds, and audit outputs.


🧠 How it works

  1. Normalize text and apply alias/exact matches via synonyms
  2. Fuzzy retrieval (rapidfuzz | TF‑IDF | BM25) with configurable thresholds
  3. Optional semantic augmentation with local embeddings (Python only)
  4. Optional local LLM re‑ranking (Python only)
  5. Assemble outputs: topic IDs + vector IDs → OpenRTB content.cat with configurable cattax
  6. SCD flags are surfaced and can be excluded with --drop-scd

🖥️ Web Demo

The repository includes a web UI with FastAPI backend:

cd python
python -m venv .venv && source .venv/bin/activate
pip install -e .
pip install -r requirements-dev.txt
uvicorn scripts.web_server:app --port 8000 --reload

Open http://localhost:8000/ in your browser.


📎 Official IAB References


🔐 Security & Operations

  • Local-first: Processing happens on your machine; no external APIs needed
  • No PII required: CSV/JSON processed in-memory
  • Air‑gapped capable: Prebundle models and run fully offline

📜 License

BSD 2-Clause. See LICENSE.

Include IAB attribution in your deployed UI/footer:

"IAB is a registered trademark of the Interactive Advertising Bureau. This tool is an independent utility built by Mixpeek for interoperability with IAB Content Taxonomy standards."


📞 Support & Contact

For enterprise support, custom integrations, or questions about multimodal classification extensions, reach out to the Mixpeek team.


🌟 Language-Specific Documentation


Made with ❤️ by Mixpeek