GitHub - DebarghaG/forte: Official Implementation for Forte: Finding Outliers using Representation Typicality Estimation (ICLR 2025)

Forte: Finding Outliers Using Representation Typicality Estimation

Why OOD?

Out-of-Distribution (OOD) detection is possibly the most important problem for safe and deployable ML:

Provides the first line of defense by preventing silent failures in critical ML systems
Bounds AI capabilities by recognition of model knowledge
Allows safe fallback and enables human oversight when needed

Why Forte?

Forte takes a novel approach to OOD detection with several key advantages:

Utilizes self-supervised representations to capture semantic features
Incorporates manifold estimation to account for local topology
Minimizes deployment overhead; eliminates additional model training requirements
Requires no class labels, no exposure to OOD data during training, and no restrictions to architecture of predictive or generative models
Strong domain generalization – tested on detecting synthetic data, MRI images etc.

Key Innovation

Forte treats OOD Detection as middleware in deployments. The approach is designed to be plug-and-play, requiring minimal setup and configuration.

Quick Start

# Clone the repository
git clone https://github.com/DebarghaG/forte.git
cd forte

python3 -m venv env
source env/bin/activate

# Install dependencies
pip install scikit-learn numpy scipy transformers torch torchvision PIL tqdm

Basic Usage

Simply provide your data folders:

python main.py --id_images_directories '../data/imagenet_1k' \
    --id_images_names imagenet1k \
    --ood_images_directories '../data/inaturalist_images' \
    --ood_images_names inaturalist_images \
    --batch_size 512 \
    --device cuda:0 \
    --embedding_dir ../embeddings/ \
    --num_seeds 5 \
    --run_baselines False

Technical Approach

Forte combines representation learning with statistical estimation:

Uses self-supervised models to extract semantic features from images
Estimates typical sets using nearest neighbor statistics
Applies density estimation (KDE, OCSVM, or GMM) on the distribution of in-distribution data
Evaluates samples using precision, recall, density, and coverage metrics

The method achieves strong state-of-the-art (SoTA) performance across various benchmarks and real-world applications.

Citation

@inproceedings{
ganguly2025forte,
title={Forte : Finding Outliers with Representation Typicality Estimation},
author={Debargha Ganguly and Warren Richard Morningstar and Andrew Seohwan Yu and Vipin Chaudhary},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=7XNgVPxCiA}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Research supported by ICICLE AI Institute.