KARMA-OpenMedEvalKit - NFHN Reader

KARMA is designed for researchers, developers, and healthcare organizations who need reliable evaluation of medical AI systems.

Extensible

Bring your own model, dataset or even metric. Integrated with Huggingface and also supports local evaluation.

Add your own →

Fast & Efficient

Process thousands of medical examples efficiently with intelligent caching and batch processing.

See caching →

Multi-Modal Ready

Support for text, images, and audio evaluation across multiple datasets.

See available datasets →

Model Agnostic

Works with any model - Qwen, MedGemma, Bedrock-SDK, OpenAI-SDK or your custom architecture with unified interface.

See available models →

Get started with KARMA in minutes:

# Install KARMA
pip install karma-medeval
# Run your first evaluation
karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3

$ karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3
{
  "openlifescienceai/pubmedqa": {
    "metrics": {
      "exact_match": {
        "score": 0.3333333333333333,
        "evaluation_time": 0.9702351093292236,
        "num_samples": 3
      }
    },
    "task_type": "mcqa",
    "status": "completed",
    "dataset_args": {},
    "evaluation_time": 7.378399848937988
  },
  "_summary": {
    "model": "Qwen/Qwen3-0.6B",
    "model_path": "Qwen/Qwen3-0.6B",
    "total_datasets": 1,
    "successful_datasets": 1,
    "total_evaluation_time": 7.380354166030884,
    "timestamp": "2025-07-22 18:43:07"
  }
}

Registry-Based Architecture: Auto-discovery of models, datasets, and metrics
Smart Caching: DuckDB and DynamoDB backends for faster re-evaluations
Extensible Design: Easy integration of custom models, datasets, and metrics
Rich CLI: Beautiful progress bars, formatted outputs, and help
Standards-Based: Built on PyTorch and HuggingFace Transformers

Installation

Multiple installation methods with uv, pip, or development setup.

Install KARMA →

Basic Usage

Learn the CLI commands and start evaluating your first model.

Learn CLI →

Add Your Own

Extend KARMA with custom models, datasets, and evaluation metrics.

Customize →

Supported Resources

Complete list of available models, datasets, and metrics.

View Resources →

Release resources

Section titled “Release resources”

Ready to evaluate your medical AI models? Get started with installation →