KARMA-OpenMedEvalKit

2 min read Original article ↗

KARMA is designed for researchers, developers, and healthcare organizations who need reliable evaluation of medical AI systems.

Extensible

Bring your own model, dataset or even metric. Integrated with Huggingface and also supports local evaluation.

Add your own →

Fast & Efficient

Process thousands of medical examples efficiently with intelligent caching and batch processing.

See caching →

Model Agnostic

Works with any model - Qwen, MedGemma, Bedrock-SDK, OpenAI-SDK or your custom architecture with unified interface.

See available models →

Get started with KARMA in minutes:

# Install KARMA

pip install karma-medeval

# Run your first evaluation

karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3

$ karma eval --model "Qwen/Qwen3-0.6B" --datasets openlifescienceai/pubmedqa --max-samples 3

{

"openlifescienceai/pubmedqa": {

"metrics": {

"exact_match": {

"score": 0.3333333333333333,

"evaluation_time": 0.9702351093292236,

"num_samples": 3

}

},

"task_type": "mcqa",

"status": "completed",

"dataset_args": {},

"evaluation_time": 7.378399848937988

},

"_summary": {

"model": "Qwen/Qwen3-0.6B",

"model_path": "Qwen/Qwen3-0.6B",

"total_datasets": 1,

"successful_datasets": 1,

"total_evaluation_time": 7.380354166030884,

"timestamp": "2025-07-22 18:43:07"

}

}

  • Registry-Based Architecture: Auto-discovery of models, datasets, and metrics
  • Smart Caching: DuckDB and DynamoDB backends for faster re-evaluations
  • Extensible Design: Easy integration of custom models, datasets, and metrics
  • Rich CLI: Beautiful progress bars, formatted outputs, and help
  • Standards-Based: Built on PyTorch and HuggingFace Transformers

Installation

Multiple installation methods with uv, pip, or development setup.

Install KARMA →

Basic Usage

Learn the CLI commands and start evaluating your first model.

Learn CLI →

Add Your Own

Extend KARMA with custom models, datasets, and evaluation metrics.

Customize →

Supported Resources

Complete list of available models, datasets, and metrics.

View Resources →

Release resources

Section titled “Release resources”

Ready to evaluate your medical AI models? Get started with installation →