Settings

Theme

ToolMisuseBench: A deterministic benchmark for tool-augmented Agents

huggingface.co

1 points by akgitrepos a day ago · 2 comments

Reader

akgitreposOP a day ago

ToolMisuseBench is a deterministic, offline benchmark dataset for evaluating tool-using agents under realistic failure conditions, including schema misuse, execution failures, interface drift, and recovery under budget constraints.

This dataset is intended for reproducible evaluation of agent tool-use behavior, not for training a general-purpose language model.

akgitreposOP a day ago

GitHub Repo: https://github.com/akgitrepos/toolmisusebench

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection