Settings

Theme

Frames: Factuality, Retrieval, and Reasoning MEasurement Set

huggingface.co

3 points by adg29 a year ago · 1 comment

Reader

adg29OP a year ago

Evaluation dataset designed to test the capabilities of Retrieval-Augmented Generation (RAG) systems. Paper with details and experiments is available on arXiv: https://arxiv.org/abs/2409.12941.

Dataset Overview 824 challenging multi-hop questions requiring information from 2-15 Wikipedia articles Questions span diverse topics including history, sports, science, animals, health, etc. Each question is labeled with reasoning types: numerical, tabular, multiple constraints, temporal, and post-processing Gold answers and relevant Wikipedia articles provided for each question

Key Features Tests end-to-end RAG capabilities in a unified framework Requires integration of information from multiple sources Incorporates complex reasoning and temporal disambiguation Designed to be challenging for state-of-the-art language models

Usage This dataset can be used to:

Evaluate RAG system performance Benchmark language model factuality and reasoning Develop and test multi-hop retrieval strategies

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection