Settings

Theme

Show HN: Benchmarking Tangible Interface Understanding in Long-Horizon Tasks

huggingface.co

1 points by tellarin 8 days ago · 1 comment

Reader

tellarinOP 8 days ago

Some collaborators and I recently released a first version of a benchmark we think highlights a critical gap in recent AI models in understanding causality in the real-world, beyond a physics focus.

Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).

Paper: SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios

Data and leaderboard in HuggingFace.

Feedback, suggestions, and collaborators are very welcome!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection