Show HN: Detecting hallucinations in LLM function calling with entropy

4 points by honorable_coder 5 months ago · 0 comments · 1 min read

Reader

We use this for tool calling in https://github.com/katanemo/archgw, that uses a 3b function-calling LLM to map a user's ask to one of many tools for routine agentic operations in an application.

Why we do this: latency. A 3b parameter model, especially when quantized, can deliver sub-100ms time-to-first-token and generate a complete function call in under 100ms. That makes the LLM “disappear” as a bottleneck, so the only real waiting time is in the external tool or API being called + the time it takes to synthesize a human readable response.

No comments yet.

Settings

Show HN: Detecting hallucinations in LLM function calling with entropy

Keyboard Shortcuts