Settings

Theme

Show HN: Detecting hallucinations in LLM function calling with entropy

archgw.com

4 points by honorable_coder 5 months ago · 0 comments · 1 min read

Reader

We use this for tool calling in https://github.com/katanemo/archgw, that uses a 3b function-calling LLM to map a user's ask to one of many tools for routine agentic operations in an application.

Why we do this: latency. A 3b parameter model, especially when quantized, can deliver sub-100ms time-to-first-token and generate a complete function call in under 100ms. That makes the LLM “disappear” as a bottleneck, so the only real waiting time is in the external tool or API being called + the time it takes to synthesize a human readable response.

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection