Settings

Theme

Evolving Instruction Following Beyond IFEval and "Avoid the Letter C"

surgehq.ai

1 points by gk1 6 days ago · 1 comment

Reader

storystarling 6 days ago

Matches my experience trying to stabilize long LangGraph workflows. The regex checks are fine for formatting but miss the semantic drift that happens when you're actually injecting context. The rubric-based approach makes sense, but I'm not sure how a bootstrapped team implements this without the human labeling budget. I've tried using a stronger model to grade the outputs, but the latency overhead is brutal.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection