From vibes to data: measuring how LLMs attend to your prompt, layer by layer

2 points by taylorsatula 4 months ago · 2 comments

Reader

Most prompt engineering is done by changing words, running the model, and squinting at the output. Over the past few weeks I've built this toolkit which lets you measure what's actually happening inside the model instead.

You define regions of your prompt (instructions, examples, constraints, whatever), run the pipeline on any HuggingFace model, and get back per-layer attention heatmaps, cooking curves showing how attention to each region evolves through the network, and logit lens snapshots. Supports Llama, Qwen, Mistral, Gemma out of the box. Self-contained engine script you can scp to a GPU box and run with no dependencies beyond transformers. The repo is designed so that Claude can handle the whole pipeline end-to-end including interpreting results in a grounded domain-specific way.

I built it to tune system prompts for another project and realized the general approach was useful enough to extract. The "before and after" comparison tooling ended up being the part I use most.

Settings

From vibes to data: measuring how LLMs attend to your prompt, layer by layer

Keyboard Shortcuts