trunk
A simple visualisation style for model architectures (implemented for HF models, tested for LLMs{text,audio,multimodal}).
- general
- component-wise color coding
- relative parameter size comparision within model at log scale
- represents model layers / operation in order visually
method
-
Clean Layer Names
Example: Convertencoder.layer.0.attention.self.query.weighttoencoder.layer.attention.self.query.weight. -
Compute Parameter Counts
Example:query.weightmay have 768 x 768 = 589,824 parameters. -
Normalize Sizes Logarithmically
Scale parameter counts using logarithms to fit visualization constraints.Example: If the smallest layer has 100K params and the largest has 10M, map their cube sizes proportionally.
-
Assign Unique Colors
Use a colormap to distinguish layer types.Example: Attention layers in blue, feed-forward layers in red, embedding layers in green (not hardcoding it right now)
Application
Initial Rough Versions
gpt2-small
Refined Version After Few Iterations
Method Applied to Popular Models
Note: more in diagrams/ folder of this repo
how to run
uv run https://raw.githubusercontent.com/attentionmech/trunk/refs/heads/main/main.py <model_name>
examples:
uv run https://raw.githubusercontent.com/attentionmech/trunk/refs/heads/main/main.py gpt2
if you checked out
Citation
@article{attentionmech2025trunk,
title={trunk: a simple visualisation style for model architectures},
author={attentionmech},
year={2025}
}





