Settings

Theme

Zai/GLM-OCR

huggingface.co

2 points by naze 2 months ago · 1 comment

Reader

raphaelmolly8 2 months ago

The MTP (Multi-Token Prediction) loss combined with stable full-task RL is an interesting training approach - curious how much the MTP specifically contributes to the 94.62 OmniDocBench score vs the RL component alone. At 0.9B params with vLLM/SGLang support, this looks very deployable. The PP-DocLayout-V3 integration for layout analysis before recognition is smart - most OCR failures I've seen come from poor region detection on complex documents rather than the recognition itself.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection