Settings

Theme

mezark

Karma
27
Created
3 years ago

Recent Submissions

  1. 1. Moe inference optimizations: 15% lower expert load by request reordering (blog.doubleword.ai)
  2. 2. Tensor Network Attention (mainlymatmul.com)
  3. 3. Redundant Information in LLM Weights (fergusfinn.com)
  4. 4. Tans: Precomputing RANS (fergusfinn.com)
  5. 5. Also-RANS: Asymmetric Numeral Systems for Entropy Coding (fergusfinn.com)
  6. 6. 70x faster cold(ish) starts for SGLang (fergusfinn.com)
  7. 7. QueueSpec – drafting speculation tokens while a request queues (blog.doubleword.ai)
  8. 8. ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism (mainlymatmul.com)
  9. 9. Parallel Primitives for Multi-Agent Workflows (fergusfinn.com)
  10. 10. New fastest AI Model Gateway – 450x less overhead than LiteLLM (github.com)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection