mezark

Karma: 126
Created: 3 years ago

Recent Submissions

1. ▲ What happens when you run a CUDA kernel? (fergusfinn.com) 294 points · 29 days ago · 32 comments
2. ▲ A running list of reasons to move to open source (whyopensource.ai) 6 points · 1 month ago · 0 comments
3. ▲ Moe inference optimizations: 15% lower expert load by request reordering (blog.doubleword.ai) 3 points · 2 months ago · 0 comments
4. ▲ Tensor Network Attention (mainlymatmul.com) 2 points · 2 months ago · 0 comments
5. ▲ Redundant Information in LLM Weights (fergusfinn.com) 5 points · 2 months ago · 0 comments
6. ▲ Tans: Precomputing RANS (fergusfinn.com) 3 points · 2 months ago · 0 comments
7. ▲ Also-RANS: Asymmetric Numeral Systems for Entropy Coding (fergusfinn.com) 25 points · 2 months ago · 0 comments
8. ▲ 70x faster cold(ish) starts for SGLang (fergusfinn.com) 4 points · 3 months ago · 0 comments
9. ▲ QueueSpec – drafting speculation tokens while a request queues (blog.doubleword.ai) 1 point · 6 months ago · 0 comments
10. ▲ ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism (mainlymatmul.com) 1 point · 6 months ago · 0 comments

All submissions on HN · View profile on HN