OsamaJaber
- Karma
- 226
- Created
- 7 months ago
Recent Submissions
- 1. ▲ AutoMegaKernel: Compiling a LLM into a single CUDA kernel (arxiv.org)
- 2. ▲ AutoMegaKernel: Compile an LLM into one provably-correct CUDA megakernel (github.com)
- 3. ▲ StreamIndex: Memory-bounded compressed sparse attention via streaming top-k (arxiv.org)
- 4. ▲ Show HN: AutoKernel, Auto GPU Kernel Optimization (arxiv.org)
- 5. ▲ DeepSeek V4's indexer dies at 65K. We got it to 1M on 6GB (arxiv.org)
- 6. ▲ AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search (arxiv.org)
- 7. ▲ DeepSeek V4's indexer OOMs at 65K context. We got it to 1M in 6G (arxiv.org)
- 8. ▲ Ouroboros: Dynamic Weight Generation for Recursive Transformers (arxiv.org)
- 9. ▲ Tide: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference (arxiv.org)
- 10. ▲ Own your AI. Optimized down to the kernel (runinfra.ai)