Settings

Theme

Serving AI from the Basement Part II: SWE Agents, MoEs, Batch Inference, and Mor

ahmadosman.com

2 points by XMasterrrr a year ago · 1 comment

Reader

XMasterrrrOP a year ago

Hey guys, If you remember, I shared my first blogpost here a couple of weeks ago Serving AI From The Basement (https://ahmadosman.com/blog/basement-ai-resident/serving-ai-...) and there was a great and lively discussion thread (https://news.ycombinator.com/item?id=41481852) with a lot of good input and questions.

This is my second blogpost, and in this one: SWE Agentic Framework – think of it as the puppet master for coders plus Replit's next nemesis. MoEs – imagine a team of AI experts, each shouting answers when it's their topic. Quantizations & Mixed Precision – turning AI from gourmet to fast food without losing the flavor. Batch Inference – AKA AI's quiz night, answering all questions at once. LLM Architectures – blueprints for our chatty AI friends. vLLM and Tensor Parallelism – or the thing that makes big AI models run lean. DeepSeek v2.5 – our open weights savior. Embedding Models – translating human words into AI-understandable numbers. Speculative Decoding – or AI's attempt at mind-reading, guessing your sentences before you finish them.

In the next blogpost I plan on addressing the main pain points of the hardware build and following up on the most-asked questions I received on the first one. I apologize for taking so long to get that out there, but it is taking me more than I expected to properly cover everything I want.

Please let me know if you have any comments or questions, and always feel free to reach out either here or via the social links on my website.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection