Serving AI from the Basement Part II: SWE Agents, MoEs, Batch Inference, and Mor

2 points by XMasterrrr a year ago · 1 comment

Reader

Hey guys, If you remember, I shared my first blogpost here a couple of weeks ago Serving AI From The Basement (https://ahmadosman.com/blog/basement-ai-resident/serving-ai-...) and there was a great and lively discussion thread (https://news.ycombinator.com/item?id=41481852) with a lot of good input and questions.

This is my second blogpost, and in this one: SWE Agentic Framework – think of it as the puppet master for coders plus Replit's next nemesis. MoEs – imagine a team of AI experts, each shouting answers when it's their topic. Quantizations & Mixed Precision – turning AI from gourmet to fast food without losing the flavor. Batch Inference – AKA AI's quiz night, answering all questions at once. LLM Architectures – blueprints for our chatty AI friends. vLLM and Tensor Parallelism – or the thing that makes big AI models run lean. DeepSeek v2.5 – our open weights savior. Embedding Models – translating human words into AI-understandable numbers. Speculative Decoding – or AI's attempt at mind-reading, guessing your sentences before you finish them.

In the next blogpost I plan on addressing the main pain points of the hardware build and following up on the most-asked questions I received on the first one. I apologize for taking so long to get that out there, but it is taking me more than I expected to properly cover everything I want.

Please let me know if you have any comments or questions, and always feel free to reach out either here or via the social links on my website.

Settings

Serving AI from the Basement Part II: SWE Agents, MoEs, Batch Inference, and Mor

Keyboard Shortcuts