How to Improve Code Completion LLMs with Repo-Specific Finetuning
cgft.ioHey everyone! We've been working on helping eng teams finetune custom code LLMs for their specific internal code repos for different tasks across the SDLC.
We wrote a blog post about how we're doing it for code completions. We essentially fine-tune the model as a developer going from a blank slate to the full repo, one diff at a time. Instead of treating codebases as a static, raw list of files, we treat them as time-series of diffs on graphs of code objects (functions, classes, etc.).
The results are very encouraging.
Would love to answer questions and hear any cool ideas y'all might have!