Open Sourcing Cody – Sourcegraph's AI-enabled editor assistant
about.sourcegraph.comI am confused of what is being open sourced here. What I've managed to read from announcements and README I find myself being funnelled to Discord and Sourcegraph cloud services. I'm trying to understand what is what here.
I think there's three components that are needed to have the best (admittedly enticing) experience: Cody itself, Sourcegraph at the background (optionally) and an LLM called Claude from Anthropic. Claude is very much proprietary. Sourcegraph is open core, but to use it as a Cody's "helper" do I need those proprietary features? Without Claude and Sourcegraph Enterprise/Cloud what can Cody do, say with LLama based LLM, should this integration happen?
Again, what I've read, taken at face value, seems really promising. I've used Sourcegraph a few times in the past and sometimes wondered how it would benefit in my commercial work. Having an LLM could make this a next level tool, possibly something that regular chat-type LLM based services don't currently do.
Cody is being open sourced under Apache 2. The source code is here: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-.... The analog would be if GitHub open-sourced Copilot but didn't open source GitHub (Sourcegraph is open core, similar to GitLab, with all the code publicly available and the enterprise-licensed code under "enterprise" directories).
The network dependencies are Cody --> Sourcegraph --> Anthropic. Cody does need to talk to a chat-based LLM to generate responses. (It hits other APIs specific to Sourcegraph that are optional.)
We are working on making the chat-based LLM swappable. Anthropic has been a great partner so far and they are stellar to work with. But our customers have asked for the ability to use GPT-4 as well as the ability to self-host, which means we are exploring open source models. Actively working on that at the moment.
Sorry for any lack of clarity here. We would like to have Cody (the 100% open source editor plugin) talk to a whole bunch of dev tools (OSS and proprietary). We think it's totally fine to have proprietary tools in your stack, but would prefer to live in a world where the thing that integrates all that info in your editor using the magic of AI and LLMs to be open source. This fits into our broader principle of selling to companies/teams, and making tools free and open for individual devs.
Thank you for your response. I think I get it now. When I first read the announcement I jumped at the thought that the entirety was being open sourced. Perhaps I got a bit clouded through that. I understand your needs as a company with a business plan and expect nothing is free. When the option to ask here in this discussion arose I took it, since I couldn't figure this out elsewhere.
Ability to use other LLMs, especially open ones, is promising. I guess it's mostly a matter of how APIs are standardised across these products. I mostly use Copilot and truly hope things can get better than that. Especially the lack of control is infuriating, and tendency to go off on repeats for no discernible reason. On paper Cody looks to do better here.
Hopefully it's not just on paper :) There are a lot of rough edges still, but we hope to iron them out as quickly as we can.
One of our core design principles for Cody is to make it "unmagic". Like, the AI is magic enough, but the rest of what we're doing in terms of orchestrating the LLMs in combination with various other data sources and backends should be clear and transparent to the user. This allows for greater understandability and steerability (e.g., if Cody infers the wrong context, maybe you can just tell it the file it should be reading and then regenerate the answer).
Copilot is a great tool, and Oege de Moor, Alex Graveley, and the whole GitHub Next team deserve huge credit for shipping it. That being said, I really want the standard AI coding assistant to be open, and there's been a ton of innovation in LLMs since Copilot's initial launch that doesn't seem to have been rolled in yet. I think this is a case where being open means we can accelerate the pace of innovation.
I'll add if folks want to submit a PR to turn on other LLMs (or have Cody talk to the base LLM provider directly, sans Sourcegraph), we're happy to accept those. Literally the only thing preventing us from doing that right now is prioritization (our team is 4 people and we're scrambling to improve context fetching and implement autocomplete rn :sweat-laugh-emoji:)
> There's actually a growing body of evidence that shows the emergent ability of LLMs to reason (the so-called "chain of thought" ability) arises only when LLMs are trained on huge amounts of code, not just natural language. Natural language training data provides the ability to sound human, but it is the programming language training data that provides LLMs with the ability to be logical.
Good on Sourcegraph to contribute back to the open source community. LLMs rely more on open source code than meets the eye.
The decision to open source is surprising at first glance. But it makes a lot of sense. With the greatly enhanced coding productivity, the ROI for tweaking open source tools is quite high. This puts closed source tools at a significant disadvantage in terms of customization. They are a little late to the AI coding assistant market, this move enables them to capture market share rapidly and slow down the competitors in building up data moat.
This is probably one of the most sincere and personal product announcements from a corporation I’ve read.
Okay I wanna know if Steve Yegge named this after Cody, one of the best people on the the Google-internal grok/kythe team that Yegge also started.
How does it compare to github copilot ?
The real value is in their data. The code is cool but without data to create contexts it’s whatever.
I wonder if there is anything like this for a notebook-style UI? I’m a fan of how Observable does it.
You're a mind-reader!
We currently have a notebooks UI (e.g., https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTg1NA==), but the plan is to roll Cody into this and make it a super rich UI for learning/understanding anything and everything about code
Sounds promising, but that looks to me like a different kind of notebook? It seems to be a way to write documentation with some fancy features.
An Observable notebook is a standalone program, sort of a cross between a makefile and a spreadsheet. You write code snippets in JavaScript to calculate things and it automatically recalculates them when dependencies change. It's pretty powerful since you can import JavaScript libraries (to draw graphs, for example) and call API's.
Examples: https://observablehq.com/collection/@skybrian/digital-signal...
How much would it cost to use Cody with Sourcegraph? I don’t see a price in the article
Cody is free to use and doesn't strictly require Sourcegraph. It can make use of Sourcegraph APIs (e.g., code search, soon find refs) to improve its context fetching. We hope to integrate other dev tool APIs (e.g., monitoring, tracing, etc.) as well.
Sourcegraph is also free to use and downloadable as a local app (https://docs.sourcegraph.com/app) or you can use sourcegraph.com for open source. Our intention is to sell to teams/companies, while making tools for individual devs free to use. There have been a few cases in the past where we've misstepped and come across as selling to individual devs. If this ever happens, please flag to me (https://twitter.com/beyang) or sqs (https://twitter.com/sqs) directly and we'll correct it.
Does Cody work with App (local Sourcegraph)?
Thanks beyang, that’s a clear answer, I appreciate
Edit: It's free
This is not a clear answer to a question about the cost.
Don't be a car salesman.
Sorry wasn't my intention. It's free.
Wrestling has more than one AI Assistant.
Is it an answer to GitHub Co-Pilot?
Cody is more like a wrapper on top of GPT API (gpt-3.5-turbo) endpoint, with related code snippets embed in prompt. It’s not competing the low latency scenario that Github Copilot seats in.
There’re recent works show a potential of beating Copilot’s performance (e.g https://arxiv.org/abs/2303.12570) with much smaller models (500M vs 10B+).
Inspired by these work, I’m building Tabby (https://github.com/TabbyML/tabby), a OSS GitHub Copilot alternative. Hopefully it could make low cost AI coding accessible to everyone.
FYI, Tabby is the name of an existing open-source product: https://github.com/Eugeny/tabby/
We don't have autocomplete yet, which is what Copilot is mainly used for. But we're working on it, with the same context-aware mechanisms that provide a quality lift on the Q&A side.
Any plans for a neovim plugin?
The best way to accelerate this is to bug https://twitter.com/teej_dv on Twitter. Tell him Beyang sent you and that he thinks he can have an Emacs plugin out faster :P
On the roadmap, no ETA.