Quantifying GitHub Copilot’s impact on developer productivity and happiness
github.blogThe place where Copilot shines most for me is boilerplate like:
up = foo.north()
right =
and then it will correctly suggest up = foo.north()
right = foo.east()
down = foo.south()
left = foo.west()
This requires a kind of linguistic understanding that is beyond basic intellisense. If you are just wanting it for basic intellisense autocomplete, it's not much better than JetBrains stuff. But for things that require linguistic comprehension, it's incredibly useful.For example, if I have a `Point` object with `x` and `y` fields, and I make a function called `lowest_point`, it knows I want to the point with the minimum `y` value. Because it knows the name `y` is most typically used for vertical direction, and the word `lowest` also implies vertical direction. This is kind of a linguistic/semantic knowledge that is not found in just the type system.
What's funny is that i often see Copilot give you code that Intellisense/a type system will tell you is obviously wrong. I wish Copilot was more integrated with type systems.
Right now Copilot is just given the surrounding code, which is obviously limited. I suspect the next big jump will be a model that is given:
1. The surrounding code
2. The type/intellisense information of all the identifiers in the surrounding code
3. The class/function definition of any identifiers in the surrounding code
I think #3 will be huge. There is a huge difference in Copilot's performance when dealing with types defined right above where you are writing code, and types defined in some far-away module or library. Right now if you use a class you defined in another file or adjacent project, Copilot can't see it. But your IDE has jump-to-definition functionality that could trivially be wired in to something like Copilot.
The team behind Stable Diffusion (the image synthesis model that's been all over HN recently) have said they are working on stuff like Copilot as well, so I expect we will see a huge transformation in this space in the next year or two.
This.
It feels like something that is obvious and should go in the copilot plugin rather than copilot itself.
Given N suggestions, validate the code + snippet on them using normal error checking.
Promote suggestions with no errors.
The intellij plugin, for example, will often suggest 'default' suggestions that are entirely wrong. Given that `foo.bar.foobar()` exists, it'll suggestion `foo.something.SomethingClever()` when those functions don't exist, or exist on types other than foo.
However, if you look at all the suggestions on the side panel, you'll see several of them are actually valid. It's just that the default completion isn't.
It produces very good results with typescript. When there is an interface and I'm creating an object that implements the interface; it produces a result that fits into that type.
This has not been my experience with Copilot and Typescript.
Same here.
Same with Rust. Sometimes I just have to write a comment saying "display implementation for X" and it will autocomplete with the trait implementation. This is extra magical because I don't need to remember the exact detail of the trait.
What's worse, is that if you make a mistake in the code leading up to it's completion (even one picked up by Intellisense), Copilot will now assume that you actually wanted the mistakes, and it's autocompletion will give you even more mistakes.
It is definitely hit or miss, but I don't find it too annoying to just clean up the small mismatches that crop up sometimes.
Yes, but no.
It feels like magic because it really does get some things right.
But I find that in most cases, because it's often wrong, you have to deal with wrong suggestions, and be very careful about what you accept.
I feel that for most coding scenarios, it doesn't actually, in most cases, do that much but save a few strokes.
I suggest that if there is 'high boilerplate' - then maybe there's upside.
But for most things, it really doesn't help, despite how in some cases it's really magical.
I turned it off and don't miss it.
What I would however prefer is merely 'much better autocomplete'.
Somewhere between 'ok autocomplete' and 'code suggestions' is the sweet spot for dev. that's not boilerplate.
Yep, for the boilerplate alone it's worth the cost for me. Oh you need to create a DTO from an internal object/class? Define the fields, write your `populateFromX()` function signature, maybe you have to type `$this->id = $input->id` and then CoPilot does the rest. And that's only one feature that I love, it does an excellent job most of the time writing the boilerplate for loops and the like based on my function names. Often I have to make few or no changes to it's code.
At a worst it provides a great starting point to jump off from, at best it writes it for me just like I would have (matching style/variable naming/etc).
One of the biggest boilerplate time savers for me is I have a service that maps HTTP requests to some specific class. I can paste an example HTTP request into a comment like:
And then it spits out a totally sane and reasonable class. In this file I already have 10+ classes already defined and manually touched up by me to rename some things, remove some fields, etc. So every time I add a new one, Copilot takes into account the previous examples and each addition requires less or no manual tweaking./* <http request here> The well-type class representation of this is: */100%. Most of my calls to the backend literally only need me to type the function name `fetchItems()` and CoPilot correctly uses my axios instance, set the return type to be my base API response wrapper, sets the generic in my base wrapper to be `IItemDTO[]` (it guesses this correctly probably 90%+ of the time), and even shoves the response into a Vuex store just like I would have done (same deal, guesses right about 90%+).
It's not like it's a ton of work to do that myself but normally the way I would accomplish it is to copy an existing function, change the return type, change the url, change the Vuex store it puts the data into but with CoPilot it handles that perfectly for me the vast majority of the time. And if I'm doing CRUD endpoint I don't even have to write the other function names, it will autocomplete the "RUD" part after I write the "C" (and by write I mean I put `createItem(payload: ICreateItemDTO)` or similar and it writes the body of the first function as well).
For me it's the perfect blend of "generated code" and "human tending" to get exactly what I need without having to do the repetitive/boring parts. It's been a joy to work with.
If you find yourself generating so much boilerplate, it's probably a better design and more enjoyable to craft a single reusable abstraction than it is to rely on a glorified copy and paste tool to create a mess of redundancy.
Is it the case that the code you wrote with copilot is much more prone to be refactored.? The usage you describe sounds like a lot of repetition and hard to update if any of those requirements (Vuex store for example) change.
That’s not been my experience. I mean the code is no harder to refactor than anything else, the code follows the style I prefer (and my linters enforce), if it doesn’t I change it as needed.
If I were to switch from Vuex to Pinia it would be the same amount of work with/without copilot. I don’t let copilot generate code without me reading/reviewing it and sometimes making tweaks. CoPilot literally (and I mean that) generates exactly what I was planning to write, if it doesn’t I start typing it my preferred way and often CoPilot then offers a completion in that style.
Without CoPilot the code would be virtually the same, so refactoring isn’t impacted one way or the other.
> For example, if I have a `Point` object with `x` and `y` fields, and I make a function called `lowest_point`, it knows I want to the point with the minimum `y` value. Because it knows the name `y` is most typically used for vertical direction, and the word `lowest` also implies vertical direction.
Unless, as in many systems, y=0 is the top and y increases downward.
The magical thing is once shown examples of this in your code, or a comment saying as much, Copilot will adapt.
For example, I just gave Copilot:
and it completed:# The screen is inverted, so bigger y values are lower def lowest_point(points):
Even if you write some code that implies that bigger y is lower, Copilot will pick up on it.# The screen is inverted, so bigger y values are lower def lowest_point(points): lowest = points[0] for point in points: if point.y > lowest.y: lowest = point return lowestNotably, I think writing with copilot encourages you to write comments that provide necessary context.
Naturally, this makes it easier for other programmers to understand the code.
>This requires a kind of linguistic understanding that is beyond basic intellisense
Are you sure? If you would look at the statistics what comes after
I bet most of the time it'sup = foo.north() right =up = foo.north() right = foo.east() down = foo.south() left = foo.west()Yes, ultimately it's just a really good statistical model.
But it does a really good job, and it's not just matching code it's seen before. You can use any variable name. You can do this in a weird programming language which uses weird symbols for assignment and calls. copilot will probably see though it and give you a good prediction.
You can probably even swap west for not_east in code elsewhere in the file, and it will manage to predict that you also want not_east here. And I guaranteed nobody has written code like that before.
but it contextualizes it with adapted variable names, case and everything. That's very hard to do with only statistical analysis.
Exactly, for example if I give it:
it correctly spits outhn_example_up = foo.north_garbage_jlakd() hn_example_right =
Even though those identifiers including random letters are certainly not in the training data. It can do surprisingly good basic reasoning.hn_example_up = foo.north_garbage_jlakd() hn_example_right = foo.east_garbage_jlakd() hn_example_down = foo.south_garbage_jlakd() hn_example_left = foo.west_garbage_jlakd()Here is another example. I give it:
So my function names are like referring to unit distances on the x/y axes. Copilot correctly figures out what I'm doing and completes:hn_example_up = foo.x_0_y_1() hn_example_left = foo.x_neg_1_y_0()
A few weeks ago I coded a rubik's cube solver that involved some very nuanced and weird spatial reasoning and was blown away by Copilot's ability demonstrate it "understood" concepts such as turning the right side down is identical to turning the left side up when viewed from the back of the cube, or stuff like that.hn_example_up = foo.x_0_y_1() hn_example_left = foo.x_neg_1_y_0() hn_example_down = foo.x_0_y_neg_1() hn_example_right = foo.x_1_y_0()No it isn’t. Variable names are literally fungible in this context.
Especially boilerplate in a language or framework you don't usually use. It's such a time saver.
I've never used Copilot, but are you worried it will only spit out:
And you won't notice?up = foo.north() right = foo.east() down = foo.south()This is why I stopped using copilot. It was very good at spitting out code that looked correct enough that I didn’t spot the mistakes. And they were mistakes I’d never make myself if I wrote it manually.
Maybe don't use it for huge chunks > 10 lines? The point of copilot to me is avg typing speed is 40wpm and read is 250wpm. A 5.25x speedup considering even after I type code I still read it over. There will be errors in copilots code sometimes so even if you said its only 50% accurate thats still a 2.5x speedup on the boilerplate it's used for.
No more worried than in the past if I made the same mistake myself by manually typing it or by copying / pasting and leaving something off.
As with everything I always do visual sanity checks.
You have to think about it like auto complete. You don't blindly accept what auto complete tells you, your code would never compile otherwise, you read what it proposes to you and you modify what you don't like.
I get that, but I meant how often does it introduce bugs in the situation the OP mentioned, using it for "a language or framework you don't usually use", where I won't know if it looks right or not because I'm unfamiliar. Are they going back to fix the code often?
Personally, I don't even rely on normal autocomplete very much when I code because I feel like if I don't know what I'm trying to call, then I don't know the language or library well enough, and it's best to go look at the documentation.
I recently used Copilot while learning Rust and it was immensely helpful for the learning process. There were a lot of language features that I learned about by seeing Copilot use them.
As for mistakes, I've got a good enough sense for reading code in any language to know if something is obviously wrong with an algorithm. The mistakes Copilot makes aren't usually to do with language-specific syntax, they're typically algorithmic and therefore easy to spot in any language. This makes sense if you think about GPT-3's output for English: it tends to be syntactically correct, but often completely wrong about facts.
It helps that Rust's compiler is so picky and so helpful, because on the rare occasions when Copilot was completely wrong about the language the compiler could set me straight pretty quickly. Writing automated tests also helped and was made easy by Copilot. I could write a single test case showing how to use the API, and then just write function names to generate the rest of the tests (obviously I would sanity-check the assertions Copilot produced).
I'm not sure, I use it A LOT for cryptographic code in Rust and I don't blindly rely on its output. I believe if there's a bug that was introduced it would be my fault with high probability, not Copilot's.
Auto-complete generally completes a single identifier, and the possible completion list is usually restricted by the types involved. So the likelihood of it getting a wrong one is lower, but even if it does, it's easy to spot.
Not really no
Why not?
From what I've seen GTP3 works exceptionally well until it doesn't, which can be easily missed.
I've simply never had an issue with this sort of thing in months of using co-pilot to be my fancy autocomplete. And if there was an issue I'd notice quickly enough, because I am still the primary pilot.
It is also great as a starting point for error messages, comments, etc.
I think the key is to still give a shit about the code you're writing.
My rule using copilot is to use it as a fancy typing assistant which will frequently make mistakes - so you need to read everything it writes for you, and have solid test coverage too in order to avoid accidentally accepting a dumb mistake.
This made it useless for me because writing code is easier and faster than reviewing someone else’s code. And less error prone.
And much more enjoyable. I haven’t tried Copilot yet, but the process of letting it generate something and then checking it for accuracy sounds mind-numbing and stressful. Kind of like supervising a car that’s on assisted cruise: I don’t enjoy that either.
But I read the code, and I run tests.
does the bot write most of the tests too?
Do you use it to write anything you want - even works on READMEs and comments! But like any good test - test it in a different way to how to implement it.
I've been writing a compiler, and I guess enough people have because it always knows exactly what my enums should be, what the name of my types should be, what I should parse next, etc.
Writing a simple compiler is a fairly common CS exercise.
This is as credible as Ford doing a survey and finding out people got way more done with Ford trucks.
I found copilot to be nothing but a hinderance and additional cognitive load. Not only do you need to think about the code you want, you need to review the (usually wrong) code copilot produces and either reject it or edit it to make it right. This is a lot more mental overhead than just writing the code.
I donk't like copilot. I don't like sending MS my code, I don't trust it to not violate copyright, and from what I used it, it feels like using dall-e trying to phrase what I expect.
Moreover writing boilerplate is rarely my daily activity, when it is, it's faster to copy paste from a similar file.
On the other hand, few exceptional devs I know, the kind of rockstar 10x engineer, insanely productive and very smart about product and business, not just great coders, people that were leads in faang 15 years ago already and are responsible for high impact quality sooftware, speak very highly of it and how better it makes them, which crushes my doubts about copilot.
As a 10x engineer with 24 years of experience, I stand by my opinion that it is meh at best.
Are you actually saying you are a 10x developer? Like you are way better than average? Because that pretty much kills your credibility right there. "Oh I'm right because I'm so great".
If you are indeed a 10x developer I'd love to see some of the great code you've written.
Yeah, I kind of assumed they were writing this facetiously, because otherwise that kind of type-a prima donna Wagnerian complex is like shorthand for: "hey let's never be friends".
I hope so. One look at the gh handle and you can see this person has less commits in ten years than your average developer in a single day.
10x developers are not modest people
They usually aren't modest but they also aren't insecure enough to have to say they are 10x developers. If you are good other people will generally be telling you how good you are.
Who 2xers or 5xers? Or do you need to be voted in by other 10ers?
Without Copilot, I still end up editing a large percentage of the code I write. It feels like a net win.
Agreed. I found that its suggestions constantly break flow as you’re now evaluating something that was pushed on you.
Beyond basic algorithms and boilerplate code completion (which would arguably be better implemented via better intellisense than ML), it was basically zero help.
Even in the areas it helped, time to eval mostly equaled the time I would have taken to write the same thing manually.
I think that's a Bruce Tognazzini thing where you feel like it's a cognitive load, but you are actually way more productive with it than without. I feel the same way about regular autocomplete, but that's a must for productivity.
I spent most of the trial hating Copilot and the way takes over my Tab key in Jetbrains software. It got in the way a bit too much. Every time I tried to use emmet shortcuts, live template triggers or anything muscle memory related, it would bring something else, so I had to undo, press Esc a couple of times and then continue with whatever I actually wanted.
However, every now and then, there came a typed object or something mundane and Copilot exceled at it, so I decided that I want to find a way for it to coexist in a way that's as easy to use as pressing Tab to expand it. Lo and behold - I found an unused key below Tab (AKA caps lock), remapped it and now it doesn't get in the way and I can use Copilot's help only when I want.
In pycharm where indentation is key and you want to bring the current line into the current scope, it will just autofill the copilot suggestion which you then have to remove again. So annoying.
I mostly enjoyed my Copilot trial but one thing that really annoyed me was that it would often make suggestions for strings/keys that would be decent guesses by the AI but would be obvious type errors in TypeScript. Then it seemed a bit tricky to fight through the incorrect suggestion and get the old TypeScript Intellisense which was of course correct and what I wanted. Other suggestions by Copilot were awesome but when there was an obvious job for TypeScript it would be nice if Copilot would step back. For example I would often type:
if (myTypedVar === "" <--
Back off Copilot, let TypeScript Intellisense give me the correct suggestions. Or at least run the Copilot suggestions through TS first and make sure it doesn't suggest type errors.
Indeed this is one of the more annoying features of the integrations I’ve seen.
Running through the type-checker is an interesting idea - even if you didn’t reject suggestions outright (which seems like it could frequently reject suggestions you actually wanted) it may be a useful weighting function.
Similarly, running through eslint (or whatever linting system is appropriate) to avoid introducing obvious fixable formatting errors, like incorrect quotes, would be wonderful.
Sounds like it was your Copilot integration with whatever editor you used that was bad, not Copilot itself.
The neovim integration worked very well, at least when I used it with Clojure and associated tooling. Could move seamlessly between what my Clojure tooling provides me with, and what Copilot did.
I'll tell you right now, having my GPL-licensed code stolen has a pretty negative impact on my happiness.
What snippets were stolen from your code and how has it affected your happiness?
What if I read your code? Now I have memory of it in my brain and it will influence my next projects. Is it stealing?
Actually it could be! If you read GPL code and remember it well, then write a near exact copy of it into your proprietary software project, that's a violation of the GPL.
I have no opinion on whether there's infrigement or not, as I am not a lawyer, but I found the argument that the terms of use specify that you allow them to analyze your code pretty convincing:
> We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.
Copilot seems to go further than all that. It can suggest verbatim, nontrivial, copyrighted work with no means of attribution. And the sources are so broad no human alive could say with certainty any given output is not infringing, unless it's so trivial it cannot be copyrighted.
I've always had trouble understanding intellectual property when it comes to code, because if I read open source code, remember everything, and write my own version (even if slightly different) 10 years later, I am not 100% sure whether it's copyright infrigement or not.
I see it as something akin to a painter studying someone else's work, then reproducing some of the techniques invented by the original artist... except that their techniques didn't have a license I guess?
I think I am not equipped to fully understand the legal boundaries between inspiration and theft, and I think most programmers are in the same situation.
Painter analogy falls down when you consider Copilot can output verbatim chunks. So more like a photo of a painting or a stroke-for-stroke copy, even if portions changed or only parts taken.
Now if changes are so significant it becomes impossible to recognize the reference then that could be legit. Though based on how Copilot works I don't think that can be assumed, or even proven. When I studied art the teachers usually taught us to begin with our own original photograph, ideally without trademarked items, for reference.
And even in literature or journalism one must quote sources, even if paraphrased. I was taught to put down any inspiring work and only begin my own work after taking a break, to reduce the likelihood of unintentionally copying the original.
Whether or not I agree with your interpretation of that clause (I am choosing not to analyze that too deeply), the vast majority of my code isn't on GitHub because I uploaded it there... it is on GitHub because it was open source and someone else--someone who uses GitHub to manage their projects--uploaded it, whether as a mere fork or as a legitimately derived work. If you were right, and this did matter to their legality, this would thereby mean 1) that Copilot's database is already tainted and 2) that I guess GitHub isn't actually capable of being used to host GPL projects at all (which I doubt is the intention).
Yeah this case is interesting actually, I had never thought of a legitimate derivative landing on github, not something the original author intended to happen...
My guess is that you're right and it does mean 1) and 2), but again, it's probably a matter of interpretation and actual ruling on the matter.
Maybe you should switch to the Unlicense. I'm so much happier not giving a damn or punishing my users with pettiness.
Licenses don't matter when Microsoft decides that AI training is fair use.
I'll take "I didn't read the GitHub TOS" for 500 Alex.
Well not everyone is in contractual privity with GitHub — like the poster a few up whose code was included in someone else’s repo. GitHub’s TOS have no effect in that circumstance.
"wants to participate in and use open source, but is not happy when others use their code"
*Use their code without complying with the license
Copilot is transformative work.
Objection, legal conclusion.
I couldn't care less about whatever restrictive license people choose and decide to call open source
Well then dont use their code you have to legally follow the codes license.
I wonder if you could just ruin these systems by having some bots constantly upload bugged code into github
I think my biggest worry about AI is that we hit a big wall right about now because training data is now contaminated with AI outputs. That's why OpenAI puts watermarks on images, I'm guessing, but Stable Diffusion doesn't, and the free one is probably going to get a lot more usage than the paid-for one.
But, maybe not, because humans liked the outputs enough to share them and put them on the Internet, so maybe they don't have much of an effect quite yet.
I think the training limits other applications like self driving cars a lot too. Training is frequently done in contrived scenarios. Now, imagine if GM or Ford put sensors and telemetry for future self driving car use on their models decades ago. You'd have practically every road mapped with ten thousand observations a piece by now. It wasn't for lack of forsight to not collect these data, it was really for lack of profit. Sensors and other data collecting systems are not new technology. Neither is machine learning, its also decades old. The only thing different these days is that the compute required to run these models on these data has gotten cheaper, but you could have been collecting data the whole time with anticipation of cheap compute to run these models in the future. Then you'd have a leg up on everyone in the business with your massive dataset of decades of real world, not test tracks and other such situations utilizing an employed driver in all likelihood.
"The GPL is a free software license, and therefore it permits people to use and even redistribute the software without being required to pay anyone a fee for doing so."
How is GHC breaking this license?
GPL requires that derivative works - which, I argue, Copilot is - also be distributed under the GPL. Microsoft disagrees with my interpretation, and wins by default because I can't afford to sue them.
The GPL has requirements, and Copilot fails to adhere to those requirements. Here's an important one:
>You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice
Say I drew a lovely picture and hung it outside my house for passers by to look at. I say “sure you can take a copy, no problem”, they then sell a billion prints, and don’t credit me at all.
Legally, not stealing. Ethically? I’d call that stealing.
Except that's not exactly what they are doing, is it? Following on your example, it'd be like selling the advice of someone who has seen a lot of these pictures passing by windows, with a very good memory and understanding of pictures.
Github Copilot is not spitting out code verbatim, it's learning what code is, what shape it usually has, and trying to retrofit your own code onto the shape it thinks code should have.
It's not like it's doing a query like `SELECT * FROM COPIED_CODE WHERE CODE STARTS_WITH "def my_func("`.
Legally that actually is stealing. They must have documented reproduction rights. Misunderstanding a verbal comment will not fly in a court.
Any thread on copilot always seems to get a lot of comments about how it makes dumb mistakes. I guess I'm used to Intellisense etc making dumb suggestions all the time? I don't dwell on it for a second, either the computer makes a good suggestion that was what I was about to do and I accept the suggestion, or it doesn't and I don't accept it. I'm certainly not going to sit there and stew over how dumb it was to suggest such a thing, it's a damn computer!
Cognitive load is massively decreased, I don't have to push much on the working memory stack to do small utility functions, they just write themselves now (especially if I write the type signature first). I can spend a lot more of my time at the middle level of abstraction, and keep moving.
Earlier in my career--back when I still relied on Intellisense--I never really thought of it as a "suggestion" that could be "wrong"?... it is a tab completion mechanism, similar to how you can tab complete filenames at the shell, and that is extremely useful if you don't exactly remember the name of the function/file you are looking for--hell: I use tab completion in bash as a replacement for ls in most cases!--but it isn't something where the thing it generates somehow looks right but is actually a security vulnerability.
Yeah, I mean if you don't remember if it's `strlen`, `strnlen` when autocompleting, you can introduce a security vulnerability with intellisense just "tab completing"
This is contrived, but in general (programmer + fuzzy memory + computer suggestion) means there's a possibility for it to be wrong, and you have to rely on the type system and reviewing it carefully to make sure there wasn't a mistake.
This seems like something obvious, but I didn't really appreciate it until I was actually using copilot day to day.
When you're using it on a code base that it's familiar with, it'll often suggestion quality of life suggestions that you may not know about.
For example, using unity, did you know
Mathf.RoundToInt
Was a thing?I didn't, until I was using a normal way of doing it (eg. `(int) Math.Round(v);` and copilot popped this suggestion up.
I had similar experiences using opencv in python.
Maybe the suggestions aren't always perfect, but what it does do, is show you how other people have written code using the same engine/plugin/library you're using; and when you're learning a new API, that's extremely valuable.
The alternative (finding github projects using library X, browsing through the code) lets you do the same thing too, but its much much slower to pickup the same 'tips' about an API that way.
Specifically with regard to "give people a new task they haven't done before in language X with/without copilot", this kind of 'tip' for using an unfamiliar API seems high plausible for getting developers up to speed.
I wish it had support for doing this in a way that was 'favour suggestions like existing code base' to help on-board new developers.
That would be really really useful.
I ended up disabling it in the beta. It ended up getting in the way more often than not.
Same. Felt like pairing with a drunk programmer.
Did it ever repeat "developers!" loudly?
I always figured those guys were on something else other than booze.
Out of curiosity, what programming language you use, and what type of code you write?
Not OP but I did the same thing. I build web apps using some combination of TypeScript, C#, F#, HTML, and CSS (or CSS-style languages). I spent about as much time correcting what it wrote as if I had written it myself. As someone else mentioned, it also interfered with proper inference given by the type system. Any time I wrote a more complicated algorithm (I work with a lot of graphs) I couldn't trust what it gave me. It's far easier to just gloss over some incorrect detail when the "shape" of the code is correct.
I turned it off after it kept outputting go that looked feasible but wasn't valid because of the libraries I was or wasn't using. I only turn it on if I'm doing something mundane that it might be able to handle. The fact it overrides intellij's autocompletion was too annoying
To me it's an added level of attention required, having to be extra careful (as I'd be when reviewing a junior's code) with any code suggestion copilot suggests... even if it's wrong 10% of the time, I need to be on the look for lines that would just cause a bug down the line. Ended up disabling it.
I don't understand all the negative comments, my experience with Copilot is just insane, it really is autocomplete on steroids. It's hard to fathom that some people have a bad experience with it.
I'm doubtful that the results from the experimental process can be used to draw such generalizing conclusions about the utility of Copilot. The task (implementing a HTTP web server in JS) is already something that would be over-represented in public codebases, this is like asking participants to implement two sum or a sudoku solver, of course the language model would have essentially memorized these types of problems.
"and timed how long it took them to write an HTTP server in JavaScript"
OK ... interesting choice. This could mean almost anything. I can "write a web server" in a python 1-liner in about 2 seconds (ctrl-r SimpleHttpServer, enter).
We'll be sharing some more about how we conducted this in an upcoming post! But no, the task was actually to write a little engine to process a limited subset of HTTP requests and return a valid response or relevant error.
Would you pit it against my non-ai solution for the same problem? As a baseline to measure against? It would seem prudent to compare against other tools trying to alleviate the same problem, rather than just humans writing the code by hand
https://github.com/hofstadter-io/demos/tree/main/full-stack-...
I've since shortened that demo to be even faster for a user to complete
There was a test suite to work against, so some standard notion of things that needed to be in place to be 'working'.
Based on this task, I could write a snippet that would beat Copilot every single time.
it's a profoundly stupid task to measure to measure developer productivity too
the hard bit of being a developer is deciding in precise, pedantic, utterly exact detail on what needs building to solve the business problem
not bashing out the code in the editor
now if they had given out a vague problem that ended up with the dev eventually producing a webserver in Javascript, then fine
but then the difference copilot makes would be a much smaller percentage of the development effort
I swear we're going back to the age of "lines of code written per hour", just now with arTiFiCiAL InTellIgEncE
> the hard bit of being a developer is deciding in precise, pedantic, utterly exact detail on what needs building to solve the business problem
The process seemed to involve have a test suite to write against. So, essentially, you had some/most of your "business problem" defined by acceptance tests (produce these outputs given these inputs). Some of that process may normally stretch out for days or weeks during meetings - condensed down, provided for you ahead of time takes some of that variability out.
I was possibly a bit more surprised that less than 80% finished. Did they not commit code? Did they submit code but it didn't pass? Did they take too long? Was the 'did not finish' an explicit call on the dev side ("I'm done") or did they just stop taking input after X days?
There is a real need for offline co-pilot alternative.
Privacy concerns due to copilot makes it unusable for companies with closed source code.
I’m hoping that eventually there will be a stable diffusion level solution in this field.
For some companies, maybe they have a large enough codebase of boilerplate functions that they can train their own models on. Oracle's own private co pilot instance or something like that, just pumping out more and more corporate code.
Maybe I’m weird, but I kindof like programming. Even my own “boiler plate”. And good coding practices combined with well factored code design and planning (even in the micro-moment to moment sense of design and planning) usually avoids most “boiler plate”. (“Boiler plate” is usually a misnomer in modern well-designed languages. If the pl or ps makes you write something out, it usually is bcs you have the opportunity/obligation to think about it and usually to tweak it. If it was actually always the same, it would have good defaults and wouldn’t be there at all.)
I use Copilot constantly and it's one of my favorite developer tools ever.
Same here. It's such a great tool and productivity boost. Don't want to miss it anymore.
I wasn't that bothered about it until I used it. The biggest thing for me is when working with Terraform and some really badly documented parts of it and really not being able to work out the syntax required, the suggestions really helped expedite what would otherwise be a lot of iteration.
What would be great would be for an enterprise pricing capability, we would love to purchase this on behalf of our employees via a single interface.
Has anyone considered creating an extension like co-pilot that's trained on your own reddit/hn/twitter comments? Basically like an autocomplete for your own self.
I've not used it, and probably never will, but why do people want non deterministic snippets again?
Seems like popularizing snippets wouldve gone further but then you miss out on the buzzwords.
Huh? Maybe you should try it out and see. It's nothing like snippets.
I really love Copilot. I had a trial, let it expire because I thought I wouldn't really need it, and then ended up paying for it. It's apparently got even better. For me, at least, it gets most things very right and takes a lot of mundanity out of day to day development!
I work in Finance, if I type a certain client name out in public, I can be fired for it. Can copilot work without the internet and within only a corporate network ?
You might be interested in IntelliCode from Visual Studio. It's an AI-guided autocompletion tool, like Copilot, although dumber and less powerful. It can run in local mode [0], looking at your existing codebase (opened solution) to feed future suggestions.
[0]: https://docs.microsoft.com/en-us/visualstudio/intellicode/ov...
Not presently per their FAQ.
> [...] The GitHub Copilot extension sends your comments and code to the GitHub Copilot service, and it relies on context, as described in Privacy below - i.e., file content both in the file you are editing, as well as neighboring or related files within a project. It may also collect the URLs of repositories or file paths to identify relevant context. The comments and code along with context are then used by OpenAI Codex to synthesize and suggest individual lines and whole functions.
- from https://github.com/features/copilot/#faq - see "How does GitHub Copilot work?"
No, you can already use offline alternatives like tabnine but it's not as good. But I guess better opensource models and tools will be available eventually.
I read the article and I have some skepticism. I think my skepticism is well-founded but it may well be the case that a machine will one day do my job. I don't believe that time is at hand, though.
First off, I don't see a link to the "HTTP server in JavaScript" task. It's really hard for me to place much faith in their conclusions when it's not even clear what the problem definition was.
Second, I believe that a lot of more senior developers and development managers who take secure development practices somewhat seriously will not be able or willing to use Copilot in any sort of proprietary setting. Here is a quote from the Copilot FAQ:
> [...] The GitHub Copilot extension sends your comments and code to the GitHub Copilot service, and it relies on context, as described in Privacy below - i.e., file content both in the file you are editing, as well as neighboring or related files within a project. It may also collect the URLs of repositories or file paths to identify relevant context. The comments and code along with context are then used by OpenAI Codex to synthesize and suggest individual lines and whole functions.
- from https://github.com/features/copilot/#faq - see "How does GitHub Copilot work?"
I believe this makes it simply a nonstarter in a lot of environments. I am wondering if there are a number of places that have restrictions on sharing their code with a third-party but don't know or don't care and so end up using Copilot anyway. I believe that short-sighted thinking like this is more prevalent in shops that have low-quality code, and I believe that the higher-quality the code, the less likely someone is to use Copilot, simply for the "I can't share my code, even if I use the most restrictive no-telemetry settings" reason. Give me a self-hosted Copilot, and I may try it out in anger.
Finally, I based some of my thinking on a recent Reddit /r/programming discussion of Copilot: https://old.reddit.com/r/programming/comments/wsnend/since_g...
After reading those posts, and internalizing them with my own view of coding, I believe Copilot is not ready for my personal use. Again: licensing considerations aside (if you actually can feel comfortable putting them aside, see NoraCodes comment in this HN thread e.g.), it is simply a non-starter for anything proprietary in nature. I am also of the mind that any code that is of necessity very tedious to write is in dire need of real attention, most likely in the form of tests and quite possibly refactoring to reduce the boilerplate if at all possible. I believe in the value of linters and automated code analysis tools and in continuous integration that runs after every commit. Give me a self-hosted Copilot, and we'll have a real chance to see how it works out - until then it's not going to be a boon to programmers.
Replying to myself to add some things I think are real boons to development:
- https://pvs-studio.com/en/blog/posts/ - I don't code in C or C++ but I love to read these posts with the limited understanding that I possess
- GitHub has another product which is good https://github.com/dependabot - similar in nature to Snyk, Renovate, etc.
- there was an HN thread yesterday about govulncheck which looks pretty nice as a Go-known-vulnerability-in-your-dependencies checker
- code review is invaluable
- continuous integration and a build that passes and expands to cover new bug cases is really helpful
ITT:
A bunch of insecure clowns that somehow manage to feel personally threatened by a productivity tool. It's hilarious. Get over yourselves.
Copilot is just an intelligent, context-aware search engine that is plugged directly into your code editing workflow. It's nothing more. It's the sci-fi version of having a hardcoded `site:stackoverflow.com` google search box right next to your editor.
It's easily the nicest thing I've installed for productivity in a while, and it regularly amazes me what it can pick up from context. I can switch between code written in two different languages that is related, and copilot will often pick up enough context to give me correct suggestions in one language based on the code I just wrote in the other.
How's Copilot with really good statically typed languages like Haskell? I haven't tried it out because most comments say it doens't even work well with typescript yet...
It is not even remotely usable with languages like Haskell, the code it generates almost never typechecks, IME. I think the primary reasons being lack of training data (there is far more Python code on GitHub than the entirety of Hackage) and that Copilot is essentially a language model that does not consider AST and type information at all.
I use it almost daily with typescript and it works well. It will guess wrong some types once in a while but it's still a very nice help.
It should be free since it was trained using free software.
Anecdata: Copilot has been net-negative for me when writing Python, but the Rust completions are fantastic.
I liked using it. Though, most of the time, it was limited, in the sense that it could only offer completions using the code before the cursor as the context.
I found it most helpful in setting up boilerplate in when writing tests. It could offer testing some edge cases before I could think of them.
Wish the auth was better