GitHub Copilot for Business is now available

97 points by markhall 3 years ago · 126 comments

Reader

Sadly, "simple license management" here just refers to "who in your organization has a license to use this tool", rather than "where did this code come from and what license is it under".

This tool remains the equivalent of money laundering for violation of Open Source licenses (or software licenses in general).

thesuperbigfrog 3 years ago

Where's the part of the Copilot EULA that indemifies users against copyright infringement for the generated code?
If the model was trained entirely using code that Microsoft has copyright over (for example: the MS Windows codebase, the MS Office codebase, etc.) then they could offer legal assurances that they have usage rights to the generated code as derivative works.
Without such assurances, how do you know that the generated code is not subject to copyright and what license the generated code is under?
Are you comfortable risking your company's IP by unknowingly using AGPLv3-licensed code [1] that Copilot "generated" in your company's products?
[1] https://en.wikipedia.org/wiki/GNU_Affero_General_Public_Lice...
- Gigachad 3 years ago
  
  Ultimately I think MS is right betting on the fact that this will not matter in the long run. All AI tools are trained on copyrighted data. The tools are too useful to cripple with restrictive laws written for a previous era.
  - belorn 3 years ago
    
    The pirate movement might have been a bit early for their time, but if AI happen to be the technology advancement needed for society to abolish copyright law then lets get it done.
    With everything from scihub getting blocked and people getting subscription fatigue, it is an excellent time for the restrictive laws to be replaced.
    
    JoshTriplett 3 years ago
    
    > The pirate movement might have been a bit early for their time, but if AI happen to be the technology advancement needed for society to abolish copyright law then lets get it done.
    No argument there; that'd be a huge victory.
    But until that happens, tools for "smart completions" need to provide appropriate licensing and attribution metadata.
  - JoshTriplett 3 years ago
    
    Please by all means have those laws repealed. Until then, a license violation remains a license violation no matter how much AI you run the code through.
  - jonstewart 3 years ago
    
    Napster was right to bet on music being accessible online but oh wait
  - MagicMoonlight 3 years ago
    
    It’s not AI if it’s just copying and pasting someone else’s code
  - thesuperbigfrog 3 years ago
    
    >> All AI tools are trained on copyrighted data. The tools are too useful to cripple with restrictive laws written for a previous era.
    Are you sure that Disney and Games Workshop will not mind if make your own Mickey Mouse / Warhammer 40K cross-over movie for profit? (I would watch it!)
    AI can help generate it, so it must be okay.
    
    Gigachad 3 years ago
    
    Those are protected by trademark law which will still be strong in a post AI world. I'm sure those companies will be making massive use of AI to speed up content production.
    
    thesuperbigfrog 3 years ago
    
    You previously stated "The tools are too useful to cripple with restrictive laws written for a previous era." and also "Those are protected by trademark law which will still be strong in a post AI world."
    So do you believe in legal protections for trademarked and copyrighted works or not?
    If an AI assistant generates content that includes copyrighted or trademarked elements is the content "too useful to be crippled by restrictive laws" or "protected by laws which will still be strong" ?
  - IceHegel 3 years ago
    
    I agree. AI is happening. Don’t swim upstream.
- Maxious 3 years ago
  
  Copilot for Business includes an $500k indemnity if you turn on excluding suggestions that match open source code https://github.com/customer-terms/github-copilot-product-spe...
  - thesuperbigfrog 3 years ago
    
    Good to know, but there are caveats:
    "GitHub’s defense obligations do not apply if (i) the claim is based on Code that differs from a Suggestion provided by GitHub Copilot, (ii) you fail to follow reasonable software development review practices designed to prevent the intentional or inadvertent use of Code in a way that may violate the intellectual property or other rights of a third party, or (iii) you have not enabled all filtering features available in GitHub Copilot."
    Do they define what constitutes "reasonable software development review practices designed to prevent the intentional or inadvertent use of Code in a way that may violate the intellectual property or other rights of a third party" ?
    
    filereaper 3 years ago
    
    >(iii) you have not enabled all filtering features available in GitHub Copilot
    I was trying to find out what these exact filtering settings are but I land at this page [1].
    The other links just download the Product Specific terms PDF again.
    Not sure if this is an oversight or intentionally intended to be circular.
    [1] https://docs.github.com/en/copilot/configuring-github-copilo...
  - hobs 3 years ago
    
    I wonder if the procurement departments get to negotiate that higher for the bigger contracts, because that's low imo.
  - fire 3 years ago
    
    this seems like it covers non-business copilot as well
- charcircuit 3 years ago
  
  >Are you comfortable risking your company's IP by unknowingly using AGPLv3-licensed code that Copilot "generated" in your company's products?
  This would not risk your company's IP.
  - thesuperbigfrog 3 years ago
    
    >> This would not risk your company's IP.
    "The GNU Affero General Public License is a modified version of the ordinary GNU GPL version 3. It has one added requirement: if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there."
    So your company is okay providing all their source code to users?
    I know that some companies do this, but most do not.
    
    charcircuit 3 years ago
    
    >So your company is okay providing all their source code to users?
    Nothing is forcing them to do that. If they are infringing they simply need to delete the infringing code and rewrite it by hand.
    
    JoshTriplett 3 years ago
    
    That's not how copyright infringement works. Great, you've stopped committing new infringements, but there's still a legal case over the previous infringements, for which you still need to provide appropriate remedy.
    If the remedy for copyright infringement were just "oh, we got caught, guess we'll stop now", that would provide substantial incentive for people to violate licenses as long as they hoped they wouldn't get caught. The remedy for such violations needs to be substantial enough that it's not profitable to temporarily get away with.
    
    number6 3 years ago
    
    The company would have to pay a fee to the copyright owner plus some extra.
    I am jaded from practice around GDPR where the thinking of companies goes along the line: if we are caught, we will pay extra, but now we make big cash. And who knows, maybe we won't get caught.
    
    thesuperbigfrog 3 years ago
    
    >> Nothing is forcing them to do that.
    If you use AGPLv3-licensed code in your codebase, you are agreeing to the terms of the license.
    Practically all corporate legal teams for companies that creates software will strictly prohibit the use of AGPLv3-licensed software, if not all GPL-based licenses.
    
    ndriscoll 3 years ago
    
    Using AGPL code doesn't mean you agree to the terms. It means if you don't agree to and obey the terms, you don't have a license, which is copyright infringement.
    
    thesuperbigfrog 3 years ago
    
    >> if you don't agree to and obey the terms, you don't have a license, which is copyright infringement
    So you either agree to the license terms or you are breaking the law.
    If your company violates copyright law, it can be taken to court and sued for damages or forced compliance to the license terms:
    https://www.mend.io/resources/blog/the-100-million-case-for-...
    https://www.natlawreview.com/article/gpl-open-source-litigat...
    https://wiki.fsfe.org/Migrated/GPL%20Enforcement%20Cases
    
    charcircuit 3 years ago
    
    Being taken to court for infringement of free software's copyright is rare. On top of that the copied code only being a snippet from copilot makes it even less likely to happen. The snippet alone may not be considered copyrightable.
cothrowaway767 3 years ago

It's an unpopular opinion which is why I'll cowardly write it under a throwaway account. Josh, I have a ton of respect for your work just btw. I can't help but see a headline like this and think "Okay the license argument has to be the top comment already".
To me this whole thing is like pandora's box and it will not in any way be put back into the box. In the long run isn't arguing about the code it generates and how it generates it mostly tilting at windmills? I've already met new / junior programmers that have used copilot and chatgpt to help them see how to approach certain problems or try to get better framing for what they couldn't quite get into the most accurate words to google.
I too would prefer these tools embody the ideal: no license violation, perfect citation of where the archetypes of the code came from. I've commented here today (amongst some great FOSS software engineers) to see if a genuine respectful conversation can be had about how just like torrents this one isn't going to be put back in the box no matter how many legal precedents attempt (or succeed) in cutting off heads of the hydra. It's utility seems like it will steamroll any attempts to stop or slow it down.
Am I wrong? Is it a fools errand to ask?
- sublinear 3 years ago
  
  > It's utility seems like it will steamroll any attempts to stop or slow it down.
  What? I don't see any utility outside of education and even there it's pretty sketchy.
  For business, legal compliance is not a joke and instantly shuts it down. The only businesses willing to use ChatGPT for generating code would be naive young startups who don't realize some assembly is still required and the instructions are missing no matter how much they query the bot. That's called expertise (which they don't yet have). It's not good enough to just write the code. Someone has to comprehend it so they can tweak it as needed. At some point the tweaks will become unwieldy and require actual software engineering that the bot doesn't know how to do (transform from one design pattern to another and know which to use). More power to them if they can cobble something together and then succeed at maintaining it. By the time they're through they'll have pulled off so many miracles that they won't need the bot anymore and become experts. That's quite the trial by fire, but hey everyone has to find their way!
- JoshTriplett 3 years ago
  
  I'm not saying "put it back in the box", I'm saying fix it to actually track Open Source licenses and provide attributions.
  I'd have no objections to a tool that generated suggestions that came with attributions and license metadata, ready to insert into your project's file for third-party licenses. AI code suggestions are impressive.
  I have objections to a tool that generates derived works from code without respecting the licenses of that code. For permissively licensed Open Source code, including that code without attribution deprives authors of their due credit (said credit often being how people get employment or funding). For copyleft Open Source code, including that code without using a compatible license violates the conditions upon which people made that code available for others to build upon and share. For proprietary code, including that code at all incurs legal risks.
- Benjamin_Dobell 3 years ago
  
  I can understand if people don't agree that it's copyright infringement, and will use these tools on that basis. However, rolling over on issues you're passionate about because they're difficult to address? Well, if everyone was like that, nothing would ever change.
- Gigachad 3 years ago
  
  Agreed. Copyright is not a fundamental law of physics, its something we invented to help incentivize creation. The moment AI tools show to help spur creation, they are now more useful than copyright so society will simply rewrite the laws to adapt.
  - wahnfrieden 3 years ago
    
    As an individual do you think you gone out the other end of this with freer abilities vs current copyright restrictions? And up against bigger players than yourself
- wahnfrieden 3 years ago
  
  OpenAI/GPT products at MS making same exact bet
unxdfa 3 years ago

Yeah we were told not to use it by the lawyers at work and have an official policy against using it. Not having that would open us up for liability if we’re sued as there’s no defence that what we did was clean room if we admitted using it.
We’ll hang back until other companies have litigated their way to some legislation around it.
- hashtag-til 3 years ago
  
  Same here. Told to steer clear of this (in fairness, not only CoPilot, also ChatGPT and stuff), until somebody else pipeclean this in the courts.
  No matter what is said, there are no license guarantees on the generated code, as you don’t know the exact provenance, so it seems only sensible to be on the safe side.
  - aliswe 3 years ago
    
    The lawyers on my last job were terrified to know that we store tracking information on website visitors computers that is used to track them by third party corporations.
    
    hashtag-til 3 years ago
    
    Yes, I can imagine. It’s all fun and games until you get something that sticks on generally available media/press for example Cambridge Analytica sort of stuff.
fire 3 years ago

I don't understand why they aren't tagging data with license information and allowing users to use models that don't include certain licenses - seems like it would be the middle ground given the stance they've taken; like, "we don't think it's a problem, but if this makes you feel better you can use these other models that specifically don't train on gpl code, or whatever"
I would prefer to see full license attributions included in generated responses, though. Something that then also wouldn't be that difficult to generate a licenses file from?
Amazon's CodeWhisperer has a "reference tracker" that tells you the license of training data code if the generated response is within some similarity threshold, but that's still not good enough imo.
- JoshTriplett 3 years ago
  
  > I would prefer to see full license attributions included in generated responses, though. Something that then also wouldn't be that difficult to generate a licenses file from?
  Exactly. By all means build tools like this, but build them to actually comply with Open Source licenses. Provide a list of the licenses you don't mind copying from, and get back attributions with your suggestions.
  - danuker 3 years ago
    
    Suppose Copilot offered some pure-MIT licensed flavor.
    Copilot could comply with MIT licenses by just outputting an MIT license with ALL the authors of code used in training.
    
    JoshTriplett 3 years ago
    
    That'd be a valid solution, if impractical. I doubt that people would be willing to copy hundreds of thousands of license notices into their project.
    
    danuker 3 years ago
    
    One perspective is that those authors actually contributed to the end result.
    But sure, disk size could be a problem.
    
    JoshTriplett 3 years ago
    
    > One perspective is that those authors actually contributed to the end result.
    They absolutely did, yes. The approach you're suggesting would work from a legal perspective, but the size might pose practical problems.
- LelouBil 3 years ago
  
  > Amazon's CodeWhisperer has a "reference tracker" that tells you the license of training data code if the generated response is within some similarity threshold, but that's still not good enough imo.
  I don't think it's possible to do better than that with this technology.
  - fire 3 years ago
    
    like I probably don't understand this in the right way, but I could have sworn we had the ability to probe latent space on models like these and make mappings based on them? Or was that only for diffusers?
    
    danuker 3 years ago
    
    Sure, you could build an index of attribution <> latent space coords, but it would not be clear whether a generated document near several index entries would require compliance.
    I guess this is where the threshold comes from. Choose a generous margin and over-attribute rather than under-attribute.
Kiro 3 years ago

You make it sound like Copilot is just copy-pasting something from a single repo. The code Copilot generates for me is extremely specific to my application. It understands the context, my code style and what I'm trying to do.
The result looks like my own code and is utilizing the already existing parts of my application. The code it writes for me solves problems that you cannot find a standard solution for anywhere and is definitely not something that could be attributed.
How Copilot is trained is an issue but answering the question "where did this code come from and what license is it under" would be impossible.
- JoshTriplett 3 years ago
  
  > You make it sound like Copilot is just copy-pasting something from a single repo.
  Not at all. I'm saying it's derived from large amounts of code, without respecting the licenses on that code.
  > How Copilot is trained is an issue but answering the question "where did this code come from and what license is it under" would be impossible.
  Then it shouldn't exist outside of demos of what could exist in the future if the showstopper legal problem gets solved. Let's get people treating that constraint as business-critical and start coming up with clever solutions, and see how long "impossible" lasts.
  - WithinReason 3 years ago
    
    How is training on public source code a legal problem? Can you provide some links for that claim?
  - fooster 3 years ago
    
    What a bunch of pure fear mongering nonsense. The code that is produced by copilot indistinguishable from the rest of the code in the code base. Get over yourself.
- Gigachad 3 years ago
  
  I'd like to see people try to cite the sources for the code they wrote. It's highly improbable that without looking at anyone else's work in their lives, they would have created anything remotely similar to what they produced.
ryan_lane 3 years ago

I open source my code specifically so that it can be re-used, ideally even for cases like this. To me, software freedom is the ability for it to be used for effectively any useful purpose, so that others don't have to do the same work again.
I understand that some folks don't believe the same thing, and use copyleft licenses so that their code can't be re-used in a closed way, and that's fair. Github shouldn't be training their product on copyleft licenses.
It's fair to call out its misuse of certain licenses, but "the equivalent of money laundering for violation of Open Source licenses" is simply inaccurate, as many licenses allow this type of re-use explicitly.
- JoshTriplett 3 years ago
  
  Do you license your code under a license that doesn't require any form of attribution or preservation of copyright notices? (Even permissive Open Source licenses typically do require that.)
  If you do, then by all means they're welcome to use it without attribution or preservation of copyright notices, per the terms of the license you used.
  But for all the Open Source code, even permissive Open Source code, that does require attribution or preservation of copyright notices, that's still a license violation. People don't often think of permissive Open Source licenses as something that can be violated, but they absolutely can be.
danuker 3 years ago

Indeed, that is why I don't use it either.
Double-checking whether the generated part is a verbatim copy negates the speed advantage.
Possible infringements from similarity are even harder to search.
filereaper 3 years ago

Yes, thank you this is exactly what I was looking out for in the announcement.
Was looking for a way to instruct CodePilot to abide by the following rules:
- Only use Apache v2, MIT or BSD licensed work for its recommendations. (Or a specific license set)
- Only use code trained on public repositories.
- Provide code attributions of the source code where the recommendations originate from.
I'm not sure if the last point is possible given these GPT type architectures but it would really help during code reviews.
- jenadine 3 years ago
  
  > Only use Apache v2, MIT or BSD licensed
  Even if you use code under these license, you are still supposed to credit the authors by reproducing the license. So you need to know where it came from. Do you credit all the software the model was trained on?
  - tick_tock_tick 3 years ago
    
    > Do you credit all the software the model was trained on.
    That's a brilliant idea everyone can just copy and past the exact same attribution file and be done with it.
    
    capableweb 3 years ago
    
    "Just" copy paste a multi GB file with endless lines of authors? Brilliant? I might be missing something here.
  - filereaper 3 years ago
    
    We're not using any ML based machine generated source code.
    The remaining use of OSS code has attributions.
tick_tock_tick 3 years ago

> This tool remains the equivalent of money laundering for violation of Open Source licenses
That's what a good chunk of people do anyway at work. No one really cares nor will care. We were already moving in that direction anyway this will just accelerate it.

aunch 3 years ago

if you want an actual enterprise solution with in-customer-tenant/on-prem hosting, check out Codeium (https://www.codeium.com/enterprise)

disclaimer: i'm from the Codeium team. but really, we will even ship you a physical box if that level of data security is important to you

youssefabdelm 3 years ago

I tried using the playground code completion to ask it to write a script for pyautocad that colors in a grid, it completed with a different library, pyautogui. Even after saying "import pyautocad" myself, the function it completed was pyautogui.
I'm sure it's prob still very useful for people who care about the privacy tradeoff, but I've had more success with ChatGPT
- aunch 3 years ago
  
  Thanks for the feedback! Yeah the playground is a bit limited as it is for demo purposes.
  We are fans of ChatGPT and think that ChatGPT is pretty complementary to tools like Copilot and Codeium. ChatGPT is helpful for longer form exploratory questions from natural language while Codeium in its current form is great to accelerate your coding.
  - youssefabdelm 3 years ago
    
    No worries haha, not at all easy nor cheap to build these things. I'm sure it's useful in most cases.
BaculumMeumEst 3 years ago

you’re going to need to ship that emacs extension if you want to keep advertising on HN :-)
- aunch 3 years ago
  
  haha we'll have it out polished in the next week or so :P
  - grep_name 3 years ago
    
    That plus lisp / clojure on the roadmap is exciting :) I'll definitely give it a try when it comes out. The thing that slows me down is that I actually don't like to get visual feedback while I'm typing. I'm curious if codeium has a good way to compromise there
    
    BaculumMeumEst 3 years ago
    
    with copilot.el, i flip it on only when i’m doing something repetitious where i know it’s likely to give me a useful suggestion. then i flip it off. works well for me
hathawsh 3 years ago

I looked around your web site and I thought about trying out your product, but one feeling never stopped nagging me: even though I'm not in a large organization, I need absolute assurance that the AI is trained only on our code and permissively licensed open source software (like MIT or BSD.) Also, whenever it uses permissively licensed code, I need a complete list of everything it based its work upon so I can declare the relevant licenses.
Without that, I can't even entertain the idea of using an AI code tool for anything but private projects that I don't share with anyone.
- JoshTriplett 3 years ago
  
  Exactly this. Also, even the "our code" case if not done carefully may copy code from one or more internal projects that had in turn copied with attribution from an Open Source project, and fail to propagate the attribution.
cloudking 3 years ago

What model do you use? CodeGen?
- aunch 3 years ago
  
  our own!
MrZander 3 years ago

Are there plans to support the full Visual Studio IDE?
Edit: Also, Notepad++ support would be awesome
- aunch 3 years ago
  
  VS is on the roadmap, Notepad++ isn't something currently on the roadmap, but we'd be totally open if someone wants to write an open source plugin for it, just like we did for Vim/Neovim (https://github.com/Exafunction/codeium.vim) and are planning on doing with Emacs!

breckenedge 3 years ago

An extra $9/mo for:

* Simple license management

* Organization-wide policy management

* Industry-leading private

* Corporate proxy support

Wow. Who’s going to pay a 90% premium for these features?

Edit: OK seems like different marketing pages have different features. The list above comes from https://github.com/features/copilot/. Still seems like a very steep increase over the base. And I cannot believe there are only 400ish companies using copilot.

Aeolun 3 years ago

The price difference is mostly irrelevant to large corporations. The just need that license management.
tccole 3 years ago

9 bucks per developer isn’t that much. I getting a developer to be 5 percent faster is a huge gain for just 9 dollars.
- paxys 3 years ago
  
  Sure, but the harder part is measuring whether the developer is actually 5% faster. Otherwise you can make the same case for every $10/mo subscription service in the world, and so we should all be operating at infinite efficiency.
  - aleph_minus_one 3 years ago
    
    > Sure, but the harder part is measuring whether the developer is actually 5% faster. Otherwise you can make the same case for every $10/mo subscription service in the world, and so we should all be operating at infinite efficiency.
    After n such iterations, the developer gets 100*(1-0.95^n)% faster. So, after some such $10/month purchases, the developer gets so fast that buying another improvement yields diminishing returns.
- dx034 3 years ago
  
  Certainly for SV salaries. But $19/month per developer in addition to already existing cost can make a difference for regions with lower salaries.
  - breckenedge 3 years ago
    
    I agree. I can see these AI assistants becoming a game changer and eventually a requirement to keeping up, but the costs will be prohibitive for engineers in many regions.
Spooky23 3 years ago

If you want to make a conversation awkward, ask your account team about the indemnification for the AI’s potential copyright violations.
- WithinReason 3 years ago
  
  "Notwithstanding any other language in your Agreement, GitHub will defend you against any claim by an unaffiliated third-party that your use of GitHub Copilot misappropriated a trade secret or directly infringes a patent, copyright, trademark, or other intellectual property right of a third party, up to the greater of $500,000.00 USD or the total amount paid to GitHub for the use of GitHub Copilot during the 12 months preceding the claim."
  From here: https://github.com/customer-terms/github-copilot-product-spe...
  - fire 3 years ago
    
    the full segment also includes disclaimers on this so it isn't quite as cut and dry:
    > 4. Defense of Third Party Claims.
    Notwithstanding any other language in your Agreement, GitHub will defend you against any claim by an unaffiliated third-party that your use of GitHub Copilot misappropriated a trade secret or directly infringes a patent, copyright, trademark, or other intellectual property right of a third party, up to the greater of $500,000.00 USD or the total amount paid to GitHub for the use of GitHub Copilot during the 12 months preceding the claim. GitHub’s defense obligations do not apply if (i) the claim is based on Code that differs from a Suggestion provided by GitHub Copilot, (ii) you fail to follow reasonable software development review practices designed to prevent the intentional or inadvertent use of Code in a way that may violate the intellectual property or other rights of a third party, or (iii) you have not enabled all filtering features available in GitHub Copilot.
    and as I understand it:
    • i) means you can't modify copilot suggestions without losing this protection, • ii) isn't actually defined; this might mean they can use it as a crutch to avoid providing protection, • iii) means you aren't protected if you fail to enable settings that are not on by default but in theory allow you to avoid generations that reference "radioactive" licenses like gpl.
    
    dx034 3 years ago
    
    Still sounds pretty friendly to me. Obviously once you modify the code Github cannot be liable anymore for the lines edited. Otherwise you could push all kinds of infringement on them just because Copilot was active.
    Reasonable development practices certainly don't include scanning every line for copyright, but should catch the most obvious cases.
    Overall I'm surprised they're willing to bet $500k per client on the legality of Copilot.
  - Spooky23 3 years ago
    
    The word “patent” is missing and $500k is nothing. And how are you going to prove that you didn’t modify whatever was suggested by the AI?
    
    WithinReason 3 years ago
    
    1. The word "patent" is there
    2. By rerunning the network for the same input
ch4s3 3 years ago

People with corporate compliance departments.
das_keyboard 3 years ago

I think the privacy part would be a big part for some organizations, even if I do not know what this really means or what this implies for the other plans.

X-Istence 3 years ago

Will Github indemnify users against potential copyright lawsuits related to the code it regurgitates?

jonstewart 3 years ago

I came here to post exactly this. My team has budgeted for CoPilot, but we’ll only pull the trigger if copyright liability is resolved. I think it’s hilarious Microsoft is doing this, of all companies; it’s like they’ve forgotten that big businesses are run by risk-averse general counsel offices (often for good reason).
I can’t even imagine the hilarity that would ensue if I went to my GC right now to ask permission with so much in limbo; it’d be suicide by conference call.
Kiro 3 years ago

It doesn't regurgitate any code unless you bait it really hard. That's not how you use it. The only code it normally regurgitates is your own when it tries to autocomplete your boilerplate.

rectang 3 years ago

Is Copilot HIPAA compliant? It sends data to the cloud, so if you paste PHI…

lelandfe 3 years ago

Not an answer to your overall question of compliance, but to the specific point:
> Copilot for Business does not retain any telemetry or Code Snippets Data.
https://docs.github.com/en/copilot/configuring-github-copilo...
- the_duke 3 years ago
  
  The key word being "retain".
  It's probably still sent to their servers.
- ilc 3 years ago
  
  And if that page changes without notifying you?
  ... Unless they are selling the compliance as a feature, be careful.
  - dx034 3 years ago
    
    As long as it's part of the agreement, they would need to at least notify you if they're planning to change that.
ilc 3 years ago

Source: Ex Hospital IT.
I wouldn't risk it. It is too easy to write the wrong prompts and leak PHI.
ChatGPT:
"Write me a parser for this HL7 message..."
Copilot:
"Using this example message please write a parser for it..."
Yeah... If it was compliant, people would write those in a heartbeat.
Unless sold as HIPAA compliant, and the conditions of use for that compliance are known... don't trust it, for SAAS.
This is stuff covered in your yearly HIPAA briefing folks.
- hn_throwaway_99 3 years ago
  
  Realizing that many people still fundamentally misunderstand HIPAA and PHI:
  1. First, you really only need to worry about the specifics of HIPAA if you are a "covered entity" under the law (primarily a hospital or other healthcare provider, or a health insurer), or if you have signed a BAA with another company (more on that below). There are all sorts of misunderstandings that you can't, for example, say something like "Jane couldn't make the meeting because she's out with the flu" at a company - that's not how it works. Unless you're a covered entity, you're under no obligation to keep PHI private under HIPAA.
  2. If you do work at a HIPAA covered entity, it usually is made explicitly clear where patient data is or is not allowed. Even if GitHub Copilot were "HIPAA Compliant", unless they signed a HIPAA BAA (business associate agreement) with your company, it's still not OK to send them any PHI.
  Point being, there are plenty of reasons to be worried about customer privacy and data security, but people like to bring up HIPAA rules in lots of situations where they simply don't apply.
hn_throwaway_99 3 years ago

Why on earth would you ever put any PHI information in source control?
- rectang 3 years ago
  
  I would not. I avoid pasting PHI into source controlled files altogether.

moyix 3 years ago

My prediction that they'd offer on-prem hosting of the models (for businesses with IP / secrecy concerns) turns out to be wrong! Seems like a weird choice, but maybe their hands are tied by OpenAI not wanting to lose control over the models?

IshKebab 3 years ago

More likely their hands are tied by not many businesses wanting to pay for a DGX A100 to run the models!

rwalle 3 years ago

So the code suggestion comes from the same data trained for the public version, which could include GPL code or have other issues?

I doubt any company would use this in their production code. Internal tools, maybe.

theRealMe 3 years ago

Microsoft itself uses and suggests this internally for production code.
Maxious 3 years ago

This version specifically removes snippets that appear in open source code.
- JoshTriplett 3 years ago
  
  So instead of Open Source license violations for which it'd theoretically be possible to provide information to comply with those licenses, it instead only violates proprietary software licenses for which you shouldn't be using the code at all?
  - smegsicle 3 years ago
    
    was it trained on proprietary software?

youssefabdelm 3 years ago

So is the new Codex model not available for individuals? That's what they seem to imply in the blog announcement but thats not a difference they highlight on their landing page between the two plans.

caditinpiscinam 3 years ago

What's the benefit of using copilot over a package manager? Both help you reuse code that's already been written, but using packages gives you updates, explicit dependency tracking, documentation, etc.

LelouBil 3 years ago

I don't think you've ever tried copilot.
It's amazing at specific things, like being z context aware boilerplate generator or doing the scaffolding of an algorithm from a comment describing it.
That's really different than using libraries.
keithnz 3 years ago

it's not about reusing code, it's about generating code, ie, autocomplete, except the generator trys to recognize what you are trying to / most probably writing and suggests it. Great for boilerplate type code.

keithnz 3 years ago

Currently using codeium after they had an HN post not so long ago. Seems not too bad, though for C# its code generation is pretty poor, though apparently there is supposed to be improvements to the model soon.

wg0 3 years ago

There was Kite also that was shutdown. Microsoft can maybe keep it as a vanity product but seriously - I'm curious what kind of teams pr developers are using CoPilot and how much more productive you feel?

whalesalad 3 years ago

I love it. I love that it learns my codebase, my style, etc... and truly does feel like a second brain at times.
I am using it within JS/Vue and Python and really enjoy it a lot. You can write a simple comment like "upsert the object in the store, and if the item is incomplete recursively call a refresh fn until it is complete" and it will "magically" do the rest - down to understanding where the objects I am referring to are in my store, the attribute used to decide this ('completed_at'), down to the correct syntax for updating the data in a way that plays nice with vue reactivity.
It's also a stellar autocompleter in Python-land. I have been using more and more type annotations in my codebases, but even without that, it will usually guess the right attribute or function name.
I also dig the way it will automatically write a docstring for a function. It can sometimes be a great debugging method since I can just have it comment an undocumented fn to quickly glean what it is doing. I'll circle back to enhance the docstring usually too so it is less cookie cutter.
For writing blog posts it is really neat too, because it will help me write functions to illustrate certain points based on what I am trying to teach in the post. Most of the time I can come up with the most succinct and relevant example, but sometimes I cannot and this tool does a good job helping there.
It's not perfect, but the fact that I can type a few words, hit tab, and then correct any mistakes is really a magic experience sometimes.
- fortyseven 3 years ago
  
  When it seems to KNOW where you're going with something and generates a surprise suggestion automatically that's exactly where I'm going with something, it gets real freaky and fun.

arduinomancer 3 years ago

I’m curious if enterprise customers would have license concerns about the code produced from using this.

Have any big companies set policies on employees using these kind of tools? Do they allow them?

hashtag-til 3 years ago

One data point for you.
My company (big UK-based tech company) had an “all employees” sort of e-mail saying the use of Copilot, ChatGPT et al was not allowed for anything work-related or using company equipment due to unclear licensing model of the generated code.
I find many of these blanket rules silly, but in this case I find it is sensible to wait before polluting our products with this autogenerated code.
theRealMe 3 years ago

Fwiw Microsoft itself is betting on and suggesting devs use it for Microsoft production code.

commitpizza 3 years ago

Yeah I'll pay for something that is just slightly better/faster than writing it myself but will practically speaking steal my code and give it to others that pay for the same product. /s

I'd say no thanks. I think programmers using Copilot is paying for something that'll hurt them in the long run for a tiny benefit in the short run.

I don't trust Microsoft, and neither should you.

dx034 3 years ago

Honestly, I don't think the code I write is so special that no one should ever see it. The whole product might be worth protecting, that's why the codebase isn't public. But the individual files/functions are not special in any way. And that'll be the case for 99%+ of all code.
yed 3 years ago

Their training data comes from repos, not from you using the tool.
- commitpizza 3 years ago
  
  Well yes, but I use repos just as any other dev does. Most of my own projects I have converted to Gitlab, but still some projects still exist on Github.

IceHegel 3 years ago

How much better is the new model?

Settings

GitHub Copilot for Business is now available

Keyboard Shortcuts