FAQ: Privacy Statement update on Copilot data use for model training (Free/Pro/Pro+) · community · Discussion #188488

Hello GitHub Community👋

We’re sharing an update to our Privacy Statement and Terms of Service about how we use personal data to develop, improve, and secure GitHub products and services, including training AI and machine learning models that power GitHub Copilot.

For the full announcement and complete details, please visit the blog post: Updates to GitHub Copilot interaction data usage policy.

Frequently Asked Questions

Below is an FAQ covering what’s changing, who’s affected, what data may be used (when enabled), safeguards, and opt-out instructions.

Why is GitHub making this change and when will it go into effect?
- We’re making this change to improve model performance and deliver a better user experience. As Copilot usage continues to increase dramatically, real-world data will help our models cover the increasing number of scenarios they’re now being used for. The change goes into effect on April 24. Users received 30 days advance notice and can opt out at any time.
Why are you only using data from individuals while excluding businesses and enterprises?
- Our agreements with Business and Enterprise customers prohibit using their Copilot interaction data for model training, and we honor those commitments. Individual users on Free, Pro, and Pro+ plans have control over their data and can opt out at any time.
Are students and teachers that access Copilot Pro for free affected by this update?
- No, under our policy, students that have access to Copilot Student and teachers that access Copilot Pro for free are not affected.
What data are you collecting?
- When an individual user has this setting enabled, the interaction data we may collect includes:
  - Outputs accepted or modified by the user
  - Inputs sent to GitHub Copilot, including code snippets shown to the model
  - Code context surrounding the user’s cursor position
  - Comment and documentation that the user wrote
  - File names, repository structure, and navigation patterns
  - Interactions with Copilot features including Chat and inline suggestions
What can individuals do if they don’t want their inputs, outputs, or code snippets, used for model training?
- They can opt out via their GitHub Copilot account settings at any time.
Who will have access to this data outside of GitHub?
- GitHub and Microsoft personnel working on AI model development may access interaction data collected for training. We may also engage service providers to assist with model training on our behalf, subject to contractual obligations to use the data only for providing services to GitHub. Third-party model providers do not receive this data for their own model training. We do not sell user data. For more information on how we handle your data, see our Privacy Statement.
What is a "GitHub affiliate" and who does that include?
- A GitHub affiliate is a company that is part of the same corporate family as GitHub — meaning any entity that controls, is controlled by, or is under common control with GitHub. Today, that primarily means Microsoft Corporation and its subsidiaries. You can review Microsoft's privacy practices in the Microsoft Privacy Statement.
Companies that provide AI models or other services to GitHub — such as model providers, cloud hosting vendors, and other service providers — are not affiliates. They are service providers or subprocessors, and they are bound by contractual obligations that restrict how they can use your data. Specifically, service providers and subprocessors may only process your data on GitHub's behalf and at GitHub's direction — they do not receive your data for their own independent purposes, including their own model training. You can see GitHub's current list of subprocessors here.

In short: affiliates are part of our corporate family. Service providers work for us under contract. These are distinct relationships with different rights and obligations.
How do you protect sensitive data?
- We’ve implemented multiple layers of protection for sensitive data including automated filtering designed to detect and remove API keys, passwords, tokens, and personally identifiable information. We also provide user controls for which repositories Copilot can access. We limit data access to authorized personnel working directly on model improvement and safety, and log and audit access.
Do I need to do anything if I previously disabled the setting titled "Enabling or disabling prompt and suggestion collection"?
- No, if you previously disabled that setting your preference will carry over and your interaction data will not be used for model training.
Will code stored in private repositories be used for model training?
- If a Copilot user has their settings set to enable model training on their interaction data, code snippets from private repositories can be collected and used for model training while the user is actively engaged with Copilot while working in that repository. This applies only to code snippets and outputs that are actively sent to Copilot during a user's session — we do not access or pull code from private repositories at rest. To prevent code snippets shared with Copilot during active sessions from being used for model training, users can disable model training in their GitHub account settings.
What safeguards are in place to prevent enterprise code being used for model training due to an individual using a personal Copilot license while working in their employer’s codebase?
- We do not train on the contents from any paid organization’s repos, regardless of whether a user is working in that repo with a Copilot Free, Pro, or Pro+ subscription. If a user’s GitHub account is a member of or outside collaborator with a paid organization, we exclude their interaction data from model training.
Other companies aren’t using user data to train models. Why is GitHub?
- We’re giving users the option of allowing us to train AI models with their interaction data because internal testing shows it will improve the accuracy of Copilot’s suggestions for all users. We can only speak on behalf of GitHub, but it’s worth noting this is similar to the approach that Microsoft, Anthropic, and JetBrains are taking.
You’re collecting code snippets, prompt text, AI responses, and detailed interaction patterns. How is this not giving you my entire codebase?
- We’re collecting data only from your interactions with Copilot—we are not pulling any data from your codebase at rest.
Security researchers found that Copilot Chat could expose private code from repositories that were temporarily public then set back to private. Why should we trust your guarantees about protecting our data?
- We take the security of your data seriously and continue to invest in safeguards. The scenario described involved third-party collection of temporarily public data, which is outside GitHub's control. For data collected under this new program, we apply access controls, audit logging, and automated filtering to protect your code.
I selected Copilot because GitHub said it didn’t train on user data. This feels like a bait and switch.
- As Copilot usage continues to increase dramatically, we’ve identified a need for real-world data to help our models cover the increasing number of scenarios they’re now being used for. We are committed to giving developers control over whether their interaction data is used for training and will always be transparent about our use of this data.
If this data collection is truly safe, why don’t you enable it by default for enterprise customers as well?
- Our agreements with Business and Enterprise customers prohibit using their Copilot interaction data for model training, and we honor those commitments. That said, we use Microsoft interaction data for model training, with plans to incorporate GitHub interaction data, and are very confident in our ability to protect data used for model training.
If your AI needs real user code to be competitive, isn’t that an admission that your advantage comes from exploiting your existing user base rather than better research?
- 26 million developers are now using GitHub Copilot, representing a huge range of use cases and needs. We want to deliver an exceptional experience for every user, which is why we’re giving users the option of allowing us to use their usage data to improve Copilot’s ability to perform a more diverse range of coding tasks.
If training on user code is so obviously beneficial, why does every company try to hide it behind toggles, footnotes, or tiered pricing instead of proudly asking for consent?
- We can only speak for GitHub, and we are being as transparent as we can be about this change. We are notifying affected users directly, we published a blog post, and we have in-product notifications alerting users of the upcoming changes and directing them to update their user settings per their preferences.
I noticed the private repository access language was removed from the Privacy Statement. Where did it go?
- We moved the description of when GitHub personnel may access private repository content from the Privacy Statement to Section E (Private Repositories) of the Terms of Service. This is a consolidation, not a change. The same access limitations that have always applied continue to apply. See the changelog for more details: Updates to our Privacy Statement and Terms of Service: How we use your data.

Join the discussion

Have additional questions or feedback? Please share them in the comments below.