— days since last takedown!
UK Biobank holds genetic, health, and lifestyle data on half a million British volunteers. It has given 20,000 researchers around the world access under strict agreements that prohibit sharing data further. And yet, researchers are repeatedly uploading participant data by mistake to public GitHub repositories.
According to The Guardian, UK Biobank has been closely monitoring the situation, contacting researchers directly then issuing takedown notices when repositories are not being deleted—sometimes by researchers and students Biobank never gave data in the first place.
This tracker monitors the 110 notices filed so far, targeting 197 code repositories by 170 developers across the world, using public data from GitHub's DMCA archive.
From only two pieces of information (approximate date of birth and date of a single major surgery), the Guardian was able to re-identify a volunteer in one of the exposed datasets. For BMJ, Jess Morley and I argue that UK Biobank is harming participants by dismissing re-identification risks but advising them to now limit what they share online. Institutions like Biobank must demonstrate humility, a commitment to listening to privacy experts, and a willingness to learn.
Built by Luc Rocher, Oxford Internet Institute, University of Oxford
What is UK Biobank trying to take down
UK Biobank uses copyright takedown notices, a mechanism often associated with removing pirated software and stolen code, to remove health data from GitHub. The UK has no equivalent of DMCA for privacy breaches that would compel a platform to act so quickly.
Looking at the takedown notices, we often see specific files being targeted rather than entire repositories—possibly to justify the copyright infringement as required for a takedown notice. Nearly half are Jupyter or R notebooks, which can contain a few rows of data. A quarter are genetic and genomic data files (PLINK, BOLT-LMM, BGEN) that directly encode participant genotypes or association results. Tabular datasets (CSV, TSV, Excel, and serialised R objects) account for another large share and could contain phenotype or health records. The remainder includes analysis scripts, documentation, and compressed archives.
Timeline of takedown notices
The first takedown notice was filed in July 2025. Since then, the pace has been steady, with a total of 110 requests to GitHub. Interestingly, the requests stopped in January, February, and most of March 2026. It's hard to believe that no researcher has mistakenly uploaded UK Biobank data during these months. The notices restarted end of March, just after the Guardian's investigations revealed the ongoing data exposure and the ineffectiveness of takedowns.
Where in the world
Developers targeted by UK Biobank's takedown notices are based in at least 14 countries. The true number is likely higher: of the 170 developers identified in the notices, only 76 list a location on their GitHub profile. Most appear to be from United States and China.
- 24 United States
- 22 China
- 7 United Kingdom
- 5 Germany
- 4 Hong Kong
- 4 Australia
- 3 Spain
- 1 South Korea
- 1 Greece
- 1 Qatar
- 1 United Arab Emirates
- 1 Switzerland
- 1 India
- 1 Netherlands
Methodology
To build this webpage, I used data from the github/dmca repository, where GitHub publishes the full text of every DMCA takedown notice it receives. When a rights holder asks GitHub to remove content that infringes their copyright, the notice is posted publicly as a Markdown file in this repository. According to The Guardian, UK Biobank has used this process to request the removal of files or repositories that contain (or that it believes contain) participant data covered by its data access agreements.
To identify UK Biobank-related notices, I match filenames containing the
slug "uk-biobank" (the convention GitHub uses when naming notice files). Just in case, I also
search the full text of every other notice file for the phrases "UK Biobank" or
"UKBiobank" (case-insensitive) to catch notices filed under different slugs, such as
those submitted on behalf of UK Biobank. From each matching notice, I
extract the filing date (parsed from the filename, which follows GitHub's
YYYY-MM-DD-slug.md convention) and all GitHub repository
URLs mentioned in the notice body. URLs pointing to GitHub's own infrastructure
(e.g. github.com/contact or github.com/site) are excluded.
For each unique GitHub username found in the notices, I query the GitHub REST API
(GET /users/{username}) to retrieve the user's
public profile, specifically the self-reported location field. This is a free-text
string that users enter voluntarily. It may be a city, a country, a university name,
or left blank entirely. Deleted accounts return a 404 and are not included further.
I derive countries from the raw location strings by hand. When a user's GitHub profile does not include a location, I also determine their country by inspecting their GitHub profile and associated email address domains. This process is inherently imperfect: some locations are ambiguous (e.g. "Cambridge" could refer to the UK or the US), and many users do not provide any location at all. Of the 170 unique developers in the dataset, only 76 have a location that could be resolved to a country.
The data is regularly refreshed by re-running the collection script against the latest state of the github/dmca repository. This page does not make any claims about the content of the targeted repositories, including whether they contained actual participant data, derived datasets, analysis code, or just documentation. It reports only what is visible in the public DMCA notices filed by UK Biobank.
Further reading
The exposure of Biobank data on GitHub is the latest in a series of governance challenges for UK Biobank.
Mar 2026
Confidential health records exposed online — The Guardian
Investigation revealing that UK Biobank participant data had been uploaded to public GitHub repositories by researchers sharing their code. With a volunteer's consent, journalists successfully matched their record in an exposed dataset using only their month and year of birth and the date of a single major surgery.