IBM's Best Customer

13 min read Original article ↗

IBM still makes magnetic tape. Not as a legacy product — as a growth business. Their tape storage division is the dominant player in a market growing at roughly 8% annually.1 The biggest customers are the hyperscalers: Microsoft confirmed tape use through their Azure CTO in 2018,2 Google's VP of Engineering called them "the world's biggest single consumer of magnetic tape cartridges" in 2011,3 and Meta confirmed tape infrastructure in an SEC filing in 2022.4 The LTO Consortium shipped 152.9 exabytes of compressed capacity in 2023, the highest ever.5

The reason is economics. Tape costs four to eight times less per terabyte than disk for data you rarely access.1 It draws zero power when idle. IBM's current enterprise cartridge holds 50TB natively, 150TB compressed.6 Their Diamondback library fits nearly 62 petabytes in a standard rack. In the lab, IBM Research has demonstrated densities that would put 580TB on a single cartridge.7

This is why services like AWS Glacier exist. Cold data — old emails, closed accounts, years of transaction logs — sinks to tape and stays there effectively forever at near-zero marginal cost.

Follow that logic far enough and you get to a facility in Utah.

We built an archive we couldn't read. That just changed.


The Utah Data Center in Bluffdale was completed in 2014 at a cost of $1.5 billion and is operated by the NSA.8 The storage capacity is classified, but the building specs aren't. Unclassified Army Corps of Engineers documents describe 100,000 square feet of data hall across four rooms.9 Forbes analyzed the blueprints in 2013 and estimated 3-12 exabytes — at 2013 tape density.10 Current cartridges hold roughly 10x what was available then. The facility was built for 65 megawatts but initially loaded at 30, designed from the start to expand. In February 2025, the Army Corps published a draft environmental assessment for a campus expansion.11

Bluffdale isn't the only one. The NSA operates data centers at Fort Meade, in Texas, Georgia, Hawaii, and Colorado — millions of square feet of classified technical space collectively.12 And the footprint is still growing. In December 2024, the NSA's CIO said publicly: "We are about three, maybe four years into a journey of moving our capabilities into the Amazon cloud. We are in the process of bringing online new data centers for our work."13

That's a lot of storage. The natural question is whether it's enough to hold everything worth collecting — and it is. About 70-80% of global internet traffic is streaming video — Netflix, YouTube, TikTok — the same content served to millions of viewers.14 No intelligence value. Strip that out, plus software updates, gaming, and ad delivery, and the intelligence-relevant fraction — emails, metadata, messages, search queries, financial transactions, compressed voice — amounts to tens of exabytes per year. Over two decades, that's a low single-digit number of zettabytes. The IC's combined storage footprint, based on the facility specs above, appears to comfortably exceed that.

So the infrastructure exists to store it. The legal framework exists to collect it. And the contents of the archive are, at this point, reasonably well documented.


In 2021, an FBI analyst ran a batch query on the communications of more than 19,000 donors to a congressional campaign. The analyst claimed foreign influence. DOJ reviewers found that 8 of the 19,000 had any connection to foreign intelligence.15 The campaign's identity is still classified.16 The other 18,992 people had their communications searched without a warrant, without probable cause, and without any way to find out it happened.

That search was legal. For five decades, one legal doctrine has made all of this possible. The third-party doctrine — from Smith v. Maryland (1979)17 and United States v. Miller (1976)18 — says that data you share with a company loses Fourth Amendment protection. Your carrier, your email provider, your bank: hand them your data and you've "assumed the risk" it gets disclosed to the government.

Section 702 of the Foreign Intelligence Surveillance Act lets the government collect communications of foreigners abroad, but inevitably sweeps in Americans' calls, texts, and emails in the process. The third-party doctrine is why warrantless searches of that data are legal. It's why the data broker market exists. It's why the FBI can query the archive without a warrant.

The Supreme Court started narrowing this in Carpenter v. United States (2018), ruling that historical cell-site location data is constitutionally protected even though a carrier holds it.19 The government's response wasn't to stop collecting. It was to buy from commercial data brokers what it could no longer compel.

In September 2025, FBI Director Kash Patel stood in the Oval Office as President Trump signed a national security presidential memorandum directing the FBI to investigate civil society organizations and their donors.20 Six months later, at a Senate Intelligence Committee hearing on March 18, 2026, Patel confirmed: "We do purchase commercially available information."21 The FBI disputed the characterization but didn't deny the practice. The political directive and the surveillance capability, announced by the same official, six months apart.

For scale: the data broker Gravy Analytics and its subsidiary Venntel, before an FTC enforcement action in late 2024, had been collecting over 17 billion location signals from approximately one billion mobile devices daily, with precision to one meter.22 The IC's own partially declassified report acknowledged it "acquires a significant amount of commercially available information" but couldn't determine how much or what it was doing with it.23

All of this has been reported. All of it has been debated. And it has been treated as largely theoretical, because of a practical constraint that no longer applies.


The NSA employs 30,000-40,000 people.24 Even if every one of them did nothing but read intercepted communications, they couldn't process a meaningful fraction of what's been collected. The ratio of data to human processing capacity is essentially infinite.

This created something that functioned like privacy. Not actual privacy — practical obscurity. Your data existed somewhere between Utah and Fort Meade, but the probability of anyone looking at it was effectively zero unless you specifically attracted attention.

You were never private. You were just too unimportant to spend resources on.

That's what's changing. Large language models process, summarize, cross-reference, and pattern-match unstructured text at scales no human workforce can approach. The same technology that summarizes a PDF can, in principle, be pointed at an archive: find all references to a person across a time period, reconstruct the social graph of an organization from metadata, surface behavioral patterns in a dataset.

The NSA's CIO said this publicly in 2013: "We're learning, just like industry is, that in big data, machine analytics are critical to success. So we'll have machine analytics running on everything."25 The agency established an AI Security Center in September 2023.26 Palantir's AIP platform already integrates LLMs into classified air-gapped deployments.27 ICE uses Palantir's ImmigrationOS, a $30M system combining passport data, Social Security records, IRS data, and license plate readers.28 (End-to-end encryption protects some content, but metadata — who you contacted, when, from where — isn't encrypted, and the data broker market bypasses encryption entirely.)

It's worth being precise about the inference gap here: the archive exists, and AI processing capability exists, but that does not prove they've been combined. What is documented is that the government has actively sought to combine them. The New York Times reported in early 2026 that the Pentagon wanted AI company Anthropic to help "collect and analyze unclassified, commercial bulk data on Americans, such as geolocation and web browsing data."29 When Anthropic refused, the Department of War threatened to designate it a supply chain risk and invoke the Defense Production Act.30 In an internal memo seen by the Financial Times, Anthropic's CEO described the government offering to accept the company's terms if they deleted a specific phrase about "analysis of bulk acquired data."31 A federal judge issued a preliminary injunction, calling the government's retaliation "classic illegal First Amendment retaliation."32

The archive didn't get bigger. It got legible.

A legible archive is a different thing than a large one. A government with a large archive has a haystack. A government with a legible archive has a searchable database of its population's behavior, associations, movements, and communications — queryable retroactively, at scale, without human analysts. Everyone who attended a protest. Everyone who donated to a campaign. Everyone who contacted a journalist, visited a clinic, or searched for a lawyer. Not as a hypothetical — as a query.


Most discussion about surveillance focuses on whether you're being watched now. The bigger risk may be retroactive: data collected legally under yesterday's rules, stored indefinitely at near-zero cost, queryable with today's tools, against tomorrow's enforcement priorities.

On September 25, 2025, President Trump signed NSPM-7, directing roughly 200 FBI Joint Terrorism Task Forces to investigate organizations and their funders — under categories that include constitutionally protected viewpoints.20 The Brennan Center concluded the memo effectively commands law enforcement to investigate anyone who "writes, organizes, advocates, litigates, or otherwise speaks out" against the administration's interests, "as well as anyone who donates to them."33

This has happened before. In 1942, the Second War Powers Act repealed census confidentiality. The Census Bureau — an agency whose entire credibility depends on the promise that individual data will never be used against respondents — provided the War Department with block-by-block tabulations showing where Japanese Americans lived. In 1943, it went further: individual names, addresses, citizenship status. 120,000 people were interned. Census Director Kenneth Prewitt issued a formal apology in 2000, acknowledging that "senior Census Bureau staff proactively cooperated with the internment."34 The data had been collected for one purpose under one set of norms. It was weaponized under another. The archive doesn't update to reflect changing standards. It holds everything.

Then there's the VPN paradox. On March 26, 2026, six members of Congress revealed that under the NSA's own targeting procedures, if your location is unknown — because you're using a VPN — you're presumed to be non-US and lose warrant protections under Section 702 and EO 12333.35 Federal agencies recommend VPNs. Using one may strip you of the constitutional protections those agencies are supposed to provide. DNI Gabbard hasn't responded.

The pattern is the same across all three: the government knows more about its citizens than citizens know about their government, and the gap is widening. That asymmetry isn't a policy choice. It's an infrastructure outcome. Changing administrations doesn't change the infrastructure.


But changing the law can.

The oversight system as it currently exists has documented its own failure. In one reporting period, the FBI violated its own querying rules 278,000 times.36 The FISC called it "persistent and widespread." This month, the FISC found that the compliance problems the government claimed to have fixed are still ongoing and extend beyond the FBI to the intelligence community at large.37

The abuses are specific. FBI agents searched 702 data for an analyst's dating prospects.38 A state court judge who contacted the FBI to report a police chief's civil rights violations had his own communications searched. Rep. Darin LaHood (R-IL), a member of the House Intelligence Committee — the body that oversees the FBI — publicly stated his own communications were improperly queried by an agent who used only his last name.39

The intelligence community argues these tools are essential and that recent reforms are working.40 Whether an oversight system that documented 278,000 of its own violations — and that a court found still isn't fixed in 2026 — can be described as "working" is the question Congress is answering this week. But the answer doesn't have to be the one we've been getting.

Section 702 expires April 20. Congress returns from recess on April 14.41 House leadership is trying to bring a clean extension to the floor this week — days before expiration. Carriers have privately warned the administration they'll stop collecting data when the law lapses.42 The Government Surveillance Reform Act — bipartisan, co-sponsored by Wyden and Lee in the Senate, Davidson and Lofgren in the House — would require a warrant for FBI searches of Americans' 702 data, ban purchasing data from brokers without a warrant, and impose a five-year retention limit.43 It's been endorsed by the ACLU, EFF, Americans for Prosperity, and former House Judiciary Chairman Bob Goodlatte (R). A coalition of 133 organizations has urged Congress to block a clean extension.44


The Fourth Amendment exists because the founders lived under general warrants — instruments that let the government search first and develop suspicions later — and wrote a constitution to prevent their return.

The vote on whether to extend the architecture that has functionally replaced them is happening this week. And one week after that, on April 27, the Supreme Court hears oral argument in Chatrie v. United States.45

The facts are concrete. After a bank robbery investigation went cold, police drew a geofence — a 300-meter circle, longer than three football fields — around the crime scene.46 The circle encompassed not just the bank but a nearby church. Google searched the location records of over 500 million users to identify 19 devices in the area during a one-hour window. Everyone who prayed at that church that afternoon had their location data searched — not because they were suspected of anything, but because they worshipped within 150 meters of a crime. The detective requested expanded data on nine users and de-anonymized three, without returning to a judge at any step.

The government's argument is that no Fourth Amendment search occurred — Chatrie voluntarily shared his location with Google, so he has no constitutional protection.47 That argument has a name. It's the third-party doctrine — the same doctrine that makes warrantless 702 searches legal, that sustains the data broker market, that lets the FBI purchase your location without a warrant. Whether it survives this case may determine whether the archive remains open.

The Fourth Circuit split 7-7, producing nine opinions across 126 pages.48 The Fifth Circuit already ruled geofence warrants categorically unconstitutional.49 Over 25 amicus briefs were filed by organizations that agree on almost nothing else: the ACLU and the Cato Institute, Americans for Prosperity and the Innocence Project, Google and the Reporters Committee for Freedom of the Press.

Whether the archive can be queried without a warrant is being decided by Congress this week. Whether the Fourth Amendment protects you from dragnet location searches is being decided by the Supreme Court this spring.

IBM's tape division will keep growing. The cartridges will keep stacking. Without a legal framework, the archive will simply become the way things are — your entire digital history searchable, sellable, and queryable by agencies you've never interacted with, for reasons you may never learn.


Every factual claim above is sourced to public record. This piece was written with the assistance of AI in order to publish before the Section 702 vote. If anything is wrong, let me know.

← back