The banality of surveillance

For a while, I worked at a company that branded itself as the “enterprise social network,” though for all intents and purposes, it was the enterprise Facebook.1 Facebook was moving all of our personal communication out of emails and into a shared feed of posts and replies; our product was designed to do the same thing for our professional communication.

That meant our product was also designed to look like Facebook.2 There was a newsfeed; there were messages and threads; there were users; there were user profiles. There was a like button. It was Facebook, in a small corporate sandbox.

A couple months after I joined the company as a data analyst, the product and engineering department held one of its regular hack days.3 Everyone had 24 hours to work on anything they wanted to, and then, a strict three minutes to present their project to the entire department. It was judged; there was a stage and an emcee; there was a soundboard full of jeers; there were trophies for winners; there was an open bar. There was pride in it, and everyone wanted to put on a good show. For this particular hack day, the data team was participating for the first time, and in the days leading up to the event, we talked about our ideas. What do you want to do, we asked each other? What are you going to build?

My idea felt obvious. If you had access to data on how people were using Facebook—which is the data we had, in a bizarro bureaucratic sort of way—what would be the first thing you’d look up? If you knew someone else had that data, what would be the last thing you’d want them to look up?

Profile views. It’s clearly profile views. It’s who’s looking at your profile; it’s the profiles that you’re looking at. That was the holy grail; the third rail; the third life. If you wanted to put on a good show—if you wanted to make a drunk audience look up from their laptops—that’s the data that will make them pay attention.

And it was, of course, data that we already had. Like any responsible SaaS product, our app was thoroughly “instrumented”—it recorded every click; every page view; every mobile interaction. We tracked the user who did it; the device that they did it from; their browser; their IP address; the sequence of clicks that came before; the sequence that came after. This type of logging was all generic, mundane, the “industry standard.” We used the same tracking libraries that everyone else used. We recorded the same events that everyone else did. It was mindless and mechanical—years before I joined, an engineer had stuck a few lines of code in our app’s codebase, it captured millions of events an hour, and everything was dumped into a huge table called “event properties.” Because, as the legal documents all say, some piece of it might one day be useful to “improve our Services.”4

Though all this data was carefully protected in an encrypted database behind several firewalls and one very long password, that was not what made it secure. It was secure because it was a pain to use. You had to come up with interesting—or, you know, indelicate—questions to ask of it. You had to figure out how to answer that question using a sprawling array of machine-generated event logs. And you had to write 595-line SQL queries to do it all.5 But any employee—at our company, or at the hundreds of other SaaS startups that were functionally identical to us, and who all logged identical streams of data—could write that query, combine those logs, and answer those questions.

Or, more generally: Prior to working in Silicon Valley, I assumed that data was secure because it was obfuscated by impressive cryptography and stored in buildings that were guarded by tall fences. And I assumed that what we did on the internet was private—and people’s ability to draw any inferences from what we did was difficult—because “surveillance” required complex technologies that could detect faint patterns in millions of disparate signals. Yes, Target might be able to figure out if someone is pregnant before their father could, but that took years of careful observation and sophisticated science. It took well-trained humans working with well-trained models, years in the making.

If only. On an internet where everything is tracked—and man, everything is tracked—surveillance does not require a Ph.D., or even any particularly advanced math. It just requires a junior analyst with 24 hours of free time.6 Because the real fences around the data all we leave behind—and the real protections of our privacy—are neither tall nor covered in barbed wire. They are simply fences that are annoying to climb.7 We are not hidden, on the internet; mostly, people are just too uninterested to bother looking for us.

Everyone already knows what happened: The United States Department of War wanted to use Claude.8 Anthropic wanted them to use Claude, but with restrictions. The two sides could not agree; the negotiations broke down; the negotiations turned into outright hostilities; the hostilities became very public. The Atlantic reports on part of what went wrong:

Anthropic learned that the Pentagon still wanted to use the company’s AI to analyze bulk data collected from Americans. That could include information such as the questions you ask your favorite chatbot, your Google search history, your GPS-tracked movements, and your credit-card transactions, all of which could be cross-referenced with other details about your life.

When we hear stories about “mass surveillance” and “artificial intelligence” and the “CIA,” it is tempting to imagine systems of unfathomable reach and sophistication. It is tempting to worry about shadowy government agencies using AI to hack into our phones and turn them into sonar transmitters.9 It is tempting to see the the Greco—a million sensors and cameras feeding into a machine that “doesn’t think, but reasons:”

It reads every permutation in every wager in every seat in the entire casino, hand by hand. It’s wired into floor security cameras that measure pupil dilation, and determine if a win is legitimate or expected. It gathers bio feedback—players’ heart rates, body temperatures. It measures, on a second-by-second basis, whether the standard variations of gaming algorithms are holding or are being manipulated. The data is analyzed in real time, in a field of exabytes.

For better or for worse, reality is almost certainly much more mundane. Nobody wants to use AI to bug our phones, or to build a sprawling nerve system to track our vitals, because our phones are already bugged. Everything we do on them is recorded a dozen times over, by our wireless carriers, by the websites we visit and the apps we use, by the vendors and ad networks those companies are sending their data to, and in the marketplaces that sell that data. We built the eyes of the Greco decades ago.

But that data has remained relatively secure—or maybe more precisely, its potential energy has remained relatively buried—largely because it’s tedious to work with. It’s messy; it’s scattered across different sources and in different formats; combining it together is a pain, and most of us are simply not interesting enough to investigate. Data analysts who work at shadowy government agencies have lives too, and they do not want to write 595-line SQL queries either.

But AI doesn’t mind. And that’s the boring danger of what happens next: Not of AI becoming a superintelligent Sherlock Holmes finding impossible patterns in its enormous mind palace,10 but of it being a million monkeys at a million typewriters, doing the grunt work no person wanted to do. Because when prying questions are a prompt away—rather than 24 hours of work away—who wouldn’t get tempted to pry?

It does make you wonder though: While defense and intelligence agencies are unique in the legal and extralegal alleys in which they operate, they are not unique in their ability to warehouse massive amounts of data. In fact, as The Atlantic pointed out, these agencies aren’t collecting this data themselves; they are buying it from other people, in open markets:

The government can purchase detailed records of Americans’ movements, web browsing, and associations from public sources without obtaining a warrant, a practice the Intelligence Community has acknowledged raises privacy concerns and that has generated bipartisan opposition in Congress. Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale.11

But if those agencies can buy that data, so can other people. If they can use AI to trawl through it “at massive scale,” so can other companies—especially if those companies are already collecting those events and messages themselves.12

People often talk about how AI breaks many of the foundational floorboards of our society. Our formal and informal senses of truth are built on the assumption that realistic photos and videos cannot be faked; that is breaking down. Our ambitions and careers are built on the assumption that intelligence and expertise are scarce; that is breaking down. Our sense of how the world works is often defined by what is possible for other people to do and what is worthwhile for them to do. Sure, we know it is possible for us to be monitored, but why would anyone bother watching the tapes? Everyone must have more important things to do with their time.

Banality is a sturdy armor. Or was, anyway.