Warning: LLMs Able to De-Anonymize User Accounts on Reddit, Hacker News & Other "Pseudonymous" Platforms; Report Co-Author Expands, Advises

4 min read Original article ↗

Originally published on my Patreon — join to read more content like this first!

Figure above: Real identities of Reddit users exposed by their discussions of favorite movies; from "Large-scale online deanonymization with LLMs" paper available at Cornell University's arXiv

Pseudonyms are crucial to virtual worlds, gamer hangouts on Reddit and Discord, and other online communities; they enable us to be more expressive and experimental with our activities online, without fear of personal judgement, or having our real life identities exposed.

Maybe not anymore.

At least that’s according to a new study that should probably be labeled, Warning: Hazard Ahead.

A team of AI academics working with a researcher at Anthropic, a leading LLM company, were recently able to prove that Large Language Models like Claude and ChatGPT can de-anonymize pseudonymous user accounts on Reddit and Hacker News:

Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to:

(1) extract identity-relevant features,

(2) search for candidate matches via semantic embeddings, and

(3) reason over top candidates to verify matches and reduce false positives.

So for example, the researchers were able to use this approach to connect pseudonymous users on Hackers News with their real life profiles on LinkedIn (”at 99% precision”).

In another test, they were able to connect different Reddit accounts with the same real life owner, simply through the two accounts’ disparate discussions of favorite movies.

More concerning, the authors add, it would be difficult for AI companies to block people from using their LLM service for deanonymization purposes. For one thing, the techniques they used involve some common, non-malicious content summarization and search/ranking tasks.

For another, they note, “Not revealing any data on online platforms is difficult, as the data we use is the very data that makes online communities worthwhile.” IE, what’s the point of using Reddit if you’re not there to talk about your favorite movies, games, etc?

Many or most of us use pseudonyms for pretty casual reasons. But for some of us, it’s a literal matter of life and death.

This concern first hit my radar when Linden Lab briefly considered making it possible to link one’s Second Life avatar with their Facebook account. 15 years ago, this seemed like a good idea. Until:

A woman who read my blog quietly messaged me and torched that notion to the ground:

“You don’t understand,” she explained. “I have a friend in Second Life who uses it as her social escape valve, because she’s hiding from her abusive husband. He’s trying to find her. She can’t share anything on Facebook, because he’ll use it to track her down. She once mentioned playing an MMO on Facebook. Her husband figured out her username and started stalking her in the game.”

[From Making a Metaverse That Matters]

This dedicated ex-husband discovered her username (I assume) by doing a lot of time-consuming detective work on Facebook, blogs, and other Internet sites.

But LLMs make this stalking process as quick and easy as entering in a detailed prompt.

Many players of Second Life, VRChat, and other virtual worlds also use their in-game avatar name on Reddit and Hacker News. Could the methods outlined in this paper be used to de-anonymize people who are using game avatar names on Discord and forums?

While declining to discuss specific platforms in the interest of academic ethics, co-author Joshua Swanson tells me this:

“In general, any online forum where people write enough text is potentially vulnerable, and platforms where posts are publicly searchable are more exposed.”

He adds that online identities are already vulnerable to exposure via AI search:

“The basic capabilities already exist in current models, and it’s not straightforward for LLM providers to prevent misuse,” as he puts it. “Raising awareness of this is a major reason we published the paper!”

How can owners of online accounts better protect their real life identity in the LLM era?

Here’s some advice from the paper’s lead author, Simon Lermen:

Individuals may adopt a stronger security mindset regarding privacy. Each piece of specific information you share – your city, your job, a conference you attended, a niche hobby – narrows down who you could be. The combination is often a unique fingerprint. Ask yourself: could a team of smart investigators figure out who you are from your posts? If yes, LLM agents can likely do the same, and the cost of doing so is only going down.

Emphasis mine. More here.

Co-author Joshua Swanson offers additional guidance going forward:

“Use new accounts to post truly sensitive questions or information. Be aware that it’s not any single post that identifies you, but the combination of small details across many posts. And consider never posting anything you truly don’t want shared with the world.”

Read their whole report here.

Discussion about this post

Ready for more?