Tools for detecting AI generated content

Can you tell if a piece of content was created by AI?

For the keen observer, the answer might be yes if we’re talking about a weirdly formal email cluttered with em dashes, or a repetitive playlist titled ~coffee shop study vibes~, or an image of a barista with an eerily shaped hand.

But more than likely, you’ve recently encountered AI generated images, audio, video, and text, and had no idea. And as AI models become increasingly better at generating convincingly human-like outputs, it will become increasingly difficult to validate the provenance of a piece of content.

The ability to tell if something is AI generated is immensely valuable. Knowing the origins of an artifact can help to prevent deep fakes, misinformation, and misattribution, and is critical to maintaining the hygiene of our information ecosystem. The fact that we can create, share, and discover so much information and content on the Internet is an absolute marvel. Let’s be good stewards in protecting and maintaining the digital world we are all so reliant on.

Watermarking is one of several techniques for verifying if a piece of content was created by AI. When you watermark AI generated content, you add a digital signature that can later be detected to verify the content's origins.

For example, this image generated by the Google AI model Imagen has been watermarked with Google DeepMind's watermarking tool SynthID. The watermark in this image is imperceptible to you and me, but if you passed the image to the SynthID detector, it would be able to find the watermark and verify that the image was created by Google AI.

Article content — SynthID detection in action!

The techniques SynthID uses to watermark content are robust to small perturbations in the data (image cropping, filters, word deletion, etc), and don't dramatically impact the quality of generated output. However, SynthID can only tell you "this content was made with Google AI." It can't tell you if a sequence of text or audio clip was created with another (non Google) AI tool. That’s because watermarking has to be implemented by the model provider. So in reality, while critical, watermarking is only one component of building a reliable and responsible AI content ecosystem.

Google has implemented watermarking with SynthID in every one of its consumer products. That means if you're creating images with Nano Banana, podcasts in NotebookLM, or blog posts with Gemini 2.5 Pro, the generated content can all be traced back to Google's AI tools.

But how does watermarking actually work?

While Google DeepMind’s SynthID technology works for all data modalities, this article focuses specifically on the methodology used for text watermarking.

I’m guessing you’ve used an LLM before, but you might not know how they produce that eloquent and nuanced text in response to your queries about travel, recipes, personal finance, and whatever else you’re asking these days.

LLMs take in a sequence of text and generate an array of probabilities over what words could come next.

In the example below, the model processes the text my favourite tropical fruit is and assigns a probability of 0.50 to the word mango, 0.30 to lychee, 0.15 to papaya, and 0.05 to durian.

Next, a word is sampled from this distribution, and that word is returned to the user. Sampling means that words with higher probabilities are more likely to be selected. So if you were to randomly select a word from this list, mango would be the most likely selected word, and durian the least.

To watermark text, we insert a bias into this word sampling process.

Watermarking text

First, we create a watermarking key. This is a unique string of characters that is known only by the creators of the LLM. So in the context of Gemini, it’s known only by the DeepMind team that operates the model (in case you’re wondering, I am not one of those people!).

It’s important to keep your watermarking key secret because if someone gets access they might be able to reverse engineer the process and game the system.

Next, you take a small context window of text preceding the generated word. In the example here, the window length = 3, corresponding to the text tropical fruit is.

The last thing we need is a hash function, known as the watermarking function. A hash function is a mathematical algorithm that takes an input of any size and converts it into a fixed-size string of characters. A given input will always produce the same hash value. And, critically, it's computationally difficult to reverse the process—you can't easily recreate the original input data from the hash value.

In the SynthID paper, the hash function produces either a 0 or 1, which is called the g-value. More specifically, this function takes as input:

INPUT: watermarking key, recent context, a word from the vocabulary

and returns the following as output:

OUTPUT: 0 or 1

For each of the words in your vocabulary you compute a g-value, which you can see in the blue array on the right side of this image.

The hash function is conditioned on the previous 3 words, the watermarking key, and the generated word. That means if you used a different watermarking key or a different context length (like favourite tropical fruit is instead of tropical fruit is), then your g-value for a given word would not be the same.

Now that we've computed a g-value for our words, it's time to start sampling.

In the vanilla no-watermarking case, we sampled one word from our array of probabilities.

This time we’ll sample two.

Let’s say the two words we sampled are lychee and mango.

We compare the g-values for these two tokens. The token we return is the one with the larger g-value.

Since mango has a g-value of 1, and lychee has a g-value of 0, we return mango (1 > 0). Note that in the case of a tie, we select a token at random.

Because we always choose the token with the higher g-value, we're biasing the generated text towards tokens that have a g-value of 1. 🌟 And this is the key insight that helps us to later detect if text was generated by our AI model 🌟

Detecting Watermarks

Now that we've generated text with a watermark, how do we actually detect the watermark?

Let’s take this sequence of text.

My favourite tropical fruit is mango. Its sweet, juicy flesh and vibrant aroma instantly transport me to a sunny beach. I love eating it fresh, but it's also incredible in smoothies, salsas, and even on grilled chicken. There's something about its rich, golden color and irresistible flavor that makes it a true taste of summer.

For each word in this ode to mangos, we'll compute the g-value, given the recent context and our watermarking key.

We take My favourite tropical as the context, fruit as the generated word, pass those four words along with our secret key to the hash function and compute a g-value. In this case, the hash function returns a g-value of 1.

We proceed to compute the g-values for each sequence of 4 words.

Etc etc etc…

After computing the g-value for each 4gram, we end up with a sequence of g-values.

g values : 1,0,1,1,...1,0,1

If this sequence of g values has more 1s than 0s, then it is very likely that the text is watermarked.

If the sequences has an equal number of 0s and 1s, or more 0s than 1s, then it's probably not watermarked, meaning it was not created by our model.

Tournament sampling

There are lots of ways to modify this basic watermarking methodology. In addition to changing hyperparameters like the context length or trying a different hash function, you can also increase the number of rounds in your word sampling tournament.

Previously, we sampled two words, lychee and mango, and we selected mango because it had the larger g-value.

In practice, watermarking involves many many rounds of watermarking and word selection.

Returning to our example from before, we can implement three different watermarking hash functions, and compute 3 different g-values for each of the words in our vocabulary.

Now in addition to having a probability score for each word, we have these 3 different g-values represented in the blue, pink, and green arrays g1, g2, g3 in the image below.

This time, we will sample four pairs of words, which you can see on the left side of the image below.

In round 1 of our lineup we have:

durian vs mango
lychee vs mango
papaya vs lychee
mango vs mango

Then amongst each of these pairs we select the word with the highest g1 value.

The winners of the tournament proceed to the next round, where we select the words with the highest g2-values.

And finally we end up with mango vs lychee, and since mango has a g3 value of 1, vs lychee a g3 value of 0, mango is our winner! 🥭🥭🥭

Intuitively you can think of tournament sampling as increasing the watermarking signal. For example, imagine you had a biased coin that instead of being 50/50 heads tails was 60/40 heads tails. If you performed 10 coin tosses, it would be pretty difficult to tell it was biased. But if you performed 1000 tosses, it would be a lot more obvious that something was up with this coin. Tournament sampling is intuitively like having multiple biased coin tosses for every single word generated.

Generally speaking, increasing the number of tournament layers will make the watermark more detectable and thus the content more easily verifiable (though at some point you will hit diminishing returns).

Watermarking as standard practice

Being able to verify the provenance of any piece of content is a crucial. Watermarking is one way to help protect against a future of unverifiable AI slop and misinformation. Let’s make it a norm.

Tools for detecting AI generated content

Discussion about this post

Ready for more?