adversarial.js – Intro

2 min read Original article ↗

Everything runs client-side – there is no server! Try the demo:

Select a model:

Original Image

Adversarial Image

Turn this image into a: Select an attack:

This will take a few seconds.

Prediction

Prediction

What is the demo doing?

Neural networks achieve superhuman performance in many areas, but they are easily fooled.

In the demo above, we can force neural networks to predict anything we want. By adding nearly-invisible noise to an image, we turn "1"s into "9"s, "Stop" signs into "120 km/hr" signs, and dogs into hot dogs.

These noisy images are called adversarial examples. They break the integrity of machine learning systems, and the illusion of their superhuman performance.

Why does this matter?

Our world is becoming increasingly automated, yet these systems have strange failure modes.

If machine learning systems are not properly defended, attackers could:

Impersonate others in facial recognition systems
Force autonomous vehicles to misrecognize street signs & obstacles
Bypass content moderation and spam filters in social networks
Inject adversarial bytes into malware to bypass antivirus systems
Digitally alter numbers on a check in a mobile banking app
(and more)

Is this limited to image classification with neural networks?

No. Adversarial examples exist for almost every machine learning task: speech recognition, text classification, fraud detection, machine translation, reinforcement learning, ....

Moreover, all machine learning models (not just neural networks) are vulnerable. In fact, simpler models such as logistic regression are even more easily attacked.

Finally – beyond adversarial examples – there are many more adversarial attack vectors, including data poisoning, model backdooring, data extraction, and model stealing.

How do I defend against adversarial examples?

There are several proposed defenses, including adversarial training and admission control.

However, no defense is universal and many have proven ineffective, so work with an expert to quantify your risks and invest in defenses appropriately.

(What happens if someone can make your system predict anything they want?).

Where can I learn more?

Here's a list of good resources, in rough order of approachability:

The full FAQ
The directory of attacks (try running locally and playing with various settings)
[Blog] CleverHans – start here
[Blog] Gradient Science – start here
[Tutorial] Adversarial Robustness - Theory and Practice
[Paper] SoK: Towards the Science of Security and Privacy in Machine Learning
[Paper] Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

Last – feel free to email me questions.