Hacking as a pathway to building better Products

Most security products are terrible. For years our industry has managed to get by because our products were mandated by someone or some regulation, and users were trained to accept that security and usability were necessary trade-offs. This was just the prevailing truth.

One of the reasons we always promote hacker-led companies is because hackers delight in challenging accepted truths. We think this applies as much to product design as it does to smashing the stack.¹

In a few months, Thinkst will celebrate Canary’s ten-year anniversary. A decade of building the world’s best honeypot, and one of the world’s most loved security tools. Within the company, several people count themselves as current or ex-hackers. And, according to us, our kids, and our love page, Canary doesn’t suck.

In a recent Hacker News thread discussing one design choice we made, a commenter picked up that our devices use similar tunnelling techniques to nefarious software. Well spotted! In fact, there are a bunch of design and technology choices we’ve made on Canary, influenced heavily by our offensive security experience.

We wanted to explore the way hacking influenced our design of Canary. We took ideas from exploitation, C2 tunnelling, web attacks, and more.

CSRF-style device pairings

Early hardware Canaries needed a setup step to register the bird to its Console. Since we run a Console per customer, birds needed a way to discover their Console. A very common approach (even today) is to rely on a mobile app which connects to the hardware via Bluetooth, and configures the device. But that means you need to maintain a separate mobile app and, worse, customers have to install another app just to perform an initial setup. The mobile app route was out, we needed something else.

At its core, the mobile app is a relay between the device and the cloud server. The mobile app connects to the deployed hardware and to the cloud server, and relays traffic between them. Hmmm… we know lots about relaying traffic via confused deputy attacks. In either a CSRF or SSRF attack, a third-party relays traffic (unexpectedly) between an attacker and their target. For CSRF, the third-party is a client browser, in SSRF it’s a remote server (typically running a web app).

Leaning on this, we built a configuration flow which didn’t require additional apps, whose only requirements were Bluetooth and a browser, both readily found in any modern user device.

In this flow, the customer toggles the Canary’s configuration mode via its single physical button. This launches a webserver on the Canary that’s only accessible via Bluetooth. The user pairs to the Canary, and configures it via their web browser. When they save the settings, the Canary will have the user’s browser make cross-site requests to their Console, doing a key-exchange and registering the bird on the Console.

Canary configuration is relayed from the device to the Console, via the browser

We use the browser as the relay between the hardware and the cloud server, and it works beautifully smoothly. The inspiration for the flow was CSRF.

But we still wanted to make it slicker, and client-side exploitation gave us a goal to aim for easier installations that didn’t require Bluetooth at all.

0-click installations

The rise of client-side attacks exposed the popularity of 0-click exploitation. Attackers wanted to target victims and achieve persistence, code execution, or some other goal, without the victim having to do anything. This goes back decades. Look at IRC DCC bugs, mail client attachment auto execution, drive-by browser exploits, mobile phone messaging exploits; all of these had examples where an attacker could send something to the client, and compromise the endpoint without the client doing anything. Every additional step that malware needs from a user to actually run, sees dramatic drop-off in the success rate of attacks. Attackers will push really hard to remove user interaction in the exploitation stage.

To make 0-click exploitation work in malware, successful attackers cannot make many assumptions about the environment in which their attack runs. The attacks have to be reliable in uncertain environments, possibly by bootstrapping a minimal stub to explore the local environment (maybe leak an address or two) before getting to the meat of the attack. They cannot rely on users to set up or configure the malware (of course, because who’d willingly agree?!?) Lastly, the attacks should ideally fail in controlled ways when they don’t work out.

There’s a great lesson here for product designers. Friction hinders deployment; reduce the friction to install and you dramatically increase the odds of your product being actually useful when the customer needs it…

We took inspiration from 0-click exploitation, to aim for 0-click installation.

In building Canary, we’ve focused on making the installation simpler and simpler. Hardware Canaries initially required users to perform a Bluetooth setup step (described above), where they Bluetooth-paired to the Canary and registered the bird to their Console. However Bluetooth can be finicky; several years ago we switched to a model where we pre-register Canaries before shipping, so customers really do get a 0-click installation. They plug in power… and the Canary boots up online and ready to go. 0-clicks involved.

Customers simply power up the Canary when it arrives, and the status LED turns Green indicating it’s online and working

I can hear you say, “But that doesn’t scale!”. Ahem.

Aside from the hardware birds, we also ship cloud (AWS, Azure, GCP, Tailscale) and virtual (VMware, Hyper-V) options.² These birds have to know their home Console when they boot. A standard approach in this situation might be to supply a single machine image or OVA, and have customers perform configuration locally inside each running instance. But this is painful for customers (and we can hack better!).

Instead we automatically build a unique VM / machine image per Customer per platform per version, with all their Console details already baked in. Customers simply download the VM from their Console, boot it, and it’s online and ready to go.

# Number of managed VM images
$ (for bucket in $BUCKET_AP_SOUTHEAST_2 $BUCKET_CA_CENTRAL_1 $BUCKET_EU_WEST_1 $BUCKET_EU_WEST_2 $BUCKET_US_EAST_1 ; do aws s3api list-objects --bucket $bucket --output json --query "[length(Contents[])]" | grep [0-9];done;) | awk '{ s += $1 } END { print s }'
46374

# Size of managed VM images
$ (for bucket in $BUCKET_AP_SOUTHEAST_2 $BUCKET_CA_CENTRAL_1 $BUCKET_EU_WEST_1 $BUCKET_EU_WEST_2 $BUCKET_US_EAST_1 ; do aws s3api list-objects --bucket $bucket --output json --query "[sum(Contents[].Size)]" | grep [0-9];done;) | awk '{ s += $1 } END { print s }'
18646793380101

46k managed VM images, amounting to 18TB. Storage is cheap, easy virtual deployments are priceless.

0-click (or 1-click in the case of virtual birds) installations make Canary low friction to deploy, and that’s great because deployment is most of the battle in getting a product in use.

Encrypted DNS

“Not so fast!”, you say, “it’s one thing to boot the Canary, but what about allowing the Canary to connect to its cloud Console? Firewall rules take days to get authorised.” Similar products rely on HTTP, or MQTT, or Syslog to push data to their dashboards. They need firewall exceptions on your corporate firewall, or HTTP proxy credentials, or …

No, not Canary.

Again, we lean on our previous experience in offensive work. Tunnelling data in and out of networks is routine stuff for attackers. Defenders don’t want attack traffic to leave their networks, but attackers are incentivised to make it happen. So they get creative. We’ve spoken several times on tunnelling data out of networks.

DNS traffic is allowed out of networks without restriction in most contemporary networks. Like good hackers, we use that to our advantage. Canaries rely on DNS traffic for all their communications with their Console. We put in significant work to make the channel reliable and secure even though DNS is neither of those things natively (go away DNSSEC). All configurations and software updates are carried via DNS, and we’ve been doing this on all seven continents for years. Without our previous tunnelling work, we’d never have built this.

The alternative route (poking firewall holes or proxies) is how everyone else does it… and it’s painful as a user.

Vulnerabilities. Vulnerabilities… everywhere

Two weeks ago at Blackhat 2024, Natalie Silvanovich made the point that vendors are the future of reducing zero days, not security research.

It’s becoming increasingly apparent that security research is not enough to end the era of zero days.
@natashenka

Picture: Decipher

Engineers don’t want to write vulnerable code.³ On the other hand, finding and exploiting vulnerabilities is an everyday activity for offensive security consultants, they find bugs everywhere. So what gives? I’m not going to delve into why bugs occur, we take their presence as axiomatic. Instead, hacking teaches us to be proactive, expect bugs, and build defensively.

Architectural choices matter when building defensively. Choose a memory-unsafe language, and you have to work very hard to prevent memory corruption vulnerabilities. Choose a less popular language, and you have fewer options for frameworks or modules. Choose to build everything yourself, and you will need to learn how to defend against many types of attacks (shout out to our SAML login flow which has to handle eight different exception types, and that’s with libraries). Use a multi-tenant architecture, and you’ll find yourself patching cross-tenant bugs.

Our primary goal is to build a great honeypot. For us, “Great” includes “secure” long before it includes “performant”.⁴ Canaries are not typical production servers; they don’t need to be able to handle thousands of concurrent requests quickly. They need to send one alert. So we can choose languages (specifically Python) that have large ecosystems, are quick to write in, and won’t expose memory corruption vulnerabilities to the network.

Recently we described in laborious detail why we separate customers into isolated VMs; this simply removes the possibility of cross-tenant bugs.

Finding tons of vulnerabilities in other people’s systems showed us that that defensive design could go a long way in mitigating future bugs in our product.

Living off the land

Hackers know the feeling of landing on some remote box for the first time, and wondering “what is this thing, and what can it do?” It’s a special feeling, full of unknowns and anticipation. Sometimes you get lucky and your preferred tools are already on the compromised host (or you can upload them). Often though, that’s not possible or advisable. Instead, you need to utilise whatever you can find that’s already installed, aka living off the land (LOL). Hackers get good at living off the land; you’re more likely to fly under the radar if you use tools that are already present. A lighter dependency footprint helps get things done remotely, whether for products at customers or for rootkits.

As our Canaries live year-round at remote sites across the planet, everything they need has to be packed up, signed, and shipped to them over the encrypted DNS tunnel we mentioned earlier. We’ve become used to the idea of finding equivalent living off the land techniques with our network framework. When the framework, for example, didn’t natively support a particular cryptographic suite we needed, we saw we had another good quality implementation already on the Canary (Thanks djb!). With some LOL thinking, we hooked the framework to substitute in the alternative implementation and saved on a huge framework upgrade just for a new cryptographic suite.

A hacker’s philosophy

The last influence I want to cover is simply one of persistence. As hackers we found there was almost always a way to achieve your goal, if you were persistent enough. For example, in trying to compromise an ATM, if remote attacks were proving difficult, picking locks to gain physical access worked (more than once!). This is especially true in our hardware design. One of the enclosure elements is a button with a built-in LED, so the button lights up. We asked our hardware partner for an RGB LED button so the Canary could display multiple status colours, but they could not source any and said we needed to be happy with a 2-colour LED button (which was available.) A non-RGB solution sucked, and we were noodling on options.

Haroon sketched out a hardware hack on a napkin that required a custom “light pipe” plastic piece that sat as a bridge over two actual buttons, and directly under the light pipe was a standard RGB LED. It means manufacturing a custom light pipe, but that’s very doable and cheap. We got a button with a full RGB capability because we weren’t happy with the standard solution, and this design is still present in Canaries today.

Takeaways

We’re not done building Canary, and we don’t think it’s the best version of Canary we can build. Constant iteration is a given at Thinkst, and we’re still striving to be better. But without our hacking background, it surely would be a worse product.

The first time I ran afl-fuzz I was amazed at the TUI that lcamtuf had built. As TUIs went, it was remarkably different from the tools that came before (it’s extremely information dense but somehow still manageable). Security tools don’t have to have terrible interfaces (and they don’t even need to be GUIs).

Hacking requires peeling back layers and understanding a little more about the foundations than even the developers or designers. This cross-stack knowledge is a treasure trove for building better products. Hackers can build better products (but you have to have a super clear idea of who your user is, and deeply empathise with them).

Only call them Hacks if they’re from the Hacking region in France

Treating product design like a hacking challenge isn’t the same as “growth hacks” or other short-term tricks. Craft and experience are key parts of hacking, and when combined with a hacker’s dogged persistence… well, we think they are secret unlocks for great products too.

Footnotes

As we were writing this, Phrack 71 was published. cts makes a related argument about Hackers building and running companies. The edition also features a prophile on the inimitable BSDaemon. ↩︎
I’ve left out our Docker birds, because they rely on a different mechanism to figure out their home Console. ↩︎
Unless something very strange is happening. ↩︎
When we started building Canary, “memory-safety” and “performance” were trade-offs against each other; that’s no longer true with languages like Rust. ↩︎