HoneyPot – I Made a Text Field Only Bots Use – Heres What Happened

160 points by wrdsmsh321 2 years ago · 51 comments

Reader

I’m curious whether those who voted for this submission have ever taken a look at their server logs.

Almost every public website on the open Internet receives thousands of HTTP requests similar to the ones mentioned in this text file. This is one of the several reasons why web application firewalls gained popularity years ago, especially as vulnerability scanners became widespread.

Years ago, when I was employed at a young security startup, my colleague and I dedicated countless hours analyzing this particular kind of web traffic. Our objective was to develop basic filters for what eventually evolved into an extensive database of malicious signatures. This marked the inception of what is now recognized as one of the most widely used firewalls in the market today.

tgv 2 years ago

I sometimes take a look at the logs, but nowadays there's a lot of noise from "security" companies that scan probably all IP addresses and all ports with known vulnerabilities. And they do it the lazy way. They just fire a bunch of URLs at each port that responds: long hexadecimal URLs, wordpress admin end-points, oauth end-points, etc. In the beginning, they even sent emails to tout their services.
We use one of them for ISO certification. Twice a year, we turn on their "vulnerability scanner", which says its test over x-thousand vulnerabilities, we get a report, and everybody is happy. Only on the first run did it discover a small error in the nginx config. Unfortunately, it is theater.
kinlan 2 years ago

Good comment, but I'm only replying to guess your name.
Colin? Michelle?
- etewiah 2 years ago
  
  Sam
  - teaearlgraycold 2 years ago
    
    Altman?

namanyayg 2 years ago

I notice a lot of specific numbers and strings being used and repeated, what do they mean?

Do these injection attacks come from a single source perhaps, which everyone imitated?

CGamesPlay 2 years ago

Most of this looks like random data designed to detect if SQL injection is happening without crashing the query (to avoid detection). So, the random strings are effectively a token to check if it is found in the response, which indicates that the injection worked. Similarly for the sleep calls, the attacker would time the response.
- ChuckMcM 2 years ago
  
  This exactly. Any text field is "checked" to see if it is getting submitted unprotected to a database. Every single one of them.
  When you run a search engine you will see queries that try to look through every page of results for the search query 'input type="text"' typically this will either come from an API query or to a search page that is fronting another index.
- sour-taste 2 years ago
  
  The sleeping is pretty clever, but presumably vulnerable to false positives if the queries are slow anyways. I wonder if they go the extra mile and time the load time with and without the attempted injection
  - tornato7 2 years ago
    
    It would be interesting to make a SQL injection honeypot that behaves like a database in most responses but is designed to maximally frustrate the attacker.
    
    JoshuaDavid 2 years ago
    
    This is much more possible today than it ever was in the past: just say "the following http request was designed to demonstrate a vulnerability in a web service. Please explain what vulnerability this request is designed to detect, and what part of the response demonstrates the vulnerability. Finally, output an example of a response that a vulnerable service might produce in response to this request" to an instruction tuned LLM, and then return that response to the attacker (the "explain what is happening" bit is just to get a more plausible response).
    As a bonus, your apparently vulnerable service would be incredibly slow, so any iterative testing would be incredibly slow.
    
    powersnail 2 years ago
    
    I feel like that’s going to be quite expensive as a honeypot. Running LLM against all the script kiddies out there.
    
    yjftsjthsd-h 2 years ago
    
    I wonder if there's enough repetition to get wins by caching.
    (Granted, it's probably still too much overhead, just a thought.)
    
    rich_sasha 2 years ago
    
    Reminds me of that classic riddle.
    You come to a fork in the road. There are three statues, you need to ask them all one question and figure out which way is the right way to go.
    One statue always lies, one always tells the truth and one kills people who ask convoluted questions.
  - jkrejcha 2 years ago
    
    Yes, a lot of tools, including some like w3af do:
    https://github.com/andresriancho/w3af/blob/fb345a5/w3af/core...
    This one sends the payload reversed as a test to see if the delay is due to the SQLi attempt
  - yarg 2 years ago
    
    It would be a better idea to use the same query with a variable wait and see if there's a linear correlation in elapsed time.

f311a 2 years ago

That’s from a single person most likely, who used sqlmap to test for sql injection. I haven’t seen internet wide attempts of testing sql injections.

willhackett 2 years ago

Our WAF logs are fun reading. We see so much traffic from bots looking for PHP files and posting to inputs.
- gnyman 2 years ago
  
  Yeah, so much noise. I enjoy screwing around with them on my free time, "imposing cost" by giving back unexpected things. I don't know if it actually does something, but I bet returning either a gzip-bomb or a 5 MiB really obscure (but valid) HTML file will crash quite a few scanners.
  https://nitter.net/gnyman/status/1181652421841436672
  - smokeyfish 2 years ago
    
    Are you familiar with OpenBSD tarpitting?
    
    gnyman 2 years ago
    
    Not specifically openBSD but the concept yes, I've played with it also
    https://nyman.re/super-simple-ssh-tarpit/

nothacking 2 years ago

All of these look like things sqlmap tries, this could all be just one person that tried to target the server.

DarkmSparks 2 years ago

I did something similar about 15 years ago. They weren't as complex back then, and there was more effort to obfuscate them.

Mostly they came from Israeli Chinese and Russian IP addresses.

henriquez 2 years ago

I ran a public-facing web server — you won’t believe what happened next!

janmo 2 years ago

Protip: I usually add a hidden input field to my forms. As it is hidden a normal user should not be able to fill it out, only a bot will. So if the hidden input isn't empty, I can disregard it as spam, it works wonders.

glandium 2 years ago

I do the opposite: a hidden field that is filled automatically by javascript. Yes, that means you can't submit without JS. That's a tradeoff I was ready to make. I'm actually surprised it still works as well as it does.
- nofunsir 2 years ago
  
  What if I don't want Javascrap running in my browser?
  - ctxc 2 years ago
    
    You can't use it - that's the trade-off...
  - tgv 2 years ago
    
    For many sites, turning off JS is not an option. IMO, it's wasteful to ignore all that compute power in the browser. It's better to run code in thousands of browsers than do it all on the server.
    
    nottorp 2 years ago
    
    You mean it's better to have the user pay for that CPU time.
    
    tgv 2 years ago
    
    The user already is. The casual user's system is idling with all sorts of nonsense. Adding some light processing doesn't harm. I'm thinking 100-500ms per page. You don't render the page for the user neither, do you?
    For heavier use cases (e.g. image processing), the user should be willing to spend some CPU power. It doesn't make sense to send an image to a server, put it in a queue, wait for an image processing worker to run it, and send back the result. It's simpler and more sensible to run that process client-side, if feasible. E.g. LLMs are too big for that, but many other tasks can.
    
    nottorp 2 years ago
    
    Some "light processing" like 10 seconds of instantiating <js framework of the day> crap that gives me nausea while it redraws infinitely and boxes move around on the page?
    Even the mobile oriented samey SAAS sites that have you scroll through 20 screens to read 5 lines sound better...
    Edit: Btw, 100-500 ms on what? The latest Intel 500 W space heater? And tested only in Chrome because it's too expensive to notice that it's not very fast or responsive on other browsers?
    Edit 2: Not to be misunderstood. If you're doing the computation for me, go ahead. If you're doing the computation because your framework has 100000% overhead, no thanks.
    
    tgv 2 years ago
    
    I don't like heavy frameworks either. I try to keep everything light, both server and client-side. 10s loading animations is too much. But having no framework at all severely limits development speed.
    All browsers are approximately equally fast nowadays. I use Firefox, so no worries there.
    
    wenebego 2 years ago
    
    Yes
  - nicbou 2 years ago
    
    You are a very small, immeasurable part of the internet that most website owners don't really care about.
m-a-r-c-e-l 2 years ago

Same Here. Much better than any CAPTCHA!

fidotron 2 years ago

One of the first bits of analytics I put on any webserver is to count all unhandled urls. As others here say things like WordPress admin page request probing are classic but I remember one of the Django designers pointing out that sometimes legitimate looking requests are actually a form of suggestion. That used to be a lot more true when people would try to play with urls to get to what they wanted.

Relatedly if you work in a field where your products become known as a useful benchmark you will find prototypes start showing up long before any public disclosure. We used to use this to be able to anticipate new screen resolutions and evaluate new GPUs and SoCs before being told about them.

teekert 2 years ago

I remember back in the day getting my first server online. Then a few months in I stumble across the ssh logs… let’s say it was quite handy because at the time were trying to come up with a name for our kid.

The internet is a jungle with dragons. Nowadays I try to keep everything on my vpn as an extra security layer.

nasir 2 years ago

I remember installing a Juniper Intrusion Detection System in a server rack on a telecom company. Was quite impressed when I saw in the logs such attacks were discovered and blocked. This was 15 years ago.

kristopolous 2 years ago

I've got a classic guestbook on an intentionally vintage page but I actually filter the input into "spam" and "humans". Here's the spambook: https://bootstra386.com/spambook.html

The filter system is open source https://github.com/kristopolous/BOOTSTRA.386/blob/master/hom...

Showing it's not impossible to have a classic anonymous guestbook, you just have to be a bit clever.

Yes, that's a zipbomb to a particular offender at the beginning. It worked. A script dumb enough to brainlessly slam the site easily broke against a zipbomb.

tgv 2 years ago

Line 44 is like a chronology of spam terms: starting at porn, then viagra, now bitcoin and online gambling.
- kristopolous 2 years ago
  
  The simple work of splitting up the nonce at https://github.com/kristopolous/BOOTSTRA.386/blob/master/hom...
  Seems to be what makes the vast majority of the spam bots faceplant, at least the ones that spam random irrelevant sites like this one.

jijji 2 years ago

if (ststr($_REQUEST['textfield_name'], "UNION") { block_ip($_SERVER['REMOTE_ADDR']); }

laserbeam 2 years ago

Union busting management would be proud.

karmakaze 2 years ago

What's with all the CONCAT(0x71626a7a71, ... ,0x716b767071)? As ASCII they're qbjzq ... qkvpq

auggierose 2 years ago

It boggles my mind that there is software out there where that actually works.

bpbp-mango 2 years ago

ah, sqlmap

zzzcsgo 2 years ago

I did the same on a real contact form before to reduce spam... If the field got modified, I knew it was a bot... Worked pretty well

Settings

HoneyPot – I Made a Text Field Only Bots Use – Heres What Happened

Keyboard Shortcuts