Show HN: SQRL – Anti-spam rules language Twitter acquired in 2018, open-sourced

websqrl.vercel.app

200 points by qixxiq 3 years ago · 48 comments · 1 min read

Reader

Hey all, author here! This is a demo of the rules engine we built to fight spam/abuse on the internet. It was built based on learnings from Facebook & Google, while trying to make a language that makes it simple for anti-spam analysts to quickly (and safely) deploy rules to production.

Unfortunately it looks like the Twitter event feed is temporarily down (they're blocking it, possibly as part of shutting down the API on Thursday). I have a cache of events from a little earlier that I'm going to try play through the stream.

qixxiqOP 3 years ago

I've tried to release this a couple times in the past (you'll notice the git repo has a four year history), and happy it's finally out the door. If anyone is interested in using this, or wants to talk about the strategies I'm happy to help in any way I can.

Proud of what we built at Smyte, and hoping it can find another live outside of Twitter </3. I know there are already a couple of implementations based on SQRL, one at Discord and another at Sumatra.ai[1]

[1] https://docs.sumatra.ai/scowl/

kuhlmann 3 years ago

Thanks for the Sumatra shout-out, Josh. We definitely got a lot of ideas from SQRL. Congrats on the OS release.

jhgg 3 years ago

We used to use Smyte at my place of work prior to the twitter acquisition. Very cool stuff.

Ended up rewriting it in house. We ended up with a SQRL variant that is a restricted type-checked subset of python. It is not python however, just syntactically similar.

whatupdave 3 years ago

That's super cool. I worked on Smyte and have wanted to replace the language with something very similar to what you're describing. I'd love to learn more about it.
- jhgg 3 years ago
  
  I open sourced a very minimal proof of concept for this here: https://github.com/jhgg/hyrule which shows the syntax.

wnevets 3 years ago

Not to be confused with SQRL (Secure Quick Reliable Login) [0]

[0] https://www.grc.com/sqrl/sqrl.htm

mike_d 3 years ago

Please don't use unrelated posts to help peddle snake oil. https://attrition.org/errata/charlatan/steve_gibson/
- borplk 3 years ago
  
  Disgusting comment. Steve Gibson is a true and independent professional with hundreds upon hundreds of hours of great and accurate video podcast content. Even when he makes a tiny mistake he comes back the next week and thoroughly corrects himself. That post is ancient and largely invalid. On top of that SQRL is in public domain with a published spec/algorithm. Stop punching the old man.
- ntdll 3 years ago
  
  There's nothing snake oil about SQRL. Can you back up your claims?
- wnevets 3 years ago
  
  Literally none of those links mention SQRL.
  - mike_d 3 years ago
    
    They don't need to. He has been thoroughly and completely debunked for decades. Legitimate security researchers pay no attention to him, and neither should you.
    
    KyeRussell 3 years ago
    
    Would you talk to someone like this in real life? If you jumped into a conversation being this needlessly on the attack against someone who just posted a link to a project with the same name, I can’t imagine wanting to converse with you at all. How wrong or right you are about Steve Gibson is irrelevant. The fact that after a couple Id replies you can’t point to why this snake oil is snake oil is just the icing on the cake.
    
    wnevets 3 years ago
    
    None of those links do that either.

snehesht 3 years ago

Link to Github: https://github.com/sqrl-lang/sqrl

codetrotter 3 years ago

It says "Compiled! Running", and shows some code in their language, but the other half of the screen is just black for me.

I assume from the text on the page that it is supposed to be showing tweets and some kind of spam rule classification or something.

The Wikipedia Recent Changes demo https://websqrl.vercel.app/wikipedia however did show one element on the right side, with an article title, IPv6 address, timestamp, a piece of quoted text, and "Rules fired". The rule was "FirstEventSeen".

Looking at the code shown for the Wikipedia example, the demo rules are:

- Simple rule to make sure atleast one event shows up in the UI

- Flag any users using profanity (not a great spam rule! but easy)

So I suppose that with a bit more time some events might show up matching the second rule as well.

Meanwhile though, https://www.mediawiki.org/wiki/API:Recent_changes_stream links to https://codepen.io/ottomata/pen/VKNyEw/ which has events flying across the screen in the hundreds. Would be neat with a demo of SQRL also having many events fly by like that, and perhaps with a pause button in case one wanted to stop it and have a look at some of the events.

Perhaps the Twitter demo example works like that when it works.

I see also that the SQRL code in the Twitter demo is also has a rule that is meant to ensure that at least one tweet shows up.

Definitely either something is currently broken, or it is connecting directly to Twitter from the browser perhaps, and Twitter is not letting my browser get any data from their API?

qixxiqOP 3 years ago

Ah! Yeah extremely poor timing. The Twitter API has stopped letting me fetch the events. Going to try see if I can fix it quickly.
In the meantime the Wikipedia demo should be working, although it is far less interesting (much less data so not much spam popping up.)
- mdaniel 3 years ago
  
  > Ah! Yeah extremely poor timing. The Twitter API has stopped letting me fetch the events
  I thought that was a "known issue," in pursuit of repaying a $44B loan

latchkey 3 years ago

All I want is this:

  if new follower has <=4 tweets.total and all(tweets) === type=image
    block

arthurcolle 3 years ago

Literally. You forgot the "if there are more than 8 digits in username, purge immediately"
- emmelaich 3 years ago
  
  But https://twitter.com/greg16676935420 is quite funny!
  - latchkey 3 years ago
    
    Ever try calling that number?
    
    arthurcolle 3 years ago
    
    I just did haha, seems like its a wireless provider
mouse_ 3 years ago

This may work for individuals, but if Twitter were to implement this, it would work for MAYBE 15 minutes before the bots get around it.
- latchkey 3 years ago
  
  I'm ok with that. The next set of bots will reveal themselves and then whack-a-mole on them too.

fsckboy 3 years ago

this is just a thought experiment, but instead of "no obscurity open rules engine" for blocking spam which can be used to help create ever more creative $pAm, how about a "no obscurity open rules engine" that all acceptable posts must meet, a sort of Turing test. Then the stuff that was allowed through would at least be readable.

remember that post a month back (and of course all other recently discusseds) for stylometry, to use writing style to identify authorship? would it work to require posters to expose a unique "voice" in their posts, and where banning of your style would definitely be something you'd want to avoid.

This is a thought experiment to talk about. Yes, there are obvious criticisms, but would it also produce something "we can have nice things" useful?

obvious criticisms are sort of like "what about the children!?" concern trolls (this is meant to be hyerbolic and biased): "what about PopulationX, the half literate know-nothing n00bs who are barely surviving under repressive regimes, they may all sound alike, but we can't silence their voices!" Which I agree with, we don't want to silence their voices. But again, this is just a thought experiment, I bet all the super high quality well informed thoughtful posters would in fact sail through the system. So the question is, does PopulationX actually all style-sound alike? (where population X is the concern you raise, not the fake one i just made up) maybe they are just as distinctive and the system would actually work for them too.

and the criticism "this is not going to work because it's just spam filters with more steps and a minus sign" is not an adequate criticism because (a) spam filters do exist, and are deployed, and do take care of a bunch of spam. and (b) yes, what I'm trial ballooning here is similar, it would just distribute some of the work of maintaining the tasty spam rules to good posters who might get some sort of "you can't post hot grits here, naked or petrified" warning, more frequently then they currently do, hopefully in some way that would be useful for the community at large.

saurik 3 years ago

> would it work to require posters to expose a unique "voice" in their posts, and where banning of your style would definitely be something you'd want to avoid.
"Hey, GPT! Can you please take the following ad for eyeglasses and rewrite it in the style of fsckboy so we can assign any blame in the spam filters to his account?"
"Hey, GPT! You helped some spammer get the unique way I talk banned by spam filters; can you rewrite everything I say from now on to sound like a British chef?"
- fsckboy 3 years ago
  
  you're saying AI can never win in a game theoretic or adversarial situation because AI can always nullify AI. but if AI is better than the alternatives to AI, you must use AI because it's the only thing that can break even. but since AI is so good (even if only evenly matched against anti-AI) there is no point in people participating on HN.
  I don't think that's going to happen, but if it will, it hasn't yet. But please do me the favor of responding only in my style so I'll know it's really you.

BoorishBears 3 years ago

Don't open this in any public setting, there's some extremely NSFW images in the stream.

If you're just replaying events, maybe select a few tamer examples to replay...

qixxiqOP 3 years ago
Sorry based on some other feedback I was just hacking away on turning off the profanity filter. Didn't think about the profile images (which we're not scanning, but typically is associated with them)
Version I'm pushing up right now has a
```
  LET ProfanityFilterEnabled := false;
```
that you'll need to tweak to turn it on.

croes 3 years ago

On Android Chrome mobile I only see the text explanation on the left side of the screen, the right is completely empty.

qixxiqOP 3 years ago

This was temporary due to the Twitter API being turned off, should be working now! (With slightly older data)

est 3 years ago

I built something similar internally, just use Python expressions directly.

rengers 3 years ago

That’s a really neat demo!

karinemellata 3 years ago

Super neat demo!

revskill 3 years ago

Nomad is really what Docker should become since day 1.

anigbrowl 3 years ago

I wonder is Twitter even using this any more? I report spam a lot and i's common to find 50+ identical tweets being pumped out at the rate of 10/minute, or tweets that include variation but link to the same piece of media or URL hundreds or even thousands of times. Given the quite fine-grained control in the filter stream API, it's very easy to get streams of spam, so I wonder what their excuse for letting it proliferate.

KennyBlanken 3 years ago

Allowing spammers to run wild artificially boosts activity statistics.
- astrange 3 years ago
  
  But they make the ad performance look worse, because they're active users who don't click ads.
- emodendroket 3 years ago
  
  I seem to recall that this was supposed to be a reason Elon didn’t want Twitter.
simondotau 3 years ago

At Twitter scale, arbitrary hard-coded rules are almost always useless; they make life more difficult for authentic posters while nefarious actors can work around them with varying degrees of triviality.
- qixxiqOP 3 years ago
  
  Honestly our findings from working at scale (Facebook, Google, Twitter, Discord, Reddit, ...) were actually the opposite.
  With hand written (not arbitrary) rules it's easier to understand the intent of the attacker and build a system that they can't work around because we're blocking them at their source of income. Sure they can figure out how to post messages but unless they can include their link/payload/etc it's not worth their time.
  Machine learning defences are definitely a part of what we did, but they're slower to respond to attacks and generally easier to work around.
  - simondotau 3 years ago
    
    As someone who has personally battled such adversaries, I call bullshit on that. People with a financial incentive to spam in a user discussion environment are able to change pretty much every letter of their message if necessary.
- anigbrowl 3 years ago
  
  arbitrary hard-coded rules are almost always useless
  I disagree; just pointed out how it's not hard to get pure spam by using the filtered stream rules. If I can reliably identify & filter for spam on my creaking desktop with limited compute power and technical/coding skills, I would be happy to operate a silicon backhoe for a modest fee.
  - simondotau 3 years ago
    
    I’m talking about specifically about systems the size of Twitter. Arbitrary hard-coded rules are absolutely useful for smaller systems. I run a smaller system and such rules are useful and effective.
    
    anigbrowl 3 years ago
    
    These are Twitter's filtered stream rules. They're accessible via the API to select from the global feed in real time. I don't have access to the firehose, of course, but my understanding is that it's an outgrowth of their internal systems. They have their own query language to filter Tweet and user parameters, semantic entity recognition, URL's etc.
    https://developer.twitter.com/en/docs/twitter-api/tweets/fil...

Settings

Show HN: SQRL – Anti-spam rules language Twitter acquired in 2018, open-sourced

Keyboard Shortcuts