AI Startup Says It Has Defeated Captchas

73 points by Larx-3 13 years ago · 81 comments

Reader

From the article:

    "Captcha" stands for "completely automated
    Turing test to tell computers and humans apart."

No it doesn't - where's the word that starts with "P"?

According to Wikipedia "CAPTCHA" is an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart" (and is apparently a trademark of Carnegie Mellon University, which I didn't know.) Even that's a really bad backronym.

Google's reCAPTCHA page[0] says the same thing, and even attributes it:

    The term CAPTCHA (for Completely Automated Public Turing Test
    To Tell Computers and Humans Apart) was coined in 2000 by Luis
    von Ahn, Manuel Blum, Nicholas Hopper and John Langford of
    Carnegie Mellon University.

That should then be CAPTTTTCAHA. Doesn't quite have the same ring to it.

Given the term CAPTCHA it seems that a "better" (by some definition) backronym would be:

    Completely Automated Process to Tell Computers and Humans Apart.

At least with that one only the particles "to" and "and" are left out of the abbreviation.

More on topic though, I long for the day when off-the-shelf systems that even script-kiddies can use are able to break CAPTCHAs with ease - that's when they'll finally die. TicketMaster gave up on using the standard CAPTCHAs earlier this year[1], and I'd love to see something better replace it.

[0] http://www.google.com/recaptcha/captcha

[1] http://www.bbc.co.uk/news/technology-21260007

mcherm 13 years ago

> I'd love to see something better replace it.
So long as the "better" solution is something OTHER than "Please log in to your Facebook account."

benhamner 13 years ago

This is cool, but there's no indication from the article that it's novel, or that it's better than existing methods.

The article linked to 28 other different systems that have claimed to beat / demonstrated beating captchas at some point: http://www.karlgroves.com/2013/02/09/list-of-resources-break....

Without a performance comparison to existing methodologies on a benchmark dataset and precise details on the models, this is a neat marketing demo and nothing more.

gkoberger 13 years ago

They literally have a better hit rate than I do when it comes to captchas.

girvo 13 years ago

My first thought was that I'd love to run this myself. Not to spam, but to be able to skip Captchas.
That's a really sad thought.
- cclogg 13 years ago
  
  Yeah totally. I swear there's been captchas that were unsolvable (to my eye) lol.
lstamour 13 years ago

Yep. My first thought was, what, not 100%? Then I realized that I missed 3 out of 5 myself the other day.
- mathgladiator 13 years ago
  
  Perhaps the next level is to realize that getting 3/5 means you are human. ;)
  - zmk_ 13 years ago
    
    Maybe that's what Google switched to now. Since their captchas are supposed to be easier to read now.

bjterry 13 years ago

> or let you know how many calories you’re about to eat by looking at your lunch.

I have previously told some people that this is the holy grail of dieting apps. The difference in ease of use between entering all of the items in your meal, one by one, and just taking a photo, would be a game changer. Of course, this is just a throwaway example in the article so they probably haven't done any of the work that would be required to make this a reality (aside from the vision processing, of course). I think it could be done with thousands of human raters estimating for you instead of a machine learning system, but I was skeptical of whether it could be profitable enough to justify it as a startup on a risk/reward basis. One day, maybe we'll see it though.

sjtrny 13 years ago

The hard part about this problem is getting the scale and thus volume of the objects right. How big is that bowl holding your cereal?
- millstone 13 years ago
  
  And is that Coke or Diet Coke?
  - vidarh 13 years ago
    
    And have you "hidden" a dash of syrup or 100g of nuts under the milk.
    But it'd be useful for rough estimates for things like restaurant meals, with the aid of a menu.
- rtpg 13 years ago
  
  panorama-like shot + obligatory 'token' in the shot should solve that, right?
- waps 13 years ago
  
  Do the same as humans do : it's 25cl ... point. You won't be far off the mark.
  - vidarh 13 years ago
    
    You probably will be far off the mark. I realized once I started tracking my food seriously that 1) those glasses I have that I though were substantially bigger than those other glasses? They held the exact same amount; 2) the amount of calories I consume could easily vary by 30% or more depending on how full I'd fill my various glasses, plates or food containers.
    
    waps 13 years ago
    
    I don't think getting a 30% margin of error would be all that disastrous. How big is the error margin on people measuring ?
    
    ehsanu1 13 years ago
    
    A 30% margin of error would seem to make calorie counting practically useless.
    
    waps 13 years ago
    
    I seriously doubt anyone doing calorie tracking without scales achieves less than a 30% error rate.
MysticFear 13 years ago

Meal Snap, already tries doing this:
https://itunes.apple.com/us/app/meal-snap-calorie-counting/i...
- bjterry 13 years ago
  
  Wow, that's crazy. I actually USED DailyBurn at one point and I didn't even know this existed. Their strategy of separating everything into a bunch of superfluous apps seems to have failed pretty hard in this instance (they have a barcode reading calorie tracking app separate from their food entering calorie tracking app with neither as a superset of functionality, even though these should clearly be a single app if you want a good user experience).
  Apparently the quality of the estimates is very poor for the app. The problem is that DailyBurn LITERALLY uses Mechanical Turk[1], which I don't think is what you would need to do to get accurate estimates. You would have to actually have in-house talent with training and feedback on estimating the size and calories of food. I'm pretty good at estimating the calorie content of common foods just by eyeballing because I've done it for a long time and know generally what goes into them, but this is obviously an acquired skill.
  1: http://dashdingo.org/post/4391031302/how-mealsnap-works

ChrisNorstrom 13 years ago

Computers can simulate everything from visual character recognition to mouse movements and everything else we do online.

My prediction is that Google will one day enter the "Bot Recognition Market". They've got so much data on everyone and their browsing habits from Gmail, to Adsense, to Search results, to Google Analytics. Their cookies, browser, javascript, and ads follow you around all over the internet. They're the only company capable of putting all that data together and returning via API: "This user is a real person, we've analyzed 3 years of data from them, go ahead and let them sign up"

Or returning: "User is a bot, their IP has no purchases, no google account, no search requests, no adsense impressions, etc..."

babby 13 years ago

The part about Google offering a sort of vetting service sounds quite plausible, in a "Oh, shit" kind of way. For those of us that try to be anonymous we could have further encumbrance dealing with not being "Google Verified™".

TomGullen 13 years ago

CAPTCHA's did nothing to prevent spam for us.

However simply disabling users < 3 posts being allowed to post hyperlinks got rid of 99% of spam.

We initially allowed plain text links but spammers seemed happy to just post them.

Spammers want to post hyperlinks, without that ability they can't do anything.

borplk 13 years ago

Just an idea:
Make links remain as plain text (so they are plain text as far as google is concerned)
then use javascript to turn them into clickable links when the user mouse is over or near the link so the user can still click on them.
- TomGullen 13 years ago
  
  Spammers still relentlessly spammed us when we turned them into plain text links. Removing the ability to post plain text links was the only thing that stopped them.
  No idea why they seemed happy to post plaintext links but there you go!
prawn 13 years ago

I do something similar which has significantly cut down on my spam too.

muglug 13 years ago

Here's a video demonstration: https://vimeo.com/77431982.

Apart from one blip (scepticism vs sccpticism) it outperforms me.

ChrisNorstrom 13 years ago

As far as I know bots, even advanced ones, have an extremely hard time with dotted fonts. Just change the Captchas to use some really dotted or unique fonts where the letters are made up of smaller elements that don't connect. You can extend the life of Captchas for a few more years.

http://fontspace.com/malwin-b%C3%A9la-h%C3%BCrkey/merkur

http://fontspace.com/honey-and-death/dotline/8617.charmap

http://fontspace.com/bythebutterfly/bubble-bath

http://fontspace.com/jecko-development/jd-lcd-rounded

Eventually, the spammers will make a bot to analyze the distance between dots, group them into letters, and the race will be on to use other methods. I see this as a never ending virus/immunity battle. We're pretty much at the end of Captchas. Other methods like mouse movement, surfing time, scrolling, etc... can all be mimiced as well. Computers can or will be able to simulate humans very well, even our imperfections.

turing 13 years ago

You might find these examples from LeNet interesting. They are examples of unusual styles of digits that the system correctly recognized, made of dashed lines, bubbles, and dots. Granted this system only recognized digits, but it's not exactly a stretch to jump to the character set typical of Captchas.
http://yann.lecun.com/exdb/lenet/weirdos.html
waps 13 years ago

> Eventually, the spammers will make a bot to analyze the distance between dots, group them into letters, and the race will be on to use other methods. I see this as a never ending virus/immunity battle. We're pretty much at the end of Captchas. Other methods like mouse movement, surfing time, scrolling, etc... can all be mimiced as well. Computers can or will be able to simulate humans very well, even our imperfections.
Love your optimism. I'd like to say that captchas are very limited : they have to be solvable by idiot humans. Captcha algorithms have no such limits they have to abide by. Since the anti-spam side of things is blocked at a certain point in the arms race, the other side is bound to win.
Why not just require, say, a google or facebook login and transfer the "eliminate spammers" problem onto them ?

thrush 13 years ago

This was literally a homework assignment in a security course I took during undergrad so excuse me if I'm not impressed. Also, from the comments it's clear that the benefits of Captchas are being overlooked. Google purchased ReCaptcha some time ago and uses it to solve difficult OCR problems by using human input. In Recaptcha you'll notice that there are always two words, and one tends to be easier than the other. The one that is easier has a known result and is used for security, the other one is unknown and Google will use people's guesses to find its true result. Even if this startup was to consistently solve the hardest Captchas on the internet, it would actually be a good thing because we would have better OCR. Realistically this won't happen and ReCaptcha will just use harder images all the time if it needs to.

lucisferre 13 years ago

Good, Captchas should have stopped being a spam prevention measure years ago.

olalonde 13 years ago

Do you have an alternative to suggest?
- klearvue 13 years ago
  
  We use honeypot field and timegate trap on our site forms - so far very successfully.
  - bbrizzi 13 years ago
    
    I know what a honeypot (fake field that only bots fill out) but what's a timegate trap? Googling didn't bring up anything relevant.
    Does it check how long it took the user/bot to fill up the form and dismisses it if it was too fast?
    
    borplk 13 years ago
    
    yes as far as I know you got it right, you somehow keep track of when the form was presented to the user and compare that with the current time when a response comes in
  - olalonde 13 years ago
    
    While this may defeat naive bots, it certainly won't be defeat bots that target your site specifically so I wouldn't say it qualifies as an alternative to CAPTCHAs.

sigmike 13 years ago

"It's a textbook example of AI hype of the worst kind

Hype is dangerous to AI. Hype killed AI four times in the last five decades. AI Hype must be stopped."

Yann LeCun, https://plus.google.com/104362980539466846301/posts/Qwj9EEkU...

perfmode 13 years ago

Advances in storage and compute have led to a disturbing fetishization of machine learning.

While the modern Machine Learning Movement makes sense in a historical context and is a reasonable reaction to the disappointing returns from symbolic inference during the early days of AI research, it is terrifying that the research community is satisfied to rely on big data and statistical methods to carry us forward.

Few among us recognize the need to prioritize the study of the human brain. Even fewer are placing their bets on intelligent computer systems seeded with neurologically-inspired designs.

Vicarious gets it.

How long before others see the writing on the wall?

Now is the time to stop reacting. Now is the time to consider the field in a broad context and develop a balanced, holistic approach.

Consider this a wake-up call.

http://blog.perfmode.com/the-noml-movement/

unlikelymordant 13 years ago

I'm not sure I understand. "Few among us recognize the need to prioritize the study of the human brain", a great deal of the state of the art machine learning results are based on deep learning, which are algorithms that are "neurologically inspired" as you would put it. You seem to have a problem with big data and statistical methods, but one of the main deep learning algorithms, RBMs, are statistical methods.
Also, could you expand on what a "balanced, holistic approach" to machine learning is?
- perfmode 13 years ago
  
  "Even though an amplifier and a computer are both made of transistors, they have almost nothing else in common. In the same way, a real brain and a three-row neural network are built with neurons, but have almost nothing else in common." -- Jeff Hawkins (On Intelligence)
  One example is that most NNs neglect the time domain.
  A balanced approach recognizes the importance of learning from data, but does not _rely_ on big data. A holistic approach entails a close examination of biological learning systems.
  - varjag 13 years ago
    
    As a counterpoint, nearly all human technical advances (flight, propulsion, energy, computation) did not entail examination of biological systems. And ANNs in particular share a host of problems with statistical techniques: black-boxish behavior with lack of human-accessible state introspection and poor tractability.
    
    perfmode 13 years ago
    
    "On the basis of observation, Wilbur concluded that birds changed the angle of the ends of their wings to make their bodies roll right or left.[30] The brothers decided this would also be a good way for a flying machine to turn—to "bank" or "lean" into the turn just like a bird—and just like a person riding a bicycle, an experience with which they were thoroughly familiar."
    https://en.wikipedia.org/wiki/Wright_brothers
    
    varjag 13 years ago
    
    This doesn't however make airplanes aerodynamically anywhere similar to bird flight. Besides, Wright's method of wing warping was soon discarded as structurally unsound and is seldom used now.
    
    chongli 13 years ago
    
    black-boxish behavior with lack of human-accessible state introspection and poor tractability.
    I have a philosophical question in response to this. Is it even possible to have intelligence without it being a black box? Are people willing to call something intelligent if they completely understand how it works?
    
    varjag 13 years ago
    
    Well my point was rather practical than philosophical: if you take a human at say classification task, they'd be able to explain why they identify bicycle as bicycle and not, say, bulldozer. You can't readily have this with ANNs; all you have is a bunch of weight coefficients and feedback loops.
    
    chongli 13 years ago
    
    if you take a human at say classification task, they'd be able to explain why they identify bicycle as bicycle and not, say, bulldozer.
    I don't think this is such an easy task. In the professional world of scientific taxonomy there are many problems with classification. The problem seems to stem from the tension between intensional and extensional definitions.
    
    varjag 13 years ago
    
    Yes but you still are able to verbalize these problems, quite unlike when you stuck with a misfiring ANN. In symbolic approaches, e.g. inference engines, you have comparable explanatory facilities.
    
    eli_gottlieb 13 years ago
    
    >Is it even possible to have intelligence without it being a black box?
    That depends. Do you believe in Cartesian dualism, or atomic monism? If you believe in a monist universe where minds don't reside on some other plane of existence, then plainly a mind must be explicable to a sufficiently smart other mind, because after all so is everything else.
    
    chongli 13 years ago
    
    then plainly a mind must be explicable to a sufficiently smart other mind, because after all so is everything else
    Right, but is that intelligence? My main argument is basically an attack on the word intelligence. I believe people use it far too frequently and they allow its meaning to change whenever it comes close to being pinned down. In a strange way, intelligence is a tricky refuge for dualism in an otherwise monist world.
eli_gottlieb 13 years ago

Oh, you just walked into a bloody minefield, mate.
Numenta has pulled crap like this before. We know patents may be pending, but you don't have the epistemological right to go blowing the Great Shofar for the invention of True AI with a link to your company's website and a fancy buzzword about neural or cortical this-and-that on the front page. We need to see some published research, or you need to take over the world. Preferably the former.
Until then, stop making claims unless you want the rest of us to consider you a crackpot and a braggart.
- aufreak3 13 years ago
  
  Wasn't Dileep George part of Numenta?
  - eli_gottlieb 13 years ago
    
    I checked and yes. Which bugs me even more.
    Come on, guys, put up or shut up. If you've made the kind of advance in machine learning that entitles you to talk about human-level cognition, take out a patent and then publish some freaking papers. Or take over the world.
    There are accepted ways of proving claims like this, and founding company after company without releasing a product or publishing research isn't one of them.
    
    aufreak3 13 years ago
    
    While that criticism is valid, it is also possible for them to think that the details of it be better kept a trade secret than be revealed to the public through either a patent document or a detailed enough paper. (Just giving them the benefit of doubt.)
    If you look at Gary Drescher's work published in "Made up minds" [1], it seems possible that an AI that can influence the world can more efficiently arrive at intelligence than one that simply observes it - i.e. one that can learn by performing experiments rather than only looking at data coming out of everywhere. So there does seem to be scope for approaches to AI that aren't in the "data trumps everything" gang.
    [1] http://books.google.co.in/books/about/Made_up_Minds.html?id=...
    
    eli_gottlieb 13 years ago
    
    Sorry, I didn't mean to disagree with the underlying message that AI/ML should be getting away from the "just eat huge amounts of data and process it" paradigm.
    What I more meant is: why on Earth should we accept that whenever someone says the magic words "intelligence", "cognition", "mind", or "consciousness", we switch our scientific brains off and start openly espousing blatant woo? Any real advancement in high-level AI not only should but must involve a scientific theory of intelligence: what is it, how does it operate, how can we measure it? Is it made up of smaller component parts or is it a unified "thing"? How can we detect it if shown a non-human intelligence?
    If someone has such a theory, it should be entirely possible to publish the theory without revealing details of their proprietary algorithms. If, in fact, they believe that there is only one algorithm that gives rise to intelligence in the entire universe, then they might want to keep a trade secret, but they should have to justify to a Senate subcommittee or something why the hell they're trying to keep one of the deepest, most fundamental secrets of Nature a secret.
    A true science of AI should do for intelligence what the Wright Brothers did for human flight: stop the cargo-cult and find the underlying principles. In fact, a true science of AI should split the field into three branches: theory of intelligence, taxonomy of naturally-occurring agents, and engineering of artificial agents.
    Given all that, anyone and everyone who takes the "I'VE FOUND THE SECRET BUT I'M NOT TELLING YOU PATENTS PENDING NEENER NEENER NEENER BUT TOTALLY INVEST IN MY COMPANY" approach... comes off like a Renaissance gentleman scientist suddenly claiming to have discovered the Philosopher's Stone.

ColinWright 13 years ago

Video of the system in action submitted here:

https://news.ycombinator.com/item?id=6626405

bparsons 13 years ago

So they have invented the world's best OCR software?

vidarh 13 years ago

Not really, unless your corpus consists mainly of hopelessly distorted characters.
They state a captcha solving rate of around 90%.
For OCR to be cost-competitive, you typically need it to be correct on about 98% of characters or more; below that and it is typically cheaper to have a human typing in the text than to have a human correct OCR'd text.
Modern OCR engines typically do better than 99% on text that isn't really badly damaged (my MSc. dissertation was on error correction in OCR, and as part of that I tested some engines with pages that had been crumpled, intentionally damaged with sand and liquids, and even then many of the engines managed more than 99%).
- gondo 13 years ago
  
  hi, would it be possible to see your dissertation somewhere? thx
yen223 13 years ago

Sorta relevant xkcd: http://xkcd.com/810/
memracom 13 years ago

Actually they have invented a supplement to OCR software that will work for the characters that OCR is not certain about. The world`s best OCR software would be software that recognizes when it should pass off a patch of text to this new AI engine to decode.

porker 13 years ago

It'd be interesting to know more about this approach. In particular:

> One big difference in Vicarious’s approach, says cofounder Dileep George, is that its system can be trained with moving images rather than only static ones.

Does this imply they teach it how the shapes of numbers change, for easier detection?

waps 13 years ago

It generally means you teach it to recognize 3d shapes (a 2d image that moves = a 3d image, more or less. Yes there's a good reason why you might want to call it 2.5d, but the easy way to model a 2.5d object is in 3d). Think of it as the difference between recognizing 2 points and recognizing a Feynman diagram.
This is one of the things people don't often realize you can do with algorithms. You don't need to look at the world the way it actually really exists, and there may be very good reasons not to. Training algorithms to actually recognize moving images is incredibly hard, because it requires things like memory, fade-outs, recurrent networks, all that very advanced stuff. Obviously time exists as a continuum in the "real" world. But that's bloody inconvenient. So just look at big "quanta" of time, collecting all data points during the quantum, analyse it, then shift the quanta/window ahead 0.1s and do the exercise again. This is so much easier you wouldn't believe it.
Teaching an algorithm to recognize, say, a car collision, given 100 frames. It doesn't require any change to the algorithm (just a change in training data). And obviously your backend system needs to be aware that, over time, the "isColliding" output will look like ......1.....11.....1111...1111.1.1.111.11..11...11.11...11...1.....1...1...... when a collision occurs and this of course doesn't mean you've had 20 collisions.
It does mean a bigger network, slower training, and more resources needed. But not as much as you'd think. Keep in mind that a "temporal" network will need more hidden layers. Also please consider building "redundant" networks for temporal data. When people ask why, I have no better answer than that it's the same technique our brain uses, so frankly if it's good enough for God, it's good enough for me.
Doing the temporal thing means you're back to using trivially simple algorithms, running on more data.
Cracking captcha's is not very impressive. I've done it as a weekend project, and exceeding "average" human captcha'ing ability is easy. I actually got it to the point where my algorithm was slightly better at captcha's than me, where I was allowed to take 2 minutes for difficult captchas. If I wasn't allowed to take more than 10 seconds, my algorithm easily beat me by over 10% (my captcha performance, when measured, shockingly is only ~83%). I didn't cheat : I used an external site's captchas (from dns.be).
The algorithm used was dead simple backpropagation.
- alphydan 13 years ago
  
  > Cracking captcha's is not very impressive. I've done it as a weekend project, and exceeding "average" human captcha'ing ability is easy
  If it's so easy, could you share it on github?

theboss 13 years ago

I've Used a toolkit like 3 years ago that completely beat the captchas. Recaptchas are harder though. Which did they beat?

welly 13 years ago

Both Captcha and Recaptcha failed terribly on a forum that I run. I run a photography forum and what worked great for me was asking the user signing up to answer a pretty simple question related to this fairly specific (large format) kind of photography. I get no spammers or spambots at all now.
Turning on recaptcha only briefly to test resulted in a mass of spammers.
- theboss 13 years ago
  
  This is because you can pay to have humans resolve captchas. It is really sad, but worth it I guess to the spammers....

gprasanth 13 years ago

Spam = Problem -> Captcha = Solution -> Captcha Solver = Another Problem. Sigh.

svantana 13 years ago

Well if you read the article you'll see that it's only a technology demo, they're not releasing it as a product and their goal is not this but computer vision /brain mimicking in general.
Even CAPTCHA inventor Luis von Ahn has talked about CAPTCHAs being a practical way of generating test data for computer vision systems.

acchow 13 years ago

I wonder how it does on reCAPTCHA

ozh 13 years ago

Probably better than me anyway: I usually refresh the reCaptcha image at least 10 times before I get one that 1) I might decypher and 2) won't take 15 letters to input

EGreg 13 years ago

What if we will reach a time when robots have sex better than people... then what will happen?

freehunter 13 years ago

The world will keep spinning?
I think the assumption you're making is that humans only have sex because it feels good, and not for the explicit purpose or the creation of life. If people want children, they'll procreate regardless, even if it doesn't feel as good as robot sex.

b0z0 13 years ago

Not surprised. Not to be cavalier or anything, but I mean, even I was about to start a side-project to solve those things.

GuiA 13 years ago

Awesome! It would be amazing if you open sourced your own solutions for it so we could learn from you- I find the topic fascinating! :-)
- boyter 13 years ago
  
  Shameless plug http://www.boyter.org/decoding-captchas/ I wrote this a few years ago. Has full source code too.
thejosh 13 years ago

checkout captchasniper, was the best one a while ago.

Settings

AI Startup Says It Has Defeated Captchas

Keyboard Shortcuts