Reverse OCR
reverseocr.tumblr.comI was thinking about this: http://www.cs.toronto.edu/~graves/handwriting.html
Very impressive!
The author published an open source library -- RNNLIB [1] used for his neural network research but is the actual code for this hand writing demo published somewhere?
Unfortunately it does not take acute accents. Such as the sentence: Ceci n'est pas écriture humaine.
Nor umlauts, like åäö.
I wonder if it also works for voice synthesis.
Not yet, I think, but people are seriously trying. http://research.google.com/pubs/HeigaZen.html
that is so impressive, wow. thank you for the link
Thats nice, thank you for the link!
This is similar to the project where images of clouds were fed to face recognition software: http://ssbkyh.com/works/cloud_face/
there's also http://iobound.com/pareidoloop/, a project that uses a genetic algorithm for breeding (random) polygons into a shape with a face detection algorithm as the fitness function.
That is totally cool. It feels like watching an oil painter slowly work from something very abstract (layers of brushstrokes) to something very recognizable (a human face).
I'm also intrigued by the cat vs human face recognition results!
Not strictly related, but reminded me of the exercise in genetic programming by Roger Alsing: http://rogeralsing.com/2008/12/07/genetic-programming-evolut...
It's a rather cool attempt to draw the Mona Lisa using random, semi-transparent polygons
I did this recently, the results were surprisingly good! https://github.com/darkFunction/PolygonPainter
Edit: Roger Alsing's implementation was a single entity population (mutated then reverted if the mutation was no good). I copied this approach in my first implementation, but found that much better results could be achieved with a breeding population of genes.
> I copied this approach in my first implementation, but found that much better results could be achieved with a breeding population of genes.
Ohhh, interesting, thanks for posting this! I just started playing around with this myself a few days ago in Javascript (it has no UI so no link yet, but I uploaded some samples [0]), and it also uses the original simple approach. I wondered about an "actual" gene pool and cross-breeding, but shyed away from the additional effort for uncertain benefit... so this helps, greatly :)
One thing I intend to try is to get the fittest (in terms of likeness to target image), and then calculate the fitness of the other genomes by taking the difference (in terms of variables that determine the shapes) to that "champion". I see you take the two most fittest as is, maybe this could be useful for picking the second one?
Also, when the Mona Lisa thing was posted on HN, someone suggested marking areas of the target image as "more important", to maybe make facial features etc. more recognizable. I'll also see if making such a mask automatically, e.g. influenced by contrast, helps any.
> I see you take the two most fittest as is, maybe this could be useful for picking the second one?
Yep, we take the two most fittest unchanged, and breed the rest randomly along a non-uniform distribution tending towards the top.
> Also, when the Mona Lisa thing was posted on HN, someone suggested marking areas of the target image as "more important", to maybe make facial features etc. more recognizable.
This is a really good idea and would definitely make a difference. In my penguin example, with too few polys, sometimes we reach a local maxima before the eyes (small details) look any good. I combatted this somewhat by encouraging new polys to be (a) small and (b) regular. But I like the idea of a more guided approach.
There's an awesome javascript example you might find helpful: http://alteredqualia.com/visualization/evolve/
Edit: Your images look amazing. How do you make the vector shapes?
I already knew the altered qualia link, and actually meant that one when I talked about the post on HN (the link to which I got from the bottom of that page) xD
Thanks! I just use what the HTML canvas has to offer: circles, n-gons, and n-gons made of bezier curves. Those can get drawn both filled and as outline, which in turn aren't drawn with plain colors, but gradients made of 3 HSLA colors (alpha ranges from 0.05 to 0.85, to make sure everything matters at least a bit and doesn't just get covered up or disappear), with the position of the middle color stop, as well as the coordinates that define the gradient direction, being variable.
The linewidth is variable, and it can also use dashed lines with a randomized pattern, which looks funky but I haven't gotten a good result with it yet. Another thing I'll do is add allowing picking a random compositing mode (and in the spirit of the expedition, do that for both fill and outline), but then it will be too unpredictable to look like being made of shapes, I expect.
The background also has a gradient consisting of no less than 6 colors, with 4 variable color stops.. what can I say, I like enums and copy and paste haha. Though of course adding more degrees of freedom nilly-willy might not be the best idea.. I have to improve the whole evolution/mutation stuff a lot first, and then I intend to throw just about anything at it I can find, at least as an option. Right now I added words, though that doesn't seem very promising. But having one basic common function to setup fill and outline gradients for any and all shapes I might come up with makes experimenting with this a breeze.
I kind of feel I should be doing this with WebGL, but then I wouldn't have all those convenient drawing functions.. but still, whenever WebGL 2.0 comes (which will make a lot of compute-ish things a lot easier AFAIK), I want to do at least a polgon/circle/bitmaps thing with it, because I feel a speedup factor of a gazillion might just make up for that :)
Do you have a Github link? Or can you post back here when you're finished? I'd like to see the final result :)
Yeah really good results, conceivably useful for compression. It be good to know the vertices count in your final penguin images.
100, 6-sided polygons :) Though it looks pretty good with as few as 50
This could be a cool way to visually "encrypt" messages. They're readable, but only by the correct tool. I wonder how these squiggles might be creatively arranged steganographicly in an image and still be "read" by the OCR tool.
Correct me if I'm wrong here, but that just seems like reinventing crypto with a large key, and requires you to implement a counterparty-provided algorithm, which could be malicious.
Believe it or not, there's still a lot of utility in putting encrypted messages on physical things (pieces of paper). One time pads work well for this, but imagine if the recognition algorithm was altered in different ways effectively acting as a key s.t. you had to have a similarly altered recognized on your side to see it. Yeah it's symmetric crypto in that sense but you can physically hide all kinds of stuff or divide up the message between different couriers or do other stuff in ways that are a bit unlike digital crypto. The simple fact of a message being the physical object might be enough to confuse an eavesdropper.
Or the message itself could also be encrypted with a more secure system, but then physically presented in an open area so that somebody with a tuned recognizer can get the encrypted data to later decrypt digitally.
The point of steganoraphy is that no realizes there's a message.
And? It still has key sizes, the enemy still knows the system.
Steganography is solidly a "security through obscurity" thing. Sure, we comp-sci people don't care about that, but spies do.
There was that Russian Spy who was transmitting data for years on her Facebook account through steganography pictures on her Facebook account.
http://www.technologyreview.com/view/419833/russian-spies-us...
The FBI didn't know about it until after she was caught. So believe it or not, Steganography _works_. If you're trying to hide the fact that you're a spy, encrypting all of your messages over TOR is a bad idea.
On the other hand, if you pretend to be a normal person and embed secret messages in your Facebook posts, you can be a spy for years and not get caught.
"believe it or not, Steganography _works_"
I think one of the reasons stego works is because of the sheer amount of data being generated and shared in the modern world.
It's kind of a blessing and a curse for spy agencies. On the one hand, they love to collect data, and the more the better, since with more data to analyze, they can potentially learn more things. But the more data there is, the more computing power they have to throw at it to make sense of it.
So it's really not surprising that data can be hidden from spy agencies (possibly by relatively primitive means even), because they probably don't have the computing power (vast as their computing power is) to effectively run every possible detection algorithm and all their highly sophisticated (and probably computationally expensive) steganalysis software on so much data.
Videos, since they are so huge compared to other media files like text or audio, have always seemed like an ideal medium for stego to me. Of course, it's more difficult to preserve one's hidden data on sites like youtube that re-compress the videos that get uploaded to them, but any site that hosts original videos unmolested should be ripe for stego.
>Steganography is solidly a "security through obscurity" thing.
Right, but that means it inherits all the problems of security by obscurity, like it breaking as soon as the public knows the technique, which they do now.
My other point was that this seems to be equivalent to traditional stego solutions but with a key size equal to the algorithm size.
(And I'm not sure why merely asking about they key size problem and obscurity problem hurt the discussion enough to get hammered so hard...)
> Steganography is solidly a "security through obscurity" thing.
I never really understood why is it so.
Encrypted data must be indistinguishible from random, thus, if you replace any random projection of a file with your data, the result should be completely unrecognizable. It shouldn't really matter if your algorithms are public.
Is the problem that it's hard to get random projections from modern data? If so, why not use older formats?
People don't typically exchange randomised versions of their data.
I think "random projection", as used by the parent, can be things like "low bits of the pixels in this image". If the color depth provides greater resolution than the sensors, then you can expect to have some random data implicit in the image that it would be possible to change in ways that could be provably undetectable.
A tremendous caveat is that when we find ourselves shipping around lots of meaningless random bits, we often quickly reach for lossy compression that doesn't faithfully reproduce those bits, and that can break the scheme.
Wow, that article is pretty bad.
Yeah... it was...
It was the first link on Google that seemed to mention the FBI / Russian Spy case. So take it as "proof the thing happened", but ignore the article.
Disclaimer: [1]
From what I understand, ideally stego would be used in conjunction with encryption.
First, you would encrypt your message, then you would use stego to hide it.
If the stego is good, it would be a computationally intractable problem[2] for your adversary to determine whether there was indeed a message hidden within the data they were analyzing, with greater than 50% accuracy.
That said, I'm not sure how practical using an application like this would be for stego. It does not "whiten" the data it tries to hide, so unless the data's already whitened, it could potentially stand out like a sore thumb when subjected to steganalysis. And how would you propose actually using this?
This does present some intriguing possibilities, however, like maybe having Alice and Bob share a tweaked version of an OCR library and having Alice generate random images until her encrypted message has been "encoded" in such a way as to be recognizable by the tweaked OCR library that she shares with Bob. The tweaking of the library's character recognition parameters could be a sort of pre-shared key, and would not be available to Eve (the adversary).
[1] - this post comes from a hobbyist, not from any kind of security researcher, steganalyst, cryptoanalyst, etc. So please take what I say with a grain of salt and please correct me if I'm wrong.
[2] - "computationally intractable" being different for different adversaries, of course, which is one reason you need a good threat model.
This is a good idea! The problem with these squiggles is that they look abnormal and would draw attention. It would be interesting if these can be tweaked somehow so that they are still bot readable but can also be interpreted as patterns by humans.
...Here's a crazy thought.
Take handwriting, the more illegible the better. Then use a genetic algorithm where the fitness function is trying to find as small a perturbation as possible to the input such that the output is recognized as the letters you want.
What if... we combine them with normal looking letters to make a captcha? humans see one thing, bots see some more?
QR code?
Could be used for automated printing of doctors' prescriptions ;)
Perhaps this could lead to a new kind of captcha that only bots can solve. I doubt it would be efficient, though.
Startup idea: create a CAPTCHA that humans recognize as one word, but OCR recognizes as a different word.
This HN item: https://news.ycombinator.com/item?id=8544911 was about how to do this for neural nets.
Nice, a captcha to block colorblind users!
That's a subset of the objective. The objective is to detect all humans because they wouldn't spot the 'face' in the coloured circles. Whether or not it's coloured doesn't change that.
Unless you're concerned about the rights of colour-blind computers for some reason.
Ha, oops, I guess I didn't read it closely enough. As someone who's partially colorblind, just seeing those dot patterns pisses me off... :P
How do you know if the individual being tested doesn't see a face? You have to trust them when they say "no face here"... May as well just ask "are you human?"
What if the computer just chooses a random rectangular region, or it always answers 'no face here'?
If this bot can generate scribbles from words. Theoretically, couldn't this bot work to teach an OCR-bot to effectively recognize scribbles as letters?
Somebody just modded you down instead of answering...
No. It couldn't be used to tech an OCR. Well, technically, it could, but all the OCR will learn is how to read text from this bot, not how to read text written by people.
If a correct answer is given, presume it's a bot.
..simply solved making bots answer randomly the first time
... that's a pretty interesting idea!
Bot Honeypot
HoneyBot
Looks like he has written tons of very creative bots. They are all very interesting ideas (e.g. http://randomshopper.tumblr.com)
It would be pretty interesting to see one degree of abstraction up from this - what sets of lines are close enough to match a certain word?
If you averaged over all those sets, would the resulting blobby heatmap resemble the original word in a legible form? Or something else?
I can imagine generating a few pages or even an entire book of this, and some future generations attempting to figure out what sort of language it was written in... reminds me of this:
I couldn't get that OCR to read my mouse-written E. It's a nice experiment nevertheless.
Indeed, the underlying OCR seems to need lots of work still. It's no wonder that the "reverse" operation results in such messy line art.
The demo is nice, though.
I highly recommend watching talk Darius Kazemi (author of Reverse OCR) gave at this years XOXO: http://www.youtube.com/watch?v=l_F9jxsfGCw
It has been fantastic watching Darius' myriad experiments over the past few years. His work always has a great mixture of whimsy and serious experimentation.
Nice. Finally computers approached the age of writing. :)
I can already imagine the innovation:
> Type over this text to prove that you are a computer.
> Human detected. Shoo, shoo!
Looks like my handwriting
I can't believe OCR has not been solved yet. The only one even close is OmniPage.
Isn't OCR pretty good these days?
Here's the source code on github: https://github.com/dariusk/reverseocr
A generative model, although computationally expensive, would not suffer this problem. Essentially a generative model can run in reverse, which means that if you feed values into the output you get inputs that could explain the output. Check out "Boltzmann Machines" for an example. There are plenty of examples for the MNIST dataset of hand written digits.
I think one of the problems is that the OCR assumes the images to be (English) letters.
To be really really useful, the OCR would need to consider at least all characters in the Unicode Basic Multilingual Plane. And then it needs to be able to reject an image as containing any word, and then it needs to solve the halting problem.
This reminds me of an experiment I played with using random search to "teach" the browser how to draw characters: http://zwass.github.io/Learn2Write/
This actually seems like a great program for automatically generating adversarial examples to improve OCR. A human could rate this text as being illegible or legible. Each example can then be added to the training data to improve its quality.
The Letter Spirit project (a Douglas Hofstadter thing) is sort of in that vein, but less prosaic in its objectives.
It would be neat to see the same thing, except using two OCR libraries instead of just one, and requiring both libraries to be able to read the message. I imagine the letters would start to look a bit less insane.
This is pretty cool, although it makes me wonder what the real world applications could be. It does, at the very least, tantalise my curiosity and gets me thinking.
Could this be used in a pseudo reverse CAPTCHA by showing a series of words, and asking the user to say which is not human readable?
I wonder what would happen if you run this program letter-by-letter, possibly the readability could increase.
I love algorithmic art.
very cool idea.
What (if anything) is this saying about the quality of the OCR process? Especially since none of these seem human readable.
Not much, probably. It would be largely wasted effort to tune OCR algorithms to avoid (falsely) recognizing letters in artificially synthetized datasets that don't occur in practice.
Why's it so highly upvoted then? I was expecting something moderately legible.
> I was expecting something moderately legible.
Is human handwriting anything more then repeated patterns with lines and shapes on a 2d plane?
A computer program designed by humans to assist with human constructs can infer meaning from what appears to be mostly noise to humans. It hits us hard because communication through complex language is the defining trait of our species. It manages to do what we do, and many people are intrigued by the results.
Are we just observing the results of our actions or did we just take one more small step towards the singularity?
I'm not bothered by it any more than I'm bothered by our brains' propensity for seeing faces where there aren't any.
That's a pretty interesting way of looking at it, machine pareidolia.
Even computers see faces where there aren't any. I've had Facebook's tagging functionality sometimes pick a non-human part of an image as a face to be tagged.
Only a reminder that the OCR process looks nothing like what humans do.
Several of them did seem readable to me.