Computer Eyesight Gets a Lot More Accurate
bits.blogs.nytimes.comThis year's result: http://image-net.org/challenges/LSVRC/2014/results
Computer vision is one of those odd areas that I cannot see a nice gentle slope to adoption, but instead is a step change. For example NLP gives us all sorts of add ons to our current interaction with computers (hey let's do sentiment analysis of customer reviews / emails / etc)
But there is no obvious slope for computer vision - we need an infrastructure of cameras and bandwidth before it becomes ubiquitous
So I struggle to see the profitable intermediate businesses between here and there - and that troubles me.
There are many intermediate applications. From my own computer vision classes at university I remember examples of jobs when a guy is sitting and looking at a factory line or a machine for 8 hours a day in order to press a button if something goes wrong. This is a kind of work that bores humans out of their minds (thus making them extremely fallible), and that can be done much better with a few cameras and a computer running not-very-supercomplicated computer vision algorithms.
I don't think that's entirely true.
There are many tasks where the current levels of accuracy are sufficient (eg, registration plate recognition), and as recognition slowly improves more and more tasks become possible.
Pete Warden has written extensively on this topic[1]. His "hipster detection" algorithm is quite inaccurate by any conventional measurement, but is accurate enough to be useful.
[1] http://petewarden.com/2014/07/31/how-to-get-computer-vision-...
Every smartphone has a camera... and if self-driving cars become a reality there will be a lot of cameras on the road as well.
Who needs bandwidth when you can push your models to the local device with a small update? They can just send back batched statistics when a high bandwidth network is available. After all, cars need gas or a charge sometime.
It is just a binary patch to change some weights or an architecture layout, which is not so different from updating any other application.
Most businesses are covered with cameras as well as hiring people who's only job is to watch those cameras for anomalous activity - I think there are more opportunities than you realize. Farming is another indistry where this technology could be useful.
Science is but a perversion of itself unless it has as its ultimate goal the betterment of humanity. - Nikola Tesla
Does this not nearly amount to "population-scale mass surveillance algorithms"? Do people not feel this is accelerating negative social impacts of technology?
Is it merely a coincidence that winning teams include many from countries criticized for their totalitarian social contracts: Hong Kong University of Science and Technology, National University of Singapore, Microsoft Research China, Southeast University (China), Chinese Academy of Sciences? There's also a presence from Holland.
Oh, and guess who won the category "with additional training data"? Google.
Come on people, we can do better than this! SHAME SHAME SHAME.
This can already be done to a large degree... see [1]. That said, this contest is about recognition of items and localization, both of which are key for the future of robotics and have little do with your surveillance state fears.
Ultimately, the thing stopping mass surveillance is not a limitation of technology, but of policy. For better or worse, the days of "they don't have the resources to do that" have been replaced by "they aren't allowed to do that".
If you have access to the raw packets going to and from every device, and the accelerometer in almost everyone's pocket, identification can be much simpler than doing full face recognition all the time.
I seriously doubt the dawn of the surveillance state will be heralded by deep neural networks recognizing faces in the streets - hardware and software backdoors on phones are cheaper and more effective.
The stated scope is object detection and image classification at large scale, so I would be interested to hear your reasoning as to why that is not applicable to mass surveillance.
It's not a stretch of the imagination to see these things being sold to airports, seaports, mass transit stations, and storefronts as a security feature. Next, your physical mail could be scanned. None of that seems politically unlikely in the current climate.
To be perfectly honest, the fact that NSA already has access to every single packet into or out of the US (and probably most inside the states as well...) for much cheaper with much less rollout overhead, points me away from these types of algorithms as a "tool of mass surveillance". Think Occam's razor - you would need massive political pull to put this in every tiny jurisdiction, not to mention equipment maintenance and the massive attack vector exposed by hundreds of "internet of things" devices piping data to some endpoint. The recognition results would need to be geolocated, time tagged, and encrypted to NSA specs. To access the data it would have to go through some kind of unclass->classified firewall, get decrypted AND they would have to keep the public in the dark, blah blah blah.
The tools already revealed for large scale surveillance are cheaper, more effective, and more robust to outside attack than the mentioned ideas. More importantly, they are already there - there is no rollout cost at all! And up until recently it was also easier to keep the public in the dark...
I do see applications at the places you mention, but for a very different reason - border inspections (coupled with human oversight) are an excellent place for automation where a small amount of effort could lead to a massive increase in throughput per person.
The only downside is that officials who deploy these things will want guarantees on effectiveness, which you can never truly give due to statistics. Couple this with the fact that neural networks are very difficult to tune for false negatives and false positives and it would be a difficult sell.
One alternative would be to use these types of networks as black-box preprocessing, followed by a "tunable" algorithm like logistic regression where you could effectively control the ratio of false positives - a high rate of false positives coupled with human oversight could still lead to a large boost in human performance if most of the border inspection process is uninteresting.
But still there are unions... which is a whole separate issue to itself.
I remain unconvinced. Fundamentally, packets and physicalia are apples and oranges.
China, Hong Kong, Singapore, the UK, and an increasing number of cities worldwide already have massive video surveillance networks allied to local law enforcement, traffic management, and other functions. Adding another stage of image processing would help to leverage and extract actionable data from those (the classic problem of CCTV is that nobody watches it) ... with probably very little additional outlay compared to existing investment.
I can't see anything stopping this commercial progression, in fact I see it as inevitable unless the citizenry can somehow curb their politicians. Good luck with that...
It's a good point. You should get involved in the industry so you have an opportunity to shape the conversation towards better policy outcomes.
Google is hardly a monopoly here and "open data initiatives" will expand in scope soon.
It's not technology that sucks, it's how people use it. Technology is just a multiplier on people's ability to do things. It's too bad that so many people are up to no good in the first place. No need to become a Luddite though.
You misunderstand the type of algorithms being developed and tested for this challenge.
History shows us that virtually all imaging related research is rapidly applied to military and government surveillance efforts. However general the algorithms, the direction these technologies are helping to take society does seem fairly clear at this point. I do not argue there are no good, peaceful uses, merely that major uses are oppressive and that current era actors in this space do not have good records on morality nor a lack of extensive, zero public oversight opportunity to abuse this research to negative social ends.
Tesla was wrong. We don't always know the future applications of research.
Just FYI, the only additional data used by the GoogLeNet entry was from the classification challenge (aka provided by the organizers), hardly something that would make you lose sleep at night.
I was not suggesting Google was pulling data from other sources in some sort of conspiratorial way, but rather pointing out for interest that its algorithmic superiority was weighted toward large data sets. Given the volume of data they see and store in their existing operations, I saw that as a potentially interesting correlation.
Minor editing nitpicks:
GPU, not G.P.U. OpenCV, not Open CV
c'mon NYT, act like you know.
Open CV is a mistake, but the style guide in force at the Times gives guidance for the use of dots in abbreviations. G.P.U. appears as dictated by their style guide. The Times also inserts dots into C.I.A. and F.B.I. for the same reason.