Google Brain's Magenta: Multi-Style Image Transfer with Code

magenta.tensorflow.org

73 points by cinjon 9 years ago · 30 comments

Reader

As an artist I find it very frustrating when people try to apply style transfer type techniques in an attempt to emulate an artist like Picasso. It kinda works and generates a bunch of hype but it isn't even close. The reason it's frustrating is because I think Deep Learning is actually capable of doing this stuff but the people implementing need to understand how Picasso actually did his work.

If you look at cubism the whole idea is to capture multiple sides of a 3-Dimensional object at once. A lot of art is not a "style" but rather a projection from 3D (or 4D) space to 2D space.

If you wanted to paint a "dog" in the style of Picasso your network would need to understand the geometry of a dog.

Training on a bunch of 2D before and after training examples is underspecified.

It's important to understand that it is a mapping from 3D -> 2D ... NOT 2D->2D

Another example is "Nude Descending a Staircase" by Duchamp: https://en.wikipedia.org/wiki/Nude_Descending_a_Staircase,_N...

It is a painting describing motion. To apply style transfer would be completely stupid because the point of the image is to project 4D->2D ... not to have wavy black and brown lines.

dmreedy 9 years ago

Unfortunately, I feel like it is the case that much current DNN work is predicated specifically around not understanding the problem, in a kind of Skinnerian rejection of GOFAI; the hope is that the signal in the data is strong enough that the statistical learning will 'understand' it for you, and all you need to worry about are tweaking hyperparamters until it clicks.
To the point of your concern, for various, and likely numerous reasons, this does not always seem to occur.
pavlov 9 years ago

To translate that kind of conceptual aesthetic logic into an algorithm, the programmer essentially needs to become the artist: make subjective creative decisions about the style to achieve, and enshrine those into code. And (as dmreedy wrote in a sibling comment) that's specifically the kind of "old-school" AI approach the current DNN-based work is trying to avoid.
I'm not as optimistic as you that the current statistics-driven approaches could ever reach the kind of deep analytic modeling that would be required for a style transfer system to be able to look at a Picasso and infer that there's a 3D->2D mapping at play... And it's a very interesting thought because (to me) it seems to demonstrate how far we are from actual AI that could make that kind of inventive conceptual leap.
- salik_syed 9 years ago
  
  What data does an artist consider when he paints? He does sort of an optimization procedure very similar to what something like Deep Dream does. But rather than doing response optimization to make random-noise more "dog-like" or "cat-like" or "human-like" (as Deep Dream does) the optimization is done to evoke a certain feeling within the artist himself. To create more extreme feelings than just a photo-realistic rendering.
  The mapping between feeling and images are correlated to each other through experience. Certain images are fundamental to human experience and the human brain through evolution( a mother smiling, scary monsters). Others are learned (ever been hit by a car? bet that every time you see that exact model and color of car you'll feel an emotion)
  Here's a thought experiment:
  What if we fed the deep learning "painter" tons of 3D animation. Each point in time would be a full 3D Scene. Each point in time would be labelled with emotions "scary", "happy" , "angry"
  I bet the algorithm could generate original art and learn new artistic styles by maximizing response to certain permutations of feelings.
  - visarga 9 years ago
    
    Learning from video is researched in many papers. The way video data is structured, it allows for identification of new objects by comparing consecutive frames. It creates a "model of the physical world" that can predict the future a few time steps ahead. It is being used to identify activities and to help plan robotic movements.
lqdc13 9 years ago

I think for cubism, the easiest way would need to create the "original" painting/image before cubism transformation and learn the transformation.
Ideally several such images to not overfit on the specific details of that transformation.
The other way is to get a huge dataset of cubism still life paintings and also a dataset of a bunch of photographs of still life and learn the "average" transformation from that. Although such transformation may not generalize to other subjects and might only work well with flowers/food on a table.
Same thing with the other styles. For example (NSFW), photographs of naked women such as https://www.daniel-bauer.com/images/art_nudes/15_artistic_nu... transformed into classical https://www.google.com/culturalinstitute/beta/asset/the-birt... Here you would first identify the people objects and then learn the transformation of both people and backgrounds.
Still, the current approach works fine for things like starry night because of the nature of the painting.
paulsutter 9 years ago

So how about showing us a demo of what you mean? (from the playbook wherein "he who criticizes, volunteers")
Yes, most people working on deep learning are making small incremental improvements, and yes it's a little tiresome to see each one trumpeted as some big advance.
But its really hard to make fundamental advances. Which shouldn't stop you from working on it.
- salik_syed 9 years ago
  
  Agree that it is easy to be a critic ... I am indeed working on it :)
  That being said -- my goal is to inform of a better approach rather than criticize.

gabipurcaru 9 years ago

Why is everyone working on style transfer? It doesn't seem like such an interesting problem in the field, compared to things like speech recognition for example. Is it just because it's a "cracked" problem and it looks nice? I'm just genuinely curious here, not trying to bash the amazing work these people do.

visarga 9 years ago

Style transfer is part of a new trend that is concerned with generation of content. It is very difficult to generate images or text because the space of possible shapes/messages is infinite and highly dimensional. We know how to classify in 1000 categories (which corresponds to generating tags from a set of 1000 choices) but when it comes to painting, it requires to select a combination of pixels from a much much higher dimensional space. Hence, the difficulty.
But I think that generating in high dimensional spaces, such as in translation, style transfer, gameplay and robotics is the most interesting part of AI. It is what makes AI appear more intelligent and creative to us. AlphaGo was impressive because it could select movement sequences from a space of 10^120 possible combinations (compare that with an ImageNet classifier that outputs from a space of 10^3 labels).
So, in conclusion, it is essential to learn to generate images, text, sounds and behavior or movement that are just as complex and coherent as those created by humans. Being able to do so would mean half the way to AGI would be achieved, we could have talking moving robots that are not lame. Remember the latest text to speech engine from DeepMind - that's speech generation from a higher dimensional space. It shows the difference compared to regular TTS.
joefkelley 9 years ago

I don't think anybody is taking it extremely seriously. For Google, it's PR. For individuals working on it, it's fun, interesting, and accessible.
feelix 9 years ago

Simply put, it's because apps like Prisma have demonstrated that there are 100's of millions of people that want this. So developers are following the market demand.
dorianm 9 years ago

If you can change the style of an image to anybody's style I guess you could:
- take photos and apply the styles of famous photographers
- take your writing and apply the styles of famous writers
- take your code and apply the style of famous coders
etc.
- xamuel 9 years ago
  
  I'd like you to be right, but I don't think you are.
  There's a big difference between style transfer in art vs. literature or code. In art, it's ok to get close enough, laymen will forgive a lot of noise. A lossy painting is still a painting.
  With great literature, every word is carefully chosen. You can't take something like Franz Kafka and randomly fuzz it, you'll destroy hidden features which differentiate it from the mediocre.
  With code it's even harder. There's almost zero room for noise, a stray period throws it completely off.
  - Houshalter 9 years ago
    
    Theres some recent work in style transfer for sentences. The way someone says something can vary a huge amount between individuals, even if the meaning of a sentence is the same. The hard part is separating meaning from style, which requires a dataset of different sentences with the same meaning. Translation datasets are one possible solution. Different translators might have unique styles and word choices that an NN could learn to separate.
bertiewhykovich 9 years ago

Because it's a way to avoid confronting the increasingly unavoidable fact that the AI renaissance DNNs were supposed to usher in is looking increasingly less impressive. Unsurprising, given that throwing more computing power at neural networks doesn't constitute a fundamental leap forward -- but disconcerting to a community that expected, and promised, far more than is being delivered.
- Florin_Andrei 9 years ago
  
  Hold on a second. We're still in the very, very early stages here. We haven't even started to connect those networks together to make hierarchies.
  You're speaking like someone watching the Wright brothers testing some of their earliest models, and going "supersonic flight my ass, you guys can't even fly across this football field".
  - therein 9 years ago
    
    > We haven't even started to connect those networks together to make hierarchies.
    What's stopping us at this point?
    
    zardo 9 years ago
    
    Nothing, but it might not be the right approach.
    We didn't progress from the wright flyer by stacking more and more wings on. (Although that path was explored for a couple decades)
- Houshalter 9 years ago
  
  What exactly do you think was 'promised and expected'? Because from here it looks like deep learning has delivered an awful lot more than what anyone expected. No one expected it to beat Go. No one expected it to achieve human level results on problems like image recognition. And no one expected all this to happen in just a few years.
  NNs have made measurable and enormous progress in many different AI domains in a very short space of time. There are awesome new applications and improvements coming our every day.
  It's easy to say, from the vantage point of hindsight bias, that everything that's happened was predictable. So what exactly do you expect from NNs and AI in the near future? Make some testable predictions.
  - espadrine 9 years ago
    
    I actually agree with you, as I feel that deep neural networks have exceeded expectations, but I like the guessing game, so I'll do a few predictions that, who knows, might be exceeded.
    Fully autonomous vehicles (as in, all passengers can sleep) with less deaths than human drivers in 2020.
    Realtime text-to-speech matching top humans, including proper intonation, in 2025.
    Fully autonomous computer factories (as in, trucks deliver raw materials in containers at one location, and fetch the computers in containers at another) in 2035.
- AlexCoventry 9 years ago
  
  Right, making the best Go player in the world and cutting Google's power bill by 40% were huge yawns.
  - bertiewhykovich 9 years ago
    
    Optimization problems -- the bread and butter of machine learning for years. DNNs are certainly more powerful than many earlier-generation systems, but it's a quantitative difference, not a qualitative one. A DNN may have more neurons, more synapses, and access to more data, but it's not doing anything genuinely new.
    A lot of hopes seem (to me) to have been pinned on the notion that neural nets (as we currently understand them) are the one true algorithm. This notion seems to have been fueled by the significant success of DNNs for certain (highly specific) problems, and by a (shallow) analogy with the human brain. However, it's becoming increasingly clear that this is not the case -- that an artificial neural net is an artificial neural net, no matter how many GPUs you throw at it.
    
    choxi 9 years ago
    
    From what I understand, the current bottlenecks for machine learning are:
    - The lack of good data. Machine learning and DNN's specifically perform best with large datasets, that are also labeled. Google has open sourced some, but they (supposedly) keep the vast majority of their training data private.
    - Compute resources. Training these datasets (which can be over terabytes in size) takes a lot of computational power, and only the largest tech companies (e.g. Google, Facebook, Amazon) have the capital to invest in it. Training a neural net can take a solo developer weeks or months of time while Google can afford to do it in a day.
    There are actually a lot of advances being made in the algorithms, but iteration cycles are long because of these two bottlenecks and only large tech companies and research institutions have the resources to spend overcoming those bottlenecks. Web development didn't go through a renaissance until web technology became affordable and accessible to startups and hobbyists from reduced server costs (via EC2 and PaaS's like Heroku).
    By that analogy, I think we're still in the early days of machine learning and better developer tools and resources could spur more innovation.
    
    AlexCoventry 9 years ago
    
    I don't have the impression that serious researchers regard them as a One True Algorithm, or as sufficient in their own right for development of human-level AI. Why do you believe that?
    
    bertiewhykovich 9 years ago
    
    I'm not claiming that they do, although AI researchers who focus on DNNs certainly have a vested interest in accentuating their capabilities -- particularly when they have industry ties. I'm referring more to intellectual trends in Silicon Valley at large.
- Teodolfo 9 years ago
  
  Who promised the renaissance?! We should put them in the stocks and throw overripe fruit at them!

bcheung 9 years ago

Not sure how related this is but seems like the right crowd to ask...

As a photographer who also does programming full time I've been wondering what would need to happen to synthesize skin texture to remove imperfections. Example, removing small scars, wrinkles, etc. Currently I just use the healing brush in Photoshop but wondering if ML can be used to automatically do it.

Does anyone have any recommendations on what sub-fields or papers I could read to get a better idea of what would be involved to create a solution like that?

whataretensors 9 years ago

Sounds like a problem for inpainting.
Here's a paper around synthesizing human faces. It includes inpainting http://www.faculty.idc.ac.il/arik/seminar2009/papers/VisioFa...
https://arxiv.org/pdf/1604.07379.pdf this uses a GAN to inpaint with arbitrary data. This is probably a couple of iterations from being easy to implement, as training GANs efficiently and accurately is still a technical challenge.
shostack 9 years ago

Not fully automated with ML, but as a fellow photographer I think you'll find tutorials on frequency separation in Photoshop helpful and relevant.
Now if only that existed in Lightroom so I don't need to have a massive PSD and can keep my nice and tiny .dng files.

Settings

Google Brain's Magenta: Multi-Style Image Transfer with Code

Keyboard Shortcuts