Deep Learning Is Easy – Learn Something Harder
inference.vcIMO this 2016 article really hasn't aged well. It turned out that architectural improvements really did matter, and there was still loads of low-hanging fruit. His prediction that bayesian approaches (which were the topic of his PhD) would turn out to be fundamental, has not turned out to be true so far (although they do have their place).
(I think in general when people say that their special area of study is particularly important, it should be taken with a grain of salt!)
> It turned out that architectural improvements really did matter
Transformer (2017). NET based Diffusion (2015). Score based diffusion (2019). DDPM (2020). Uses unet (2015). Clip (~2021) uses Resnet (2015) + ViT (2020). stable diffusion also uses Unet.
Yes, deep learning is easy. Throw compute, you get the answer.
Those with insights on the old problem are now churning papers because they throw compute and deep learning to the stuffs they understood well.
You can look into any paper and see inspiration from old methods. Applying deep learning is hard, deep learning itself is quite easy. If it were really hard, it wouldn't have been popular at all! Nowadays most of the people don't even bother with architectural gains.
Sure if better architecture eventually comes up, people will throw new ones into the compute. I don't believe in bayesian stuff either. However it is worth learning otherwise you might miss insights on a lot of papers!
Whatever you do, just having the idea to understand problem matters a lot. Then later comes the part of deep learning and throwing compute.
I’m sure there’s an argument to be made that all architectural improvements to basic feed forward neural nets are essentially optimizations for the amount of compute provided anyway.
If we had unlimited compute and time for training, I don’t think we would’ve really moved on from dense feed forward nets.
This is terrible and frankly self serving advice.
"Don't build on top of deep learning. Build on top of MCMC-like methods"
I used to do research into such methods. That game is over. It's a massive waste of time at the moment. The whole idea is how do we import what was good about those methods into the modern deep learning toolkit. How do I sample from distribution with dl? How do I get uncertainty estimates? How do I compose models, get disentangled representations, get few shot learning, etc.
The idea that people should go back to tools like MCMC today is pretty absurd. That entire research program was a failure and never scaled to anything. I say this of my many dozens of papers in the area too.
I would never give this advice to my PhD students.
Maybe in a decade or two someone will rescue MCMC like methods. In the meantime your PhD students will suffer by being irrelevant and having skills that no one needs.
You dismissed most of the modern computational finance and a lot of cross valuation adjustments mechanisms that are behind the modern financial system and risk estimates.
Edit: not to mention modern statistical mechanics…
Which then cycles back to ML methods. Diffusion models are non-equilibrium stat mech.
Sure building a better HMC algorithm might not be a good use of time, but the spirit of MCMC is alive
Yep. Capture the distribution you want to sample from by training a DNN on the data and then sample it via generative/diffusion. Basically the DNN is a learnt parametrized distribution and replaces the Markov chain. The adversarial training approach seems to me like a forward backward algorithm over the dataset anyway.
But does this validate OP's claim?
My general belief, is that the best way to learn at the frontier of something is to pick a problem or a goal and try to solve it. Then you will learn what is in the way of getting that done.
Unless you already have deep expertise, I think it's a bad idea to pick a research area and just go and research that. You won't have intuition about why it's a good thing to research. However, you can have intuition about real world problems and the solutions you want to see, and then work backwards to what you need to research.
Sometimes when I drive to solve a really hard problem rather than solving directly I come to conclusion it's better to do it a different way where I don't run into the hard problem.
The art of a scientist is knowing which problems to pick.
Too easy and you won’t find anything new, too hard and you won’t make any progress.
What's the difference between a "problem" and a "research area"?
Research areas are made of problems to solve.
Agree, distinction is fuzzy. I meant, instead of picking a research area because it seems cool/interesting/etc, either:
The blog post seemed to be more for people doing applied things, so I was speaking to those doing option 1.1. pick a product or user problem and try to solve that problem, then work back into the research you need to do so 2. pick a research problem and try to solve that (again, focusing on the problem, not just an area of study)>focusing on the problem, not just an area of study
how exactly do you plan on picking a problem that is either unsolved and/or not uninteresting without "picking a research area"? solving an already solved problem is a waste of your time (reviewer #2 will just point out the relevant missing citation). solving an uninteresting problem is a waste of everyone's time: it will take someone some time to figure out what you've actually done and that it's useless.
this is why generic platitudes like this are worse than useless - you're giving the impression/appearance of high wisdom that's sure to lead some naive kid astray; for example, the bulk of a phd is not actually solving some problem but finding the right problem to solve (interesting and unsolved).
I should probably have not mentioned the academic use case. My advice is better for people trying to solve real-world problems, as opposed to improving the theory. Not to say that improving the theory doesn't later lead to massive real-world solutions!
I feel very strongly, that if someone wants to apply ML to accomplish something outside of academia, they should think about an applied use case and then work backwards to what they need to learn. Otherwise, you will have a solution in search of a problem.
If someone wants to go into academia or theory, then yes, I think you're right. They need to pick a research area. But then I think the goal should be get to the problem space as soon as possible. I think it would be suboptimal to decide "I want to improve Bayesian ML" without first deciding the "why", such as: "I want to make ML models more understandable." And yeah, maybe you need to do a little research to know what the problems-to-solve are first.
Per all the above, I never went into academia, so take my opinion there with a grain of salt. I have worked exclusively at startups and co-founded one, so take my opinion there with two grains of salt :)
It is interesting to look back and evaluate the preferences / intuitions prominent researchers had in the field (most of whom started their careers experimenting with MNIST-scale of data, at best)
With access to unfathomable amounts of data, especially over the last couple of years, the game changed entirely and is not seeming to cool down anytime soon.
The field, certainly, values engineering a lot more than it used to, and it is exciting to see how major advances together with open-source contributions are going to take us
> It is interesting to look back and evaluate the preferences / intuitions prominent researchers had in the field (most of whom started their careers experimenting with MNIST-scale of data, at best)
Look back to how HN greeted the victory of the SuperVision team (Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton) in the 2012 ImageNet Large Scale Visual Recognition Challenge - https://news.ycombinator.com/item?id=4611830
We have been doing feature engineering, then we got into architecture engineering, and the future is dataset & prompt engineering. All models learn more or less the same, given the same training budget and dataset. But better data makes a better model.
Same for humans, the better the education and more advanced the science, the more we achieve with the same brains. The key ingredient is ideas, both for humans and AI.
Maybe if you stick to purely theoretical stuff, it's easy. Actually building real systems that work and add value using deep learning isn't easy. There are so many gotchas.
https://www.inference.vc/we-may-be-surprised-again/
The author wrote this article one month ago, and he mentioned where he thought wrong about DL
[2016]
thanks, didn't notice the year until your comment. wonder what his thoughts are on it now.
Pretty straightforward case of the curse of knowledge (https://en.wikipedia.org/wiki/Curse_of_knowledge), in my opinion.