Deep Reinforcement Learning is a waste of time
jtoy.netIn a lot of ways, the field has already come to this conclusion. At NeurIPS this year some of the biggest topics in Deep RL were model-based RL and meta-learning for RL, both of which aim to learn a generalized representation of an environment that can be used in a variety of downstream tasks.
If you are not familiar with RL, I recommend first reading the two articles that the author links to:
- https://www.alexirpan.com/2018/02/14/rl-hard.html
- https://himanshusahni.github.io/2018/02/23/reinforcement-lea...
They are no so recent anymore, but still capture the problem well.
Long story short: RL doesn't work yet. We're not sure it'll ever work. Some big companies are betting that it will.
> My own hypothesis is that the reward function for learning organisms is really driven from maintaining homeostasis and minimizing surprise.
Both directions are actively researched: maximizing surprise (to improve exploration), and minimizing surprise (to improve exploitation).
See eg "Exploration by Random Network Distillation" for the first, "SURPRISE MINIMIZING RL IN DYNAMIC ENVIRONMENTS" for the second.
Sometimes, send a letter is the best way to do.
Some systems fail to even implement the concept of reward (and punishment) and the agent is not even 'aware' of what is a reward (or a 'punishment'), and so the agent don't even know he is being rewarded (or 'punished') is in the first place. Then the system has to be redesigned to optimize the code.
Sometimes AI is the least straight forward solution, the most expensive and the less efficient in matter of result.