Settings

Theme

Deep Reinforcement Learning is a waste of time

jtoy.net

21 points by snaky 6 years ago · 3 comments

Reader

beisner 6 years ago

In a lot of ways, the field has already come to this conclusion. At NeurIPS this year some of the biggest topics in Deep RL were model-based RL and meta-learning for RL, both of which aim to learn a generalized representation of an environment that can be used in a variety of downstream tasks.

MasterScrat 6 years ago

If you are not familiar with RL, I recommend first reading the two articles that the author links to:

- https://www.alexirpan.com/2018/02/14/rl-hard.html

- https://himanshusahni.github.io/2018/02/23/reinforcement-lea...

They are no so recent anymore, but still capture the problem well.

Long story short: RL doesn't work yet. We're not sure it'll ever work. Some big companies are betting that it will.

> My own hypothesis is that the reward function for learning organisms is really driven from maintaining homeostasis and minimizing surprise.

Both directions are actively researched: maximizing surprise (to improve exploration), and minimizing surprise (to improve exploitation).

See eg "Exploration by Random Network Distillation" for the first, "SURPRISE MINIMIZING RL IN DYNAMIC ENVIRONMENTS" for the second.

w1nst0nsm1th 6 years ago

Sometimes, send a letter is the best way to do.

Some systems fail to even implement the concept of reward (and punishment) and the agent is not even 'aware' of what is a reward (or a 'punishment'), and so the agent don't even know he is being rewarded (or 'punished') is in the first place. Then the system has to be redesigned to optimize the code.

Sometimes AI is the least straight forward solution, the most expensive and the less efficient in matter of result.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection