Settings

Theme

Outcome-Based Reinforcement Learning to Predict the Future

arxiv.org

99 points by bturtel 10 months ago · 16 comments

Reader

ctoth 10 months ago

Do you want paperclips? Because this is how you get paperclips!

Eliminate all agents, all sources of change, all complexity - anything that could introduce unpredictability, and it suddenly becomes far easier to predict the future, no?

valine 10 months ago

So instead of next token prediction its next event prediction. At some point this just loops around and we're back to teaching models to predict the next token in the sequence.

  • lumost 10 months ago

    Tokens are an awfully convenient way to describe an event.

  • ww520 10 months ago

    It’s the next state. So instead of spitting out words, it will spit out a whole movie, or a sequence of world states in a game or simulation.

jldugger 10 months ago

From the abstract

> A simple trading rule turns this calibration edge into $127 of hypothetical profit versus $92 for o1 (p = 0.037).

I'm lazy: is this hypothetical shooting fish in a barrel, or is it a real edge?

  • nyrikki 10 months ago

    Note the 'hypothetical profit' part , I know of several groups looking for opportunities to skim off LLM traders, leveraging its limited sensitivity, expressiveness, and the loss of tail data.

    Predictive AI is problematic no matter what tool you use. Great at demoware that doesn't deliver.

    I am sure there are use cases, but it would be augmentation, not a reliable approach by itself.

amelius 10 months ago

Why would you use RL if you're not going to control the environment, but just predict it?

garbagecoder 10 months ago

"a couple of wavy lines"

bzzzzz "sorry this isn't your lucky day"

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection