Settings

Theme

Pearl: A Production-Ready Reinforcement Learning Agent

github.com

73 points by da4id 2 years ago · 9 comments

Reader

DennisP 2 years ago

> prioritize cumulative long-term feedback over immediate feedback and can adapt to environments with limited observability, sparse feedback, and high stochasticity

Sounds like something that could learn to play decent poker.

catlover76 2 years ago

Sorry for the dumb question, but can someone ELI5 what one is supposed to do with this? How does it fit into the world of fine-tuning, function calling, etc?

  • adastra22 2 years ago

    This is not a LLM.

    This is an AI in maybe the more traditional/popsci sense. A digital robot. It not just understands its perceptions (like an LLM), but it acts on that understanding to achieve its goal(s).

    The reinforcement learning aspect is simply how it learns its goals. It takes a database of "good bot" / "bad bot" feedback and associated context, and implicitly learns what it should do.

  • bwanab 2 years ago

    Reinforcement learning is not LLM. It's the technology that is used in industrial applications (walking, acting robots, etc) and game playing (e.g. AlphaGo to play Go, AlphaChess, etc.)

syngrog66 2 years ago

unwise name

B1FF_PSUVM 2 years ago

They missed spelling it 'perla' on purpose?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection