Reinforcement Learning Playground

Maze Environment

The environment has 54 states, corresponding to the cells of the grid, and 4 actions (up, down, left, right). The agent receives a reward of -0.01 if it tries to step outsize the grid or onto a wall. It receives a reward of +10 if it reaches the goal.

Show Cell Visit Frequency Show Q-Values