is a 2-D grid world embedded in a 3-D simulation. Grid locations correspond to potential intersections in the maze; the maze is randomly generated and contains a single shortest path to the goal, no loops. Various search and learning agents are implemented in the Maze environment, as well as a first-person search setting.
What the display means
In the search demo:
- Red cube - goal position (opposite of the starting position)
- Yellow marker - next location the agent is going to
- Blue marker - past locations the agent has already expanded, i.e. whose successors have already been generated
- Green markers - generated (but not yet expanded) locations the agent may return to
- White markers - found path
In the Q-learning demo:
- Yellow cube - location of a (discrete) state
- Blue cube - Q-value of the action in that state (direction from the yellow cube identifies the direction of the action (NSWE), and distance from the yellow cube corresponds to the actual Q-value: the further the blue cube is, the higher the corresponding Q-value
The controls can be easily redefined, but in general, the following keys should work:
- F1 - help (opens the browser to show this page)
- A - move camera left
- D - move camera right
- W - move camera forward
- S - move camera back
- Q - pan camera left
- E - pan camera right
- R - tilt camera up
- F - tilt camera down
- space bar - recenter camera to origin
- ESC - exit the currently running mod
- Mouse Scroll - zoom in or zoom out
- Z - zoom in
- C - zoom out
The pull-down menu lists the different types of agents available for the Maze:
- Depth First Search - starts the depth first search agent
- Breadth First Search - starts the breadth first search agent
- A* search with three different types of visualizations
- Single Agent A* Search - the agent has to navigate through the maze both to make progress and to back-track to move on
- Teleporting A* Search - the agent can search for solutions faster by teleporting to the next open node instead of having to backtrack
- Front A* Search - the agent now has the ability to produce several new agents when faced with different alternatives. These agents are marking the front of the search.
- Q-Learning, coarse and fine - the agent learns from reinforcement signal using the off-policy learning algorithm (tabular, no function approximation; the coarse version is based on a 8x8 and the fine version on a 64x64 location table).
- First Person Control, coarse and fine - use the arrow keys to try to solve the maze yourself! The coarse version corresponds to the search agents and the coarse Q-learning agent, and the fine version corresponds to the fine Q-learning agent.
The control panel includes additional controls:
- The Exploit/Explore Slider - this is applicable only to the learning methods such as Sarsa and Q-Learning. Because these methods start out knowing nothing about the best actions and learn from experience, they face an exploration-exploitation trade-off during learning, where they have to decide how much of the time to do the best thing they know how to do (exploit) and how much of the time to try to seek new experience (explore). The exploit/explore slider lets you make this decision for them - side it to the right to encourage exploration and slide it to the left to see what the best learned policy so far looks like. The slider will appear as soon as you start running Q-learning.
- The Speedup Slider - this slider controls another tradeoff: one between displaying the simulation slowly enough to see robot animations and movements from cell to cell, and as quickly as the computer running OpenNERO can handle it. The speedup slider is again particularly useful when running the learning agents, because they may require a large amount of experience before finding the optimal path through the maze. To progress through the learning faster, slide the Speedup slider to the right.
- Generate New Maze Button - this button allows you to mix things up by generating a new random maze. Some mazes take longer than others, and some are more suited to particular search techniques.
- Pause/Continue - Pause will temporarily suspend the execution of the algorithm; the button changes to Continue and hitting it will resume execution.
- Start/Reset - Start will begin running the selected algorithm; the button changes to Reset and hitting it will terminate the algorithm.
- Help - will get you to this page.