QLearningExercise
## Extending Q-learning with Function ApproximationIn this exercise, you will implement a function approximator for the Python Q-learning agent. It will abstract a more fine-grained version of the problem space, making the problem tractable for the Q-Learner. ## The OpenNERO PlatformThe following steps should help get you up and running in OpenNERO: - Either:
- Install it on your own machine by downloading one of the prebuilt binaries
- Build OpenNERO on your own machine using the source code.
## Creating your agentTo create your RL agent, open the agent.py file (located in Create a new class called ## The fine-grained maze and function approximatorsThen try running your agent on the fine-grained version. This environment divides the world into much smaller steps (64x64), resulting in a substantially larger state space. You'll likely notice that your agent is not performing as well with the larger state. In such cases, it's helpful to implement a function approximator that abstracts the large state space into something more manageable. You will implement two such approximators. ## Tiling ApproximatorBegin by implementing a simple tiling approximator. This function should simply map the current state back to the standard 8x8 space. For the simple tiling approximator, think of it this way - lets say you have got just 8x8 memory i.e you can just store 64 value function entries. Now due to the limited memory, you divide the 64x64 world grid into 64 tiles of size 8x8 (just like in the figure below where each one of the four quadrants encapsulates 8x8 states). Having such a coarse tiling means that each 8x8 states falling into a single tile will have the same value function. Of-course this does not help you much in terms of learning (and your agent might wander around aimlessly), because nearby states lying in a single tile are not distinguishable. But such coarse tiling does saves you space. However, do not spend too much time in training your agent. For further details, refer to FAQs page. As a next step, you will implement nearest-neighbor approximator in order to distinguish the states falling in a single tile. ## Nearest-Neighbors ApproximatorOnce this is completed and verified, implement a nearest-neighbors approximator. This function should interpolate between the values of the three nearest 8x8 locations. Note that each location should only be interpolated if it is reachableâ€“ don't interpolate across walls! Below is an example of how the nearest-neighbors approximator should work.
Fig1. Nearest Neighbor Approximator. Value of a given state/location is computed by taking weighted sum of 3 nearest neighbors. Equations for tiling approximator can be derived by considering only one nearest neighbor. Note, since OpenNERO environment is deterministic, Value(Y, Left) = Value(X). In the figure, your agent is currently at state Y and is considering taking the action to move left to X. The function approximator would calculate the Q-value at X by interpolating the values of A, B, and C based on their Euclidean distance to X. For instance, X is 3.5 rows up and 1.5 rows to the left of A, so its distance is sqrt(3.5 If X has a larger value than the tiles below or to the right of Y, (Y, Left) will be chosen as the epsilon-greedy (state, action) pair, and that is the value we must use to update the approximator tiles (A, B, and C). To do this, we simply take the reward received at X and perform the standard Q-learning update to all the approximator tiles. In order to detect walls, your approximator can call if ((3,4), (3,5)) in get_environment().maze.walls: print "There is a wall between (3,4) and (3,5)!" The walls set uses the coarse-grained (8x8) mapping for rows and columns, which should work out well for your approximator. For further details, refer to FAQs page. ## Switching between approachesAll three approaches (pure tabular, tiling, and nearest neighbors) should be implemented such that they can be easily switched between (e.g., by commenting out a piece of python code or passing a different parameter) so that it is easy to check that each works. ## DebuggingIf you run into any bugs in your program, you can find the error log file for OpenNERO at one of the following locations: **Linux or Mac:**`~/.opennero/nero_log.txt`**Windows:**`"AppData\Local\OpenNERO\nero_log.txt"`or`"Local Settings\Application Data\OpenNERO\nero_log.txt"`depending on the version of Windows you have.
| |