|
NeroMod
The NERO game description
Demo The NERO Machine Learning GameIn the NERO game, the user trains intelligent agents to perform well in battle. It is a machine learning game, i.e. the focus is on designing a set of challenges that allow agents to learn the necessary skills step by step. Learning takes place in real time, as the user is observing the game and changing the environment and behavioral objectives on the fly. The challenge for the player is to develop as proficient a team as possible. The NERO game in OpenNERO is a simpler research and education version of the original NERO game, focusing on demonstrating learning algorithms interactively in order to make it clear how they work. The game environment is first described below, then the two methods for training the agents (neuroevolution and reinforcement learning), how a team can be put together for battle, and then the battle mode itself. Ways of extending the learning methods and handcoding the teams, as well as differences from the original NERO are described in the end. To get a quick introduction to NERO, watch the video below.
NERO EnvironmentThe player first enters the NERO-Training environment, where s/he develops a team and saves it. The player then enters the NERO-Battle environment, where s/he loads two competing teams that then battle each other. The agents are simulated "Steve" robots seen also in the Maze and BlocksWorld environments. In NERO they have egocentric sensors:
Their effectors are
Note that there is no output for taking a shot. Instead, the agents shoot probabilistically. First, they have to be oriented within 2 degrees of the target; outside of that angle they never shoot. Similarly, if they are further than 600 lenth units away (roughly the width of the standard field), they never shoot. Between 600 and 300, their likelyhood of shooting increases linearly, and within 300, they always shoot. Within the 2 degrees, their accuracy increases linearly so that at 2 degrees they have a 50% chance of hitting, and iff they are facing the center of the target exactly, they'll always hit. It is therefore possible to train agents to become better at shooting by getting closer and orienting more accurately towards the enemy. Their weapon is a laser gun that shoots a single instantaneous ray; it is blocked by walls and trees, but it has no effect on teammates. The red team shoots red rays and the blue team blue rays; if either ray hits a wall or a tree, it turns green. Each shot that hits an agent decreases the agent's hitpoints by one; once the hitpoints run out, the agent dies and is removed from the field. In battle, it is gone for good; during training, it is respawned with hitpoints and lifetime reset. The standard NERO environment consists of an enclosed field with a wall in the middle and a couple of trees around, behind which the robots can take cover. Most of the environment can be manipulated during training, and to create interesting new battlefields if so desired. The player can look around the environment as usual using the keyboard and mouse controls:
Near the top left of the screen there's a single number that indicates the current frame rate of the display. It should be 24 or higher in a visually appealing animation, but may fall to 12 or lower on a slow machine (as well as currently on MacOSX due to Irrlicht rendering issue). NERO TrainingIn the training mode, the user selects one of the two training methods (neuroevolution or reinforcement learning) and manipulates the environment and the behavioral goals in order to train them to do what s/he wants. Typically the training starts by deploying either an rtNEAT team or Q-learning team, and then setting some of the goals (or fitness coefficients) in the parameter window (the sliders become active after a Deploy button is pressed). They are:
There are also a number of parameters that effect learning that should be set appropriately (the default values usually are a good starting point):
The second part of the initialization is to set up the environment. An initial environment is already provided, and it is the same as the battle environment. The user can, however, add objects to it to design the training curriculum, through right clicking with the mouse: When right clicking on empty space:
When right clicking on an object (i.e. a wall or a turret) that you placed:
The trees are sensed as small walls; in the current version they cannot be created or modified though. Over-head displayBy hitting the F2 key, you can cycle through additional information about each agent that may be useful during training. This "over-head" display shows up as a bit of text above each agent on the field. When an over-head display is active, the window title will change to say what is being displayed. Some of information is specific to Neuroevolution, and some is specific to RL.
Neuroevolution (rtNEAT)The rtNEAT neuroevolution algorithm is a method for evolving (through genetic algorithms) a population of neural networks to control the agents. See the paper on rtNEAT for more details. When you press the "Deploy rtNEAT" button, a population of 50 agents is created and spawned on the field. Each agent is controlled by a simply neural network connecting the input sensors directly to outputs, with random weights. Over their lifetime, fitness is accumulated based on the behavior objectives specified with the sliders: if e.g. the approach enemy is rewarded, the time they spend near the enemy is multiplied by a large constant and added to the fitness. After their lifetime expires, they are removed from the field one at a time. If their fitness was low, they are simply discarded. If their fitness was high, they will be put back into the field, and in addition, a new agent is generated by mutating the neural network (i.e. adding nodes and connections and/or changing connection weights) and crossing over its representation with another network with a high fitness. A balance of about 50% new individuals and 50% repeats is maintained in the field in the steady state (the explore/exploit slider has no effect on evolution). In this manner, evolution is running incrementally in the background, constantly evaluating and reproducing individuals. Over time, evolution is thus likely to come up with more complex networks, including those with recurrent connections. Recurrency is useful e.g. when an agent needs to pursue an enemy around the corner (i.e. even though the enemy disappeared from view, activation in a recurrent network will retain that information). In other word, it allows disambiguating the state in a POMDP problem (where the state is partially observable). When the population is saved, the genomes of each agents are written into a text file. That file can be edited to form composite teams, reloaded for further training, or loaded into battle. The rtNEAT algorithm is parameterized using the file neat-params.dat; you can edit it in order to experiment with different versions of the method (such as mutation and speciation rates, balance of old and new agents, etc.) Reinforcement Learning (Q-learning)The reinforcement learning method in NERO is a version of Q-learning (familiar from the Q-learning demo), using either static, linear discretization or a tile-coding function approximator. The agents learn during their lifetime to optimize the behavioral objectives. When you press the "Deploy Q-learning" button, a Q-learning agent is created according to the specs in the file mods/_NERO/data/shapes/character/steve_blue_qlearning.xml. The <Python agent="NERO.agent.QLearningAgent()"> XML element can be changed to include keyword arguments that will be passed to the QLearning constructor. These parameters are:
The last four parameters specify the discretization of the state and action dimensions so that the agent's state can be represented as a discretized table of Q-values, one for each state/action pair (these values are initialized to zero). If you choose to use the tile-coding approximator, be sure to set action_bins and state_bins to 0; conversely, if you wish to use the static bins, be sure to set num_tiles and num_weights to 0. The default Q-Learning agents are created with action_bins set to 3 and state_bins set to 5. The population for the game is generated by cloning this agent 50 times; each agent gets its Q-table to update, so different agents can learn different Q-values depending on their experiences. Q-learning progresses as usual during the lifetime of these individuals, modifying the values in the table. Using the Exploit-Explore slider you can adjust the fraction of the actions taken greedily (i.e. those with the best Q-values) vs. actions taken to explore the environment (i.e. randomly selected actions). When the lifetime of an agent expires, it is respawned, and continues from the spawn location with its current Q-tables. When the population is saved, the Q-tables of each individual are saved together with its parameters and the function approximation parameters, so that they can be loaded for further training and battle. Training strategyThe game consists in trying to come up with a sequence of increasingly demanding goals, so that the agents will perform well in the end. It is a good idea to start with something simple, such as approaching the enemy. Once the agents learn that, place the enemy behind a wall so they learn to go around it. Then reward the agents for hitting the enemy as well. Then start penalizing them for getting hit. Introduce more enemies, and walls behind which the agents can take cover. You can also explore the effects of staying close or apart from teammates, and standing ground or moving a lot. In this manner, you can create agents with distinctly different personalities, and thus possibly serving different roles in battle. Achieving each objective will take some time. Within a couple of minutes you should see some of the agents perform the task sometimes; within 10-15 minutes, almost the entire team may converge. Using the F2 displays you can follow the behavior of the current champion, which agents are drawing fire and which are avoiding it, and with rtNEAT, observe which agents are new and which are old, and how speciation is progressing. Note that it is not always good to converge completely, because it may be difficult to learn new skills then. The trick is to discover a sequence where later skills build on earlier ones so that little has to be unlearned between them. It is a good idea to train several teams, and then test them in the battle mode. In this manner, you can develop an understanding of what works and why, and can focus your training better. Based on that knowledge you can also decide how to put a good team together from several different trained teams, as will be described next. Composing a Team for BattleNote that you can train several different teams to perform different behaviors, for instance a team of attackers, defenders, snipers, etc. It may then be useful to combine agents with such different behaviors into a single team. Because the save files are simply text, you can form such composite teams simply by editing them by hand. You can also "clone" agents by copying them multiple times. You can even combine agents created by neuroevolution and reinforment learning into a single team. The first 50 in the save file will be used in the battle; if there are fewer than 50 agents in the file, they will be copied until 50 are created in battle. The basic structure of the file is like this for rtNEAT teams: genomestart 120 trait 1 0.112808 0.000000 0.305447 0.263380 0.991934 0.000000 0.306283 0.855288 ... node 1 1 1 1 FriendRadarSensor 3 90 -90 15 0 ... node 21 1 1 3 ... gene 1 1 22 0.041885 0 1.000000 0.041885 1 ... genomeend 120 In words, a population consists of one or more genomes. Each genome starts with a genomestart (followed by its ID) line and ends with a genomeend line. Between these lines, there are one or more trait lines followed by one or more input (sensor) lines, followed by some other node lines, followed by the gene lines. For RL teams, the file looks like this: 22 serialization::archive 5 0 0 0.8 0.8 0.1 3 3 ... 1 7 27 OpenNero::TableApproximator 1 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 0 ... 22 serialization::archive 5 0 0 0.8 0.8 0.1 3 3 ... 1 7 27 OpenNero::TableApproximator 1 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 0 ... ... Each team member is represented by a bunch of numbers representing the stored Q table for the agent. Unlike rtNEAT teams, RL agents in this file are separated by one blank line. Either way, you will probably want to pick and choose the individual agents from your training episodes that perform the best for the tasks you anticipate. You should assemble these agents into one file for the battle. (Note: If you include reinforcement learning agents, you need to separate all agents in your submission file with one blank line. Also note: if you form a team by combining individuals from different rtNEAT runs, you current cannot train such a combo team further (because rtNEAT training depends on historical markings that then would not match)). Before you submit to the tournament, you should test your file by loading it into NERO_Battle and making sure it runs correctly. If you want, you can test your team e.g. against this sample team. NERO BattleIn the NERO-battle environment the user first loads the two teams: one is identified as Red and the other as Blue based on how the top of the head of the robots is painted. By default they spawn on the opposite sides of the central wall in the standard environment (the environment and the spawn locations can be changed as in training mode). The Hitpoints slider specifies how many times each agent can be hit before it dies and is removed from the battle. The game ends when one team is completely eliminated or when the time runs out, in which case the team that has more hits on the opponent wins. The current hitpoints are displayed in the title bar of the NERO window; the agent that delivered the winning shot will jump up and down in jubilation :-). The game starts when the user presses the Continue button. The agents are spawned only once, and they then have to move around in the environment and engage the other team. This is where the training pays off: the agents need to respond appropriately to the opponents' actions, emploing different skills in different situations, such as attacking, retreating, sniping, ambushing, sometimes perhaps working together with teammates and sometimes independently of them. There is no a-priori winning strategy; the performance of the team depends on the ingenuity of its creator! To see how the battle mode works, or see how well your team is doing, you can use this sample team. NERO TournamentA fun event in e.g. AI or machine learning courses is to organize a NERO tournament. The students develop teams, and the teams are then played against each other in a round-robin or a double-elimination tournament. One such tournament was held in Fall 2011 for the Stanford Online AI course; the tournament assignment is here. Extending NERO MethodsThe ingenuity is not limited to simply training the agents with the methods that have been implemented in OpenNERO. The game is open source, and you can modify all aspects of it by changing the python code (and in some case, the C++ code). The main files are... For instance, you can implement more sophisticated versions of the sensors and effectors, or entirely new ones such as line-of-fire sensors, or sending and receiving signals between the agents. You can implement more sophisticated function approximators for reinforcement learning, and even other neuroevolution and reinforcement learning algorithms. If you so desire, you can also program the agent behaviors entirely by hand. Note that many such changes will require making corresponding changes into the battle mode as well, and therefore it will not be possible to use them in the NERO Tournament. However, note that as long as your team is represented in terms of genomes and Q-tables, it doesn't matter how that representation is created. That is, if your changes apply to training only, and your team can still be saved in the existing format, the team can be entered into the tournament. For instance, you can express behaviors in terms of rules and finite state automata based on the sensors and effectors in NERO, and then mechanically translate them into neural networks (see e.g. this paper). Those networks can then be represented as a genome and entered into tournament. Differences between OpenNERO and Original NEROThe NERO game in OpenNERO differs from the original NERO game in several important ways. First of all, whereas the original NERO was based on the Torque game engine, OpenNERO is entirely open source (based on the Irrlicht game engine and many other open-source components). This design makes it a good platform for research and education, i.e. it is possible for the users to extend it and to understand it fully. Second, the original NERO was designed to demonstrate that machine learning games can be viable. It therefore aimed to be a more substantial game, and included many features such as more advanced graphics, sound, and user interface, as well as more detailed environments that made gameplay more enjoyable. The 2.0 version of NERO also included interactive battle where the human players specified targets and composed teams dynamically. Third, OpenNERO includes reinforcement learning as an alternative method of learning for NERO agents. The idea is to demonstrate more broadly how learning can take place in intelligent agents, both for research and education. Fourth, the original NERO included several features that have not yet been implemented in OpenNERO, but could in the future. They include a sensor for line-of-fire (which may help develop more sophisticated behaviors); taking friendly fire into account; collisions among NERO agents; different types of turrets in training; a button that converges a population to a single individual, and a button that removes an individual from the population. We invite the users to implement such features, and perhaps others, in the game, and contributed them to OpenNERO! Fifth, much of OpenNERO is written in Python (instead of C++), making it easier to understand and modify, again supporting research and education. Unfortunately, it has the result of slowing down the simulation by an order of magnitude. However, we believe that researchers and students have the patience it takes to "play" OpenNERO, in order to gain the better insight into the learning in it. Software IssuesOpenNERO is academic software and under (hyper)active development. It is possible that you will come across a bug in it, or a feature that should be implemented. If so, please report it here, so that everyone can see it and track it (please first check whether it has already been reported). | |