The bt-RecordBook is intended to simplify the task of comparing experimental results in reinforcement learning. The idea is simple (details follow)
- Create a set of open source agents and environments from the reinforcement learning literature.
2. Create a set of experimental specifications (events). 3. Record the results (records) of each agent with a variety of parameters for every event.
Once these records have been created, they never need to be duplicated again, by anyone. At the same time, because all of the source is available, they can be duplicated at any time, by anyone. When testing a new algorithm, a research scientist will not need to re-implement all of the alternative algorithms. People evaluating his/her results won't have to worry about the experiment being fairly conducted, either.
Let me give an example.
Agent: Sarsa Lambda with Tile Coding Function Approximation and Epsilon Greedy Action Selection Environment: Mountain Car Event: How many episode can the agent complete within 15 000 time steps in Mountain Car, with a fixed starting state
Desiderata
Save Time / Effort / Frustration
- No need to re-implement the environments that are used in the record book
- Library of events makes it easier to find environments particularly applicable to your new algorithm
- No need to re-run hundreds of parameter permutations of your favorite competitor algorithms
o Reduces up-front development time o Lower probability or needing to re-run results because of bugs
o Computation savings o Stronger comparisons
Increase Comparability
- Re-use of experimental design and environments means no idiosyncrasies in setup that change results
Accelerate Advances
- Open Source / Public agents mean that people with new ideas can stand on the shoulders of giants
- Trends and Macro behavior can be seen when larger amounts of data are pooled (same agents on many environments and experiments can be enlightening)
Decrease Experimenter Bias
- "Good" experimental designs will become popular events (natural vetting)
- Intuition is that events with less bias are less likely to favor the creator and therefore will be used by more people
- No concerns that experimenter isn't giving every advantage to the comparison algorithms because they have been created and tested by others
Proposal for version 1.0
First iteration of the record book will be called the bt-RecordBook.
It will be created by Brian Tanner (and company) at the University of Alberta (please tell us if you want to help.
The interface for RL communication will be based on RL-Glue.
Agents and environments will come from the RL-Library, with usability extensions (for exposing parameters) from RL-VizLib.
The initial version of the record book will be based on Java environments and agents in order to get us moving quickly.
The experimental data will be stored (in a format yet to be discussed) in Amazon's Simple Storage System.
Experiments can be verified and validated on any computer, but all official runs will be done remotely on virtualized hardware provided by Amazon's Elastic Compute Cloud. For now, this will be paid for by Brian Tanner's personal research budget.
Future Versions
- Support for "private" results that are only accessible to the experimenter
- Multi-language support
- Google Web Toolkit based web application to create new experiments and schedule trials