Reinforcement learning is a fascinating family of algorithms that closely match our intuitions about the way humans learn. Perhaps the two most famous examples come from DeepMind. In December 2013, they announced a deep reinforcement learning algorithm that surpassed human expert performance on a number of Atari games. Then in March 2016, another algorithm, AlphaGo, defeated Lee Sedol, the world’s best player at Go, a game orders of magnitude more complex than chess. More recently, reinforcement learning has made its way into Dota, a fiendishly fast-paced multiplayer battlefield game.
In reinforcement learning, algorithms are agents, which act in environments. Actions change the environment and generate rewards (or punishments) for the agent which reinforce good behavior and discourage bad behavior. Agents learn through trial and error, and eventually figure out what good and bad actions are in different settings.
Working with these algorithms is fun because you get to reason about a broad range of issues that have natural counterpoints in human learning. For example,
- How should the world (environment) be described? What information does an agent need to learn the problem we are asking it to solve? How granular should this information be, i.e. on what level does the agent perceive events?
- How should your algorithm remember its experiences? Should it forget things? If so, how?
- How should your algorithm balance repeating things it knows are good v.s. trying out new things in order to learn more?
- How complex is the problem? So how much capacity (think compute power or flexibility) does your algorithm need? And can too much complexity inhibit learning?
Many of the classic reinforcement learning problems are simulations with a visual component or computer games. For example, below are a few snapshots of an agent at different stages of the learning process in the Lunar Lander environment from the OpenAI gym. This is another reason why reinforcement learning is interesting. You get to watch your algorithm learn in real time. Sometimes this can be painful, but often it is delightful and surprising. It offers insights into an algorithmic learning process that are not readily available for other problems.
If this introduction has piqued your interest, then here are a few links to get you started.
- Reinforcement Learning: An Introduction, Sutton and Barto: This is the classic reinforcement learning textbook. Link is to the in progress 2nd edition. First edition published in 1998 is available here.
- Deep Reinforcement Learning: Good 90 minute video overview by John Schulman. Assumes some knowledge of neural networks.
- David Silver’s Reinforcement Learning course: An excellent series of 10 lectures. These cover the main topics in reinforcement learning; Markov decision processes, model free control, value functions, policy gradients, and exploration vs. exploitation.
Work out at the OpenAI Lab: This is an experimentation framework for deep Reinforcement Learning (RL) using the OpenAI Gym, Tensorflow, and Keras. OpenAI Lab is created to do RL like science – theorize, experiment, so has an automated experimentation and evaluation framework. It is also intended to reduce the time it takes to get started learning RL, and implement algorithms. To do this it provides a modular set of components that correspond to the main RL concepts, as well as a number of customizable algorithms, available out of the box.
For supported environments, loading them takes one line of code, allowing users to focus on designing agents to operate in them. Currently supported environments are a selection those provided by the OpenAI gym
Deep neural networks for RL algorithms are equally easy to implement, through the agent classes. The following algorithms are available;
- Double DQN
- Actor Critic
- Deep Deterministic Policy Gradients
All users need to specify is the architecture (number and size of hidden layers) and a few hyper-parameters. Alternatively, if you want finer grained control over the network architecture, inheriting from the relevant class will mean that the training is handled for you.
rl/spec/*_experiment_specs.json controls the parameter settings for an experiment. Either tweak existing parameter settings, or design your own agent. Below is an example set of specs for the LunarLander environment with a deep Q-network.