Getting started with Reinforcement Learning

Reinforcement learning is a fascinating family of algorithms that closely match our intuitions about the way humans learn. Perhaps the two most famous examples come from DeepMind. In December 2013, they announced a deep reinforcement learning algorithm that surpassed human expert performance on a number of Atari games. Then in March 2016, another algorithm, AlphaGo, defeated Lee Sedol, the world’s best player at Go, a game orders of magnitude more complex than chess. More recently, reinforcement learning has made its way into Dota, a fiendishly fast-paced multiplayer battlefield game.

In reinforcement learning, algorithms are agents, which act in environments. Actions change the environment and generate rewards (or punishments) for the agent which reinforce good behavior and discourage bad behavior. Agents learn through trial and error, and eventually figure out what good and bad actions are in different settings.

Working with these algorithms is fun because you get to reason about a broad range of issues that have natural counterpoints in human learning. For example,

  • How should the world (environment) be described? What information does an agent need to learn the problem we are asking it to solve? How granular should this information be, i.e. on what level does the agent perceive events?
  • How should your algorithm remember its experiences? Should it forget things? If so, how?
  • How should your algorithm balance repeating things it knows are good v.s. trying out new things in order to learn more?
  • How complex is the problem? So how much capacity (think compute power or flexibility) does your algorithm need? And can too much complexity inhibit learning?

Many of the classic reinforcement learning problems are simulations with a visual component or computer games. For example, below are a few snapshots of an agent at different stages of the learning process in the Lunar Lander environment from the OpenAI gym. This is another reason why reinforcement learning is interesting. You get to watch your algorithm learn in real time. Sometimes this can be painful, but often it is delightful and surprising. It offers insights into an algorithmic learning process that are not readily available for other problems.


If this introduction has piqued your interest, then here are a few links to get you started.



Work out at the OpenAI Lab: This is an experimentation framework for deep Reinforcement Learning (RL) using the OpenAI Gym, Tensorflow, and Keras. OpenAI Lab is created to do RL like science – theorize, experiment, so has an automated experimentation and evaluation framework. It is also intended to reduce the time it takes to get started learning RL, and implement algorithms. To do this it provides a modular set of components that correspond to the main RL concepts, as well as a number of customizable algorithms, available out of the box.

For supported environments, loading them takes one line of code, allowing users to focus on designing agents to operate in them. Currently supported environments are a selection those provided by the OpenAI gym

Deep neural networks for RL algorithms are equally easy to implement, through the agent classes. The following algorithms are available;

  • DQN
  • Double DQN
  • Actor Critic
  • Deep Deterministic Policy Gradients

All users need to specify is the architecture (number and size of hidden layers) and a few hyper-parameters. Alternatively, if you want finer grained control over the network architecture, inheriting from the relevant class will mean that the training is handled for you.

rl/spec/*_experiment_specs.json controls the parameter settings for an experiment. Either tweak existing parameter settings, or design your own agent. Below is an example set of specs for the LunarLander environment with a deep Q-network.

"lunar_dqn": {
"problem": "LunarLander-v2",
"Agent": "DQN",
"HyperOptimizer": "GridSearch",
"Memory": "LinearMemoryWithForgetting",
"Optimizer": "AdamOptimizer",
"Policy": "EpsilonGreedyPolicy",
"PreProcessor": "StackStates",
"param": {
"train_per_n_new_exp": 5,
"batch_size": 32,
"lr": 0.005,
"gamma": 0.99,
"hidden_layers": [400, 200],
"hidden_layers_activation": "sigmoid",
"output_layer_activation": "linear",
"exploration_anneal_episodes": 150,
"epi_change_lr": 200

To help you understand how your agent is performing, there is live plotting of three key metrics as your agent trains.

  • Total rewards per episode
  • Mean rewards per last 100 episodes
  • Loss per batch

Here's an example of these plots for an agent that had been training in the LunarLander environment for 500+ episodes.


Reinforcement learning is a hugely fun and interesting field to start learning about and I hope you find these resources useful. For more information about the OpenAI Lab check out the docs


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s