# Reinforcement Learning

## Introduction

Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto and Algorithms for Reinforcement Learning by Csaba Szepesvári.

## Multi-armed Bandits

On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples by William R. Thompson, On the Theory of Apportionment by William R. Thompson, Some Aspects of the Sequential Design of Experiments by Herbert Robbins, A Problem in the Sequential Design of Experiments by Richard Bellman and Bandit Problems: Sequential Allocation of Experiments by Donald A. Berry and Bert Fristedt.

## Markov Decision Processes

A Markovian Decision Process by Richard Bellman, Dynamic Programming and Markov Processes by Ronald A. Howard, Learning Machines: A Unified View by John H. Andreae, A Set of Successive Approximation Methods for Discounted Markovian Decision Problems by Johannes A. E. E. van Nunen and Modified Policy Iteration Algorithms for Discounted Markov Decision Problems by Martin L. Puterman and Moon C. Shin.

## Learning Automata

Learning Automata - A Survey by Kumpati S. Narendra and Mandayam A. L. Thathachar and Learning Automata: An Introduction by Kumpati S. Narendra and Mandayam A. L. Thathachar.

## Dynamic Programming

Dynamic Programming by Richard Bellman, Dynamic Programming by Ronald A. Howard and Learning from Delayed Rewards by Christopher J. C. H. Watkins.

## Monte Carlo Methods

Monte Carlo Methods by Malvin H. Kalos and Paula A. Whitlock, Monte Carlo Methods by John Hammersley, Simulation and the Monte Carlo Method by Reuven Y. Rubinstein and Dirk P. Kroese, Monte Carlo: Concepts, Algorithms, and Applications by George Fishman and Monte Carlo Statistical Methods by Christian Robert and George Casella.

## Temporal-difference Learning

An Adaptive Optimal Controller for Discrete-time Markov Environments by Ian H. Witten, Temporal Credit Assignment in Reinforcement Learning by Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences by Richard S. Sutton and Analysis of Temporal-diffference Learning with Function Approximation by John N. Tsitsiklis and Benjamin Van Roy.

## Multi-step Bootstrapping

Learning from Delayed Rewards by Christopher J. C. H. Watkins, Truncating Temporal Differences: On the Efficient Implementation of TD (lambda) for Reinforcement Learning by Pawel Cichosz and Effective Multi-step Temporal-difference Learning for Non-linear Function Approximation by Harm van Seijen.

## Exploration versus Exploitation

The Apparent Conflict between Estimation and Control - A Survey of the Two-armed Bandit Problem by Ian H. Witten, Optimal Control Systems by Aleksandr A. Feldbaum and Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence by John H. Holland.

## Function Approximation

Issues in Using Function Approximation for Reinforcement Learning by Sebastian Thrun and Anton Schwartz, Residual Algorithms: Reinforcement Learning with Function Approximation by Leemon Baird, Generalization in Reinforcement Learning: Safely Approximating the Value Function by Justin A. Boyan and Andrew W. Moore, An Analysis of Linear Models, Linear Value-function Approximation, and Feature Selection for Reinforcement Learning by Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield and Michael L. Littman and An Analysis of Reinforcement Learning with Function Approximation by Francisco S. Melo, Sean P. Meyn and M. Isabel Ribeiro.

## Policy Optimization

Comparing Policy-gradient Algorithms by Richard S. Sutton, Satinder P. Singh and David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation by Richard S. Sutton, David A. McAllester, Satinder P. Singh and Yishay Mansour and A Class of Gradient-estimating Algorithms for Reinforcement Learning in Neural Networks by Ronald J. Williams.

## Simulation

Metacontrol for Adaptive Imagination-based Optimization by Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess and Peter W. Battaglia and Imagination-augmented Agents for Deep Reinforcement Learning by Theophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo J. Rezende, Adrià P. Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter W. Battaglia, David Silver and Daan Wierstra.

## Planning

Machine Learning Methods for Planning edited by Steven Minton, Reinforcement Learning and Automated Planning: A Survey by Ioannis Partalas, Dimitris Vrakas and Ioannis Vlahavas, Combining Reinforcement Learning with Symbolic Planning by Matthew Grounds and Daniel Kudenko, Learning Model-based Planning from Scratch by Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra and Peter Battaglia, The Predictron: End-to-end Learning and Planning by David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto and Thomas Degris and Model-based Planning with Discrete and Continuous Actions by Mikael Henaff, William F. Whitney and Yann LeCun.

## Multi-objective

Multiobjective Reinforcement Learning: A Comprehensive Overview by Chunming Liu, Xin Xu and Dewen Hu, Empirical Evaluation Methods for Multiobjective Reinforcement Learning Algorithms by Peter Vamplew, Richard Dazeley, Adam Berry, Rustam Issabekov and Evan Dekker and Multi-objective Deep Reinforcement Learning by Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers and Shimon Whiteson.

## Multi-agent

Multi-agent Reinforcement Learning: A Critical Survey by Yoav Shoham, Rob Powers and Trond Grenager and A Unified Game-theoretic Approach to Multiagent Reinforcement Learning by Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver and Thore Graepel.

## Deep Reinforcement Learning

A Brief Survey of Deep Reinforcement Learning by Kai Arulkumaran, Marc P. Deisenroth, Miles Brundage and Anil A. Bharath, Deep Reinforcement Learning by Yuxi Li, Human-level Control through Deep Reinforcement Learning by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis, Massively Parallel Methods for Deep Reinforcement Learning by Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu and David Silver and IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-learner Architectures by Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg and Koray Kavukcuoglu.