🌱Aadam's Garden


Search IconIcon to open search

Reinforcement Learning: An Introduction

Last updated Aug 10, 2022

# Reinforcement Learning: An Introduction


  • CiteKey:: suttonReinforcementLearningIntroduction2018
  • Type:: book
  • Author:: Richard S. Sutton, Andrew G. Barto
  • Publisher:: MIT Press
  • Year:: 2018
  • ISBN:: 978-0-262-03924-6
  • Tags:: #Source/Zotero
  • Format:: PDF


The significantly expanded and updated new edition of a widely used text on reinforcement learning, one of the most active research areas in artificial intelligence.Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field’s key ideas and algorithms. This second edition has been significantly expanded and updated, presenting new topics and updating coverage of other topics.Like the first edition, this second edition focuses on core online learning algorithms, with the more mathematical material set off in shaded boxes. Part I covers as much of reinforcement learning as possible without going beyond the tabular case for which exact solutions can be found. Many algorithms presented in this part are new to the second edition, including UCB, Expected Sarsa, and Double Learning. Part II extends these ideas to function approximation, with new sections on such topics as artificial neural networks and the Fourier basis, and offers expanded treatment of off-policy learning and policy-gradient methods. Part III has new chapters on reinforcement learning’s relationships to psychology and neuroscience, as well as an updated case-studies chapter including AlphaGo and AlphaGo Zero, Atari game playing, and IBM Watson’s wagering strategy. The final chapter discusses the future societal impacts of reinforcement learning.

Files and Links

Tags and Collections

  • Keywords:: Intelligence (AI) & Semantics, RL, 📙, 📥

Table of Content

  1. Introduction
    1. Reinforcement Learning
    2. Examples
    3. Elements of Reinforcement Learning
    4. Limitations and Scope
    5. An Extended Example: Tic-Tac-Toe
    6. Summary
    7. Early History of Reinforcement Learning
  2. Multi-armed Bandits
    1. A k-armed Bandit Problem
    2. Action-value Methods
    3. The 10-armed Testbed
    4. Incremental Implementation
    5. Tracking a Nonstationary Problem
    6. Optimistic Initial Values
    7. Upper-Confidence-Bound Action Selection
    8. Gradient Bandit Algorithms
    9. Associative Search (Contextual Bandits)
    10. Summary
  3. Finite Markov Decision Processes
    1. The Agent–Environment Interface
    2. Goals and Rewards
    3. Returns and Episodes
    4. Unified Notation for Episodic and Continuing Tasks
    5. Policies and Value Functions
    6. Optimal Policies and Optimal Value Functions
    7. Optimality and Approximation
    8. Summary
  4. Dynamic Programming
    1. Policy Evaluation (Prediction)
    2. Policy Improvement
    3. Policy Iteration
    4. Value Iteration
    5. Asynchronous Dynamic Programming
    6. Generalized Policy Iteration
    7. Efficiency of Dynamic Programming
    8. Summary
  5. Monte Carlo Methods
    1. Monte Carlo Prediction
    2. Monte Carlo Estimation of Action Values
    3. Monte Carlo Control
    4. Monte Carlo Control without Exploring Starts
    5. Off-policy Prediction via Importance Sampling
    6. Incremental Implementation
    7. Off-policy Monte Carlo Control
    8. *Discounting-aware Importance Sampling
    9. *Per-decision Importance Sampling
    10. Summary
  6. Temporal-Difference Learning
  7. TD Prediction
  8. Advantages of TD Prediction Methods
  9. Optimality of TD
  10. Sarsa: On-policy TD Control
  11. Q-learning: Off-policy TD Control
  12. Expected Sarsa
  13. Maximization Bias and Double Learning
  14. Games, Afterstates, and Other Special Cases
  15. Summary
  16. n-step Bootstrapping
    1. n-step TD Prediction
    2. n-step Sarsa
    3. n-step Off-policy Learning
    4. *Per-decision Methods with Control Variates
    5. Off-policy Learning Without Importance Sampling: The n-step Tree Backup Algorithm
    6. *A Unifying Algorithm: n-step Q
    7. Summary
  17. Planning and Learning with Tabular Methods
    1. Models and Planning
    2. Dyna: Integrated Planning, Acting, and Learning
    3. When the Model Is Wrong
    4. Prioritized Sweeping
    5. Expected vs Sample Updates
    6. Trajectory Sampling
    7. Real-time Dynamic Programming
    8. Planning at Decision Time
    9. Heuristic Search
    10. Rollout Algorithms
    11. Monte Carlo Tree Search
    12. Summary of the Chapter
    13. Summary of Part I: Dimensions
  18. On-policy Prediction with Approximation
    1. Value-function Approximation
    2. The Prediction Objective (VE)
    3. Stochastic-gradient and Semi-gradient Methods
    4. Linear Methods
    5. Feature Construction for Linear Methods
      1. Polynomials
      2. Fourier Basis
      3. Coarse Coding
      4. Tile Coding
      5. Radial Basis Functions
    6. Selecting Step-Size Parameters Manually
    7. Nonlinear Function Approximation: Artificial Neural Networks
    8. Least-Squares TD
    9. Memory-based Function Approximation
    10. Kernel-based Function Approximation
    11. Looking Deeper at On-policy Learning: Interest and Emphasis
    12. Summary

# Notes

# By Chapter

  1. Chapter 5 - Monte Carlo Methods
  2. Chapter 6 - Temporal-Difference Learning
  3. Chapter 7 - n-step Bootstrapping
  4. Chapter 8 - Planning and Learning with Tabular Methods
  5. Chapter 9 - On-policy Prediction with Approximation

# Extracted Annotations