Implementations of various RL Algorithms
Contents:
Implementation of selected algorithms from the book. I tried to make code snippets minimal and faithful to the book.
| Part I: Tabular Solution Methods | |
|
Chapter 2: Multi-armed Bandits
|
|
|
Chapter 4: Dynamic Programming
|
|
|
Chapter 5: Monte Carlo Methods
|
|
|
Chapter 6: Temporal-Difference Learning
|
|
| Part II: Approximate Solution Methods | |
|
Chapter 9: On-Policy Prediction with Approximation
|
|
|
Chapter 10: On-Policy Control with Approximation
|
|
A bit more in-depth explanation of selected concepts from David Sivler lectures and Sutton and Barto book.