README.md Reinforcement Learning: An Introduction Python code for Sutton & Barto's book If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly. Contents Click to view the sample output Chapter 1 • Tic-Tac-Toe Chapter 2 • • • • • • Chapter 3 • • Chapter 4 • • • Chapter 5 • • • • Chapter 6 • • • • • • Chapter 7 • Chapter 8 • • • • Chapter 9 • • • • • Chapter 10 • • • • • Chapter 11 • • • Chapter 12 • • • • • Environment • Python2 or Python3 • Numpy • Matplotlib • Six • Seaborn Usage git clone cd reinforcement-learning-an-introduction/chapterXX python XXX.py Contribution This project contains almost all the programmable figures in the book. However, when I completed this project, the book is still in draft and some chapters are still incomplete. Furthermore, due to the limited computational capacity of my machine, I can only use limited runs and episodes for some experiments, so the sample output is much less smooth than that in the book. If you want to contribute some exercises of the book or some missing examples, fix some bugs in existing code, provide sample outputs with higher quality, add some new interesting experiments related to RL, feel free to open an issue or make a pull request. I will appreciate it very much. Reinforcement Learning SuttonAlso, feel free to comment on the sample outputs, some curves are really interesting. Following are known missing figures/examples: • Example 3.4: Pole-Balancing • Example 3.6: Draw Poker • Example 5.2: Soap Bubble • Example 8.5: Rod Maneuvering • Figure 12.14: The effect of λ (I don't have time to replicate it for now) • Chapter 14 & 15 are about psychology and neuroscience • Chapter 16: Backgammon, The Acrobot, Go A Jupyter Notebook version is being developed by now, completed chapters are available in the branch. University Of Massachusetts AmherstReinforcement Learning. Sutton and Andrew G. A reinforcement learning system: a policy, a reward function, a value function, and. This approach extends reinforcement learning to learning for the entire process from sensors to. Learning from Delayed Rewards (PDF). By Rich Sutton and.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |