Reinforcement Learning An Introduction - Richard S. Sutton , Andrew G. Barto.pdf

(3675 KB) Pobierz

Reinforcement

Learning:

An Introduction

Richard S. Sutton

and

Andrew G. Barto

MIT Press,

Cambridge, MA,

1998

A Bradford Book

Endorsements Code Solutions Figures

Errata Course Slides

This introductory textbook on reinforcement learning is targeted toward engineers and

scientists in artificial intelligence, operations research, neural networks, and control

systems, and we hope it will also be of interest to psychologists and neuroscientists.

If you would like to order a copy of the book, or if you are qualified instructor and would

like to see an examination copy, please see the

MIT Press home page for this book.

Or you

might be interested in the reviews at

amazon.com.

There is also a Japanese translation

available.

The table of contents of the book is given below, with associated HTML. The HTML

version has a number of presentation problems, and its text is slightly different from the

real book, but it may be useful for some purposes.

Preface

Part I: The Problem

Introduction

1.1 Reinforcement Learning

1.2 Examples

1.3 Elements of Reinforcement Learning

1.4 An Extended Example: Tic-Tac-Toe

1.5 Summary

1.6 History of Reinforcement Learning

1.7 Bibliographical Remarks

Evaluative Feedback

2.1 An n-armed Bandit Problem

2.2 Action-Value Methods

2.3 Softmax Action Selection

2.4 Evaluation versus Instruction

2.5 Incremental Implementation

2.6 Tracking a Nonstationary Problem

2.7 Optimistic Initial Values

2.8 Reinforcement Comparison

2.9 Pursuit Methods

2.10 Associative Search

2.11 Conclusion

2.12 Bibliographical and Historical Remarks

The Reinforcement Learning Problem

3.1 The Agent-Environment Interface

3.2 Goals and Rewards

3.3 Returns

3.4 A Unified Notation for Episodic and Continual Tasks

3.5 The Markov Property

3.6 Markov Decision Processes

3.7 Value Functions

3.8 Optimal Value Functions

3.9 Optimality and Approximation

3.10 Summary

3.11 Bibliographical and Historical Remarks

Part II: Elementary Methods

Dynamic Programming

4.1 Policy Evaluation

4.2 Policy Improvement

4.3 Policy Iteration

4.4 Value Iteration

4.5 Asynchronous Dynamic Programming

4.6 Generalized Policy Iteration

4.7 Efficiency of Dynamic Programming

4.8 Summary

4.9 Historical and Bibliographical Remarks

Monte Carlo Methods

5.1 Monte Carlo Policy Evaluation

5.2 Monte Carlo Estimation of Action Values

5.3 Monte Carlo Control

5.4 On-Policy Monte Carlo Control

5.5 Evaluating One Policy While Following Another

5.6 Off-Policy Monte Carlo Control

5.7 Incremental Implementation

5.8 Summary

5.9 Historical and Bibliographical Remarks

Temporal Difference Learning

6.1 TD Prediction

6.2 Advantages of TD Prediction Methods

6.3 Optimality of TD(0)

6.4 Sarsa: On-Policy TD Control

6.5 Q-learning: Off-Policy TD Control

6.6 Actor-Critic Methods (*)

6.7 R-Learning for Undiscounted Continual Tasks (*)

6.8 Games, After States, and other Special Cases

6.9 Conclusions

6.10 Historical and Bibliographical Remarks

Part III: A Unified View

Eligibility Traces

7.1 n-step TD Prediction

7.2 The Forward View of TD()

7.3 The Backward View of TD()

7.4 Equivalence of the Forward and Backward Views

7.5 Sarsa()

7.6 Q()

7.7 Eligibility Traces for Actor-Critic Methods (*)

7.8 Replacing Traces

7.9 Implementation Issues

7.10 Variable (*)

7.11 Conclusions

7.12 Bibliographical and Historical Remarks

Generalization and Function Approximation

8.1 Value Prediction with Function Approximation

8.2 Gradient-Descent Methods

8.3 Linear Methods

8.3.1 Coarse Coding

8.3.2 Tile Coding

8.3.3 Radial Basis Functions

8.3.4 Kanerva Coding

8.4 Control with Function Approximation

8.5 Off-Policy Bootstrapping

8.6 Should We Bootstrap?

8.7 Summary

8.8 Bibliographical and Historical Remarks

Planning and Learning

9.1 Models and Planning

9.2 Integrating Planning, Acting, and Learning

9.3 When the Model is Wrong

9.4 Prioritized Sweeping

9.5 Full vs. Sample Backups

9.6 Trajectory Sampling

9.7 Heuristic Search

9.8 Summary

9.9 Historical and Bibliographical Remarks

Dimensions

10.1 The Unified View

10.2 Other Frontier Dimensions

Case Studies

11.1 TD-Gammon

11.2 Samuel's Checkers Player

11.3 The Acrobot

11.4 Elevator Dispatching

11.5 Dynamic Channel Allocation

11.6 Job-Shop Scheduling

References

Summary of Notation

Plik z chomika:

musli_com

Reinforcement Learning An Introduction - Richard S. Sutton , Andrew G. Barto.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: