The course will introduce students to the fundamental concepts of reinforcement learning. Students will learn to develop RL models and understand the intricacies in various aspects of the field.
- The Reinforcement Learning problem : evaluative feedback, non-associative learning, Rewards and returns, Markov Decision Processes, Value functions, optimality and approximation.
- Dynamic programming : value iteration, policy iteration, asynchronous DP, generalized policy iteration.
- Monte-Carlo methods : policy evaluation, roll outs, on policy and off policy learning, importance sampling.
- Temporal Difference learning : TD prediction, Optimality of TD(0), SARSA, Q-learning, R-learning, Games and after states.
- Eligibility traces : n-step TD prediction, TD (lambda), forward and backward views, Q (lambda), SARSA (lambda), replacing traces and accumulating traces.
- Function Approximation : Value prediction, gradient descent methods, linear function approximation, ANN based function approximation, lazy learning, instability issues
- Policy Gradient methods : non-associative learning, REINFORCE algorithm, exact gradient methods, estimating gradients, approximate policy gradient algorithms, actor-critic methods.
- Planning and Learning : Model based learning and planning, prioritized sweeping, Dyna, heuristic search, trajectory sampling, E 3 algorithm
- Hierarchical RL : MAXQ framework, Options framework, HAM framework, airport algorithm, hierarchical policy gradient
- Case studies : Elevator dispatching, Samuelâs checker player, TDgammon, Acrobot, Helicopter piloting