Home » Education » Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen

Deep Reinforcement Learning, part 1 - Doina Precup - MLSS 2020, Tübingen

Written By virtual mlss2020 on Thursday, Jul 09, 2020 | 01:44 AM

 
Table of Contents (powered by https://videoken.com) 0:00:00 Speaker Introduction 0:01:22 Introduction to Reinforcement Learning: Part 1 - prediction, Value-Based, Model-free, Control (including DQN) 0:05:18 Reinforcement Learning 0:07:01 Example: AlphaGo & AlphaZero 0:12:39 Key Features of RL 0:14:03 Reinforcement Learning 0:14:36 Example: TD-Gammon 0:16:49 Some RL Successes 0:24:44 Computational framework 0:25:37 The Agent-Environment Interface 0:27:21 Supervised vs Reinforcement Learning 0:28:40 Agent's learning task 0:29:34 Return 0:30:59 Episodic Tasks 0:31:19 Example: Mountain Car 0:35:40 Continuing Tasks 0:40:58 4 value functions 0:44:24 Value function approximation 0:45:00 A natural objective in VFA is to minimize the Mean Square Value Error 0:46:02 Simple Monte Carlo 0:48:55 Gradient MC works well on the 1000-state random walk using state aggregation 0:51:09 Markov Decision Processes 0:53:24 Optimal Value Functions 0:54:40 What About Optimal Action-Value Functions? 0:55:20 Bellman Equation for a Policy 0:57:37 cf. Dynamic Programming 0:58:56 Recall: Monte Carlo 0:59:29 Simplest TD Method 1:01:34 TD Prediction 1:03:24 You are the Predictor 1:06:03 TD vs MC 1:07:22 Semi-gradient TD is less accurate than MC on the 1000-state random walk using state aggregation 1:09:00 n-step TD Prediction 1:11:13 Mathematics of n-step TD Targets 1:12:13 The λ-return is a compound update target 1:12:45 Unified View 1:22:48 Value function approximation (VFA) replaces the table with a general parameterized form 1:23:03 Stochastic Gradient Descent (SGD) is the idea behind most approximate learning 1:25:07 Geometric intuition 1:29:30 TD converges to the TD fixedpoint, OTD a biased but interesting answer 1:32:35 Summing up policy evaluation 1:33:56 TD(λ) performance with a 1:34:31 Q&A