Seminar Details
2024-06-11 (14h) : New results for old MDP algorithms
At Euler building (room A.002)
Organized by Mathematical Engineering
Speaker :
Arsenii Mustafin (Boston University)
Abstract :
The Markov Decision Process (MDP) is a fundamental mathematical model for sequential task problems. While basic analysis of MDPs was done back in the 1960s, recent successes in reinforcement learning have sparked a new wave of interest and led to significant results. During my talk, I will cover our recent findings in the analysis of classical RL algorithms. The first part of the talk is dedicated to the application of variance reduction techniques to TD-learning (TD-SVRG). We will discuss how a recently introduced interpretation of gradient splitting aids in the analysis of the TD-SVRG algorithm's convergence. The second half of the talk will focus on the convergence of the value iteration algorithm. I will demonstrate how the assumption of the connectivity of an optimal policy yields an improved convergence rate for the algorithm
