Seminar Details

Home > Seminars > Details

2024-06-11 (14h) : New results for old MDP algorithms

At Euler building (room A.002)

Organized by Mathematical Engineering

Speaker : Arsenii Mustafin (Boston University)

Abstract : The Markov Decision Process (MDP) is a fundamental mathematical model for sequential task problems. While basic analysis of MDPs was done back in the 1960s, recent successes in reinforcement learning have sparked a new wave of interest and led to significant results. During my talk, I will cover our recent findings in the analysis of classical RL algorithms. The first part of the talk is dedicated to the application of variance reduction techniques to TD-learning (TD-SVRG). We will discuss how a recently introduced interpretation of gradient splitting aids in the analysis of the TD-SVRG algorithm's convergence. The second half of the talk will focus on the convergence of the value iteration algorithm. I will demonstrate how the assumption of the connectivity of an optimal policy yields an improved convergence rate for the algorithm

← Back to Seminars