Academics

Convergence and Inference of Distributional Reinforcement Learning

Time:Fri., 09:00-10:00 am, Dec. 26, 2025

Venue:B725, Shuangqing Complex Building A Title:

Organizer:Yunan Wu

Speaker:彭洋

Statistical Seminar

Organizer

Yunan Wu 吴宇楠 (YMSC)


Speaker:

彭洋北京大学数学科学学院

Time:

Fri., 09:00-10:00 am, Dec. 26, 2025

Venue:

B725, Shuangqing Complex Building A

Title:

Convergence and Inference of Distributional Reinforcement Learning

Abstract:

Distributional reinforcement learning (RL) has achieved remarkable success in various domains by modeling the full distribution of returns rather than just the expectation. Despite the rapid development of algorithms in recent years, the fundamental statistical properties underlying these methods remain largely underexplored.


In this talk, I will present a rigorous statistical framework for distributional RL. First, I will establish that the sample complexity of the distributional temporal difference (TD) learning algorithm is minimax optimal (up to logarithmic factors) under the 1-Wasserstein distance. A surprising implication of this result is that estimating the infinite-dimensional return distribution does not require more samples than estimating the expected return in classic RL. Second, I will introduce a model-based variant of the algorithm and demonstrate the asymptotic normality of the resulting estimators, thereby facilitating valid statistical inference for the return distribution.


To derive these results, we develop novel theoretical tools, including Freedman’s inequality in Hilbert spaces and sharp matrix concentration inequalities for Markovian data. These mathematical tools are of independent interest and have broad applications in other statistical problems. This talk is based on joint works published in the Annals of Statistics and NeurIPS, as well as several recent working papers.

DATEDecember 25, 2025
SHARE
Related News