清华主页 EN
导航菜单

Convergence and Inference of Distributional Reinforcement Learning

来源: 12-25

时间:Fri., 09:00-10:00 am, Dec. 26, 2025

地点:B725, Shuangqing Complex Building A Title:

组织者:Yunan Wu

主讲人:彭洋

Statistical Seminar

Organizer

Yunan Wu 吴宇楠 (YMSC)


Speaker:

彭洋北京大学数学科学学院

Time:

Fri., 09:00-10:00 am, Dec. 26, 2025

Venue:

B725, Shuangqing Complex Building A

Title:

Convergence and Inference of Distributional Reinforcement Learning

Abstract:

Distributional reinforcement learning (RL) has achieved remarkable success in various domains by modeling the full distribution of returns rather than just the expectation. Despite the rapid development of algorithms in recent years, the fundamental statistical properties underlying these methods remain largely underexplored.


In this talk, I will present a rigorous statistical framework for distributional RL. First, I will establish that the sample complexity of the distributional temporal difference (TD) learning algorithm is minimax optimal (up to logarithmic factors) under the 1-Wasserstein distance. A surprising implication of this result is that estimating the infinite-dimensional return distribution does not require more samples than estimating the expected return in classic RL. Second, I will introduce a model-based variant of the algorithm and demonstrate the asymptotic normality of the resulting estimators, thereby facilitating valid statistical inference for the return distribution.


To derive these results, we develop novel theoretical tools, including Freedman’s inequality in Hilbert spaces and sharp matrix concentration inequalities for Markovian data. These mathematical tools are of independent interest and have broad applications in other statistical problems. This talk is based on joint works published in the Annals of Statistics and NeurIPS, as well as several recent working papers.

返回顶部
相关文章