清华主页 EN
导航菜单

Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning

来源: 09-22

时间:16:30-17:30, 9月22日(星期四), Sep. 22th (Thur.) 2022

地点:近春园西楼三层报告厅, Lecture hall, 3rd floor of Jin Chun Yuan West Building

主讲人:Jian Li (李建), Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University.

Abstract: 

Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly nonconvex optimization problem,

simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly,  can generalize well to out-of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that the optimization algorithms (various gradient-based methods) contribute greatly to the generalization properties of deep learning. However, recently, researchers have found that gradient methods (even gradient descent) may not converge to a stationary point, the loss graduately decreases but not necessarily monotonically, and the sharpness of the loss landscape (i.e., the max eigenvalue of the Hessian) may oscillate, entering a regime called edge of stability. These behaviors are inconsistent with several classical presumptions widely studied in the field of optimization. Moreover, what bias is introduced by the gradient-based algorithms in neural network training? What characteristics of the training ensures good generalization in deep learning? In this talk, we investigate these question from the perspective of the gradient based optimization methods. In particular, we attempt to explain some of the behaviors of the optimization trajectory (e.g., edge of stability), prove new generalization bounds and investigate the implicit bias of various gradient methods.


Bio:

Jian Li is currently a tenured associate professor at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, headed by Prof. Andrew Yao. He got his BSc degree from Sun Yat-sen (Zhongshan) University, China, MSc degree in computer science from Fudan University, China and PhD degree in the University of Maryland, USA. His major research interests lie in  theoretical computer science, machine learning, databases and finance.  He co-authored several research papers that have been published in major computer science conferences and journals. He received the best paper awards at VLDB 2009 and ESA 2010, best newcomer award at ICDT 2017.


返回顶部
相关文章
  • Uncertainty estimation in deep learning | BIMSA Member Seminar

    Speaker IntroAlexey has deep expertise in machine learning and processing of sequential data. He publishes at top venues, including KDD, ACM Multimedia and AISTATS. Industrial applications of his results are now in service at companies Airbus, Porsche and Saudi Aramco among others

  • Optimization Methods for Machine Learning

    IntroductionStochastic Gradient Descent (SGD), in one form or another, serves as the workhorse method for training modern machine learning models. Amidst its myriad variations, the SGD domain is both extensive and burgeoning, presenting a significant challenge for both practitioners and even experts to understand its landscape and inhabitants. This course offers a mathematically rigorous and co...