Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning-清华大学求真书院

研讨班

首页 > 书院学术 > 至美数学 > 研讨班

Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning

来源： 09-22

时间：16:30-17:30, 9月22日(星期四), Sep. 22th (Thur.) 2022

地点：近春园西楼三层报告厅, Lecture hall, 3rd floor of Jin Chun Yuan West Building

主讲人：Jian Li (李建), Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University.

Abstract:

Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly nonconvex optimization problem,

simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly, can generalize well to out-of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that the optimization algorithms (various gradient-based methods) contribute greatly to the generalization properties of deep learning. However, recently, researchers have found that gradient methods (even gradient descent) may not converge to a stationary point, the loss graduately decreases but not necessarily monotonically, and the sharpness of the loss landscape (i.e., the max eigenvalue of the Hessian) may oscillate, entering a regime called edge of stability. These behaviors are inconsistent with several classical presumptions widely studied in the field of optimization. Moreover, what bias is introduced by the gradient-based algorithms in neural network training? What characteristics of the training ensures good generalization in deep learning? In this talk, we investigate these question from the perspective of the gradient based optimization methods. In particular, we attempt to explain some of the behaviors of the optimization trajectory (e.g., edge of stability), prove new generalization bounds and investigate the implicit bias of various gradient methods.

Bio：

Jian Li is currently a tenured associate professor at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, headed by Prof. Andrew Yao. He got his BSc degree from Sun Yat-sen (Zhongshan) University, China, MSc degree in computer science from Fudan University, China and PhD degree in the University of Maryland, USA. His major research interests lie in theoretical computer science, machine learning, databases and finance. He co-authored several research papers that have been published in major computer science conferences and journals. He received the best paper awards at VLDB 2009 and ESA 2010, best newcomer award at ICDT 2017.

返回顶部

Instruction for choosing courses in the direction Algebra and Number
Please download the file for more informatio
View more
Uncertainty estimation in deep learning | BIMSA Member Seminar
Speaker IntroAlexey has deep expertise in machine learning and processing of sequential data. He publishes at top venues, including KDD, ACM Multimedia and AISTATS. Industrial applications of his results are now in service at companies Airbus, Porsche and Saudi Aramco among others
View more

书院学术

Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning

Instruction for choosing courses in the direction Algebra and Number

Uncertainty estimation in deep learning | BIMSA Member Seminar

友情链接 HYPERLINK：