Academics

Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning

Time:16:30-17:30, 9月22日(星期四), Sep. 22th (Thur.) 2022

Venue:近春园西楼三层报告厅, Lecture hall, 3rd floor of Jin Chun Yuan West Building

Speaker:Jian Li (李建), Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University.

Abstract: 

Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly nonconvex optimization problem,

simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly,  can generalize well to out-of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that the optimization algorithms (various gradient-based methods) contribute greatly to the generalization properties of deep learning. However, recently, researchers have found that gradient methods (even gradient descent) may not converge to a stationary point, the loss graduately decreases but not necessarily monotonically, and the sharpness of the loss landscape (i.e., the max eigenvalue of the Hessian) may oscillate, entering a regime called edge of stability. These behaviors are inconsistent with several classical presumptions widely studied in the field of optimization. Moreover, what bias is introduced by the gradient-based algorithms in neural network training? What characteristics of the training ensures good generalization in deep learning? In this talk, we investigate these question from the perspective of the gradient based optimization methods. In particular, we attempt to explain some of the behaviors of the optimization trajectory (e.g., edge of stability), prove new generalization bounds and investigate the implicit bias of various gradient methods.


Bio:

Jian Li is currently a tenured associate professor at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, headed by Prof. Andrew Yao. He got his BSc degree from Sun Yat-sen (Zhongshan) University, China, MSc degree in computer science from Fudan University, China and PhD degree in the University of Maryland, USA. His major research interests lie in  theoretical computer science, machine learning, databases and finance.  He co-authored several research papers that have been published in major computer science conferences and journals. He received the best paper awards at VLDB 2009 and ESA 2010, best newcomer award at ICDT 2017.


DATESeptember 22, 2022
SHARE
Related News
    • 0

      Uncertainty estimation in deep learning | BIMSA Member Seminar

      Speaker IntroAlexey has deep expertise in machine learning and processing of sequential data. He publishes at top venues, including KDD, ACM Multimedia and AISTATS. Industrial applications of his results are now in service at companies Airbus, Porsche and Saudi Aramco among others

    • 1

      Deep learning of multi-scale PDEs based on data generated from particle methods

      AbstractSolving multiscale PDEs is difficult in high dimensional and/or convection dominant cases. The Lagrangian computation, interacting particle method, is shown to outperform solving PDEs directly (Eulerian). Examples include computing effective diffusivities, KPP front speed, and asymptotic transport properties in topological insulators. However the particle simulation takes long before co...