Abstract:
Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly nonconvex optimization problem,
simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly, can generalize well to out-of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that the optimization algorithms (various gradient-based methods) contribute greatly to the generalization properties of deep learning. However, recently, researchers have found that gradient methods (even gradient descent) may not converge to a stationary point, the loss graduately decreases but not necessarily monotonically, and the sharpness of the loss landscape (i.e., the max eigenvalue of the Hessian) may oscillate, entering a regime called edge of stability. These behaviors are inconsistent with several classical presumptions widely studied in the field of optimization. Moreover, what bias is introduced by the gradient-based algorithms in neural network training? What characteristics of the training ensures good generalization in deep learning? In this talk, we investigate these question from the perspective of the gradient based optimization methods. In particular, we attempt to explain some of the behaviors of the optimization trajectory (e.g., edge of stability), prove new generalization bounds and investigate the implicit bias of various gradient methods.
Bio:
Jian Li is currently a tenured associate professor at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, headed by Prof. Andrew Yao. He got his BSc degree from Sun Yat-sen (Zhongshan) University, China, MSc degree in computer science from Fudan University, China and PhD degree in the University of Maryland, USA. His major research interests lie in theoretical computer science, machine learning, databases and finance. He co-authored several research papers that have been published in major computer science conferences and journals. He received the best paper awards at VLDB 2009 and ESA 2010, best newcomer award at ICDT 2017.