Introduction
Stochastic Gradient Descent (SGD), in one form or another, serves as the workhorse method for training modern machine learning models. Amidst its myriad variations, the SGD domain is both extensive and burgeoning, presenting a significant challenge for both practitioners and even experts to understand its landscape and inhabitants. This course offers a mathematically rigorous and comprehensive introduction to the field, drawing upon the most recent advancements and insights. It meticulously constructs a theory of convergence and complexity for SGD's serial, parallel, and distributed variants across strongly convex, convex, and nonconvex settings, incorporating randomness from subsampling, compression, and other sources.
The curriculum also delves into advanced techniques such as acceleration through Polyak momentum or Nesterov extrapolation. A notable portion of the course is dedicated to a unified analysis of a large family of SGD variants. Historically, these variants have demanded distinct intuitions, convergence analyses, and applications, evolving separately across various communities. This framework includes but not limited to the useful techniques: variance reduction, data sampling, coordinate sampling, arbitrary sampling, importance sampling, mini-batching, quantization, sketching, dithering, and sparsification, as well as their combinations. This comprehensive exploration aims to equip learners with a deep understanding of SGD's intricate landscape, fostering the ability to adeptly apply and innovate upon these methods in their work.
Lecturer Intro
Yi-Shuai Niu, a tenured Associate Professor of Mathematics at Beijing Institute of Mathematical Sciences and Applications (BIMSA), specialized in Optimization, Scientific Computing, Machine Learning, and Computer Sciences. Before joining BIMSA in October 2023, he was a research fellow at the Hong Kong Polytechnic University (2021-2022); an associate professor at Shanghai Jiao Tong University (2014-2021), where he led the “Optimization and Interdisciplinary Research Group” and double-appointed at the ParisTech Elite Institute of Technology and the School of Mathematical Sciences. His earlier roles include postdoc at the University of Paris 6 (2013-2014) and junior researcher both at the French National Center for Scientific Research (CNRS) and Stanford University (2010-2012). He was also a lecturer at the National Institute of Applied Sciences (INSA) of Rouen (2007-2010) in France, where he earned a Ph.D. in Mathematics-Optimization in 2010 and double Masters in Pure and Applied Mathematics and Genie Mathematics in 2006. His research covers a wide range of applied mathematics, with a spotlight on optimization theory, machine learning, high-performance computing, and software development. His works span various interdisciplinary applications including: machine learning, natural language processing, self-driving car, finance, image processing, turbulent combustion, polymer science, quantum chemistry and computing, and plasma physics. His contributions encompass fundamental research, emphasizing novel algorithms for large-scale nonconvex and nonsmooth problems, and practical implementations, focusing on efficient optimization solvers and scientific computing packages using high-performance computing techniques. He developed more than 33 pieces of software and published about 30 articles in prestigious journals and conferences (including Journal of Scientific Computing, Combustion and Flames, Applied Mathematics and Computation). He was PI of 5 research grants and members of 5 joint international research projects. He was awarded of shanghai teaching achievement award (First prize) in 2017, two outstanding teaching awards (First prize) at Shanghai Jiao Tong University in 2016 and 2017 respectively, as well as 17 awards in international contests of mathematics MCM/ICM (including the INFORMS best paper award in 2017).