Towards modern datasets: laying mathematical foundations to streamline machine learning-清华大学求真书院

研讨班

首页 > 书院学术 > 至美数学 > 研讨班

Towards modern datasets: laying mathematical foundations to streamline machine learning

来源： 12-16

时间：Tues., 16:00-17:00, Dec. 17, 2024

地点：C548, Shuangqing Complex Building A

主讲人：Chen Cheng

Statistical Seminar

Organizer：

吴宇楠

Speaker：

Chen Cheng

Statistics Department in Stanford University

Time：

Tues., 16:00-17:00, Dec. 17, 2024

Venue：

C548, Shuangqing Complex Building A

Online：

Zoom Meeting ID: 271 534 5558

Passcode: YMSC

Title：

Towards modern datasets: laying mathematical foundations to streamline machine learning

Abstract：

Datasets are central to the development of statistical learning theory, and the evolution of models. The burgeoning success of modern machine learning in sophisticated tasks crucially relies on the vast growth of massive datasets (cf. Donoho), such as ImageNet, SuperGLUE and Laion-5b. However, such evolution breaks standard statistical learning assumptions and tools.

In this talk, I will present two stories tackling challenges modern datasets present, and leverage statistical theory to shed insight into how should we streamline modern machine learning.

In the first part, we study multilabeling—a curious aspect of modern human-labeled datasets that is often missing in statistical machine learning literature. We develop a stylized theoretical model to capture uncertainties in the labeling process, allowing us to understand the contrasts, limitations and possible improvements of using aggregated or non-aggregated data in a statistical learning pipeline. In the second part, I will present novel theoretical tools that are not simply convenient from classical literature, such as random matrix theory under proportional regime. Theoretical tools for proportional regime are crucially helpful in understanding “benign-overfitting” and “memorization”. This is not always the most natural setting in statistics where columns correspond to covariates and rows to samples. With the objective to move beyond the proportional asymptotics, we revisit ridge regression (ℓ2-penalized least squares) on i.i.d. data X ∈ Rn×d, y ∈ Rn. We allow the feature vector to be infinite-dimensional (d= ∞), in which case it belongs to a separable Hilbert space.

返回顶部

Instruction for choosing courses in the direction Algebra and Number
Please download the file for more informatio
View more
Bayesian machine learning
Record: YesLevel: GraduateLanguage: EnglishPrerequisiteProbability theory, Mathematical statistics, Machine learningAbstractProbabilistic approach in machine and deep learning leads to principled solutions. It provides explainable decisions and new ways for improving of existing approaches. Bayesian machine learning consists of probabilistic approaches that rely on Bayes formula. It can help in...
View more

书院学术

Towards modern datasets: laying mathematical foundations to streamline machine learning

Instruction for choosing courses in the direction Algebra and Number

Bayesian machine learning

友情链接 HYPERLINK：