Academics

Model Selection for Optimal Regression Learning

Time:Fri., 4:00-5:00pm, Sept.23,2022

Venue:近春园西楼三层报告厅 Lecture Hall, Floor 3,Jin Chun Yuan West Bldg.;Zoom ID: 271 534 5558; PW: YMSC

Speaker:Prof.Yuhong Yang(University of Minnesota)

In statistical learning, various mathematical optimalities are used to characterize performances of different learning methods. They include minimax optimality from a worst-case standpoint and asymptotic efficiency from a rosy view that the regression function to be learned sits there to be discovered. When multiple models, e.g., trees, neural networks and support vector machines, are considered as possible candidates to describe the unknown regression function behind the data at hand, one hopes to develop a model selection method to automatically achieve the optimal performance offered by the candidate models as if one knew the best model to begin with. Fundamental questions include: 1. How should one conduct model selection to achieve such adaptive optimality? 2. Can different optimalities be attained simultaneously by a powerful learning procedure?


In this talk, I will give a glimpse of some foundational theories on model selection for optimal regression learning. First, we will understand why AIC type of model selection criteria lead to adaptive minimax optimal estimation. Second, we will provide insights on if hallmark theoretical properties of different model selection methods guided by different principles can or cannot be integrated in any new “super” selection criterion. Third, we will examine arguably the most widely used model selection method in statistical and machine learning applications, namely, cross-validation (CV). In particular, we will illustrate the puzzling cross-validation paradox, address a couple of widely spread deceptive misconceptions, and present a new electoral college cross-validation approach for a more reliable and trustworthy learning.



DATESeptember 22, 2022
SHARE
Related News
    • 0

      Data-driven optimization --- Integrating data sampling, learning, and optimization

      Abstract:Traditionally machine learning and optimization are two different branches in computer science. They need to accomplish two different types of tasks, and they are studied by two different sets of domain experts. Machine learning is the task of extracting a model from the data, while optimization is to find the optimal solutions from the learned model. In the current era of big data and...

    • 1

      High-dimensional IV regression for genetical genomics data incorporating network structures

      AbstractGenetical genomics data present promising opportunities for integrating gene expression and genotype information. Lin et al. (2015) proposed an instrumental variables (IV) regression framework to select important genes with high-dimensional genetical genomics data. The IV regression addresses the issue of endogeneity caused by potential correlations between gene expressions and error te...