In statistical learning, various mathematical optimalities are used to characterize performances of different learning methods. They include minimax optimality from a worst-case standpoint and asymptotic efficiency from a rosy view that the regression function to be learned sits there to be discovered. When multiple models, e.g., trees, neural networks and support vector machines, are considered as possible candidates to describe the unknown regression function behind the data at hand, one hopes to develop a model selection method to automatically achieve the optimal performance offered by the candidate models as if one knew the best model to begin with. Fundamental questions include: 1. How should one conduct model selection to achieve such adaptive optimality? 2. Can different optimalities be attained simultaneously by a powerful learning procedure?
In this talk, I will give a glimpse of some foundational theories on model selection for optimal regression learning. First, we will understand why AIC type of model selection criteria lead to adaptive minimax optimal estimation. Second, we will provide insights on if hallmark theoretical properties of different model selection methods guided by different principles can or cannot be integrated in any new “super” selection criterion. Third, we will examine arguably the most widely used model selection method in statistical and machine learning applications, namely, cross-validation (CV). In particular, we will illustrate the puzzling cross-validation paradox, address a couple of widely spread deceptive misconceptions, and present a new electoral college cross-validation approach for a more reliable and trustworthy learning.