Repro Samples Method and Principled Random Forests

Time：Thursday 10:00-11:00 am June 20, 2024

Venue：C548 Shuangqing Complex Building A

Organizer：Yuhong Yang, Fan Yang

Speaker：Min-ge Xie Rutgers University

Abstract

Repro Samples method introduces a fundamentally new inferential framework that can be used to effectively address frequently encountered, yet highly non-trivial and complex inference problems involving discrete or non-numerical unknown parameters and/or non-numerical data. In this talk, we present a set of key developments in the repro samples method and use them to develop a novel machine learning ensemble tree model, termed principled random forests. Specifically, repro samples are artificial samples that are reproduced by mimicking the genesis of observed data. Using the repro samples and inversion techniques stemmed from fiducial inference, we can establish a confidence set for the underlying (‘true’) tree model that generated, or approximately generated, the observed data. We then obtain a tree ensemble model using the confidence set, from which we derive our inference. Our development is principled and interpretable since, firstly, it is fully theoretically supported and provides frequentist performance guarantees on both inference and predictions; and secondly, the approach only assembles a small set of trees in the confidence set and thereby the model used is interpretable. The development is further extended to handle tree-structured conditional average treatment effect in a causal inference setting. Numerical results have demonstrated superior performance of our proposed approach than existing single and ensemble tree methods.

The repro samples method provides a new toolset for developing interpretable AI and for helping address the blackbox issues in complex machine learning models. The development of the principle random forest is our first attempt on this direction.

Speaker

Min-ge Xie, PhD is a Distinguished Professor at Rutgers, The State University of New Jersey. Dr. Xie received his PhD in Statistics from University of Illinois at Urbana-Champaign and his BS in Mathematics from University of Science and Technology of China. He is the current Editor of The American Statistician and a co-founding Editor-in-Chief of The New England Journal of Statistics in Data Science. He is a fellow of ASA, IMS, and an elected member of ISI. His research interests include theoretical foundations of statistical inference and data science, fusion learning, finite and large sample theories, parametric and nonparametric methods. He is the Director of the Rutgers Office of Statistical Consulting and has a rich interdisciplinary research experiences in collaborating with computer scientists, engineers, biomedical researchers, and scientists in other fields.

谢敏革，美国新泽西州立罗格斯大学统计学系特聘教授，是统计学基础和融合学习方面的著名专家。他在置信分布方面的开创性研究被描述为“具有影响力和洞察力的基础过程”。他的其他研究领域包括数据科学基础、共形预测、大数据、估计方程、稳健统计、分层模型、渐近等。

DATEJune 19, 2024

Related News

0
Decomposing cubic graphs into isomorphic linear forests
AbstractIn 1987 Wormald conjectured that the edges of every cubic graph on 4n vertices can be partitioned into two isomorphic linear forests. We prove this conjecture for large connected cubic graphs. This is joint work with Gal Kronenberg, Alexey Pokrovskiy, and Liana Yepremyan.Speaker IntroShoham Letzter is a Royal Society University Research Fellow and lecturer at University College London (...
1
Improved Bounds for Sampling Solutions of Random CNF Formulas
AbstractLet Φ be a random k-CNF formula on n variables and m clauses, where each clause is a disjunction of k literals chosen independently and uniformly. Our goal is, for most Φ, to (approximately) uniformly sample from its solution space.Let α=m/n be the density. The previous best algorithm runs in time n^poly(k,α) for any α≲2^(k/300) [Galanis, Goldberg, Guo, and Yang, SIAM J. Comput.'2...