Statistical methods for compositional data analysis

Time：Friday, 14:00 March 1st, 2024

Venue：Lecture Hall B725报告厅, Tsinghua University Shuangqing Complex Building A 清华大学双清综合楼A座B725报告厅Zoom:4552601552,YMSC

Speaker：Xiang Zhan 占翔 Peking University

Xiang Zhan

Peking University

Xiang Zhan is an Associate Professor at the Department of Biostatistics and Beijing International Center for Mathematical Research of Peking University. He obtained his BS degree from Peking University in 2010 and PhD degree from Penn State in 2015. Before joining Peking University, Xiang had been working at Penn State as an Assistant Professor of Biostatistics. His research interest includes biostatistics, high dimensional statistics, compositional data analysis, kernel methods and next generation sequencing data analysis.

Abstract

It is quite common to encounter compositional data in many disciplines in modern data sciences (e.g., sequence count data in biological and biomedical research). Unfortunately, traditional statistical methods without addressing compositionality can lead to suboptimal or even misleading analysis results.

In this talk, we first discuss measurement error issues in compositional data. The presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component has an impact on others in the composition. To simultaneously address the compositional nature and measurement errors in the high dimensional compositional covariates, we propose a new method named ERror-In-Composition (Eric) Lasso for regression analysis of corrupted compositional predictors. Estimation error bounds of Eric Lasso and its asymptotic sign consistent selection properties are established.

The second part of this talk is about composition-on-composition regression. When both responses and predictors are compositional, the inventory of statistical analysis tools is surprisingly limited. To fill this gap, we propose a high-dimensional Composition-On-Composition (COC) regression analysis, which does not require log-ratio transformations and hence can handle excessive zeroes in sequence count data. We first introduce a penalized estimation equation approach in COC to improve its estimation accuracy in high-dimensional settings and then establish inference procedures to quantify uncertainties in COC model estimation and prediction. The proposed methods are evaluated using both numerical simulations and real data applications to demonstrate its validity and superiority.

DATEMarch 1, 2024

Related News

0
Data analysis: A shift from linear regression to network modeling
Speaker IntroRongling Wu, received a Ph.D. in Quantitative Genetics from the University of Washington (Seattle) in 1995. He was a Distinguished Professor of Statistics and Public Health Sciences at Pennsylvania State University, and Director of the Center for Statistical Genetics. He is currently the Zeng Siming Chair Professor of Yau Mathematical Sciences Center, Tsinghua University. He is als...
1
Machine intelligence and network science for complex systems big data analysis
Speaker Dr. Cannistraci is a theoretical engineer and computational innovator. He is a Professor in the Tsinghua Laboratory of Brain and Intelligence (THBI) and an adjunct professor in the Department of Computer Science and in the Department of Biomedical Engineering at Tsinghua University. He directs the Center for Complex Network Intelligence (CCNI) in THBI, which seeks to create pioneering a...