清华主页 EN
导航菜单

Statistical methods for compositional data analysis

来源: 03-01

时间:Friday, 14:00 March 1st, 2024

地点:Lecture Hall B725报告厅, Tsinghua University Shuangqing Complex Building A 清华大学双清综合楼A座B725报告厅Zoom:4552601552,YMSC

主讲人:Xiang Zhan 占翔 Peking University

Xiang Zhan 

Peking University

Xiang Zhan is an Associate Professor at the Department of Biostatistics and Beijing International Center for Mathematical Research of Peking University. He obtained his BS degree from Peking University in 2010 and PhD degree from Penn State in 2015. Before joining Peking University, Xiang had been working at Penn State as an Assistant Professor of Biostatistics. His research interest includes biostatistics, high dimensional statistics, compositional data analysis, kernel methods and next generation sequencing data analysis.


Abstract

It is quite common to encounter compositional data in many disciplines in modern data sciences (e.g., sequence count data in biological and biomedical research). Unfortunately, traditional statistical methods without addressing compositionality can lead to suboptimal or even misleading analysis results.

In this talk, we first discuss measurement error issues in compositional data. The presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component has an impact on others in the composition. To simultaneously address the compositional nature and measurement errors in the high dimensional compositional covariates, we propose a new method named ERror-In-Composition (Eric) Lasso for regression analysis of corrupted compositional predictors. Estimation error bounds of Eric Lasso and its asymptotic sign consistent selection properties are established.

The second part of this talk is about composition-on-composition regression. When both responses and predictors are compositional, the inventory of statistical analysis tools is surprisingly limited. To fill this gap, we propose a high-dimensional Composition-On-Composition (COC) regression analysis, which does not require log-ratio transformations and hence can handle excessive zeroes in sequence count data. We first introduce a penalized estimation equation approach in COC to improve its estimation accuracy in high-dimensional settings and then establish inference procedures to quantify uncertainties in COC model estimation and prediction. The proposed methods are evaluated using both numerical simulations and real data applications to demonstrate its validity and superiority.


返回顶部
相关文章
  • Statistical Topics with Missing Data

    Abstract:In some sense, many issues in statistics can be viewed as being focused on issues involving missing data, from predicting future observations from past observations, to the design and analysis of surveys and experiments, to the understanding of economic models involving instrumental variables, to medical data that are unobservable due to the death of patients. This course will conside...

  • Econometric analysis of cross section and panel data

    Description: The course will teach basic concepts in econometrics with a focus on the design-based approach. Such designs are sometimes called quasi-experimental and sometimes natural experiments. The course deals with cross sectional and panel data. The design stems from the type of data that the researcher has available, but the estimators will basically always be some form of regression esti...