Speaker
Dr. Hongzhe Li is Perelman Professor of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine at the University of Pennsylvania. He is Vice Chair of Research Integration, Director of Center of Statistics in Big Data and former Chair of the Graduate Program in Biostatistic at Penn. He is also a Professor of Statistics and Data Science at the Wharton School. Dr. Li has been elected as a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS) and a Fellow of American Association for the Advancement of Science (AAAS). Dr. Li served on the Board of Scientific Counselors of the National Cancer Institute of NIH and regularly serves on various NIH study sections. He served as Chair of the Section on Statistics in Genomics and Genetics of the ASA and Co-Editor-in-Chief of Statistics in Biosciences. Dr. Li’s research focuses on developing statistical and computational methods for analysis of large-scale genetic, genomics and metagenomics data and theory on high dimensional statistics. He has over 240 published papers, including papers in Science, Nature, Nature Genetics, Nature Methods, Nature Microbiology, Science Translational Medicine, Cell Host & Microbe, JASA, JRSS, Biometrika, Biometrics and Annals of Applied Statistics etc. He has trained over 50 PhD students and postdoctoral fellows.
Abstract
The gut microbiome plays an important role in maintenance of human health. High-throughput shotgun metagenomic sequencing of a large set of samples provides an important tool to interrogate the gut microbiome. Besides providing footprints of taxonomic community composition and genes, these data can be further explored to study the bacterial growth rate and metabolic potentials via generation of small molecules and secondary metabolites. Everything from microbiome diagnosis to microbiome-based therapy will rely on vast amounts of data analysis. In this talk, I will present several computational and statistical methods for analysis of data measured on phylogenetic tree and methods for estimating bacterial growth rate for metagenome-assembled genomes (MAGs). I will also present a deep learning algorithm for predicting all biosynthetic gene clusters (BGCs) in the bacterial genomes. The key statistical and computational tools used include Wasserstein distance estimation, optimal permutation recovery based on low-rank matrix projection and a LSTM deep learning method to improve prediction of BGCs. I will demonstrate the application of these methods using several ongoing microbiome studies of inflammatory bowel disease at the University of Pennsylvania.