Statistics Seminar | Learning Topic Models: Identifiability and Finite-Sample Analysis

Time:Monday, 10:00-11:00 am June 10, 2024

Venue:C546, Shuangqing Complex Building A 清华大学双清综合楼A座 C546报告厅

Organizer:Yuhong Yang, Fan Yang

Speaker:Feng Liang 梁枫 伊利诺伊大学香槟分校(University of Illinois Urbana Champaign, UIUC)


Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this work, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets. This talk is based on joint work with Yinyin Chen, Shishuang He, and Yun Yang.

DATEJune 9, 2024
Related News
    • 0

      Approximation error of operator learning for parabolic PDE | BIMSA Thursday Machine Learning Applications Seminar

      AbstractIn this talk, some well known literatures about approximation error of operator learning will be reviewed and discussed. Especially, application of specific PDE by this approximation analysis will be presented

    • 1

      Learning constitutive models with neural networks

      AbstractIn this talk, I will introduce some work of learning constitutive equations in fluid mechanics and geophysics based on machine learningSpeaker Intro熊繁升,现任北京雁栖湖应用数学研究院助理研究员,曾任北京应用物理与计算数学研究所所聘博士后。先后毕业于中国地质大学(北京)、清华大学,美国耶鲁大学联合培养博士。研究兴趣主要集中于基于机器学习算法(DNN、PINN、DeepONet等)求解微分方程模型正/反问题...