清华主页 EN
导航菜单

Statistics Seminar | Learning Topic Models: Identifiability and Finite-Sample Analysis

来源: 06-09

时间:Monday, 10:00-11:00 am June 10, 2024

地点:C546, Shuangqing Complex Building A 清华大学双清综合楼A座 C546报告厅

组织者:Yuhong Yang, Fan Yang

主讲人:Feng Liang 梁枫 伊利诺伊大学香槟分校(University of Illinois Urbana Champaign, UIUC)

Abstract:

Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, a formal theoretical investigation on the statistical identifiability and accuracy of latent topic estimation is lacking in the literature. In this work, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood, which is naturally connected to the concept of volume minimization in computational geometry. Theoretically, we introduce a new set of geometric conditions for topic model identifiability, which are weaker than conventional separability conditions relying on the existence of anchor words or pure topic documents. We conduct finite-sample error analysis for the proposed estimator and discuss the connection of our results with existing ones. We conclude with empirical studies on both simulated and real datasets. This talk is based on joint work with Yinyin Chen, Shishuang He, and Yun Yang.

返回顶部
相关文章
  • Bayesian machine learning

    Record: YesLevel: GraduateLanguage: EnglishPrerequisiteProbability theory, Mathematical statistics, Machine learningAbstractProbabilistic approach in machine and deep learning leads to principled solutions. It provides explainable decisions and new ways for improving of existing approaches. Bayesian machine learning consists of probabilistic approaches that rely on Bayes formula. It can help in...

  • Probabilistic machine learning

    IntroductionProbabilistic approach in machine and deep learning leads to principled solutions. It provides explainable decisions and new ways for improving of existing approaches. Bayesian machine learning consists of probabilistic approaches that rely on Bayes formula. It can help in numerous applications and has beautiful mathematical concepts behind. In this course, I will describe the found...