Academics

Deep Clustering Evaluation: How to Validate Internal Clustering Validation Measures

Time:14:00-15:00, Friday Dec. 26, 2025

Venue:C548 Shuangqing Complex Building A

Organizer:吴宇楠

Speaker:叶成龙

组织者 / Organizer

吴宇楠

报告人 / Speaker

叶成龙 助理教授

肯塔基大学

时间 / Time

14:00-15:00, Friday

Dec. 26, 2025

地点 / Venue

C548

Shuangqing Complex Building A

Abstract

Deep clustering partitions complex high-dimensional data using deep neural networks for clustering. It involves projecting data into lower-dimensional embeddings before partitioning, which embarks unique evaluation challenges. Traditional clustering validation measures, designed for low-dimensional spaces, are problematic for deep clustering for two reasons: 1) the curse of dimensionality when applied to the high-dimensional input data, and 2) unreliable comparison of clustering results when applied to embedded data from different embedding spaces, owing to variations in training procedures and model parameter settings. This paper addresses these unresolved and often overlooked challenges in evaluating clustering within deep learning. We propose a systematic evaluation framework for internal clustering validation measures that: (1) theoretically establishes why traditional measures are ineffective when applied to input data or across disparate embedding spaces paired with partitioning outcomes; (2) identifies embedding spaces that endorse reliable evaluations by detecting groups with high agreement in ranking partitioning outcomes; and (3) develops a stable and robust scoring scheme by weighting index values computed across these identified embedding spaces. Experiments show that this new framework aligns better with external measures, effectively reducing the misguidance from the improper use of internal validation measures in deep clustering evaluation.

About the Speaker

Chenglong Ye is currently an assistant professor in the Dr. Bing Zhang Department of Statistics, University of Kentucky.

He received his Ph.D. in Statistics from University of Minnesota in June, 2019 under the supervision of Professor Yuhong Yang. Before his PhD studies, he received his B.S. in Statistics from University of Science and Technology of China (USTC) in 2014.

DATEDecember 24, 2025
SHARE
Related News
    • 0

      Facilitating model-based clustering by dimension reduction

      Statistical SeminarOrganizer:Yunan Wu 吴宇楠 (YMSC)Speaker:Wei Luo 骆威浙江大学数据科学研究中心Time:Mon., 14:00- 15:00, Sept. 22, 2025Venue:C548, Shuangqing Complex Building ATitle: Facilitating model-based clustering by dimension reductionAbstract:The Gaussian Mixture Model (GMM) has been widely used for clustering analysis. It is commonly fitted by the maximal likelihood approach, which i...

    • 1

      Factor Modeling for Clustering High-dimensional Time Series

      AbstractWe propose a new unsupervised learning method for clustering a large number of time series based on a latent factor structure. Each cluster is characterized by its own cluster-specific factors in addition to some common factors which impact on all the time series concerned. Our setting also offers the flexibility that some time series may not belong to any clusters. The consistency with...