Consensus Similarity Measure for Short Text Clustering
Information
Title
Consensus Similarity Measure for Short Text Clustering
Authors
Youhyun Shin, Yeonchan Ahn, Heesik Jeon, Sang-goo Lee
Year
2015 / 09
Keywords
short text, clustering, semantic similarity
Acknowledgement
Samsung
Publication Type
International Workshop
Publication
12th International Workshop on Text-based Information Retrieval In conjunction with DEXA 2015 (TIR 2015)
Abstract
Measuring semantic similarity between short texts is challenging because the meaning of short texts may vary dramatically even by a few words due to their limited lengths. In this paper, we propose a novel similarity measure for terms that allows better clustering performance than the state-of-the-art method. To achieve such performance, we incorporate knowledge-based and corpus-based term similarity measures in order to exploit advantages of both approaches. We apply our method to a dialog-utterance dataset, which consists of short dialog texts. Empirical study shows that the proposed method outperforms one of the state-of-the-art clustering algorithms for short text clustering.