Consensus Similarity Measure for Short Text Clustering

Information

Title Consensus Similarity Measure for Short Text Clustering
Authors
Youhyun Shin, Yeonchan Ahn, Heesik Jeon, Sang-goo Lee
Year 2015 / 09
Keywords short text, clustering, semantic similarity
Acknowledgement Samsung
Publication Type International Workshop
Publication 12th International Workshop on Text-based Information Retrieval In conjunction with DEXA 2015 (TIR 2015)

Abstract

Measuring semantic similarity between short texts is challenging because the meaning of short texts may vary dramatically even by a few words due to their limited lengths. In this paper, we propose a novel similarity measure for terms that allows better clustering performance than the state-of-the-art method. To achieve such performance, we incorporate knowledge-based and corpus-based term similarity measures in order to exploit advantages of both approaches. We apply our method to a dialog-utterance dataset, which consists of short dialog texts. Empirical study shows that the proposed method outperforms one of the state-of-the-art clustering algorithms for short text clustering.