Consensus Similarity Measure for Short Text Clustering
Measuring semantic similarity between short texts is challenging because the meaning of short texts may vary dramatically even by a few words due to their limited lengths. In this paper, we propose a novel similarity measure for terms that allows better clustering performance than the state-of-the-art method. To achieve such performance, we incorporate knowledge-based and corpus-based term similarity measures in order to exploit advantages of both approaches. We apply our method to a dialog-utterance dataset, which consists of short dialog texts. Empirical study shows that the proposed method outperforms one of the state-of-the-art clustering algorithms for short text clustering.