Exploiting Synonymy To Measure Semantic Similarity Of Sentences
The importance of semantic similarity measures between sentences is increasingly growing in text mining, text clustering, and question answering. Many studies have focused on finding exact term matching to predict sentence similarity. In this paper, we present a method for measuring sematic similarity of sentences based on constructed synonymy graph to avoid considering just exactly matching terms. When we construct graph which has terms as nodes and synonymy relation as edges, we use WordNet and part-of-speech to exploit synonyms. We assume synonym of a synonym is also similar; it takes advantage of the fact friend of a friend is likely to be a friend as well in real-world. With this concept, similarity between words is estimated by exploiting the minimum number of synonym chains between two nodes. The proposed algorithm calculates similarity of two sentences by summing all the similarities between selected words in sentences. Evaluation is conducted on two different data sets, Microsoft Research paraphrase corpus, and Yelp review dataset. Experimental evidences show that 1) the proposed method is more accurate compared to existing sentence similarity measures and 2) using real-world dataset like Yelp reveals that the proposed method has chance to be applied to recommendation.