Greedy Filtering: A Scalable Algorithm For K-Nearest Neighbor Graph Construction

Information

Title Greedy Filtering: A Scalable Algorithm For K-Nearest Neighbor Graph Construction
Authors
Youngki Park, Sungchan Park, Sang-goo Lee, Woosung Jung
Year 2014 / 4
Keywords k-nearest neighbor graph, similarity join, similarity search
Acknowledgement NRF
Publication Type International Conference
Publication Proceedings of the 19th International Conference on Database Systems for Advanced Applications (DASFAA 2014), pp. 327-341
Link url

Abstract

Finding the k-nearest neighbors for every node is one of the most important data mining tasks as a primitive operation in the field of Information Retrieval and Recommender Systems. However, existing approaches to this problem do not perform as well when the number of nodes or dimensions is scaled up. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbor graph by filtering node pairs whose large value dimensions do not match at all. In order to avoid skewness in the results and guarantee a time complexity of O(n), our algorithm chooses essentially a fixed number of node pairs as candidates for every node. We also present a faster version of greedy filtering based on the use of inverted indices for the node prefixes. We conduct extensive experiments in which we (i) compare our approaches to the state-of-the-art algorithms in seven different types of datasets, and (ii) adopt other algorithms in related fields (similarity join, top-k similarity join and similarity search fields) to solve this problem and evaluate them. The experimental results show that greedy filtering guarantees a high level of accuracy while also being much faster than other algorithms for large amounts of high-dimensional data.