This site is deprecated and no longer maintained. Please visit the new site for up-to-date information.

This site is deprecated and no longer maintained. Please visit the new site for up-to-date information.

A novel algorithm for scalable k-nearest neighbour graph construction

From IDSlab

Jump to: navigation, search
Recent | By Year | By Topic | Only SCI or SCIE | In Journal | In Conference | Tabular Form | Search
Title A Novel Algorithm for Scalable k-Nearest Neighbour Graph Construction
Authors

Youngki Park, Heasoo Hwang, Sang-goo Lee

Date 2016-04
Keywords collaborative filtering, k-nearest neighbour search, k-nearest neighbour, graph construction
Acknowledgement NRF(20110030812)
Publication Type International Journal
Publication Info Journal of Information Science , Volume 42(2) , Page 274-288
Conference Info
Publisher SAGE journals
SCIE SCIE (IF:1.158)
Other Information ISBN:
ISSN:
Link URL
Download
Related Research
Related Project


Abstract (Korean)



Abstract (English)
Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.