A Fast K-Nearest Neighbor Search Using Query-Specific Signature Selection
k-nearest neighbor (k-NN) search aims at finding k points nearest to a query point in a given dataset. k-NN search is important in various applications, but it becomes extremely expensive in a high-dimensional large dataset. To address this performance issue, locality-sensitive hashing (LSH) is suggested as a method of probabilistic dimension reduction while preserving the relative distances between points. However, the performance of existing LSH schemes is still inconsistent, requiring a large amount of search time in some datasets while the k-NN approximation accuracy is low. In this paper, we target on improving the performance of k-NN search and achieving a consistent k-NN search that performs well in various datasets. In this regard, we propose a novel LSH scheme called Signature Selection LSH (S2LSH). First, we generate a highly diversified signature pool containing signature regions of various sizes and shapes. Then, for a given query point, we rank signature regions of the query and select points in the highly ranked signature regions as k-NN candidates of the query. Extensive experiments show that our approach consistently outperforms the state-of-the-art LSH schemes.