Partition-Based Clustering with Sliding Windows for Data Streams
Information
Abstract
Data stream clustering with sliding windows generates clusters for every window movement. Because repeated clustering on all changed windows is highly inefficient in terms of memory and computation time, a clustering algorithm should be designed with considering only inserted and deleted tuples of windows. In this paper, we address this problem by sliding window aggregation technique and cluster modification strategy. We propose a novel data structure for construction and maintenance of 2-level synopses. This data structure enables to update synopses efficiently and support precise sliding window operations. We also suggest a modification strategy to decide whether to append new synopses to pre-existing clusters or perform clustering on whole synopses according to the difference between probability distributions of the original and updated clusters. Experimental results show that proposed method outperforms state-of-the-art methods.