Partition-Based Clustering with Sliding Windows for Data Streams

Information

Title Partition-Based Clustering with Sliding Windows for Data Streams
Authors
Jonghem Youn, Jihun Choi, Junho Shim, Sang-goo Lee
Year 2017 / 3
Keywords data streams, clustering, sliding windows, approximation algorithms
Acknowledgement ITRC
Publication Type International Conference
Publication Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science, Volume 10178, pp. 289-303
Link doi

Abstract

Data stream clustering with sliding windows generates clusters for every window movement. Because repeated clustering on all changed windows is highly inefficient in terms of memory and computation time, a clustering algorithm should be designed with considering only inserted and deleted tuples of windows. In this paper, we address this problem by sliding window aggregation technique and cluster modification strategy. We propose a novel data structure for construction and maintenance of 2-level synopses. This data structure enables to update synopses efficiently and support precise sliding window operations. We also suggest a modification strategy to decide whether to append new synopses to pre-existing clusters or perform clustering on whole synopses according to the difference between probability distributions of the original and updated clusters. Experimental results show that proposed method outperforms state-of-the-art methods.