Synchronization-based scalable subspace clustering of high-dimensional data

Synchronization-based scalable subspace clustering of high-dimensional data

Abstract

How to address the challenges of the ``curse of dimensionality'' and ``scalability'' in clustering simultaneously? In this paper, we propose arbitrarily oriented synchronized clusters (ORSC), a novel effective and efficient method for subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the weighted interaction model and iterative dynamic clustering, our approach ORSC (a) naturally detects correlation clusters in arbitrarily oriented subspaces, including arbitrarily shaped nonlinear correlation clusters. Our approach is (b) robust against noise and outliers. In contrast to previous methods, ORSC is (c) easy to parameterize, since there is no need to specify the subspace dimensionality or other difficult parameters. Instead, all interesting subspaces are detected in a fully automatic way. Finally, (d) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets. Extensive experiments have demonstrated the effectiveness and efficiency of our approach.

Grafik Top
Authors
  • Shao, Junming
  • Wang, Xinzuo
  • Yang, Qinli
  • Plant, Claudia
  • Böhm, Christian
Grafik Top
Shortfacts
Category
Journal Paper
Divisions
Data Mining and Machine Learning
Journal or Publication Title
Knowledge and Information Systems
ISSN
0219-1377
Page Range
pp. 1-29
Date
2016
Official URL
http://dx.doi.org/10.1007/s10115-016-1013-1
Export
Grafik Top