Synchronization-based scalable subspace clustering of high-dimensional data
How to address the challenges of the ``curse of dimensionality'' and ``scalability'' in clustering simultaneously? In this paper, we propose arbitrarily oriented synchronized clusters (ORSC), a novel effective and efficient method for subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the weighted interaction model and iterative dynamic clustering, our approach ORSC (a) naturally detects correlation clusters in arbitrarily oriented subspaces, including arbitrarily shaped nonlinear correlation clusters. Our approach is (b) robust against noise and outliers. In contrast to previous methods, ORSC is (c) easy to parameterize, since there is no need to specify the subspace dimensionality or other difficult parameters. Instead, all interesting subspaces are detected in a fully automatic way. Finally, (d) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets. Extensive experiments have demonstrated the effectiveness and efficiency of our approach.
Top- Shao, Junming
- Wang, Xinzuo
- Yang, Qinli
- Plant, Claudia
- Böhm, Christian
Category |
Journal Paper |
Divisions |
Data Mining and Machine Learning |
Journal or Publication Title |
Knowledge and Information Systems |
ISSN |
0219-1377 |
Page Range |
pp. 1-29 |
Date |
2016 |
Official URL |
http://dx.doi.org/10.1007/s10115-016-1013-1 |
Export |