A new online clustering approach for data in arbitrary shaped clusters

Richard Hyde, Plamen Angelov

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

3 Citations (Scopus)

Abstract

In this paper we demonstrate a new density based clustering technique, CODSAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages228-233
Number of pages6
ISBN (Electronic)9781479983223
DOIs
Publication statusPublished - 6 Aug 2015
Externally publishedYes
Event2nd IEEE International Conference on Cybernetics, CYBCONF 2015 - Gdynia, Poland
Duration: 24 Jun 201526 Jun 2015

Conference

Conference2nd IEEE International Conference on Cybernetics, CYBCONF 2015
CountryPoland
CityGdynia
Period24/06/1526/06/15

Fingerprint

Computational efficiency
Joining
Data storage equipment

Keywords

  • arbitrary shape clusters
  • big data
  • clustering
  • data streams
  • online clustering

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Electrical and Electronic Engineering

Cite this

Hyde, R., & Angelov, P. (2015). A new online clustering approach for data in arbitrary shaped clusters. In Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015 (pp. 228-233). [7175937] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CYBConf.2015.7175937

A new online clustering approach for data in arbitrary shaped clusters. / Hyde, Richard; Angelov, Plamen.

Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 228-233 7175937.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding

Hyde, R & Angelov, P 2015, A new online clustering approach for data in arbitrary shaped clusters. in Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015., 7175937, Institute of Electrical and Electronics Engineers Inc., pp. 228-233, 2nd IEEE International Conference on Cybernetics, CYBCONF 2015, Gdynia, Poland, 24/06/15. https://doi.org/10.1109/CYBConf.2015.7175937
Hyde R, Angelov P. A new online clustering approach for data in arbitrary shaped clusters. In Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 228-233. 7175937 https://doi.org/10.1109/CYBConf.2015.7175937
Hyde, Richard ; Angelov, Plamen. / A new online clustering approach for data in arbitrary shaped clusters. Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 228-233
@inproceedings{67591d8c1db14e3896f4febe170d926a,
title = "A new online clustering approach for data in arbitrary shaped clusters",
abstract = "In this paper we demonstrate a new density based clustering technique, CODSAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.",
keywords = "arbitrary shape clusters, big data, clustering, data streams, online clustering",
author = "Richard Hyde and Plamen Angelov",
year = "2015",
month = "8",
day = "6",
doi = "10.1109/CYBConf.2015.7175937",
language = "English",
pages = "228--233",
booktitle = "Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - A new online clustering approach for data in arbitrary shaped clusters

AU - Hyde, Richard

AU - Angelov, Plamen

PY - 2015/8/6

Y1 - 2015/8/6

N2 - In this paper we demonstrate a new density based clustering technique, CODSAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.

AB - In this paper we demonstrate a new density based clustering technique, CODSAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.

KW - arbitrary shape clusters

KW - big data

KW - clustering

KW - data streams

KW - online clustering

UR - http://www.scopus.com/inward/record.url?scp=84947967804&partnerID=8YFLogxK

U2 - 10.1109/CYBConf.2015.7175937

DO - 10.1109/CYBConf.2015.7175937

M3 - Conference proceeding

SP - 228

EP - 233

BT - Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -