AutoSCAN: automatic detection of DBSCAN parameters and efficient clustering of data in overlapping density regions

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The density-based clustering method is considered a robust approach in unsupervised clustering technique due to its ability to identify outliers, form clusters of irregular shapes and automatically determine the number of clusters. These unique properties helped its pioneering algorithm, the Density-based Spatial Clustering on Applications with Noise (DBSCAN), become applicable in datasets where various number of clusters of different shapes and sizes could be detected without much interference from the user. However, the original algorithm exhibits limitations, especially towards its sensitivity on its user input parameters minPts and ɛ. Additionally, the algorithm assigned inconsistent cluster labels to data objects found in overlapping density regions of separate clusters, hence lowering its accuracy. To alleviate these specific problems and increase the clustering accuracy, we propose two methods that use the statistical data from a given dataset’s k-nearest neighbor density distribution in order to determine the optimal ɛ values. Our approach removes the burden on the users, and automatically detects the clusters of a given dataset. Furthermore, a method to identify the accurate border objects of separate clusters is proposed and implemented to solve the unpredictability of the original algorithm. Finally, in our experiments, we show that our efficient re-implementation of the original algorithm to automatically cluster datasets and improve the clustering quality of adjoining cluster members provides increase in clustering accuracy and faster running times when compared to earlier approaches.

Related collections

Most cited references 43

Record: found
Abstract: not found
Article: not found

Algorithm AS 136: A K-Means Clustering Algorithm

J. A. Hartigan, M. A. Wong (1979)

0 comments Cited 948 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Comparing partitions

Lawrence Hubert, Phipps Arabie (1985)

0 comments Cited 673 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Scikit-learn: Machine Learning in Python

F Pedregosa, G Varoquaux, A Gramfort … (2011)

0 comments Cited 533 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Gangman Yi

Journal

Journal ID (nlm-ta): PeerJ Comput Sci

Journal ID (iso-abbrev): PeerJ Comput Sci

Journal ID (publisher-id): peerj-cs

Title: PeerJ Computer Science

Publisher: PeerJ Inc. (San Diego, USA )

ISSN (Electronic): 2376-5992

Publication date (Electronic): 14 March 2024

Publication date Collection: 2024

Volume: 10

Electronic Location Identifier: e1921

Affiliations

[1 ]Department of Multimedia Engineering, Dongguk University , Seoul, South Korea

[2 ]Department of Artificial Intelligence, Dongguk University , Seoul, South Korea

[3 ]Division of AI Software Convergence, Dongguk University , Seoul, South Korea

Article

Publisher ID: cs-1921

DOI: 10.7717/peerj-cs.1921

PMC ID: 11042006

PubMed ID: 38660211

SO-VID: f3d19a27-eb5a-4f57-9410-10cbb6408255

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

History

Date received : 24 October 2023

Date accepted : 12 February 2024

Funding

Funded by: The National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT)

Award ID: NRF-2022R1F1A1074228

Funded by: Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development

Award ID: IITP-2023-RS-2023-00254592

Funded by: The Korean government (MSIT) and the Dongguk University Research Fund of 2023

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2022R1F1A1074228), and was also supported by Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00254592) grant funded by the Korean government (MSIT) and the Dongguk University Research Fund of 2023. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Most referenced authors 790

See all reference authors

AutoSCAN: automatic detection of DBSCAN parameters and efficient clustering of data in overlapping density regions

Read this article at

Abstract

Related collections

On Research Data Publishing

Most cited references 43

Algorithm AS 136: A K-Means Clustering Algorithm

Comparing partitions

Scikit-learn: Machine Learning in Python

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 36

Most referenced authors 790