Clustering for Unsupervised Learning

Clustering algorithms have been widely studied in many scientific areas, such as data mining, knowledge discovery, bioinformatics and machine learning. More specifically, cluster analysis which is an essential technique for unsupervised learning, aims to find the underlying structure of a dataset following some given clustering criteria and specific properties of input data. We propose the seed-and-extension-based density peaks (SDP) algorithm which incorporates a new center selection strategy into the famous density peaks (DP) algorithm [Rodriguez and Laio, Science, 344(6191): 1492-1496, 2014]. In particular, SDP selects the centers that hold the features of their clusters while building a spanning forest, and meanwhile, constructs the output clusters in a seed-and-extension manner. SDP is more accurate than existing clustering approaches for a variety of types of datasets, including time-series data. In particular, we also build an Automated Machine Learning (AutoML) framework for clustering, which can smartly choose a proper algorithm and feature preprocessing steps for a new dataset at hand, and sets their respective hyperparameters as well. We believe that SDP and the AutoML framework would be helpful to unsupervised learning as well as many real applications. (more details)

This entry was posted in research. Bookmark the permalink.

Comments are closed.