Abstract
This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering.
Original language | English (US) |
---|---|
Pages (from-to) | 215-232 |
Number of pages | 18 |
Journal | Data Mining and Knowledge Discovery |
Volume | 7 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2003 |
Keywords
- Clustering algorithm
- Mixture likelihood
- Sampling
- Star/galaxy classification
ASJC Scopus subject areas
- Control and Systems Engineering
- Artificial Intelligence
- Information Systems