Clustering Models

Clustering arbitrary data into clusters of similar items presents the difficulty of deciding what constitutes a good clustering. It can be shown that there is no absolute "best" criterion which would be independent of the final aim of the clustering. Consequently, it is the user which must supply this criterion, in such a way that the result of the clustering will suit their needs.

As of Version 2.0, ArrayMiner uses two intuitively appealing criteria of clustering quality:

    The first clustering criterion, available since the introduction of ArrayMiner, is used in other clustering tools as well. Namely, ArrayMiner clusters the expression profiles into a user-supplied number of clusters, such that the profiles within each cluster be mutually as similar as possible. More precisely, ArrayMiner finds a clustering of the expression profiles with minimal total variance of the clusters. The performance of ArrayMiner on this clustering criterion is described in the paper "Using k-Means? Consider ArrayMiner", available in the ArrayMiner installation package and in Optimal Design's web site at www.optimaldesign.com.

     The second clustering criterion, to our best knowledge not available in other clustering tools, was introduced with Version 2.0. of ArrayMiner. It seeks a set of a given number of Gaussian distributions that explain best the data being clustered, i.e., it finds a set of Gaussians yielding the highest probability for the data. This probabilistic model solves certain problems inherent to the minimal variance criterion, and has been found to yield clusters significantly more stable when the requested number of clusters changes. As a result, the difficult problem of deciding how many clusters there are in the data is drastically less critical. In addition, when clustering according to the Gaussian criterion, ArrayMiner is capable to detect outliers, i.e. expression profiles that do not match any other profiles in the data. The details of this clustering approach, as well as a discussion of its advantages, are given in the white paper available in the ArrayMiner installation package and in Optimal Design's web site.

     The user selects the desired clustering criterion in the beginning of an ArrayMiner run, in the Analysis Selection Window depicted below. There are two tabs in the window

As of version 3 of ArrayMiner, a third tab, Class Marking and Prediction, is available in the Analysis Selection Window (not shown in the above figure). Selecting the third tab acivates the optional functionality ClassMarker.