Cluster Analysis

Cluster analysis is the grouping a set of objects in such a way that objects in the same cluster group are more similar to each other than to those in other clusters.

Types of Cluster Analysis

There are a number of approaches to cluster analysis, including:

  • Centroid - clusters are represented by a central vector, which may not necessarily be a member of the data set; and example is k-means clustering

  • Density - clusters are defined as areas of higher density than the remainder of the data set

  • Distribution - based on probability distributions; clusters are defined as objects belonging most likely to the same distribution

  • Grid - analysis is performed on grid cells

  • Hierarchical - based on the core idea of objects being more related to nearby objects than to objects farther away

Cluster Algorithm Analysis

There are a number of methods for evaluating how well a clustering has been performed. One example is the Davies-Bouldin Index: