8.5 Cluster Evaluation

In supervised classification, the evaluation of the resulting classification model is an integral part of the process of developing a classification model, and there are well-accepted evaluation measures and procedures, €.g., accuracy and cross-validation, respectively. However, because of its very nature, cluster evaluation is not a well-developed or commonly used part of cluster analysis. Nonetheless, cluster evaluation, or cluster validation as it is more tradition- ally called, is important, and this section will review some of the most common and easily applied approaches.

There might be some confusion as to why cluster evaluation is necessary. Many times, cluster analysis is conducted as a part of an exploratory data analysis. Hence, evaluation seems like an unnecessarily complicated addition to what is supposed to be an informal process. Furthermore, since there are a number of different types of clusters-in some sense, each clustering algorithm defines its own type of cluster it may seem that each situation might require a different evaluation measure. For instance, K-means clusters might be evaluated in terms of the SSE, but for density-based clusters, which need not be globular, SSE would not work well at all.

Nonetheless, cluster evaluation should be a part of any cluster analysis. A key motivation is that almost every clustering algorithm will find clusters in a data set, even if that data set has no natural cluster structure. For instance, consider Figure 8.26, which shows the result of clustering 100 points that are randomly (uniformly) distributed on the unit square. The original points are shown in Figure 8.26(a), while the clusters found by DBSCAN, K- means, and complete link are shown in Figures 8.20(b), 8.26(c), and 8.26(d), respectively. Since DBSCAN found three clusters (after we set Eps by looking at the distances of the fourth nearest neighbors), we set K-means and complete link to find three clusters as well. (In Figure 8.26(b) the noise is shown by the small markers.) However, the clusters do not look compelling for any of

8.5 Cluster Evaluation 533

the three methods. In higher dimensions, such problems cannot be so easily detected.

8.5.1 Overview

Being able to distinguish whether there is non-random structure in the data is just one important aspect of cluster validation. The following is a list of several important issues for cluster validation.

1. Determining the clustering tendency of a set of data, i.e., distinguish- ing whether non-random structure actually exists in the data.

2. Determining the correct number of clusters.

3. Evaluating how well the results of a cluster analysis fit the data wi,thout reference to external information.

4. Comparing the results of a cluster analysis to externally known results, such as externally provided class labels.

5. Comparing two sets of clusters to determine which is better.

Notice that items L, 2, and 3 do not make use of any external information- they are unsupervised techniques while item 4 requires external information. Item 5 can be performed in either a supervised or an unsupervised manner. A further distinction can be made with respect to items 3, 4, and 5: Do we want to evaluate the entire clustering or just individual clusters?

While it is possible to deverlop various numerical measures to assess the different aspects of cluster validity mentioned above, there are a number of challenges. First, a measure of cluster validity may be quite limited in the scope of its applicability. For example, most work on measures of clustering tendency has been done for two- or three-dimensional spatial data. Second, we need a framework to interprr:t any measure. If we obtain a value of 10 for a measure that evaluates how well cluster labels match externally provided class labels, does this value represent a good, fair, or poor match? The goodness of a match often can be measured by looking at the statistical distribution of this value, i.e., how likely it is that such a value occurs by chance. Finally, if a measure is too complicated to apply or to understand, then few will use it.

The evaluation measures, or indices, that are applied to judge various aspects of cluster validity are traditionally classified into the following three types.

534 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms

V V Y V

V V v Yv

v Y Y’

t t y V V

t t g v

v w v v

. a

x

*

* 4 *

* *

*

* * *

O

t( * *

x’

* #

* ) ( *

* x *

i . . . * o o

o o

*

)(

* * *

* *

I

* )'(

)K

*

‘K )K. *

** *

x * * .

x * g *

V Y

o

a O

oo

o a

v * Y

y V

Y V’ Y

V Y

v Y

V

V

V

V V V v v V

y Y

I I

t a

(b) Three clusters found by DBSCAN.

V V V

V

a

o

O O O

a

a

,

o a

a a

o

o

o

o

o

o o a o

O a a o

a o t o

r O -‘ t i .

o

1 O

o o

3 t t .

o o O

o o ‘ a 1 O

a o o t 1 0 a o o o r

! . t ;o o o o

o

a o o o

o o o l l o

1 O

a

3 t t . a

o o o

v !

o

(a) Original points.

I

I

V V V V Y Y V V

v v v

v v t t

V V V

* x xtur * * x * * * *

* * * y * * *

* * *

o

a a

o

* *

*

*

,(* * * . * x

4 *

* x * * * * .

* * *

*

oo

V

VV

* * *

* ;{<.

)(o

a o o

Y V v v

Y . Y Y V ‘

v v v

v $ o

r. ‘:. o o o

o o y V

V Y V V

(c) Three clusters found by K-means. (d) Three clusters found by complete

Figure 8.26. Clustering of 100 uniformly distributed points.

Cluster Evaluation 535

IJnsupervised. Measures the goodness of a clustering structure without re- spect to external informal;ion. An example of this is the SSE. Unsu- pervised measures of cluster validity are often further divided into two classes: measures of cluster cohesion (compactness, tightness), which determine how closely relabed the objects in a cluster are, and measures of cluster separation (isolation), which determine how distinct or well- separated a cluster is from other clusters. Unsupervised measures are often called internal indices because thev use only information present

in the data set.

Supervised. Measures the extent to which the clustering structure discovered by a clustering algorithm rnatches some external structure. An example of a supervised index is entropy, which measures how well cluster labels match externally supplied class labels. Supervised measures are often called external indices Lrecause they use information not present in the data set.

Relative. Compares different clusterings or clusters. A relative cluster eval-

uation measure is a supervised or unsupervised evaluation measure that is used for the purpose of comparison. Thus, relative measures are not

actually a separate type of cluster evaluation measure, but are instead a

specific use of such measures. As an example, two K-means clusterings can be compared using either the SSE or entropy.

In the remainder of this section, we provide specific details concerning clus- ter validity. We first describe topics related to unsupervised cluster evaluation, beginning with (1) measures ba,sed on cohesion and separation, and (2) two techniques based on the proximity matrix. Since these approaches are useful only for partitional sets of clusters, we also describe the popular cophenetic correlation coefficient, which can be used for the unsupervised evaluation of

a hierarchical clustering. We end our discussion of unsupervised evaluation with brief discussions about fincling the correct number of clusters and evalu- ating clustering tendency. We then consider supervised approaches to cluster validity, such as entropy, purity, and the Jaccard measure. We conclude this

section with a short discussion of how to interpret the values of (unsupervised

or supervised) validity measures.

8 .5

536 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms

8.5.2 lJnsupervised Cluster Evaluation Using Cohesion and Separation

Many internal measures of cluster validity for partitional clustering schemes are based on the notions of cohesion or separation. In this section, we use cluster validity measures for prototype- and graph-based clustering techniques to explore these notions in some detail. In the process, we will also see some interesting relationships between prototype- and graph-based clustering.

In general, we can consider expressing overall cluster validity for a set of 1{ clusters as a weighted sum of the validity of individual clusters,

t : l

The uali,di,ty function can be cohesion, separation, or some combination of these quantities. The weights will vary depending on the cluster validity measure. In some cases, the weights are simply 1 or the size of the cluster, while in other cases they reflect a more complicated property, such as the square root of the cohesion. See Table 8.6. If the validity function is cohesion, then higher values are better. If it is separation, then lower values are better.

Graph-Based View of Cohesion and Separation

For graph-based clusters, the cohesion of a cluster can be defined as the sum of the weights of the links in the proximity graph that connect points within the cluster. See Figure 8.27(a). (Recall that the proximity graph has data objects as nodes, a link between each pair of data objects, and a weight assigned to each link that is the proximity between the two data objects connected by the link.) Likewise, the separation between two clusters can be measured by the sum of the weights of the links from points in one cluster to points in the other cluster. This is il lustrated in Figure 8.27(b).

Mathematically, cohesion and separation for a graph-based cluster can be expressed using Equations 8.9 and 8.10, respectively. The prori,mi,ty function can be a similarity, a dissimilarity, or a simple function of these quantities.

K

ouerall uatidi,ty: t wi uali,d,i,ty(C.i).

cohesi,on(C6) : I pro*i*i.ty(x,y)

i,2Z: separat’ion(Ct,C) : I pro”i^i.tg(x,y)

x e L i v€ci

(8.8)

(8.e)

(8 .10)

8 .5 Cluster Evaluation 537

(a) Cohesion. (b) Separation.

Figure 8.27. Graph-based view of cluster cohesion and separation.

Prototype-Based View of Cohesion and Separation

For prototype-based clusters, the cohesion of a cluster can be defined as the sum of the proximities with respect to the prototype (centroid or medoid) of the cluster. Similarly, the separation between two clusters can be measured by the proximity of the two cluster prototypes. This is illustrated in Figure

8.28, where the centroid of a cluster is indicated by a “+”. Cohesion for a prototype-based cluster is given in Equation 8.11, while

two measures for separation are given in Equations 8.12 and 8.13, respec- tively, where ca is the prototype (centroid) of cluster C; and c is the overall prototype (centroid). There are two measures for separation because, as we will see shortly, the separation of cluster prototypes from an overall prototype

is sometimes directly related to the separation of cluster prototypes from one another. Note that Equation 8.11 is the cluster SSE if we let proximity be the squared Euclidean distance.

cohesion(Ci) : \ pro. r imi lg(x,c i )

xeL t

prori,mity\ct, c j)

prori,mi.ty(c6, c)

Overall Measures of Cohesion and Separation

The previous definitions of cluster cohesion and separation gave us some sim- ple and well-defined measures of cluster validity that can be combined into

an overall measure of cluster validity by using a weighted sum, as indicated

separation(Ct, C i) separati.on(Ci)

(8 .11 )

(8.12)

(8.13)

538 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms

(a) Cohesion. (b) Separation

Figure 8.28, Prototype-based view of cluster cohesion and separation.

in Equation 8.8. However, we need to decide what weights to use. Not sur- prisingly, the weights used can vary widely, although typically they are some measure of cluster size.

Table 8.6 provides examples of validity measures based on cohesion and separation. Iy is a measure of cohesion in terms of the pairwise proximity of objects in the cluster divided by the cluster size. 12 is a measure of cohesion based on the sum of the proximities of objects in the cluster to the cluster centroid. t1 is a measure of separation defined as the proximity of a cluster centroid to the overall centroid multiplied by the number of objects in the cluster. 9t, which is a measure based on both cohesion and separation, is the sum of the pairwise proximity of all objects in the cluster with all objects outside the cluster-the total weight of the edges of the proximity graph that must be cut to separate the cluster from all other clusters-divided bv the sum of the pairwise proximity of objects in the cluster.

Table 8.6. Table of graph-based cluster evaluation measures.

Name Cluster Measure Cluster Weisht Type

a L 7

I m i

graph-based cohesion

,7. L2 D*ect Pr ori’mi’tY (x, c 1) I

prototype-based cohesion

c u’l prorzmity(ci,c) mi

prototype-based separation

9t s k I i . l D l : : , Prot imitY(x.Y

J c ” J