We consider three specific clustering algorithms to illustrate these exten- sions of prototype-based clustering. Ftzzy c-means uses concepts from the field of fitzzy logic and fiizzy set theory to propose a clustering scheme, which is much like K-means, but which does not require a hard assignment of a point to only one cluster. Mixture model clustering takes the approach that a set of clusters can be modeled as a mixture of distributions, one for each cluster. The clustering scheme based on Self-Organizing Maps (SOM) performs clustering within a framework that requires clusters to have a prespecified relationship to one another, e.8., & two-dimensional grid structure.
9.2.L hnzy Clustering
If data objects are distributed in well-separated groups, then a crisp clas- sification of the objects into disjoint clusters seems like an ideal approach. However, in most cases, the objects in a data set cannot be partitioned into
578 Chapter 9 Cluster Analysis: Additional Issues and Algorithms
well-separated clusters, and there will be a certain arbitrariness in assigning an object to a particular cluster. Consider an object that lies near the boundary of two clusters, but is slightly closer to one of them. In many such cases, it might be more appropriate to assign a weight to each object and each cluster that indicates the degree to which the object belongs to the cluster. Mathe- matically, wii is the weight with which object 4 belongs to cluster Cr.
As shown in the next section, probabilistic approaches can also provide such weights. While probabilistic approaches are useful in many situations, there are times when it is difficult to determine an appropriate statistical model. In such cases, non-probabilistic clustering techniques are needed to provide similar capabilities. Fuzzy clustering techniques are based on fitzzy set theory and provide a natural technique for producing a clustering in which membership weights (the w61) have a natural (but not probabilistic) interpre- tation. This section describes the general approach of finzy clustering and provides a specific example in terms of finzy c-means (fuzzy K-means).
F\rzzy Sets
Lotfi Zadeh introduced frzzy set theor:y and finzy logic in 1965 as a way of dealing with imprecision and uncertainty. Briefly, fuzzy set theory allows an object to belong to a set with a degree of membership between 0 and 1, while htzzy logic allows a statement to be true with a degree of certainty between 0 and 1. Traditional set theory and logic are special cases of their finzy counter- parts that restrict the degree of set membership or the degree of certainty to be either 0 or 1. Ftzzy concepts have been applied to many different areas, in- cluding control systems, pattern recognition, and data analysis (classification and clustering).
Consider the following example of hnzy logic. The degree of truth of the statement “It is cloudy” can be defined to be the percentage of cloud cover in the sky, e.g., if the sky is 50% covered by clouds, then we would assign “It is cloudy” a degree of truth of 0.5. If we have two sets, “cloudy days” and “non- cloudy days,” then we can similarly assign each day a degree of membership in the two sets. Thus, if a day were 2570 cloudy, it would have a 25Vo degree of membership in “cloudy days” and a 75To degree of membership in “non-cloudy days.tt
F\tzzy Clusters
Assume that we have a set of data points X : {*t, . . . ,x-}, where each point, x4, is an n-dimensional point, i .e. , x4 : (rn, . . . , r in). A col lect ion of f inzy
9 .2 Prototype-Based Clustering 579
clusters, Ct, Cz, …, Cn is a subset of all possible fizzy subsets of .t. (This
simply means that the membership weights (degrees), wii,have been assigned values between 0 and 1 for each point, xi, o,nd each cluster, Ci.) However, we also want to impose the following reasonable conditions on the clusters in order to ensure that the clusters form what is called a fitzzy psuedo-partition.
1. All the weights for a given point, xi, add up to 1. k
D.e j : I j : t
2. Each cluster, Ci, contains, with non-zero weight, at least one point, but does not contain, with a weight of one, all of the points.
rn 0< t uU lm
i : I
F\tzzy c-means
While there are many types of fuzzy chstering-indeed, many data analysis algorithms can be “ftrzzified” we only consider the fuzzy vercion of K-means, which is called fiizzy c-rneans. In the clustering literature, the version of K- means that does not use incremental updates of cluster centroids is sometimes referred to as c-means, and this was the term adapted by the fuzzy community for the finzy version of K-means. The fizzy c-means algorithm, also sometimes known as FCM, is given by Algorithm 9.1.
Algorithm 9.1 Basic fizzy c-means algorithm. 1: Select an initial fuzzy pseudo-partition, i.e., assign values to all the tu23. 2: repeat 3: Compute the centroid of each cluster using the fuzzy pseudo-partition. 4: Recompute the finzy pseudo-partition, i.e., the wii. 5: until The centroids don’t change.
(Alternative stopping conditions are “if the ch,ange in the error is below a specified threshold” or “if the absolute change in any uri.; is below a given threshold.”)
After initialization, FCM repeatedly computes the centroids of each cluster and the fuzzy pseudo-partition until the partition does not change. FCM is similar in structure to the K-means algorithm, which after initialization, alternates between a step that updates the centroids and a step that assigns each object to the closest centroid. Specifically, computing a fitzzy pseudo- partition is equivalent to the assignment step. As with K-means, FCM can
580 Chapter 9 Cluster Analysis: Additional Issues and Algorithms
be interpreted as attempting to minimize the sum of the squared error (SSE), although FCM is based on a fwzy version of SSE. Indeed, K-means can be regarded as a special case of FCM and the behavior of the two algorithms is quite similar. The details of FCM are described below.
Computing SSE The definition of the sum of the squared error (SSE) is modified as follows:
where c7 is the centroid of the j’h cluster and p, which is the exponent that determines the influence of the weights, has a value between 1 and oo. Note that this SSE is just a weighted version of the traditional K-means SSE given in Equation 8.1.
Initialization Random initialization is often used. In particular, weights are chosen randomly, subject to the constraint that the weights associated with any object must sum to 1. As with K-means, random initialization is simple, but often results in a clustering that represents a local minimum in terms of the SSE. Section 8.2.1, which contains a discussion on choosing initial centroids for K-means, has considerable relevance for FCM as well.
Computing Centroids The definition of the centroid given in Equation g.2
can be derived by finding the centroid that minimizes the fitzzy SSE as given by Equation 9.1. (See the approach in Section 8.2.6.) For a cluster, C7, the corresponding centroid, c7, is defined by the following equation:
k t n
SSE(C1,C2, . . . , Cu) : t \uf,dtst(xt, cj)2 . J – L . – L
S: n i\-cj: Lwlrx; l \ul , i-r i:\
(e .1 )
(s.2)
The finzy centroid definition is similar to the traditional definition except that all points are considered (any point can belong to any cluster, at least somewhat) and the contribution of each point to the centroid is weighted by its membership degree. In the case of traditional crisp sets, where all uii are either 0 or 1, this definition reduces to the traditional definition of a centroid.
There are a few considerations when choosing the value of p. Choosing p : 2 simplifies the weight update formula-see Equation 9.4. However, 1f p
9.2 Prototype-BasedCiustering 581
is chosen to be near L, then fuzzy c-means behaves like traditional K-means.
Going in the other direction, as p gets larger, all the cluster centroids approach
the global centroid of all the data points. In other words, the partition becomes
fuzzier as p increases.
Updating t};re Fluzzy Pseudo-partition Since the fuzzy pseudo-partition
is defined by the weight, this step involves updating the weights ?/r;7 associ-
ated with the i,th point and jtb cluster. The weight update formula given in
Equation 9.3 can be derived by minimizing the SSE of Equation 9.1 subject
to the constraint that the weishts sum to 1.
This formula may appear a bit mysterious. However, note that if P :2,
then we obtain Equation 9.4, which is somewhat simpler. We provide an
intuitive explanation of Equation 9.4, which, with a slight modification, also
applies to Equation 9.3.
ysit : (Lldist(x6,” j) ‘ ) ,- / fplonu(*.,”) ‘ )h (e.3) , n:t
1f d,ist(xi,”)’ f
lxii :