A good source of references

to clustering outside of the data mining field is the article by Arabie and Hu-

bert [376]. A paper by Kleinberg [401] provides a discussion of some of the

trade-offs that clustering algorithms make and proves that it is impossible to

for a clustering algorithm to simultaneously possess three simple properties’

The K-means algorithm has a long history, but is still the subject of current

research. The original K-means algorithm was proposed by MacQueen [403]. The ISODATA algorithm by Ball and Hall [377] was an early, but sophisticated

version of K-means that employed various pre- and postprocessing techniques

to improve on the basic algorithm. The K-means algorithm and many of its

variations are described in detail in the books by Anderberg PTal and Jain

and Dubes [396]. The bisecting K-means algorithm discussed in this chapter

was described in a paper by Steinbach et al. [4L4], and an implementation

of this and other clustering approaches is freely available for academic use in

the CLUTO (ClUstering TOolkit) package created by Karypis [382]. Bolev

[380] has created a divisive partitioning clustering algorithm (PDDP) based

on finding the first principal direction (component) of the data, and Savaresi

and Boley [411] have explored its relationship to bisecting K-means. Recent

variations of K-means are a new incremental version of K-means (Dhilion et al.

[383]), X-means (Pelleg and Moore 1408]), and K-harmonic means (Zhang et aI

[416]). Hamerly and Elkan [392] discuss some clustering algorithms that pro-

duce better results than K-means. While some of the previously mentioned

approaches address the initialization problem of K-means in some manller)

8 .6

556 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms

other approaches to improving K-means initialization can also be found in the

work of Bradley and Fayyad [381]. Dhillon and Modha [384] present a gen-

eralization of K-means, called spherical K-means, that works with commonly used similarity functions. A general framework for K-means clustering that

uses dissimilarity functions based on Bregman divergences was constructed by

Banerjee et al. 1378]. Hierarchical clustering techniques also have a long history. Much of the

initial activity was in the area of taxonomy and is covered in books by Jardine

and Sibson [398] and Sneath and Sokal 1412]. General-purpose discussions of

hierarchical clustering are also available in most of the clustering books men-

tioned above. Agglomerative hierarchical clustering is the focus of most work

in the area ofhierarchical clustering, but divisive approaches have also received

some attention. For example, Zahn l4l5l describes a divisive hierarchical tech- nique that uses the minimum spanning tree of a graph. While both divisive

and agglomerative approaches typically take the view that merging (splitting) decisions are final, there has been some work by Fisher [389] and Karypis et

al. [399] to overcome these limitations. Ester et al. proposed DBSCAN 1387], which was later generalized to the

GDBSCAN algorithm by Sander et al. 1410] in order to handle more general

types of data and distance measures) such as polygons whose closeness is mea-

sured by the degree of intersection. An incremental version of DBSCAN was

developed by Kriegel et al. [386]. One interesting outgrowth of DBSCAN is

OPTICS (Ordering Points To Identify the Clustering Structure) (Ankerst et

al. [375]), which allows the visualization of cluster structure and can also be used for hierarchical clustering.

An authoritative discussion of cluster validity, which strongly influenced the discussion in this chapter, is provided in Chapter 4 of Jain and Dubes’

clustering book [396]. More recent reviews of cluster validity are those of

Halkidi et al. [390, 391] and Milligan [ 0a]. Silhouette coefficients are described in Kaufman and Rousseeuw’s clustering book [400]. The source of the cohesion and separation measures in Table 8.6 is a paper by Zhao and Karypis [417], which also contains a discussion of entropy, purity, and the hierarchical F-

measure. The original source of the hierarchical F-measure is an article by Larsen and Aone [402].

Bibliography f3731 M. S. Aldenderfer and R. K. Blashfield. Cluster Analysr,s. Sage Publications, Los

Angeles, 1985

[374] M. R. Anderberg. Cluster Analysis for Appli.cati,ons. Academic Press, New York, December 1973.

Bibliography 557

13751 M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: Ordering Points To Identify the Clustering Structure. In Proc. of 1999 ACM-SIGMOD Intl. Conf. on Management of Data, pages 49 60, Philadelphia, Pennsylvania, June 1999. ACM Press.

[376] P. Arabie, L. Hubert, and G. D. Soete. An overview of combinatorial data analysis. In P. Arabie, L. Hubert, and G. D. Soete, editors, Clusteri,ng and Classifi,cation, pages 188 2L7. World Scientific, Singapore, January 1996.

13771

[378]

[37e]

[380]

G. Ball and D. Hall. A Clustering Technique for Summarizing Multivariate Data. Behauior Sc’ ience, 12:153-155, March 1967. A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman Diver-

gences. In Proc. of the 2001 SIaIM Intl. Conf. on Data M’ini,ng, pages 234 245, Lake Buena Vista, FL, April 2004.

P. Berkhin. Survey Of Clusterirrg Data Mining Techniques. Technical report, Accrue Software. San Jose. CA. 2002. D. Boley. Principal Direction Divisive Partitioning. Data Mi,ni.ng and Knowledge

Discouery, 2(4):325-344, 1998. P. S. Bradley and U. M. Fayyad. Refining Initial Points for K-Means Clustering. In

Proc. of the 15th Intl. Conf. on Machine Learni,ng, pages 91 99, Madison, WI, July 1998. Morgan Kaufmann Publishers Inc.

CLUTO 2. I .7 : Software for Ciustering High-Dimensional Datasets.

/www.cs.umn.edu/-61ypis, November 2003. L S. Dhillon, Y. Guan, and J. Kogan. Iterative Clustering of High Dimensional Text

Data Augmented by Local Search In Proc. of the 2002 IEEE Intl. Conf. on Data Mining, pages 131 138. ItrEE Computer Society, 2002 I. S. Dhillon and D. S. Modha. Concept Decompositions for Large Sparse Text Data

Using Clustering. M ach’ine Learn,,ing, a2Q I 2) :1a3 I75, 2001. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classi.ficati,on John Wiley & Sons,

Inc., New York, second edition, 2001.

M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental Clustering for Mining in a Data Warehousing Environment In Proc. of the 2/1th VLDB Conf., pages 323 333, New York City, August 1998. Morgan Kaufmann.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Dis- covering Clusters in Large Spatial Databases with Noise. In Proc. of the Znd Intl. Conf. on Knowledge Discouery and Data M’in’ing, pages 226 231, Portland, Oregon, August 1996. AAAI Press.

B. S. trveritt, S. Landau, and M. Leese. Cluster Analysis. Arnold Publishers, London, fourth edition, May 2001.

D. Fisher. Iterative Optimization and Simplification of Hierarchical Clusterings. -/ozr- naL of Artif”cial Intelli,gence Research, 4:147-179, 1996. M. Halkidi, Y. Batistakis, and M Vazirgiannis. Cluster validity methods: part L

SIGMOD Record (ACM Speci,al Interest Group on Management of Data),31(2):40 45, June 2002.

M. Halkidi, Y. Batistakis, and N{. Vazirgiannis. Clustering validity checking methods: part II. SIGMOD Record (ACM Special Interest Group on Managernent of Data), 3L (3):19-27, Sept. 2002.

1392] G. Hamerly and C. trlkan. Alternatives to the k-means algorithm that find better clusterings. In Proc. of the 1lth IntI. Conf . on Informat’ion and Knowledge Management, pages 600 607, Mclean, Virginia, 2002. ACM Press.

[382]

[383]

13851

13861

13871

[388]

l38el

558 Chapter 8 Cluster Analysis: Basic Concepts and Algorithms

[393] J. Han, M. Kamber, and A. Tirng. Spatial Clustering Methods in Data Mining: A

review. In H. J. Miller and J. Han, editors, Geographi’c Data Mining and Knowledge Discouery, pages 188-217. Taylor and Francis, London, December 2001.

[394] J. Hartigan. Clusteri,ng Algorithms. Wiley, New York, 1975.

[395] T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning: Data Mini,ng, Inference, Predi,ct’ion. Springer, New York, 2001.

f396] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall

Advanced Reference Series. Prentice Hall, March 1988. Book available online at

http: / /www.cse. msu. edu/ -j ain/ Clustering-Jain-Dubes. pdf .

[397] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing

Surueys, 3L(3):26a 323, September 1999.

[398] N. Jardine and R. Sibson. Mathemat’ical Tanonomg. Wiley, New York, 1971.

13991 G. Karypis, E.-H. Han, and V. Kumar. Multilevel Refinement for Hierarchical CIus- tering. Technical Report TR 99-020, University of Minnesota, Minneapolis, MN, 1999.

[400] L. Kaufman and P. J. Rousseeuw. Find,ing Groups i,n Data: An Introducti,on to Cluster Analgsi,s. Wiley Series in Probability and Statistics. John Wiley and Sons, New York, November 1990.

[401] J. M. Kleinberg. An Impossibility Theorem for Clustering. ln Proc. of the 16th Annual

Conf. on Neural Informat’ion Processi.ng Systems, December, I 142002.

[402] B. Larsen and C. Aone. Fast and Effective Text Mining Using Linear-Time Document Clustering. In Proc. of the 5th IntI. Conf. on Knowledge Discouery and Data M’ining, pages 16-22, San Diego, California, 1999. ACM Press.

[403] J. MacQueen. Some methods for classification and analysis of multivariate observa- tions. In Proc. of the Sth Berkeleg Sgmp. on Mathernatical Stati,sti,cs and, Probabi,Iitg, pages 281-297. University of California Press, 1967.

1404] G. W. Milligan. Clustering Validation: Results and Implications for Applied Analyses. In P. Arabie, L. Hubert, and G. D. Soete, editors, Clusteri,ng and Classif,cati,on, pages

345 375. World Scientific, Singapore, January 1996.

[405] B. Mirkin. Mathematical Classi,fication and Clustering, volume Il of Nonconuer Opti’- mi:zati,on and Its Appli,cations. Kluwer Academic Publishers, August 1996.

[406] T. Mitchell. Machine Learning. McGraw-Hill, Boston, MA, 1997.

[407] F. Murtagh. Multidimensi.onal Clustering Algori.thms. Physica-Verlag, Heidelberg and Vienna, 1985.

[408] D. Pelleg and A. W. Moore. X-means: Extending l(-means with Efficient Estimation of the Number of Clusters. It Proc. of the 17th Intl. Conf. on Machine Learning, pages

727 734. Morgan Kaufmann, San Francisco, CA, 2000.

[409] C. Romesburg. Cluster Analysis for Researchers. Life Time Learning, Belmont, CA, 1984.

[410] J. Sander, M. Ester, H.-P. Kriegel, and X. Xu. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications. Data M’ini,ng and” Knowl- edge Discouery, 2(2):169-194, 1998.