Anomaly detection has a long history, particularly in statistics, where it is
known as outlier detection. Relevant books on the topic are those of Barnett
and Lewis , Hawkins , and Rousseeuw and Leroy . The article
by Beckman and Cook  provides a general overview of how statisticians
look at the subject of outlier detection and provides a history of the subject
dating back to comments by Bernoulli rn 1777. Also see the related articles
1467, 4841. Another general article on outlier detection is the one by Barnett
[+0f1. erti”les on finding outliers in multivariate data include those by Davies
and Gather 1474), Gnanadesikan and Kettenring 1480], Rocke and woodruff
[tr11], Rousr””n* and van Zomerenand 1515], and Scott . Rosner  provides a discussion of finding multiple outliers at the same time.
An extensive survey of outlier detection methods is provided by Hodge and
Austin . Markou and singh [506, 507] give a two-part review of techniques
for novelty detection that covers statistical and neural network techniques,
respectively. Grubbs’ procedure for detecting outliers was originally described
in la81]. Thg mixture model outlier approach discussed in section 10-2’3 is
from Eskin . The notion of a distance-based outlier and the fact that this
definition can include many statistical definitions of an outlier was described
by Knorr et al. 1496-498]. The LoF technique (Breunig et al. [468, 469])
grew out of DBSCAN. Ramaswamy et al. 1510] propose a distance-based
outlier detection procedure that gives each object an outlier score based on
the distance of its k-nearest neighbor. Efficiency is achieved by partitioning
the data using the first phase of BIRCH (Section 9.5.2). Chaudhary et al.
 use k-d trees to improve the efficiency of outlier detection, while Bay and
schwabacher  use randomization and pruning to improve performance.
Aggarwal and Yu  use projection to address outlier detection for high-
676 Chapter 1O Anomaly Detection
dimensional data, while Shyu et al.  use an approach based on principal components. A theoretical discussion of outlier removal in high-dimensional space can be found in the paper by Dunagan and Vempala . The use of information measures in anomaly detection is described by Lee and Xiang
1504], while an approach based on the 12 measure is given by Ye and Chen
. Many different types of classification techniques can be used for anomaly
detection. A discussion of approaches in the area of neural networks can be found in papers by Hawkins et al. , Ghosh and Schwartzbard 1479], and Sykacek . Recent work on rare class detection includes the work of Joshi et al. [490-494]. The rare class problem is also sometimes referred to as the imbalanced data set problem. Of relevance are an AAAI workshop (Japkowicz
1488]), an ICML workshop (Chawla et al. [a71]), and a special issue of SIGKDD Explorations (Chawla et al. la72l).
Clustering and anomaly detection have a long relationship. In Chapters 8 and 9, we considered techniques, such as BIRCH, CURE, DENCLUE, DB- SCAN, and SNN density-based clustering, which specifically include tech- niques for handling anomalies. Statistical approaches that discuss this re- lationship are described in papers by Scott  and Hardin and Rocke .
In this chapter, we have focused on basic anomaly detection schemes. We have not considered schemes that take into account the spatial or temporal nature of the data. Shekhar et al.  provide a detailed discussion of the problem of spatial outliers and present a unified approach to spatial outlier detection. The issue of outliers in time series was first considered in a sta- tistically rigorous way by Fox . Muirhead  provides a discussion of different types of outliers in time series. Abraham and Chuang [ 6t] propose a Bayesian approach to outliers in time series, while Chen and Liu  consider different types of outliers in time series and propose a technique to detect them and obtain good estimates of time series parameters. Work on finding deviant or surprising patterns in time series databases has been performed by Jagadish et aI.  and Keogh et al. . Outlier detection based on geometric ideas, such as the depth of convex hulls, has been explored in papers by Johnson et al. , Liu et al. , and Rousseeuw et al. [51a].
An important application area for anomaly detection is intrusion detection. Surveys of the applications of data mining to intrusion detection are given by Lee and Stolfo  and Lazarevic et al. . In a different paper, Lazarevic et al.  provide a comparison of anomaly detection routines specific to network intrusion. A framework for using data mining techniques for intrusion detection is provided by Lee et al. . Clustering-based approaches in the
area of intrusion detection include work by Eskin et al. 14771, Lane and Brodley
, and Portnoy et al. .
Bibliography f461] B. Abraham and A. Chuang. Outlier Detection and Time Series Modeling. Techno-
metrics, Sl(\:2aI 248, May 1989.
14621 C. C. Aggarwal and P. S. Yu. Outlier detection for high dimensional data. In Proc. of 2001 ACM-SIGMOD Intl. Conf. on Managernent of Data, pages 37-46. ACM Press, 2001.
f463] V. Barnett. The Study of Outliers: Purpose and Model. Applied Stati.stics, 27(3): 242 250, 1978.
 V. Barnett and T. Lewis. Outliers ‘in Statistical Data. Wiley Series in Probability and Statistics. John Wiley & Sons, 3rd edition, April 1994.
 S. D. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proc. of the 9th Intl. Conf. on Knowledge Discouerg and Data M’ining, pages 29 38. ACM Press, 2003.
f466] R.J .BeckmanandR.D.Cook. ‘Out l ie r . . . . . . . . . . s ‘ . Technornet r i cs ,2S(2) :119 149,May 1983.
 R. J. Beckman and R. D. Cook. f ‘Outl ier.. . . . . . . . .s ‘ ] : Response. Technometrics,25(2): 161-163, May 1983.
 M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. OPTICS-OF: Identifying Local Outliers. In Proceedings of the Thi,rd European Conference on Princ’iples of Data Mining and Knowledge D’iscouery, pages 262-270. Springer-Verlag, 1999.
 M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. LOF: Identifying density-based local outliers. In Proc. of 2000 ACM-SIGMOD Intl. Conf. on Management of Data, pages 93-104. ACM Press, 2000.
14701 A. Chaudhary A. S. Szalay, and A. W. Moore. Very fast outlier detection in large
multidimensional data sets. In Proc. ACM SIGMOD Workshop on Research Issues ‘in
Data Min’ing and Knowledge D’iscouery @MKD),2002.
f471] N. V. Chawla, N. Japkowicz, and A. Kolcz, editors. Workshop on Learni’ng from Imbalanced Data Sets II, 20th IntI. Conf. on Machine Leamuing,2000. AAAI Press.
 N. V. Chawla, N. Japkowicz, and A. Kolcz, editors. SIGKDD Erplorations Newsletter,
Speci,al’issue on learn’ing from imbalanced datasets, volume 6(1), June 2004. ACM Press.
1473] C. Chen and L.-M. Liu. Joint Estimation of Model Parameters and Outlier Effects in
Time Series. Jout’nal of the American Stati.stical Assoc’iation, 88(421):284-297, Marc}l
1474] L. Davies and U. Gather. The Identification of Multiple Outliers. Journal of the Am eri, can S tati,sti, cal A s s oc’iatio n, 88 ( 423) :7 82-792, September I 993.
14751 J. Dunagan and S. Vempala. Optimal outlier removal in high-dimensional spaces.
Journal of Computer and System Sciences, Special Issue on STOC 2001, 68(2):335- 373, March 2004.
 E. Eskin. Anomaly Detection over Noisy Data using Learned Probability Distributions. In Proc. of the 17th Intl. Conf. on Machine Learning, pages 255 262,2000.
 E Eskin, A. Arnold, M. Prera,u, L. Portnoy, and S. J. Stolfo. A geometric framework for
unsupervised anomaly detection. In Applications of Data Mini,ng in Computer Securitg, pages 78-100. Kluwer Academics, 2002.
678 Chapter 10 Anomaly Detection
 A. J. Fox. Outliers in Time Series. Journal of the Royal Statisti,cal Society. Series B ( M ethodological), 34(3) :350-363, 1972.
14791 A. Ghosh and A. Schwartzbard. A Study in Using Neural Networks for Anomaly and Misuse Detection. In 8th USENIX Security Symposiurn, August 1gg9.
1480] R. Gnanadesikan and J. R. Kettenring. Robust Estimates, Residuals, and Outlier Detection with Multiresponse Data. B’iometrics, 28(I):8I-I2 , March Ig72.
 F. Grubbs. Procedures for Testing Outlying Observations. Annal o.f Mathemati.cal Stat ist ics, 2I( l) :27 58, March 1950.
 J. Hardin and D. M. Rocke. Outlier Detection in the Multiple Cluster Setting using the Minimum Covariance Determinant Estimator. Computational Statisti,cs and Data Analg sis, 44:625-638, 2004.
f483] D. M. Hawkins. Identi,fication of Outliers. Monographs on Applied Probability and Statistics. Chapman & Hall, May 1980.
f484] D. M. Hawkins. ‘ fOutl ier.. . . . . . . . .s] ‘ : Discussion. Technometrics,2b(2):Ib5_156, May 1983.
1485] S. Hawkins, H. He, G. J. Williams, and R. A. Baxter. Outlier Detection Using Repli- cator Neural Networks. rn DawaK 2000: Proc. of the lth Intnl. conf. on Data Ware- hous’ing and Knowledge Discouerg, pages 170 180. Springer-Yerlag,2OO2.
f486] V. J. Hodge and J. Austin. A Survey of Outlier Detection Methodologies. Arti,ficial Intelligence Reui,ew, 22:85 126, 2004.
 H. V. Jagadish, N. Koudas, and S. Muthukrishnan. Mining Deviants in a Time Series Database. In Proc. of the 25th VLDB Conf., pages 102-113, 1999.
f4881 N. Japkowicz, editor. Workshop on Learn’ing from Imbalanced Data Sets I, Seuenteenth National Conference on Artificial Intelligence, Published, as Techn’ical Report WS-00-05, 2000. AAAI Press.
 T. Johnson, I. Kwok, and R. T. Ng. Fast Computation of 2-Dimensional Depth Con- tours. In KDD98, pages 224-228, 1998.
f490] M. V. Joshi. On Evaluating Performance of Classifiers for Rare Classes. In Proc. of the 2002 IEEE IntI. Conf. on Data Mining, pages 641 044,2002.
 M. V. Joshi, R. C. Agarwal, and V. Kumar. Mining needle in a haystack: Classifying rare classes via two-phase rule induction. In Proc. of 2001 ACM-SIGM)D Intt. Conf. on Managernent of Data, pages 91-102. ACM Press, 2001.
1492] M’ V. Joshi, R. C. Agarwal, and V. Kumar. Predicting rare classes: can boosting make any weak learner strong? rn Proc. of 2002 ACM-SIGM)D IntI. Conf. on Management of Data, pages 297-306. ACM Press, 2002.
 M. V. Joshi, R. c. Agarwal, and v. Kumar. Predicting Rare classes: comparing Two-Phase Rule Induction to Cost-Sensitive Boosting. In Proc. of the 6th European conf. of Princi’ples and Practice of Knowledge Di,scoueryin Databases, pages 237 24g. Springer-Verl ag, 2002.
1494] M. V. Joshi, V. Kumar, and R. C. Agarwal. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements. In Proc. of the 2001 IEEE Intl. Conf. on Data M,ining, pages 257 264,200I.
 E. Keogh, S. Lonardi, and B. Chiu. Finding Surprising Patterns in a Time Series Database in Linear Time and Space. rn Proc. of the 8th IntI. conf. on Knowledge Discouery and Data Mining, Edmonton, Alberta, Canada, JuJy 2002.
 E. M. Knorr and R. T. Ng. A Unified Notion of Outliers: Properties and Computation. rn Proc. of the Srd IntI. conf . on Knowledge D’iscouerg and Data Mining, pages 2lg-222, 1997.
1497 E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proc. of the Zlth VLDB Conf., pages 392 403,24-27 1998.
 E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-based outliers: algorithms and applications. The V L D B J ournal, 8(3-4):237 -253, 2000.