4.L.2 Vector Addition and Multiplication b;y a Scalar

Various operations can be performed on vectors. (In what follows, we assume that the vectors are all from the same space space, i.e., have the same di- mensionality.) For instance, vectors can be added and subtracted. This is

686 Appendix A Linear Algebra

(a) Two vectors. (b) Their difference. (c) Their sum.

Figure A.1. Two vectors and their sum and difference.

best illustrated graphically, and vector subtraction and addition are shown in Figures A.1(b) and A.1(c), respectively. Like the addition of numbers, vector addition has some familiar properties. If u, v, and w are three vectors, then these properties can be described as follows:

o Commutativity of vector addition. The order of addition does not m a t t e r . u * v : v + u .

o Associativity of vector addition. The grouping of vectors during add i t ion does no t mat te r . (u+v) *w : u f (v+* )

o Existence of an identity element for vector addition. There exists a zero vector, simply denoted as 0, which is the identitv element. For a n y v e c t o r u , u + O : u .

o Existence of additive inverses for vector addition. For every vec- tor u, there is an inverse vector -u such that u * (-u) : g.

Another important operation is the multiplication of a vector by a number, which, in the context of linear algebra, is typically called a scalar. Scalar multiplication changes the magnitude of the vector; the direction is unchanged if the scalar is positive and is reversed if the scalar is negative. If u and v are vectors and a and B are scalars (numbers), then the properties of the scalar multiplication of vectors can be described as follows:

,\.1 Vectors 687

Associativity of scalar rnultiplication. The ortler of multiplication by two scalars does not matter. a(/u) : (aB)u.

Distributivity of scalar addition over multiplication of a scalar by a vector. Adding two scalars and then multiplying the resulting sum by a vector is the same as multiplying each scalar by the vector and then adding the two resultant vectors. (a + B)u : ztt -l 0,,r .

Distributivity of scalar multiplication over vet:tor addition. Add- ing two vectors and then multiplying the sum by a scalar is the same as multiplying each vector by the scalar and then adding. a(u + v) :

au + 0v.

Existence of scalar identity. If a : 1, then for arry vector u, atl : u.

A.1.3 Vector Spaces

A vector space is a set of vectors, along with an associated set of scalars (e.g., the real numbers) that satisfies the properties given above and that is

closed under vector addition and multiplication by a scillar. (BV closed, we mean that every result of vector addition and/or scalar rnultiplication results in a vector in the original set.) Vector spaces have the property that any vector can be represented as a linear combination of a small set of vectors, which are known as a basis. More specifically, if u1 ,…,un are the basis vectors, then we can find a set of n scalars {*r,…,an} for any vector v, so

that v : lf,-ra,;q. We say that the basis vectors span the vector space. The dimension of a vector space is the minimum numbr:r of vectors that are necessary to form a basis. Typically, the basis vectors at’e taken to have unit Iength.

The basis vectors are usually orthogonal. The orthogonality of vectors is

an extension of the two-dimensional notion of perpendicrrlar lines and will be

defined more precisely later on. Conceptually orthogonal vectors are unrelated or independent. If basis vectors are mutually orthogonill, then expressing a

vector as a linear combination of basis vectors effectively decomposes the vector

into a number of independent components. Thus, a vector in an n-dimensional space can be considered to be an n-

tuple of scalars (numbers). To provide a concrete illustration, consider two-

dimensional Euclidean space, where each point is associated with a vector that represents the displacement of the point from the origin. The displacement

vector to any point can be written as the sum of a displacement in the r

688 Appendix A Linear Algebra

direction and a displacement in the gr direction, which are, respectively, the r and g coordinates of the point.

We will refer to the components of a vector v by using the notation v : (u t , rz , . . . ,un- r ,un) . (Wi th re fe rence to the equat ion , v : DLr a i r t i , n , i : ai.) Note that ui is a component of v, while vi is one of a set of vectors.

With a component view of vectors, the addition of vectors becomes simple to understand; to add two vectors, we simply add corresponding components. For example, (2,3) + (4,2) : (6,5). To multiply a vector by a scalar, we multiply each component by the scalar, e.g., 3 * (2,3) : (6,9).

A.1.4 The Dot Product, Orthogonality, and Orthogonal Projections

We now define what it means for two vectors to be orthogonal. For simplicity, we restrict ourselves to Euclidean vector spaces, although the definitions and results are easily generalized. We begin by defining the dot product of two vectors.

Definition A.1 (Dot Product). The dot product u .v of two vectors, u and v, is given by the following equation:

rL \-

l l ‘ V : ) _ r U l u i . t–1

In words, the dot product of two vectors is computed responding components of a vector and then adding the For ins tance, (2 ,3 ) – (4 ,7 ) : 2 * 4* 3 * 1 : 11 .

In Euclidean space it can be shown that the dot product of two (non-zero) vectors is 0 if and only if they are perpendicular. Geometrically, two vectors define a plane, and their dot product is 0 if and only if the angle (in the plane) between the two vectors is 90o. We say that such vectors are orthogonal.

The dot product can also be used to compute the length of a vector in Euclidean space, namely, length(u) : Vffi. The length of a vector is also known as its 12 norm and is written as llull. Given a vector u, we can find a vector that is pointing in the same direction as u, but is of unit length, by dividing each component of u by its length; i.e., by computing r/llrl l. We say that we have normalized the vector to have an L2 norm of 1.

Given the notation for the norm of a vector, the dot product of a vector can be written as

(A.1)

by multiplying cor- resulting products.

u.v : l l r . l l l l ‘ l l cos(O) , (A.2)

A.1 Vectors 689

v r4

Figure A.2, Orthogonal proiection of vector v in the direction of vector u.

where 0 is the angle between the two vectors. By grouping terms and reorder-

ing, this can be rewritten as

u.v : ( l l v l l cos(d) ) l l ” l l : v ” l lu l l , (A.3)

where v, : llvll cos(d) represents the Iength of v in the direction of u as

illustrated in Figure A.2. If u is a unit vector, then the dot product is the

component of v in the direction of u. We refer to this as the orthogonal projection of v onto u. Of course, it is also true that if v is a unit vector,

then the dot product is the projection of u in the direction of v. An important consequence of this is that, given a set of orthogonal vectors

of norm 1 that form a basis of a vector space, we can find the components of

any vector with respect to that basis by taking the dot product of the vector

with each basis vector. A concept that is closely related to that of orthogonality is the notion of

linear independence.

Definition A.2 (Linear Independence). A set of vectors is linearly inde- pendent if no vector in the set can be written as a linear combination of the

other vectors in another set.

If a set of vectors is not linearly independent, then they are linearly de- pendent. Note that we want our basis to consist of a set of vectors such that

no vector is linearly dependent with respect to the remaining basis vectors,

because if this were so, then we could eliminate that vector and still have a

690 Appendix A Linear Algebra

set of vectors that span the entire vector space. If we choose our basis vectors to be mutually orthogonal (independent), then we automatically obtain a lin- early independent set since any two vectors that are orthogonal are linearly independent.

A.1.5 Vectors and Data Analysis

Although vectors were originally introduced to deal with quantities such as force, velocity, and acceleration, they have proven useful for representing and understanding many other kinds of data. In particular, we can often regard a data object or an attribute as a vector. For example, Chapter 2 described a data set that consisted of 150 Iris flowers that were characterized by four attributes: sepal length, sepal width, petal length, and petal width. Each flower can be regarded as a four dimensional vector, and each attribute can be regarded as a 150 dimensional vector. As another example, a document can be represented as a vector, where each component corresponds to a term (word) and the value of each component is the number of times the term appears in the document. This yields a very sparse, high-dimensional vector, where by sparse, we mean that most of the entries of the vector are 0.

Once we have represented our data objects as vectors, we can perform var- ious operations on the data that derive from a vector viewpoint. For example, using various vector operations, we can compute the similarity or distance of two vectors. In particular, the cosine similarity of two vectors is defined as

/ \ u v C O S ( t l . \ / / : –

‘ ,’

l l , ‘ l l l lu l l ‘ (A.4)

This similarity measure does not take into account the magnitude (length) of the vectors, but is only concerned with the degree to which two vectors point in the same direction. In terms of documents, this means that two documents are the same if they contain the same terms in the same proportion. Terms that do not appear in both documents play no role in computing similarity.

We can also simply define the distance between two vectors (points). If u and v are vectors, then the Euclidean distance between the two vectors (points) is simply

di ,st(r ,v) : ( t – . r ) . ( t – . r ) . (A.5)

This type of measure is more appropriate for the Iris data, since the magnitude of the various components of the vectors does make a difference in whether they are considered to be similar.

4.2 Matrices 691

Also, for vector data, it is meaningful to compute the mean of the set of vectors, which is accomplished by computing the mean of each component. Indeed, some clustering approaches, such as K-means (Chapter 8) work by dividing the data objects into groups (clusters) and characterizing each cluster by the mean of the data objects (data vectors). The idea is that a good cluster is one in which the data objects in the cluster are close to the mean, where closeness is measured by Euclidean distance for data like the Iris data and by cosine similarity for data like document data.

Other common operations that are performed on data can also be thought of as operations on vectors. Consider dimensionality reduction. In the sim- plest approach, some of the components of the data vector are eliminated, while leaving the others unchanged. Other dimensionality reduction techniques pro-

duce a new set of components (attributes) for the data vector that are linear combinations of the previous components. Still other methods change the vec- tors in more complicated ways. Dimensionality reduction is discussed further in Appendix B.

For certain areas ofdata analysis, such as statistics, the analysis techniques

are expressed mathematically in terms of operations on data vectors and the data matrices that contain these data vectors. Thus, a vector representation brings with it powerful mathematical tools that can be used to represent, transform, and analyze the data.

In the remainder of this appendix, we will complete the story, by discussing matrices.

4.2 Matrices

4.2.L Matrices: Definitions

A matrix is a tabular representation of a set of numbers as a collection of rows and columns. We will use uppercase bold letters, such as A, to represent matrices. (Uppercase italic letters, such as A, are also used.) The term “mby n matrix” is commonly used to refer to a matrix with rn rows and n columns. For example, the matrix A, shown below, is a 2 by 3 matrix. If m : ??, we say that the matrix is a square rnatrix. The transpose of A is written as A” and is produced by interchanging the rows and columns of A.

“:l? g N A 7 : 12 71

Ll ,)

692 Appendix A Linear Algebra

The matrix entries are represented by subscripted, lowercase letters. For matrix A, for example, a,ii rs the entry in the ith row and jth column. Rows are numbered from top to bottom and columns from left to right. As a specific illustration, &2r :7 is the entry in the second row and first column of A.

Each row or column of a matrix defines a vector. For a matrix A, the zrh row vector can be represented using the notation aia and the jth column vector using the notation a*3. Using the previous example, a.2*: 17 5 2], while a*3 : II 2]7. Notice that row and column vectors are matrices and must be distinguished; i.e., a row vector and column vector that have the same number of entries and the same values reoresent different matrices.

4.2.2 Matrices: Addition and Multiplication by a Scalar

Like vectors, matrices can be added by adding their corresponding entries (components). (Here we are assuming that the matrices have the same number of rows and columsn.) More specifically, if A and B are two matrices having dimensions m by n, then the sum of A and B is defined as follows:

Definition A.3 (Matrix Addition). The sum of two m by n matrices, A and B, is an m by n matrix C, whose entries are given by the following equation: