In this section we address the issue of describing data by considering what types of attributes are used to describe data objects. We first define an at- tribute, then consider what we mean by the type of an attribute, and finally describe the types of attributes that are commonly encountered.

What Is an attribute?

We start with a more detailed definition of an attribute.

Definition 2.1. An attribute is a property or characteristic of an object that may vary; either from one object to another or from one time to another.

For example, eye color varies from person to person, while the temperature of an object varies over time. Note that eye color is a symbolic attribute with a small number of possible values {brown,black,blue, green, hazel, etc.}, while temperature is a numerical attribute with a potentially unlimited number of values.

At the most basic level, attributes are not about numbers or symbols. However, to {iscuss and more precisely analyze the characteristics of objects, we assign numbers or symbols to them. To do this in a well-defined way, we need a measurement scale.

Definition 2.2. A measurement scale is a rule (function) that associates a numerical or symbolic value with an attribute of an object.

Formally, the process of measurement is the application of a measure- ment scale to associate a value with a particular attribute of a specific object. While this may seem a bit abstract, we engage in the process of measurement all the time. For instance, we step on a bathroom scale to determine our weight, we classify someone as male or female, or we count the number of chairs in a room to see if there will be enough to seat all the people coming to a meeting. In all these cases) the “physical value” of an attribute of an object is mapped to a numerical or symbolic value.

With this background, we can now discuss the type of an attribute, a concept that is important in determining if a particular data analysis technique is consistent with a specific type of attribute.

The Type of an Attribute

It should be apparent from the previous discussion that the properties of an attribute need not be the same as the properties of the values used to mea-

24 Chapter 2 Data

sure it. In other words, the values used to represent an attribute may have

properties that are not properties of the attribute itself, and vice versa. This

is illustrated with two examples.

Example 2.3 (Employee Age and ID Number). Two attributes that

might be associated with an employee are ID and age (in years). Both of these

attributes can be represented as integers. However, while it is reasonable to

talk about the average age of an employee, it makes no sense to talk about

the average employee ID. Indeed, the only aspect of employees that we want

to capture with the ID attribute is that they are distinct. Consequently, the

only valid operation for employee IDs is to test whether they are equal. There

is no hint of this limitation, however, when integers are used to represent the

employee ID attribute. For the age attribute, the properties of the integers

used to represent age are very much the properties of the attribute. Even so,

the correspondence is not complete since, for example, ages have a maximum’

while integers do not.

Example 2.4 (Length of Line Segments). Consider Figure 2.1, which

shows some objects-line segments and how the length attribute of these

objects can be mapped to numbers in two different ways. Each successive

line segment, going from the top to the bottom, is formed by appending the

topmost line segment to itself. Thus, the second line segment from the top is

formed by appending the topmost line segment to itself twice, the third line

segment from the top is formed by appending the topmost line segment to

itself three times, and so forth. In a very real (physical) sense, all the line

segments are multiples of the first. This fact is captured by the measurements

on the right-hand side of the figure, but not by those on the left hand-side.

More specifically, the measurement scale on the left-hand side captures only

the ordering of the length attribute, while the scale on the right-hand side

captures both the ordering and additivity properties. Thus, an attribute can be

measured in a way that does not capture all the properties of the attribute. t

The type of an attribute should tell us what properties of the attribute are

reflected in the values used to measure it. Knowing the type of an attribute

is important because it tells us which properties of the measured values are

consistent with the underlying properties of the attribute, and therefore, it

allows us to avoid foolish actions, such as computing the average employee ID.

Note that it is common to refer to the type of an attribute as the type of a

measurement scale.

2.1 Types of Data 25

—-> 1

—-> 2

–> 3

–> 5

A mapping of lengths to numbers

propertiesof rensth. nffii?;’fi”::ilin””till8in*o Figure 2.1. The measurement of the length of line segments on two different scales of measurement.

The Different Types of Attributes

A useful (and simple) way to specify the type of an attribute is to identify the properties of numbers that correspond to underlying properties of the attribute. For example, an attribute such as length has many of the properties of numbers. It makes sense to compare and order objects by length, as well as to talk about the differences and ratios of length. The following properties (operations) of numbers are typically used to describe attributes.

1. Distinctness : and *

2. Order <) <, >, and )

3. Addition * and –

4. Multiplication x and /

Given these properties, we can define four types of attributes: nominal, ordinal, interval, and ratio. Table 2.2 gives the definitions of these types, along with information about the statistical operations that are valid for each type. Each attribute type possesses all of the properties and operations of the attribute types above it. Consequently, any property or operation that is valid for nominal, ordinal, and interval attributes is also valid for ratio attributes. In other words, the definition of the attribute types is cumulative. However,

26 Chapter 2 Data

this does not mean that the operations appropriate for one attribute type are

appropriate for the attribute types above it. Nominal and ordinal attributes are collectively referred to as categorical

or qualitative attributes. As the name suggests, qualitative attributes, such

as employee ID, lack most of the properties of numbers. Even if they are rep-

resented by numbers, i.e., integers, they should be treated more like symbols.

The remaining two types of attributes, interval and ratio, are collectively re-

ferred to as quantitative or numeric attributes. Quantitative attributes are

represented by numbers and have most of the properties of numbers. Note

that quantitative attributes can be integer-valued or continuous.

The types of attributes can also be described in terms of transformations

that do not change the meaning of an attribute. Indeed, S. Smith Stevens, the

psychologist who originally defined the types of attributes shown in Table 2.2,

defined them in terms of these permissible transformations. For example,

Table 2.2, Different attribute types.