+1 (208) 254-6996 [email protected]
  

Figure 7.3. Example of a sequence database,

7.4 Sequential Patterns

Don't use plagiarized sources. Get Your Custom Essay on
Figure 7.3. Example of a sequence database,
Just from $13/Page
Order Essay

Market basket data often contains temporal information about when an item was purchased by customers. Such information can be used to piece together the sequence of transactions made by a customer over a certain period of time. Similarly, event-based data collected from scientific experiments or the mon- itoring of physical systems such as telecommunications networks, computer networks, and wireless sensor networks, have an inherent sequential nature to them. This means that an ordinal relation, usually based on temporal or spatial precedence, exists among events occurring in such data. However, the concepts of association patterns discussed so far emphasize only co-occurrence relationships and disregard the sequential information of the data. The latter information may be valuable for identifying recurring features of a dynamic system or predicting future occurrences of certain events. This section presents the basic concept of sequential patterns and the algorithms developed to dis- cover them.

7.4.L Problem Formulation

The input to the problem of discovering sequential patterns is a sequence data set, which is shown on the left-hand side of Figure 7.3. Each row records the occurrences of events associated with a particular object at a given time. For example, the first row contains the set of events occurring at timestamp t : 10

 

 

43O Chapter 7 Association Analysis: Advanced Concepts

for object A. By sorting all the events associated with object A in increasing order of their timestamps, a sequence for object A is obtained, as shown on

the right-hand side of Figure 7.3. Generally speaking, a sequence is an ordered list of elements. A sequence

can be denoted as s : (ep2es . . .en),, where each element e3 is a collection of one or more events, i.e., ej : {h,’i2, . . . ,26}. The following is a list of examples of sequences:

o Sequence of Web pages viewed by a Web site visitor:

( {Homepage} {Electronics} {Cameras and Camcorders} {Digital Cam- eras) {Shopping Cart} {Order Confirmation} {Return to Shoppi”e} )

o Sequence of events leading to the nuclear accident at Three-Mile Island:

( {clogged resin} {outlet valve closure} {loss of feedwater} {condenser polisher outlet valve shut) {booster pumps trip} {main waterpump trips}

{main turbine trips} {reactor pressure increases} )

o Sequence of classes taken by a computer science major:

( {Algorithms and Data Structures, Introduction to Operating Systems}

{Database Systems, Computer Architecture} {Computer Networks, Sofb- ware Engineering) {Computer Graphics, Parallel Programming} )

A sequence can be characterized by its length and the number of occur- ring events. The length of a sequence corresponds to the number of elements present in the sequence, while a k-sequence is a sequence that contains k events. The Web sequence in the previous example contains 7 elements and 7 eventsl the event sequence at Three-Mile Island contains 8 elements and 8 events; and the class sequence contains 4 elements and 8 events.

Figure 7.4 provides examples of sequences, elements, and events defined for a variety of application domains. Except for the last row, the ordinal attribute associated with each of the first three domains corresponds to calendar time. For the last row, the ordinal attribute corresponds to the location of the bases (A, C, G, T) in the gene sequence. Although the discussion on sequential patterns is primarily focused on temporal events, it can be extended to the case where the events have spatial ordering.

Subsequences

A sequence f is a subsequence of another sequence s if each ordered element in f is a subset of an ordered element in s. Formally, the sequence 1 : (tfi2 . . .t^)

 

 

Sequence Database

Sequence Element (Transaction)

Event (ltem)

Customer Purchase history of a given customer

A set of items bought by a customer at time t

Books, diary products, CDs. etc

Web Data Browsing activity of a particular Web visitor

The collection of files viewed by a Web visitor after a single mouse click

Home page, index page, contact info, etc

Event data History of events generated by a given sensor

Events triggered by a sensor at time t

Types of alarms generated by sensors

Genome sequences

DNA sequence of a particular species

An element of the DNA sequence

Bases A,T,G,C

7.4 Sequential Patterns 431

Ordinal Attribute

Figure 7.4. Examples of elements and events in sequence data sets,

is a subsequence of s : (“r”2.. .s,) i f there exist integers L a j t < jz <.. . a jrn 1n such that h e sj , tz e sj” , . . . , t^ e sj* . I f t is a subsequence of s, then we say that t is contained in s. The following table gives examples illustrating the idea of subsequences for various sequences.

Sequence, s Sequence, f Is f a subsequence of s? <12,413,5,6 8 2 | { 3 ,61 {8 } > Yes <12,413’5,6 8 2 [ . { 8 Yes < { 1 , 2 } 3,41 > t t 12 No <{2 ,412,412,5 2 1 1 4 Yes

7.4.2 Sequential Pattern Discovery

Let D be a data set that contains one or more data sequences. The term data sequence refers to an ordered list of events associated with a single data object. For example, the data set shown in Figure 7.3 contains three data sequences, one for each object A, B, and C.

The support of a sequence s is the fraction of all data sequences that contain s. If the support for s is greater than or equal to a user-specified

 

 

432 Chapter 7 Association Analysis: Advanced Concepts

Obiect Timestamp Events A 1 1 . 2 , 4 A 2 2 , 3 A 3 5 B 1 1 , 2 B 2 2 , 3 , 4 c 1 1 , 2 c 2 2 , 3 , 4 c 3 2 , 4 , 5 D 1 2 D 2 3 , 4 D 3 4 , 5 E 1 1 , 3 E 2 2 , 4 , 5

Minsup = 50o/o

Examples of Sequential Patterns:

<{1,2}> s=607o <{2,3}> s=607o <{2,4\> s=807o <{3}{5}> s=807″ <{1}{2}> s=807″ <{2){2}> s=607″ <{1i{2,3}> s=607o <{2) {2,3)> s=607o <{1,2} {2,3}> s=607o

Figure 7.5. Sequential pafterns derived from a data set that contains five data sequences.

threshold m’insup, then s is declared to be a sequential pattern (or frequent

sequence).

Order your essay today and save 10% with the discount code ESSAYHELP