+1 (208) 254-6996 [email protected]
Select Page

A rule-based classifier is a technique for classifying records using a collection of “if . . .then. . .” rules. Table 5.1 shows an example of a model generated by a rule-based classifier for the vertebrate classification problem. The rules for the model are represented in a disjunctive normal form, .R : (r1Yr2V…rn), where R is known as the rule set and r;’s are the classification rules or disjuncts.

Table 5.1. Example of a rule set for the vertebrate classification problem.

Don't use plagiarized sources. Get Your Custom Essay on
A rule-based classifier is a technique for classifying records using a collection of “if . . .then
Just from \$13/Page

rri (Gives Birth : no) n (Aerial Creature : yes) ——+ Birds r2i (Gives Birth : no) n (Aquatic Creature : yes) ——+ Fishes 13: (Gives Birth : yes) n (Body Temperature : warm-blooded) ——+ Mammals 14: (Gives Birth : no) n (Aerial Creature : no) —– Reptiles rbi (Aquatic Creature : semi) ——+ Amphibians

208 Chapter 5 Classification: Alternative Techniques

Each classification rule can be expressed in the following way:

ri: (Condi,ti,ont) – gr. (5 .1 )

The left-hand side of the rule is called the rule antecedent or precondition.

It contains a conjunction of attribute tests:

Condi,tion,i : (At op ur) A (Az op u2) A . . .(An oP u*), (5 .2)

where (Ai,ui) is an attribute-value pair and op is a logical operator chosen from the set {:, 1,1,},<,>}. Each attribute test (Ai op u7) is known as a conjunct. The right-hand side of the rule is called the rule consequent, which contains the predicted class g;.

A rule r covers a record r if the precondition of r matches the attributes of r. r is also said to be fired or triggered whenever it covers a given record.

For an illustration, consider the rule 11 given in Table 5.1 and the following attributes for two vertebrates: hawk and grizzly bear.

Name Body Temperature

SK1n

Cover Glves

Birth Aquatic Creature

Ae r1aI Creature

Has Legs

Hiber- nates

hawk qrizzly bear

warm-blooded warm-blooded

feather fur

no yes

no no

yes no

yes yes

no ves

11 covers the first vertebrate because its precondition is satisfied by the hawk’s attributes. The rule does not cover the second vertebrate because grizzly bears give birth to their young and cannot fly, thus violating the precondition of 11.

The quality of a classification rule can be evaluated using measures such as coverage and accuracy. Given a data set D and a classification rule r : A -+ At the coverage of the rule is defined as the fraction of records in D that trigger the rule r. On the other hand, its accuracy or confidence factor is defined as the fraction of records triggered by r whose class labels are equal to gr. The formal definitions of these measures are

Coverage(r ) !!l tD l

. lAna lAccuracy(r) : T

(5 .3)

where lAl is the number of records that satisfy the rule antecedent, lA n gl is the number of records that satisfy both the antecedent and consequent, and lDl is the total number of records.

5.1 Rule-Based Classifier 2Og

Table 5,2. The vertebrate data set.

Example 5.1. Consider the data set shown in Table 5.2. The rule

(G:.ves Birth : yes) A (Body Temperature : warn-blooded) ——+ Mammals

has a coverage of 33% since five of the fifteen records support the rule an- tecedent. The rule accuracy is 100% because all five vertebrates covered by the rule are mammals. r

5.1.1 How a Rule-Based Classifier Works

A rule-based classifier classifies a test record based on the rule triggered by the record. To illustrate how a rule-based classifier works, consider the rule set shown in Table 5.1 and the followins vertebrates:

o The first vertebrate, which is a lemur, is warm-blooded and gives birth to its young. It triggers the rule 13, and thus, is classified as a mammal.

l\ame lJody

Temperature

Skin Cover

Gives Birth

Aquatic Creature

Aerial Creature

-nas Legs

lllDer-

nates

C’lass Label

numan python salmon whale froo

komodo dragon bat

guppv alligator penguln porcuplne eel salamander

prgeon cat

warm-blooded cold-blooded cold-blooded

warm-blooded cold-blooded cold-blooded

warm-blooded warm-blooded warm-blooded cold-blooded cold-blooded

warm-blooded warm-blooded cold-blooded cold-blooded

scales

hair feathers

fur scales scales

feathers quills scales none

nalr scales scales hair none

yes no no yes no no

yes no yes yes no no yes no no

no no yes yes

semr no

no no no yes

seml semr no yes

semr

no no no no no no

yes yes no no no no no no no

yes no no no yes yes

yes yes yes no yes yes yes no yes

no yes no no yes no

yes no no no no no yes

no yes

IVlammals Reptiles Fishes

Mammals Amphibians

Reptiles

Mammals Birds

Mammals Fishes

Reptiles Birds

Mammals Fishes

Amphibians

Name lJody

Temperature

n Cover Skt Glves

Birth Aquatic Creature

Aerial Creature

Has Legs

Hiber- nates

lemur turtle dogfish shark

warm-blooded cold-blooded cold-blooded

fur scales scales

yes no yes

no semr yes

no no no

yes yes no

yes no no

zLO Chapter 5 Classification: Alternative Techniques

o The second vertebrate, which is a turtle, triggers the rules 14 and rs. Since the classes predicted by the rules are contradictory (reptiles versus amphibians), their conflicting classes must be resolved.

o None of the rules are applicable to a dogfish shark. In this case, we need to ensure that the classifier can still make a reliable prediction even though a test record is not covered by any rule.

The previous example illustrates two important properties of the rule set gen-

erated by a rule-based classifier.

Mutually Exclusive Rules The rules in a rule set .R are mutually exclusive if no two rules in .R are triggered by the same record. This property ensures that every record is covered by at most one rule in R. An example of a mutually exclusive rule set is shown in Table 5.3.

Exhaustive Rules A rule set -R has exhaustive coverage if there is a rule for each combination of attribute values. This property ensures that every record is covered by at least one rule in -R. Assuming that Body Tenperature and Gives Birth are binarv variables. the rule set shown in Table 5.3 has exhaustive coverage.

Table 5.3. Example of a mutually exclusive and exhaustive rule set.

11: (Body Temperature : cold-blooded) —- Non-mammals 12: (Body Temperature : warm-blooded) A (Gives Birth : yes) .* Mammals 13: (Body Temperature : warm-blooded) A (Gives Birth : no) –‘ Non-mammals

Together, these properties ensure that every record is covered by exactly one rule. Unfortunately, many rule-based classifiers, including the one shown in Table 5.1, do not have such properties. If the rule set is not exhaustive, then a default rt7e, r4: 0 .—-+ !4, must be added to cover the remaining cases. A default rule has an empty antecedent and is triggered when all other rules have failed. gr4 is known as the default class and is typically assigned to the majority class of training records not covered by the existing rules.

If the rule set is not mutually exclusive, then a record can be covered by several rules, some of which may predict conflicting classes. There are two ways to overcome this problem.

5 .1 Rule-BasedClassifier 2LI

Ordered Rules In this approach, the rules in a rule set are ordered in decreasing order of their priority, which can be defined in many ways (e.g., based on accuracy, coverage, total description length, or the order in which the rules are generated). An ordered rule set is also known as a decision list. When a test record is presented, it is classified by the highest-ranked rule that covers the record. This avoids the problem of having conflicting classes predicted by multiple classification rules.

IJnordered Rules This approach allows a test record to trigger multiple classification rules and considers the consequent of each rule as a vote for a particular class. The votes are then tallied to determine the class label of the test record. The record is usually assigned to the class that receives the highest number of votes. In some cases, the vote may be weighted by the rule’s accuracy. Using unordered rules to build a rule-based classifier has both advantages and disadvantages. Unordered rules are less susceptible to errors caused by the wrong rule being selected to classify a test record (unlike classifiers based on ordered rules, which are sensitive to the choice of rule- ordering criteria). Model building is also less expensive because the rules do not have to be kept in sorted order. Nevertheless, classifying a test record can be quite an expensive task because the attributes of the test record must be compared against the precondition of every rule in the rule set.

In the remainder of this section, we will focus on rule-based classifiers that use ordered rules.

5.L.2 Rule-Ordering Schemes

Order your essay today and save 10% with the discount code ESSAYHELP