Need discussion with one response for

Kirk (2016) tells us that all requirements and restrictions of a project must be identified. Select 1 key factor below and discuss why Kirk (2016) states it will impact your critical thinking and shape your ambitions:

People: stakeholders, audience.

Constraints: pressures, rules.

Consumption: frequency, setting.

Deliverables: quantity, format.

Resources: skills, technology

Please make sure you have an initial post (about 200 words) and a comment/post to one of your friends’ posts.

Answer the following questions: (10 point each)

1. The following table summarizes a data set with three attributes A, B. C and two class labels *, -. Build a two-level decision tree.

A | B | C | Number ofInstances | |

– | + | |||

T | T | T | 5 | 0 |

F | T | T | 0 | 10 |

T | F | T | 10 | 0 |

F | F | T | 0 | 5 |

T | T | F | 0 | 10 |

F | T | F | 25 | 0 |

T | F | F | 10 | 0 |

F | F | F | 0 | 25 |

a. According to the classification error rate, which attribute would be chosen as the first splitting attribute? For each attribute, show the contingency table and the gains in classification error rate.

b. Repeat for the two children of the root node.

c. How many instances are misclassified by the resulting decision tree?

d. Repeat parts (a), (b), and (c) using C as the splitting attribute.

e. Use the results in parts (c) and (d) to conclude about the greedy nature of the decision tree induction algorithm.

2. Classify the following attributes as binary, discrete, or continuous. Also classify them as qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have more than one interpretation, so briefly indicate your reasoning if you think there may be some ambiguity. Example: Age in years. Answer: Discrete, quantitative, ratio

a. Gender in terms of Mor F.

b. Temprature as measured by people’s judgments.

c. Height as measured by people’s height.

d. Body Mass Index (BMI) as an index of weight-for-height that is commonly used to classify underweight, overweight and obesity in adults.

e. States of matter are solid, liquid, and gas.

3. For the following vectors, x and y, calculate the indicated similarity or distance measures.

a. (a) x : (1,0,0,1), y : (2,1,1,2) cosine, correlation, Euclidean

b. (b) x : (1,1,0,0), y : (1,1,1,0) cosine, correlation, Euclidean, Jaccard

4. Construct a data cube from Fact Table. Is this a dense or sparse data cube? If it is sparse, identify the cells that are empty. The data cube is shown in Data Cube Table.

Fact Table | ||

Product ID | Location ID | Number Sold |

11223 | 13122 | 1065222 |

Data Cube Table | ||||

Product ID | Location ID | Number Sold | ||

1 | 2 | 3 | ||

123 | 1050 | 0222 | 600 | 16272 |

Total | 15 | 24 | 6 | 45 |

5. Consider the decision tree shown below:

a. Compute the generalization error rate of the tree using the optimistic approach.

b. Compute the generalization error rate of the tree using the pessimistic approach. (For simplicity, use the strategy of adding a factor of 0.5 to each leaf node.)

c. Compute the generalization error rate of the tree using the validation set shown above. This approach is known as reduced error pruning.

Training: | ||||

InstanceACB+-+-010101 | A | B | C | Class |

1 | 0 | 0 | 0 | + |

2 | 0 | 0 | 1 | + |

3 | 0 | 1 | 0 | + |

4 | 0 | 1 | 1 | – |

5 | 1 | 0 | 0 | + |

6 | 1 | 0 | 0 | + |

7 | 1 | 1 | 0 | – |

8 | 1 | 0 | 1 | + |

9 | 1 | 1 | 0 | + |

10 | 1 | 1 | 0 | + |

Validation: | ||||

Instance | A | B | C | Class |

11 | 0 | 0 | 0 | + |

12 | 0 | 1 | 1 | + |

13 | 1 | 1 | 0 | + |

14 | 1 | 0 | 1 | – |

15 | 1 | 0 | 0 | + |

2