+1 (208) 254-6996 [email protected]
  

Question 1

How is a data mining different from a database?

Don't use plagiarized sources. Get Your Custom Essay on
Data Mining
Just from $13/Page
Order Essay

2. Present an example where data mining is crucial to the success of a business.

3. Explain the difference and similarity between discrimination and classification, between characterization and clustering, and between classification and regression

4. Describe three challenges to data mining a regarding data mining methodology and user interaction issue.

5. Outline the major research challenge of data mining in one specific application domain, such as stream/sensor data analysis, spatiotemporal data analysis, or bioinformatics.

6. Briefly outline how to compute the dissimilarity between objects described by the following: (a) Nominal attributes (b) Binary attributes (c) Numeric attributes.

7. Briefly outline how to compute the visualization techniques described by the following: (a) Pixel-oriented (b) Geometric-based (c) Parallel coordinates

8. Define what is the data preprocessing and explain four steps.

9. In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem.

10. Briefly outline the normalization methods and show an example

————————————-*—————————————————————–

Question-2

1. Consider the data set shown in Table (Table 6.15, 515 page), with an item taxonomy given in Figure 6.25. Example of market basket transactions

– https://www.geeksforgeeks.org/apriori-algorithm/ – https://www.softwaretestinghelp.com/apriori-algorithm/ – https://www.geeksforgeeks.org/association-rule/

page66image344285280
page66image344292832
page67image363644240
page67image363642512
page67image363642800

Answer for the following questions.

(a) Consider the approach where each transaction t is replaced by an extended transaction t1 that contains all the items in t as well as their respective ancestors.

For example, the transaction t = { Chips, Cookies} will be replaced by t1 = {Chips, Cookies, Snack Food, Food}. Use this approach to derive all frequent itemsets (up to size 4) with support ≥ 70%.

(b) Consider an alternative approach where the frequent itemsets are generated one level at a time. Initially, all the frequent itemsets involving items at the highest level of the hierarchy are generated. Next, we use the frequent itemsets discovered at the higher level of the hierarchy to generate candidate itemsets involving items at the lower levels of the hierarchy.

For example, we generate the candidate itemset {Chips, Diet Soda} only if {Snack Food, Soda} is frequent. Use this approach to derive all frequent itemsets (up to size 4) with support ≥ 70%.

2. Consider the data set shown in Table. Example of market basket transactions

page69image361285680

Answer for the following questions.

(a) Compute the support for itemsets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a market basket.

(b) Use the results in part (a) to compute the confidence for the association rules {b, d} → {e} and {e} → {b, d}. Is

confidence a symmetric measure?

(c) Repeat part (a) by treating each customer ID as a market basket. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise). Use this result to compute the confidence for the association rules {b, d} → {e} and {e} → {b, d}.

(d) Use the result in part (c) to compute the confidence for the association rules {b, d} → {e} and {e} → {b, d}.

Question-3

Draw the full decision tree for the parity function of four Boolean attributes, A, B, C, and D.

page87image347953120

1. Consider the training examples show in the table for binary classification problem.

page88image374374336

1. 1)  Compute the Gini index for the overall collection of training examples.

2. 2)  Compute the Gini index for the Customer ID attribute.

3. 3)  Compute the Gini index for the Gender attribute.

4. 4)  Compute the Gini index for the Car Type attribute using

multiway split.

5. 5)  Compute the Gini index for the Shirt Size attribute using

multiway split.

6. 6)  Which attribute is better, Gender, Car, Type, or Shirt

Size?

Order your essay today and save 10% with the discount code ESSAYHELP