StudyDaddy Article Writing

Waiting for answer This question has not been answered yet. You can hire a professional tutor to get the answer.

QUESTION

Jun 05, 2020

1. Consider the data set shown in screenshot 1. (a) Compute the support for item sets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a market basket. (b) Use the results in part (a) to

1. Consider the data set shown in screenshot 1.

(a) Compute the support for item sets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a market basket.

(b) Use the results in part (a) to compute the confidence for the association rules {b, d} −→ {e} and {e} −→ {b, d}. Is confidence a symmetric measure?

2. Consider the transactions shown in Screenshot 2, with an item taxonomy given in screenshot 3

(a) What are the main challenges of mining association rules with item taxonomy?

(b) Consider the approach where each transaction t is replaced by an extended transaction t_ that contains all the items in t as well as their respective ancestors. For example, the transaction t = { Chips, Cookies} will be replaced by t_ = {Chips, Cookies, Snack Food, Food}. Use this approach to derive all frequent item sets (up to size 4) with support ≥ 70%.

(c) Consider an alternative approach where the frequent item sets are generated one level at a time. Initially, all the frequent item sets involving items at the highest level of the hierarchy are generated. Next, we use the frequent item sets discovered at the higher level of the hierarchy to generate candidate item sets involving items at the lower levels of the hierarchy. For example, we generate the candidate item set {Chips, Diet Soda} only if {Snack Food, Soda} is frequent. Use this approach to derive all frequent item sets (up to size 4) with support ≥ 70%.

3. Consider a data set consisting of 220 data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 216 prototype vectors are used. How many bytes of storage does that data set take before and aftercompression and what is the compression ratio?