Homework 6 Answer the following questions: (10 point each) 1- Consider the traffic accident data set shown in Table below.Traffic accident data set.Weather ConditionDriver’sConditionTraffic Vi
Homework 6
Answer the following questions: (10 point each)
Consider the traffic accident data set shown in Table below.
Traffic accident data set.
Weather Condition | Driver’s Condition | Traffic Violation | Seat Belt | Crash Severity |
Good Bad Good Bad Bad Bad Bad Good Good Bad Good Bad | Alcohol-impaired Sober Sober Alcohol-impaired Alcohol-impaired Alcohol-impaired Alcohol-impaired Sober Alcohol-impaired Sober Alcohol-impaired Sober | Exceed speed limit None Disobey stop sign Exceed speed limit Disobey traffic signal Disobey stop sign None Disobey traffic signal None None Exceed speed limit Disobey stop sign | No Yes No Yes No Yes Yes Yes No Yes Yes Yes | Major Minor Minor Major Major Minor Major Minor Minor Major Major Minor |
Show a binarized version of the data set.
Answer:
What is the maximum width of each transaction in the binarized data?
Answer:
Assuming that support threshold is 30%, how many candidate and frequent item sets will be generated?
Consider the data set shown in Table below. The first attribute is continuous, while the remaining two attributes are asymmetric binary. A rule is considered to be strong if its support exceeds 15% and its confidence exceeds 60%. The data given in Table below supports the following two strong rules:
(i) {(1 ≤ A ≤ 2), B = 1} → {C = 1}
(ii) {(5 ≤ A ≤ 8), B = 1} → {C = 1}
10 11 12 |
Compute the support and confidence for both rules.
Answer:
S ({(1 ≤ A ≤ 2), B = 1} → {C = 1}) =
C ({(1 ≤ A ≤ 2), B = 1} → {C = 0}) =
S ({(5 ≤ A ≤ 9), B = 1} → {C = 1}) =
C ({(5 ≤ A ≤ 9), B = 1} → {C = 1}) =
3. Consider the data set shown in Table below. Suppose we are interested in extracting the following association rule:
{α1 ≤ Age ≤ α2, Play Piano = Yes} → {Enjoy Classical Music = Yes}
Age | Play Piano | Enjoy Classical Music |
11 14 17 19 21 25 29 33 39 41 47 | Yes Yes Yes Yes Yes No No Yes Yes Yes No No | Yes Yes No No Yes No No No No Yes Yes Yes |
To handle the continuous attribute, we apply the equal-frequency approach with 3, 4, and 6 intervals. Categorical attributes are handled by introducing as many new asymmetric binary attributes as the number of categorical values. Assume that the support threshold is 10% and the confidence threshold is 70%.
Suppose we discretize the Age attribute into 3 equal-frequency intervals. Find a pair of values for α1 and α2 that satisfy the minimum support and minimum confidence requirements.
Answer:
Repeat part (a) by discretizing the Age attribute into 4 equal-frequency intervals. Compare the extracted rules against the ones you had obtained in part (a).
Answer:
Repeat part (a) by discretizing the Age attribute into 6 equal-frequency intervals. Compare the extracted rules against the ones you had obtained in part (a).
Answer:
4. For each of the sequence w = <e1, . . . , elast> below, determine whether they are subsequences of the following data sequence:
<{A, B}{C, D}{A, B}{C, D}{A, B}{C, D}>
subjected to the following timing constraints:
mingap = 0 (interval between last event in ei and first event in ei+1 is > 0)
maxgap = 2 (interval between first event in ei and last event in ei+1 is ≤ 2)
maxspan = 6 (interval between first event in e1 and last event in elast is ≤ 6)
ws = 1 (time between first and last events in ei is ≤ 1)
w = < {A}{B}{C}{D}> Answer:
w = < {A} {B, C, D} {A}> Answer:
w = < {A} {B, C, D} {A}> Answer:
w = < {B, C} {A, D} {B, C}> Answer:
w = < {A, B, C, D} {A, B, C, D}> Answer:
5. Draw all candidate subgraphs obtained from joining the pair of graphs shown in Figure below Assume the edge-growing method is used to expand the subgraphs.
Answer:
5