You are considering a possible model tree for a dataset with 30,000 records. The overall standard deviation of the goal is 239.

You are considering a possible model tree for a dataset with 30,000 records. The overall standard deviation of the goal is 239. There are three features F1, F2 and F3, which are possible candidates for the root node. Of them, F1 is discrete with three possible values {H,M,L}, and the other two are numeric.

Based on this information, please answer the following questions and explain using concepts

F1 splits these records equally into three branches (i.e, 10,000 records along each branch/partition), with the standard deviations being 243, 229, and 260, respectively for H, M, and L partitions. What would be the SDR if F1 is chosen as the root node of the model tree?

Is there any point in considering F1 any further?

Is it possible for F1 to be a node at a subsequent level of the model tree?

The other two features being numeric makes use of only binary split. The "best" split for F2 has two equal branches with the same standard deviation of 220. What is the SDR for this feature?

F3, on the other hand, splits the record into two unequal branches: the first branch gets one-third of the records with a standard deviation of 270. If the standard deviation of the second branch is 165, what is the SDR for this feature?

Which feature would you choose as the root node?