Need the answers for the following uploaded documents
Laboratory I:
To download additional .arff data sets go to:
http://www.hakank.org/weka/
or search the Internet for .arff files required
What's the difference between a "training set" and a "test set"?
Why might a pruned decision tree that doesn't fit the data so well be better than an un-pruned one?
What's the first thing that 1R does when making a rule based on a numeric attribute?
How does 1R avoid overfitting when making a rule based on an enumerated and/or numeric attribute?
What is the difference between Attribute, Instance and Training set?
What is the difference between ID3 and C4.5?
Use the following learning schemes to analyze the iris data (in iris.arff):
OneR | - weka.classifiers.OneR |
Decision table | - weka.classifiers.DecisionTable -R |
C4.5 | - weka.classifiers.j48.J48 |
Do the decisions made by the classifiers make sense to you? Why?
What can you say about the accuracy of these classifiers? When classifying iris that has not been used for training?
How did each one of the methods perform?
Use the following learning schemes to analyze the bolts data (bolts.arff without the TIME attribute):
Decision Tree | - weka.classifiers.j48.J48 |
Decision table | - weka.classifiers.DecisionTable -R |
Linear regression | - weka.classifiers.LinearRegression |
M5' | - weka.classifiers.M5' |
The dataset describes the time needed by a machine to produce and count 20 bolts. (More details can be found in the file containing the dataset.)
Analyze the data. What adjustments have the greatest effect on the time to count 20 bolts?
According to each classifier, how would you adjust the machine to get the shortest time to count 20 bolts?
Produce a model for both Weather and Weather.nominal data sets. Which method(s) did you use? What did the tree(s) look like?