Need the answers for the following uploaded documents

Need the answers for the following uploaded documents 1


Laboratory I:


To download additional .arff data sets go to:

http://www.hakank.org/weka/

or search the Internet for .arff files required


  • What's the difference between a "training set" and a "test set"?

  • Why might a pruned decision tree that doesn't fit the data so well be better than an un-pruned one?

  • What's the first thing that 1R does when making a rule based on a numeric attribute?

  • How does 1R avoid overfitting when making a rule based on an enumerated and/or numeric attribute?

  • What is the difference between Attribute, Instance and Training set?

  • What is the difference between ID3 and C4.5?

  1. Use the following learning schemes to analyze the iris data (in iris.arff):

OneR

- weka.classifiers.OneR

Decision table

- weka.classifiers.DecisionTable -R

C4.5

- weka.classifiers.j48.J48

  • Do the decisions made by the classifiers make sense to you? Why?


  • What can you say about the accuracy of these classifiers? When classifying iris that has not been used for training?


  • How did each one of the methods perform?


  1. Use the following learning schemes to analyze the bolts data (bolts.arff without the TIME attribute):

Decision Tree

- weka.classifiers.j48.J48

Decision table

- weka.classifiers.DecisionTable -R

Linear regression

- weka.classifiers.LinearRegression

M5'

- weka.classifiers.M5'

  • The dataset describes the time needed by a machine to produce and count 20 bolts. (More details can be found in the file containing the dataset.)

  • Analyze the data. What adjustments have the greatest effect on the time to count 20 bolts?

  • According to each classifier, how would you adjust the machine to get the shortest time to count 20 bolts?


  1. Produce a model for both Weather and Weather.nominal data sets. Which method(s) did you use? What did the tree(s) look like?