Need the answers for the following uploaded documents

Need the answers for the following uploaded documents 1

Laboratory III:


To download additional .arff data sets go to:

http://www.hakank.org/weka/

zoo.arff, wine.arff, soybean.arff, zoo2_x.arff,

sunburn.arff, disease.arff



  1. Use the following learning schemes to compare the training set and 10-fold stratified cross-validation scores of the disease data (in disease.arff):

Decision table

- weka.classifiers.DecisionTable -R

C4.5

- weka.classifiers.j48.J48

Id3

- weka.clusterers.Id3

  1. What does the training set evaluation score tell you?

  2. What does the cross-validation score evaluate?

C) Which one of these models would you say is the best? Why?


  1. Use the following learning schemes to analyze the wine data (in wine.arff).

C4.5

- weka.classifiers.j48.J48

Decision List

- weka. classifiers.PART


  1. What is the most important descriptor (attribute) in wine.arff?

  2. How well were these two schemas able to learn the patterns in the dataset? How would you quantify your answer?

  3. Compare the training set and 10-fold cross-validations scores of the two schemas.

  4. Would you trust these two models? Did they really learn what is important for proper classification of wine?

  5. Which one would you trust more, even if just very slightly?


  1. Perform the same analysis of sunburn.arff as in 2. Instead of 10-fold cross-validations use 5-fold.

A)-E) Same as in 2.

F) Why could not we use 10-fold evaluation in this example?

  1. Choose one of the following three files: soybean.arff, zoo.arff or zoo2_x.arff and use any two schemas of your choice to build and compare the models.