I need to answers for the following uploaded documnets.



Laboratory II:


To download additional .arff data sets go to:

weka data folder for

BreastTumor.arff

http://www.hakank.org/weka/

zoo.arff, wine.arff, bodyfat.arff, sleep.arff, pollution.arff


To download additional .arff data sets go to:

http://www.hakank.org/weka/

zoo.arff, wine.arff, soybean.arff, zoo2_x.arff,

sunburn.arff, disease.arff


  1. Produce a hierarchical clustering (COBWEB) model for iris data. How many clusters did it produce? Why? Does it make sense? What did you expect?

Change the acuity and cutoff parameters in order to produce a model similar to the one obtained in the book. Use the classes to cluster evaluation – what does that tell you?



  1. Use the following learning schemes to analyze the wine data (in wine.arff).

C4.5

- weka.classifiers.j48.J48

Decision List

- weka. classifiers.PART


  1. What is the most important descriptor (attribute) in wine.arff?

  2. How well were these two schemas able to learn the patterns in the dataset? How would you quantify your answer?

  3. Compare the training set and 10-fold cross-validations scores of the two schemas.

  4. Would you trust these two models? Did they really learn what is important for proper classification of wine?

  5. Which one would you trust more, even if just very slightly?


  1. Perform the same analysis of sunburn.arff as in 2. Instead of 10-fold cross-validations use 5-fold.

A)-E) Same as in 2.

F) Why could not we use 10-fold evaluation in this example?

  1. Choose one of the following three files: soybean.arff, zoo.arff or zoo2_x.arff and use any two schemas of your choice to build and compare the models.

  2. Produce a model for both Weather and Weather.nominal data sets. Which method(s) did you use? What did the tree(s) look like?