Files Attached

Individual Predictive Model – BUSI 650

Dr. Said Baadel

Student Name: Student ID:

Answer all Questions on the spaces provided. This test should be individual work. Once completed, save the file as a PDF and submit.

** I agree that the work in this assignment is my own work. I acknowledge that I am expected to exercise the utmost academic integrity in all work submitted for this course. I also acknowledge that I have read the FAQ posted on Moodle and understand the consequences of Plagiarism. **

One of the main concerns of the United Nations’ Sustainability Development Goals is maternal mortality around the world and especially in the less developed countries. A study in 20201 was done to analyze the risk factors of women during pregnancy. The following attached data is presented to you for analysis. Upon graduating from UCW, you have been tasked to use Machine Learning (ML) to predict whether females with certain health attributes pose a high risk in maternal mortality or not.

Data Set Information:

The dataset provided contain 1020 instances of pregnant women, with 7 different attributes including their risk level (high-risk 276, mid-risk 336, and low-risk 408). The 7 attributes recorded are:

Age (in years),
Systolic Blood Pressure (maximum pressure the heart exerts while beating) in mmHg as SystolicBP,
Diastolic BP (amount of pressure in the arteries between beats) in mmHg as DiastolicBP,
Blood Sugar (Blood glucose levels is in terms of a molar concentration) in mmol/L as BS,
Body Temperature as BodyTemp,
Heart Rate (A normal resting heart rate) in beats per minute as HeartRate, and
RiskLevel (whether high/medium/low risk of mortality).

1. Draw a simple confusion matrix for the pregnant women mortality test (2 Marks).

2. Draw 1 table highlighting the performance of the following classifiers (RIPPER, PART, Decision Table, Random Forest, J48, Random Tree, Artificial Neural Network, Simple Logistics, and Naïve Bayes).

In this table, highlight and group the different classifier types (i.e., bayes, functions, trees, and rule based). Show the following performance measures for your evaluations: Accuracy, Sensitivity/Recall, Specificity, Precision, F-Measure, and ROC Area. (4 Marks)

3. Analyze the results in question 2 above. Explain in detail the performance of the above classifiers by comparing the classifier types (3 Marks). Select 1 classifier that is better suited for the dataset, that you wish to recommend, based on what measure(s) and why. (2 Marks)

4. Run a cluster analysis algorithm (simple k-means) on the dataset. Did the algorithm do a better job in clustering the dataset given that we know the predicted attributes (i.e., high/med/low risk)? (2 Marks)
Explain your answer based on the results. (2 Marks)

1Ahmed M., Kashem M.A., Rahman M., Khatun S. (2020) Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT).