Using WEKA https://waikato.github.io/weka-wiki/downloading_weka/

Dr. Said Baadel Phishing is an attempt to gain sensitive personal and financial information (such as usernames and passwords, account details, and social security numbers) with malicious intent via online deception . A resea r ch by Abdelhamid et al. (2014) experimented on over 1350 websites collected from different phishing data archives (752 phishing sites and 601 legitimate ones). The authors identified 16 common features that can help assess and predict an y website type using common Machine Learning Classification algorithms. The table below shows the features selected for the dataset provided. The authors categorized the values from the collected feature as Legitimate (1), Suspicious (0) and Phishy ( - 1 ). Simple rule based algorithms can be used to predict whether a website is legitimate or phishy. According to the authors, the following 3 demonstrate such rules. 1. Phishers hide the suspicious part of the URL to redirect information’s submitted by users or redirect the uploaded page to a suspicious domain. Some researchers suggested when the URL length is greater than 54 characters the URL can be considered phishy. Rule : If URL length < 54 - > Legit URL length P 54 and 6 75 - > Suspicious else - > Phishy 2. The ‘‘@’’ symbol leads the browser to ignore everything prior it and redirects the user to the link typed after it. Rule : If URL has ‘@’ - > Phishy else Legit 3. Another technique used by phishers to scam users is by adding a subdomain to the URL so users may believe they are dealing with an authentic website. Rule : If dots in domain < 3 - > Legit else if . = 3 - > Suspicious else - > Phishy Using WEKA software, answer the following questions based on the Phishing dataset provided. a) Draw a simple co nfusion matrix (general one, not from WEKA) of the possible data scenarios for this Phishing dataset . b) Draw a table that will outline the Accuracy, Precision, Recall, F - Measure, ROC Area of the following Rules based algorithms; RIPPER (JRip), PART, and Dec ision Table. c) Use Decision Trees algorithms (Random Forest, Random Tree) and Artificial Neural Network (Multilayer Perceptron) to compare with the results in part b) above. Do you have better prediction accuracy with these in this dataset ? d) What is your co nclusion in these experiments pertaining to ML algorithms used ? Save your work as PDF and submit on Moodle. Instructions: Download WEKA using the link provided. After successful installation, run the app. O n the Applications column below, click on “Explorer”. In the following dialog box, click “Open file…” and select the Phishing dataset given. WEKA pre - builds the model from the dataset. Select the tab “Classify” and pick the Machine Learning Classification algorithm to answer the questions above. Reference: Abdelhamid et al . (2014) Phishing Detection based Associative Classification Data Mining. Expert Systems With Applications (ESWA), Vol. 41 (2014) , pp. 5948 - 5959.