Answered You can buy a ready-made answer or pick a professional tutor to order an original one.

QUESTION

Please answer this homework related to Data Science and Big Data Analysis in APA format with References and Citations Q1 Perform: a.Text extraction & creating a corpusb.Text Pre-processingc.Create

Please answer this homework related to Data Science and Big Data Analysis in APA format with References and Citations

Q1 Perform:

a.Text extraction & creating a corpusb.Text Pre-processingc.Create the DTM & TDM from the corpusd.Exploratory text analysise.Feature extraction by removing sparsity

f.Build the Classification Models and compare Logistic Regression to Random Forest regression https://medium.com/analytics-vidhya/customer-review-analytics-using-text-mining-cd1e17d6ee4e

Q2 – Analyze the customer reviews in the file Restaurant_Reviews.tsv

a.Explain each step for the following text clean-up commands

corpus = VCorpus(VectorSource(dataset_original$Review))

corpus = tm_map(corpus, content_transformer(tolower))

corpus = tm_map(corpus, removeNumbers)

b. What is the classification question?

c. Use CM for Random Forest classifier to calculate:

TP = # True Positives,

TN = # True Negatives,

FP = # False Positives,

FN = # False Negatives):

Accuracy = (TP + TN) / (TP + TN + FP + FN)

d. Apply the logistic regression classifier to the problem – recalculate “Q2c” i.e. TP, TN, FP, FN, Accuracy

e.Apply SVM classifier to the same question – recalculate “Q2c”

corpus = tm_map(corpus, removePunctuation)

corpus = tm_map(corpus, removeWords, stopwords())

corpus = tm_map(corpus, stemDocument)

corpus = tm_map(corpus, stripWhitespace)

Uncomment in order to see the impact:

#as.character(corpus[[841]])

#as.character(corpus[[1]])

Q3: Study the quanteda toolkit for R

Q3a: Compare quanteda to: alternative R packages for quantitative text analysis (tm, tidytext, corpus, and koRpus)

Q3b: Install(quanteda) and then library(quanteda) – and explain different features of the quanteda package for text analysis

Q4 Spam Text Message Classification - Use the quanteda package to perform “spam” classification on the text message file in Q4

The file name: Q4.spam-text-message-classification.zip

a.Create the ”word” cloud for spam and ham messagesb.Apply a Naïve Bayes Classifier and compute TP, TN, FP, FN, Accuracyc.Use a Logistic Regression Classifier and compute TP, TN, FP, FN, Accuracyd.Use a Random Forest Classifier and compute TP, TN, FP, FN, Accuracy

Q5. The State of the Union is an annual address by the President of the United States before a joint session of congress. In it, the President reviews the previous year and lays out his legislative agenda for the coming year

This dataset contains the full text of the State of the Union address from 1989 (Regan) to 2017 (Trump).

a.Topic modelling: Which topics have become more popular over time? Which have become less popular?

b.Sentiment analysis: Are there differences in tone between different Presidents? Presidents from different parties?

Show more
  • @
  • 734 orders completed
ANSWER

Tutor has posted answer for $15.00. See answer's preview

$15.00

****** rate my ******

Click here to download attached files: Data Science and Big Data Analysis.docx
or Buy custom answer
LEARN MORE EFFECTIVELY AND GET BETTER GRADES!
Ask a Question