Answered You can hire a professional tutor to get the answer.
How to answer this question. Answer in essay format, your answer should include: 1) How you would answer the question.
How to answer this question.
Answer in essay format, your answer should include:
1) How you would answer the question. This should consider the following, but your answer should
use sentences, not bullet points:
a) What data are you using (files and fields or calculated fields)
b) What records are you including or excluding from your calculation? This should be data driven,
so what profiling would you do to decide what records to use?
c) What transformations would you need to do?
d) What are the DataFrames you are going to create to answer the question (what they
contain, not their names). What joins are required, what aggregations, etc.
If it's easier to describe any of the above using Spark or SQL terminology, that's OK, but do not
write code. You need to be able to describe to a manager what you are doing and why.
e) Say how you would show your result. Draw a graph to illustrate a possible answer. This should be
neatly drawn and described - what is it telling the reader?
f) Identify any issues specific to your question/answer and possible solutions to those issues (that
you could implement). Discuss the limitations of your analysis. For this part, do NOT list issues
such as: some reviews could be fake, we don't know why users write reviews, businesses could be
paying for reviews, people may not write what they think, etc. Those are generic issues about
Yelp's business model and you could not address them.
Question One:
There is research to suggest that reviewers (not just on Yelp) are influenced by the reviews posted by
other reviewers. This would suggest that possibly as a business has more reviews, later reviewers would
tend to review closer to the average for that business (e.g., a Yelp reviewer was going to rate a business as
a "2", but saw the average for that business after 50 reviews is 4.5 and decides to rate them closer to their
average). In this question you will focus on restaurants. As restaurants have more reviews, does the
range of those reviews change? For example, are the first 10 reviews at a restaurant widely dispersed
across the ratings 1‐5, but the next ten get narrower (e.g., 2‐3 stars), and the next 10 are narrower still.
Your answer should consider as many restaurants as you can (if you are eliminating some restaurants, say
why). However, it's the range at a business that matters (e.g., a bad restaurant may have a range of 1‐2
stars and a great restaurant may have a range of 4‐5 stars. Both would have a 2‐star range).
Question Two:
Any review can be voted on by other users as being funny, useful, or cool. Assume your teammate has a
hypothesis that people find negative reviews funnier, so if a Yelper was going to review a business where
they had a bad experience, they may try to be funnier than the prior negative reviews for that business. In
this question you are going to focus on businesses in the shopping category, and in particular, you are
exploring whether businesses that are rated lower have a higher share of reviews that are funny than
businesses that are highly rated.
Your answer should consider as many businesses in the shopping category as you can (if you are
eliminating some businesses, say why). Keep in mind that it's the number of funny reviews or votes
compared to total reviews for a business that matters. If one business had 10 reviews and another
business had 100 reviews, there would likely be more funny reviews in total for the business with 100
reviews.
Question Three:
Some users have lots of fans (followers), and others do not. Possibly this could be related to the number
of "useful" votes a Yelper's reviews have earned. Any Yelper can vote on whether a review by another
Yelper was useful. In this question you are looking at whether the number of fans a user has is related
more to (a) the number of useful votes on their most useful review (the one with the most useful votes),
or (b) the average number of useful votes across their reviews.
Your answer needs to keep in mind what data we have. We have all of the reviews for 10 metro areas and
all of the users who wrote those reviews. For many users we do NOT have every review that user wrote.
- Attachment 1
- Attachment 2
- Attachment 3
- Attachment 4
- Attachment 5