(stattistics and probability )use excel to complete the question
The ATP Tour is the top men’s professional tennis circuit. The data set `tennis.csv’ contains match statistics on ATP Tour matches played in 2020 (before tennis tournaments were suspended due to COVID-19).
The definitions of the variables in the data set are as follows:
tourney_id- a unique identifier for each tournament.
tourney_name – tournament name
surface – type of court surface (hard, clay, grass)
tourney_date – date the tournament began; eight digits, YYYYMMD
winner_name – name of the winning player
winner_hand – dominant hand of winner (left or right)
winner_ht – height in centimetres where available
winner_ioc – country represented by the winner
winner_age – age of winner, in years, as of the tourney_date
loser_name – name of losing player
loser_hand – dominant hand of loser (left or right)
loser_ht – height in centimetres where available
loser_ioc – country represented by losing player
loser_age - age of winner, in years, as of the tourney_date
score – match score
round – stage of tournament match was played (RR – round robin; R32 – round of 32 (meaning 32 players left in the draw); R16 – round of 16; QF-quarter final; SF-semifinal; F – final)
minutes- match length (in minutes), where available
w_ace - winner's number of aces
w_df - winner's number of doubles faults
w_svpt – winner’s number of serve points
w_1stIn - winner's number of first serves made
w_1stWon - winner's number of first-serve points won
w_bpSaved - winner's number of break points saved
w_bpFaced - winner's number of break points faced
l_ace – loser’s number of aces
l_df – loser’s number of double faults
l_svpt – loser’s number of serve points
l_1stIn - loser's number of first serves made
l_1stWon - loser's number of first-serve points won
l_bpSaved - loser's number of break points saved
l_bpFaced - loser's number of break points faced
winner_rank – ATP Tour ranking of the winner as of the tourney_date
loser_rank – ATP Tour ranking of the loser as of the tourney_date
All matches in the data set were decided over the best of three sets.
Use the data set to answer the following questions:
Question 1 [15 marks]
Does age matter in determining who wins a match? For example, do older players have the advantage because they have more years of experience for example? Or do younger players have the advantage because they are fitter? Perform a hypothesis test to answer this question.
Question 2 [15 marks]
Are later stage matches (that is, round = Quarter Final (QF), SemiFinal (SF) or Final (F)) more likely to last for longer than 2 hours compared to earlier stage matches (that is, round = RR, R32 or R16)? Tighter battles might be fought at the later rounds due to the closer rankings of the two players or the adrenalin of a final round match for example. Perform a hypothesis test to answer this question.
For each of Question 1 and Question 2, please provide:
[2 marks] Please briefly describe the data manipulation you had to perform to carry out the test (e.g., ignored any blank cells; created a new variable defined as …..)
[1 mark] The null (H0) and alternative (HA) hypotheses. In doing so, please define the population parameter you are testing. (e.g., let be the population mean age of ATP tennis players; let be the population parameter for the proportion of first serves in).
[1 mark] The type of test you are going to carry out (e.g., one sample test for the population mean; Z test for the difference between two proportions).
[2 marks] The summary statistics you are going to need to carry out the test (e.g., sample means; sample proportions; sample standard deviations).
[2 marks] The value of the test statistic and the formula you used to calculate the test statistic.
[1 mark] The critical value(s).
[1 mark] The p-value of the test and the mathematical expression you used to calculate the p-value.
[1 mark] Your decision whether to reject H0 or not and why.
[1 mark] State the conclusion of your test in the context of the question (so in relation to the applied research question).
[3 marks] Discuss any limitations of your analysis. For example, the assumptions required by your hypothesis test were not met by the data; the method to handle missing data; bias due to the sampling design. Be specific in your discussion of limitations to the `tennis.xls’ data set and also discuss how the analysis could be improved to overcome these limitations.
STAT1008 – Quantitative Research Methods Final Examination – Part 2
Semester 1 2020
Page 3 of 3