(stattistics and probability )use excel to complete the question

The ATP Tour is the top men’s professional tennis circuit. The data set `tennis.csv’ contains match statistics on ATP Tour matches played in 2020 (before tennis tournaments were suspended due to COVID-19).

The definitions of the variables in the data set are as follows:

  • tourney_id- a unique identifier for each tournament.

  • tourney_name – tournament name

  • surface – type of court surface (hard, clay, grass)

  • tourney_date – date the tournament began; eight digits, YYYYMMD


  • winner_name – name of the winning player

  • winner_hand – dominant hand of winner (left or right)

  • winner_ht – height in centimetres where available

  • winner_ioc – country represented by the winner

  • winner_age – age of winner, in years, as of the tourney_date

  • loser_name – name of losing player

  • loser_hand – dominant hand of loser (left or right)

  • loser_ht – height in centimetres where available

  • loser_ioc – country represented by losing player

  • loser_age - age of winner, in years, as of the tourney_date


  • score – match score

  • round – stage of tournament match was played (RR – round robin; R32 – round of 32 (meaning 32 players left in the draw); R16 – round of 16; QF-quarter final; SF-semifinal; F – final)

  • minutes- match length (in minutes), where available


  • w_ace - winner's number of aces

  • w_df - winner's number of doubles faults

  • w_svpt – winner’s number of serve points

  • w_1stIn - winner's number of first serves made

  • w_1stWon - winner's number of first-serve points won

  • w_bpSaved - winner's number of break points saved

  • w_bpFaced - winner's number of break points faced


  • l_ace – loser’s number of aces

  • l_df – loser’s number of double faults

  • l_svpt – loser’s number of serve points

  • l_1stIn - loser's number of first serves made

  • l_1stWon - loser's number of first-serve points won

  • l_bpSaved - loser's number of break points saved

  • l_bpFaced - loser's number of break points faced


  • winner_rank – ATP Tour ranking of the winner as of the tourney_date

  • loser_rank – ATP Tour ranking of the loser as of the tourney_date


All matches in the data set were decided over the best of three sets.

Use the data set to answer the following questions:

Question 1 [15 marks]


Does age matter in determining who wins a match? For example, do older players have the advantage because they have more years of experience for example? Or do younger players have the advantage because they are fitter? Perform a hypothesis test to answer this question.


Question 2 [15 marks]


Are later stage matches (that is, round = Quarter Final (QF), SemiFinal (SF) or Final (F)) more likely to last for longer than 2 hours compared to earlier stage matches (that is, round = RR, R32 or R16)? Tighter battles might be fought at the later rounds due to the closer rankings of the two players or the adrenalin of a final round match for example. Perform a hypothesis test to answer this question.



For each of Question 1 and Question 2, please provide:

  1. [2 marks] Please briefly describe the data manipulation you had to perform to carry out the test (e.g., ignored any blank cells; created a new variable defined as …..)

  2. [1 mark] The null (H0) and alternative (HA) hypotheses. In doing so, please define the population parameter you are testing. (e.g., let be the population mean age of ATP tennis players; let be the population parameter for the proportion of first serves in).

  3. [1 mark] The type of test you are going to carry out (e.g., one sample test for the population mean; Z test for the difference between two proportions).

  4. [2 marks] The summary statistics you are going to need to carry out the test (e.g., sample means; sample proportions; sample standard deviations).

  5. [2 marks] The value of the test statistic and the formula you used to calculate the test statistic.

  6. [1 mark] The critical value(s).

  7. [1 mark] The p-value of the test and the mathematical expression you used to calculate the p-value.

  8. [1 mark] Your decision whether to reject H0 or not and why.

  9. [1 mark] State the conclusion of your test in the context of the question (so in relation to the applied research question).

  10. [3 marks] Discuss any limitations of your analysis. For example, the assumptions required by your hypothesis test were not met by the data; the method to handle missing data; bias due to the sampling design. Be specific in your discussion of limitations to the `tennis.xls’ data set and also discuss how the analysis could be improved to overcome these limitations.




STAT1008 – Quantitative Research Methods Final Examination – Part 2

Semester 1 2020

Page 3 of 3