Business Statistics Midterm Due TONIGHT JS

Name: ________________________________________________________________________ PLEASE READ THIS PART CAREFULLY BEFORE STARTING THE EXAM. 1.

Your submitted exam answers should be in one document, as a Microsoft Word document. Spreadsheet answers are not permitted.

2. Please show all of your work. Do not expect the grader to guess your reasoning. Your grade on the exam will depend on the clarity of your answers, the reasoning you have used, and the correctness of your answers.

3. There are 6 problems in the exam and 100 points in total . GOOD LUCK!! Problem 1 (20 points) Part 1 (12 points) Commonwealth Health Insurance has become interested in a new type of cancer screening. Early screening of cancer reduces the risk that cancers can develop, leading to greatly improved patient health and less costly procedures. Imple menting this new screening procedure costs $ 40 million every year. Staying with the current screening procedure is far less costly, only costing $5 million every year. Hence, Commonwealth Health Insurance wants to be careful and conduct a rigorous analysis of this important decision . Regardless , th is new type of cancer screening has not been put in practice before, and h ence, the final cost reduction is uncertain. Commonwealth Health Insurance has assessed that there is an 80% chance that the screening procedure is very successful, and a 20% chance that the screening is not very successful. In the case that the screening procedure is very successful, it is estimated that the yearly costs related to cancer are decreased by $100 million. On the other hand, w hen it is not a ver y successful case, yearly costs a re reduced by only $3 0 million , which is the same as under the current procedure . The decision tree associated with this problem is shown in Figure 1 below. Figure 1 (a) (4 points) Fill in the decision tree , i.e. , where needed fill in the pr obabilities, the end point values , and calculate the Expected Monetary V alue (EMV ) . Please e xplain what is the optimal decision based on EMV.

(b) (4 points) Calculate the probability to be very successful for which the decisions to implement and not implement yield the same EMV .

(c) (4 points) Discuss the sensitivity of the decision from (a ) using the outcome of (b ). Part 2 (8 points) For each of the following statements determine if it is true or false. Please offer a one - sentence explanation of your answer . (d) (2 points) The table below describes the probability distribution of a random variable X.

The values of X are given in the first row. The probabilities, 1 , 2 , 3 , 4 , 5 , and 6 are in the second row. For example, ( = 6 ) = 4 .

V alue 7 3 13 6 20 2 1 Probability p 1 p 2 p 3 p 4 p 5 p 6 i. It is possible that the expected value of X is 13.

ii. It is possible that the expected value of X is 23.

(e) (2 points) A resident of Boston is chosen completely at random. Consider the following two events:

i. The person selected is a teacher ii. The person selected is a teacher and is a vegetarian The probability of event ( ii ) can never exceed that of event ( i ).

(f) (2 points) In a plot of a regression output, corresponding to a simple linear regression (OLS) model with one explanatory variable, it is possible that all of the training data points are above the regression line.

(g) (2 points) In an optimization problem, the optimal solution is always on the boundary . Problem 2 (15 Points) Jacob has recentl y opened a new apparel store close to the towns of Bern and Oulli. Bern and Oulli together have a total population of 10 , 000 out of which 4 , 000 are from Bern and 6 , 000 are from Oulli. Everyday multiple customers enter the store, but Jacob is interested in counting the number of times the first customer comes from Bern. Each person is equally likely to stop at the store on any given day. Moreover, this likelihood is independent and identical for different days. a) ( 2 Point) What is the probability that the first person of the day comes from Bern?

For the first 10 days, Jacob wants to know how many times the first arrival will come from Bern.

Let Y denote the number of days so that the first arrival comes from Bern. For parts (b) - (e) you can use the answer that you calculated for part (a) . ( You will not be penalized in case your answers to later parts change due to a calculation error in part (a). b) ( 3 Point s ) What is the distribution of Y? What is the mean of Y and standard deviation of Y?

c) ( 3 Points) What i s the probability that none of the first arrivals happened from Bern?

d) ( 3 Points) What is the probability that Jacob saw an equal number of first arrivals from both Bern and Oulli in the first 10 days?

e) (4 Points) Write Y as a sum of random variables and u se the Central Limit Theorem to calculate the probab ility that Jacob saw at least 3 and at most 5 days on which the first arrival was from Bern.

(PLEASE SHOW US THE FORMULAS YOU USED AS WELL AS THE FINAL NUMBERS FROM YOUR CALCULATIONS ) Problem 3 (15 Points) In the past year , Winter Parks has opened a new recreation park on the shores of Lake Sum mer .

Entrance to the park is free, but there are several paid attractions as well as a membership option with additional benefits . Winter Parks wants to offer coupons to their customers to encourage them to use the attractions in the new park. Steven has been tasked with analyzing which customers should be targeted . To do this , Steven wants to predict how much each customer would spend at Lake Summer . The following is a list of the variables that Steven gathered about several household s that have visited before: - expsum : expenditures when visiting Lake Su m m er - visits : number of visits to Lake S u m m er - ski : indicato r of whether the customer water skied , 0 if not waterskiing and 1 if waterskiing - income : annual household income - feesum : annual member fee for Lake Summer , 0 if not paid and 1 if paid The initial model regresses expsum on all the available independent variables . Regression Statistics (Model 1) Multiple R 0.420358565 R Square 0.176701323 Adjusted R Square 0.162805987 Standard Error 36.45922964 Observations 242 Coefficients Standard Error t Stat Lower 95% Upper 95% Intercept 37.22597296 6.318887743 5.891222391 24.77761238 49.67433353 visits - 0.968530065 0.265269904 - 3.651111762 - 1.491118144 - 0.445941986 ski 00000000000 4.920334635 3.55243513 7.785992236 27.17234698 income 4.965190085 1.448287511 3.428317961 2.112028916 7.818351255 feesum 0.05710018 10.66093741 0.005356019 - 20.94520248 21.05940284 First, Steven wants to analyze the regression output to see which variables are useful predictors . (a) (2 Points) Calculate the missing coefficient of the variable ski in Model 1 ; use the output of Model 1.

(b) (2 Points) For each of the variables visits and feesum explain if they are insignific ant and if the variable is insignificant, explain why ; use the output of Model 1 . Discuss whether the coefficient s for the independent variable s visits and feesum make sense in Model 1.

Years ago, Steven took a class and he remembers that his professor told him that it is a good custom to go back to the data and plot it. Figure 1 shows the dependent variable expsum and the independent variable visits for each data point. Figure 1 (c) (2 Points) Do you s uspect that a linear equation describes the relationship between vis i ts and expsum. Given your answer above, how would you improve the model, if at all.

Before Steven could make these change s, he was interrupted by h is colleague who handed him last year’s customer expenditure data for the park at Lake Weather. He decided to i nclude the variable expwea : expendit ures when visiting Lake Weather . Regression Statistics (Model 2) Multiple R 0.972935477 R Square 0.946603442 Adjusted R Square 0.945472159 Standard Error 9.304725896 Observations 242 Coefficients Standard Error t Stat Lower 95% Upper 95% Intercept - 0.534059455 1.737704045 - 0.307336256 - 3.957452594 2.889333683 visits - 0.346415744 0.068534161 - 5.054643404 - 0.481432622 - 0.211398865 ski 2.331074143 1.282283939 1.817907932 - 0.195110941 4.857259227 income 0.582727703 0.377174043 1.544983578 - 0.160330377 1.325785783 feesum - 6.084416517 2.722804123 - 2.234614112 - 11.44852267 - 0.720310367 expwea 0.979709191 0.016795002 58.3333779 0.946621914 1.012796467 Additionally, Steven calculates the correlation.

Table 1 expsum visits ski income feesum expwea expsum 1 visits - 0.2431 1 ski 0.2664 0.0427 1 income 0.3024 - 0.1375 0.2713 1 feesum - 0.0575 0.2210 0.0093 - 0.0496 1 expwea 0.9676 - 0.1632 0.2491 0.2732 - 0.0047 1 Steven looks at both the model and the correlation table and finds ways for improvement. In particular, he wants to remove a variable from the model. (d) (2 Points) Explain which variable could be removed from the model first; use the output of Model 2 and Table 1.

(e) (2 Points) Discuss whether there is multicollinearity between the independent va riables.

(f) (2 Points) Write explicitly the multiple linear reg ression equation describing expenditures at Lake Summer corresponding to Model 2 .

(g) (3 Points) Explain which variable is the most useful in predicting expenditures at Lake Summer ; use specific numbers from the output of both Model 1 and Model 2 . Problem 4 (15 Points) Datatronics is a consumer analytics firm that offers licenses for two different types of software packages : DAP (Data Analytics Package) and DMP (Data Modeling Package) . While the first package provides data analytics tools for clients, the second focuses on modeling support and aids clients’ decision making process. Licensing agreements invol ve setup and initial support that Datatronics must provide to its customers . T he company has two different customer support centers for providing assistance : one in the Philippines (P) and another in the United States (U) to serve its customers. The company is considering to offer at most 750 licenses of DAP (which is the demand for DAP in the next quarter), and at most 950 licenses of DMP (which is the demand for DMP in the next quarter). Because of the limited customer support personnel available, U can only support up to 800 licenses (o f either kind), and P is limited to supporting 1,000 licenses (of either kind). The two facilities employ different workforce which translates into different customer hour requirements, as well as earnings per serviced customer. The relevant information is summarized in the table below. Labor is measured in hours. For example, customer support of 1 D AP client from the center at P requires 3 0 hours, and the total amount of labor available over the next quarter will be 17,5 00 hours. Service Center Country Software Earnings (per license) Labor in hours (per license) Available labor P DAP DMP $1000 $1350 30 40 17,500 U DAP DMP $600 $800 20 20 15,000 Datatronics has recently hired Anne, a undergrad for its internship program. Anne, has formulated a linear program and solved it using Excel’s solver tool wh ich gave the following output. (Some of the output is intentionally left blank) Note that the variable UDAP and UDMP stand fo r the DAP and DMP licenses that should be serviced fro m USA and PDAP and PDMP are the licenses that should be serviced fro m Philippines. a) (3 Points) Write the linear constraint corresponding to the DAP demand and linear constraint corresponding to the capacity constraint in the U S .

b) (3 Points) What is the optimal number of DAP licenses that should be assigned to the servicing cente r in the US ?

c) (3 Points) Is the demand constraint for DAP binding? Is the demand constraint for DMP binding?

d) (3 Points) What is the shadow price associated with the capacity constraint at the support center in the Philippines ?

e) (3 Points) Datatronics can contract for 1,000 hours of additional labor in the Philippines , at a cost of $29 per hour, including benefits, overhead, etc. Is this worth doing? Problem 5 (20 Points) Donatello is graduating and his friends, Leonardo, Rafael, Michelangelo, and Splinter are coming for his graduation ceremony.

They wi ll arrive a day ea rly, and will have time to tou r the city. Donatello decided to plan a fun day in the city fo r them. He began by composing a list of attractions around the city: Attraction Approx.

time Cost Category Level of fun Freedom Trail 1 80 minutes free of charge Outdoor activity 7 Boston Public Garden 80 minutes free of charge Outdoor activity 10 Charles River Esplanade 4 0 minutes free of charge Outdoor activity 6 Boston Tea Party Ships & Museum 15 0 minutes $ 28 per ticket ( $ 1 40 for the whole gang) Museum 11 Back Bay 6 0 minutes free of charge Outdoor activity 6 Old North Church 30 minutes free of charge Tour 4 Boston Duck Tours 80 minutes $39.5 per ticket ( $ 197.5 for the whole gang) Tour 8 Samuel Adams Brewery 60 minutes free of charge Tour 12 Museum of Fine Arts 15 0 minutes $25 per ticket ($125 for the whole gang) Museum 9 John F. Kennedy Presidential Museum & Library 15 0 minutes $14 per ticket ($70 for the whole gang) Museum 7 Museum of Science 15 0 minutes $25 per ticket ($125 for the whole gang) Museum 9 Harvard University 6 0 minutes free of charge University 5 Unfortunately, they will not have time to visit all of the attractions, as they only have 10 hours. Since Donatello never took this class, he came to you for help.(a) (4 points) Formulate the problem as a discrete linear optimization problem to maximize the total fun during the limited time available . What are the decision variables? What is the range for each variable? What is the objective function? What are th e constraints?

(b) (8 points) Donatello sent the itinerary to his friends, and received a list of requests. Model each of the requests listed below as linear constraints • (2 points) Rafael said that if they go to more than 3 museums, then they have to go to the Samuel Adams Brewery .

• (2 points) Michelangelo loves to play outside, and therefore, asked that in total they will spend at least 2 hours in outdoor activities. • (2 points) Splinter realized that the costs are getting high. He asked you to make sure that they are not spending more than $300 (in total for Donatello and his friends) .

• (2 points) Leonardo asked to visit at least one university or at least two museums.

You successfully modeled all of the requests, and shared the itinerary with your classmate. He mentioned a website that had a better estimation of the time each attraction takes. In particular, this website modeled the time at each attraction as a normal r andom variable, and provides the mean and standard deviation of each attraction, as well as the correlation between the different attractions. The data from the website is provided below. Attraction Mean S tandard deviation Freedom Trail 1 80 minutes 20 Boston Public Garden 80 minutes 30 Charles River Esplanade 4 0 minutes 30 Boston Tea Party Ships & Museum 15 0 minutes 15 Back Bay 6 0 minutes 60 Old North Church 30 minutes 5 Boston Duck Tours 80 minutes 5 Samuel Adams Brewery 60 minutes 5 Museum of Fine Arts 15 0 minutes 100 John F. Kennedy Presidential Museum & Library 15 0 minutes 50 Museum of Science 15 0 minutes 50 Harvard University 80 minutes 20 (The correlation table is at the end of the question) (c) (4 points) If the itinerary includes the Freedom Trail , Boston Public Garden and Charles River Esplanade , what is the standard deviation of the length of the trip? Add a (maybe non - linear) constraint that assures that the standard deviation of the total trip is at most 60 minutes .

You used your favorite solver, and solved the problem. After looking at your suggested schedule, Donatello realized that he forgot to account for the distance between the different attractions. He gave you the distances table below. (Note that t he distances table can b e found at the end of the question) (d) (4 points) Rewrite your constraint from part (a) to account for the traveling time. Make sure to start your day and end it at the Marriott Cambridge hotel. What are the decision variables? What is the range for each var iable? What are the new constraints?

To save some time, please consider only the first 5 attraction s (Freedom Trail, Boston Public Garden, Charles River Esplanade, Boston Tea Party Ships & Museum, and Back Bay) when you write your answer to part (d) .

Hints:

• The variables should be in the form of , that would indicate that they went from attraction to attraction .

• Make sure that you are leaving the hotel, and that you are going back to the hotel.

• Make sure that if you go to an attraction, yo u also leave it.

• Don’t forget to link the new variables to the old ones . Correlation table Freedo m Trail Boston Public Garden Charles River Esplana de Boston Tea Party Ships & Museum Back Bay Old North Church Boston Duck Tours Samuel Adams Brewery Museum of Fine Arts J. F. K . Museum & Library Museum of Science Harvard Universi ty Freedom Trail 1 Boston Public Garden -0.2 1 Charles River Esplanade 0 -0.1 1 Boston Tea Party Ships & Museum -0.2 0 0 1 Back Bay 0 0 0 0 1 Old North Church -0.2 0 0 0 0 1 Boston Duck Tours 0 -0.2 0 -0.2 0 -0.2 1 Samuel Adams Brewery 0.2 0.2 0 0.2 0 0 0.2 1 Museum of Fine Arts -0.2 0 0 -0.1 0 0 0 0.2 1 J. F. K . Museum & Library 0 0 0 -0.1 0 0 -0.2 0 -0.1 1 Museum of Science -0.2 0 0 -0.1 0 0 0 0 -0.1 0 1 Harvard University -0.2 0 0 0 0 0 -0.2 0 -0.1 0 0 1 Distances tabl e Marriott Cambridge hotel Freedom Trail Boston Public Garden Charles River Esplanade Boston Tea Party Ships & Museum Back Bay Old North Church Boston Duck Tours Samuel Adams Brewery Museum of Fine Arts J. F. K. Museum & Library Museum of Science Harvard University Marriott Cambridge hotel 0 Freedom Trail 8 0 Boston Public Garden 7 6 0 Charles River Esplanade 6 17 10 0 Boston Tea Party Ships & Museum 11 20 20 25 0 Back Bay 10 20 25 17 13 0 Old North Church 15 23 25 27 10 17 0 Boston Duck Tours 13 20 15 14 14 5 17 0 Samuel Adams Brewery 20 20 20 20 20 20 20 20 0 Museum of Fine Arts 12 40 8 30 14 24 18 21 20 0 J. F. K. Museum & Library 20 20 20 20 20 20 20 20 20 20 0 Museum of Science 23 25 25 30 10 11 24 12 20 11 20 0 Harvard University 12 15 12 12 19 13 21 17 20 13 20 19 0 Problem 6 (15 points) The primary goal of the course has been to teach you some important analytics tools that we believe can make a difference in making decisions based on data. We would like to ask you to think back to your current or last job before coming to this class. Please identify a project, activity, task, or assignment that you worked on at that job where you would now have analyzed the problem differently given the knowledge you acquired from this semester. a) (4 points) Describe the project/activity/assignment.

b) (4 points) Describe either the data that you had available for the project or that you now wish you had developed in order to complete the project.

c ) (7 points) What modeling tool(s) from this class would you have used on this project, and why do you think these tools would have been effective? Try to be as concise as possible ; we strongly suggest that you limit your answer to each of these questions to approximately one paragraph . Do not spend way too much time on this problem. ( Note: If your job involved tasks for which the use of data and quantitative analysis was not relevant at all, then answer this question by instead discussing one of your possible job opportunities for an upcom ing engagement)