QNT/351 Descriptive Statistics – Real Estate Data Part 1

Page 101 Page 102 PRINTED BY: [email protected]. Printing is for personal, private use only. No pa rt of this book may be reproduced or transmitted without publisher's prior permission. Violators will be pr osecuted.

d. List the actual values in the fourth row.

e. List the actual values in the next to the last row.

f. On how many days were less than 160 movies rented?

g. On how many days were 220 or more movies rented?

h. What is the middle value?

i. On how many days were between 170 and 210 movies rented?

9. A survey of the number of phone calls made by a sample of 16 Verizon subscribers last week revealed the following information. Develop a stem-and-leaf chart. How many ca lls did a typical subscriber make?

What were the maximum and the minimum number of calls made? 10. Aloha Banking Co. is studying ATM use in suburban Honolulu. A sample of 30 ATMs showed they were used the following number of times yesterday. Develop a stem-and-l eaf chart. Summarize the number of times each ATM was used. What were the typical, minimum, and maximum number of times each ATM was used?

LO4-3 Identify and compute measures of position.

MEASURES OF POSITION The standard deviation is the most widely used measure of dispers ion. However, there are other ways of describing the variation or spread in a set of data. One method is to determine the location of values that divide a set of observations into equal parts. These measures include quartiles, deciles, and percentiles.

Quartiles divide a set of observations into four equal parts. To expl ain further, think of any set of values arranged from the minimum to the maximum. In Chapter 3, we call ed the middle value of a set of data arranged from the minimum to the maximum the median. That is, 50% of the observations are larger than the median and 50% are smaller. The median is a measure of locat ion because it pinpoints the center of the data.

In a similar fashion, quartiles divide a set of observations into four equal parts. The first quarti le, usually labeled Q 1, is the value below which 25% of the observations occur, and the third quartile, usually labeled Q 3, is the value below which 75% of the observations occur.

Similarly, deciles divide a set of observations into 10 equal parts and percentiles into 100 equal parts. So if you found that your GPA was in the 8th decile at your university, you could conclude that 80% of the students had a GPA lower than yours and 20% had a higher GPA. If your GPA was in the 92nd percentile, then 92% of students had a GPA less than your GPA and only 8% of stude nts had a GPA greater than your GPA. Percentile scores are frequently used to report results on such national standardized tests as the SAT, https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

1 of 6 4/9/2016 9:21 AM Page 103 ACT, GMAT (used to judge entry into many master of business administrati on programs), and LSAT (used to judge entry into law school). Quartiles, Deciles, and Percentiles To formalize the computational procedure, let L p refer to the location of a desired percentile. So if we want to find the 92nd percentile we would use L 92 , and if we wanted the median, the 50th percentile, then L 50 . For a number of observations, n, the location of the Pth percentile, can be found using the formula:

An example will help to explain further.

E X A M P L E Listed below are the commissions earned last month by a sample of 15 brokers at Morgan Stanley Smith Barney’s Oakland, California, office. Morgan Stanley Smith Bar ney is an investment company with offices located throughout the United States.

Locate the median, the first quartile, and the third quartile for the commissions ear ned.

S O L U T I O N The first step is to sort the data from the smallest commission to the largest.

The median value is the observation in the center and is the sam e as the 50th percentile, so P equals 50. So the median or L 50 is located at ( n + 1)(50/100), where n is the number of observations. In this case, that is position number 8, found by (15 + 1)(50/100). The eighth-largest commission is $2,038. So we conclude this is the median and that half the brokers earned commissions m ore than $2,038 and half earned less than $2,038. The result using formula (4–1) to find the median is the same as the method presented in Chapte r 3.

https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

2 of 6 4/9/2016 9:21 AM Page 104 Recall the definition of a quartile. Quartiles divide a set of observations into four equal parts. Hence 25% of the observations will be less than the first quartile. Seve nty-five percent of the observations will be less than the third quartile. To locate the first quartile, we use formula (4–1), where n = 15 and P = 25: and to locate the third quartile, n = 15 and P = 75:

Therefore, the first and third quartile values are located at positi ons 4 and 12, respectively. The fourth value in the ordered array is $1,721 and the twelfth is $2,205. These are the first and third quartiles.

In the above example, the location formula yielded a whole number. Tha t is, we wanted to find the first quartile and there were 15 observations, so the location formula i ndicated we should find the fourth ordered value. What if there were 20 observations in the sample, that is n = 20, and we wanted to locate the first quartile? From the location formula (4–1):

We would locate the fifth value in the ordered array and then move .25 of the distance between the fifth and sixth values and report that as the first quartile. Like the me dian, the quartile does not need to be one of the actual values in the data set.

To explain further, suppose a data set contained the six values 91, 75, 61, 101, 43, and 104. We want to locate the first quartile. We order the values from the minimum to the maximum: 43, 61, 75, 91, 101, and 104. The first quartile is located at The position formula tells us that the first quartile is locate d between the first and the second values and it is .75 of the distance between the first and the second values. The first value is 43 and the second is 61. So the distance between these two values is 18. To locat e the first quartile, we need to move .75 of the distance between the first and second values, so .75(18) = 13.5. To complete the procedure, we add 13.5 to the first value, 43, and report that the first quartile is 56.5. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

3 of 6 4/9/2016 9:21 AM John W. Tukey (1915–2000) received a PhD in mathematics from Princeton in 1939. However, when he joined the Fire Control Research Office during World War II, hi s interest in abstract mathematics shifted to applied statistics. He developed effective numerical and graphic al methods for studying patterns in data.

Among the graphics he developed are the stem-and-leaf diagram and the box-and-whisker plot or box plot.

From 1960 to 1980, Tukey headed the statistical division of NBC’s el ection night vote projection team. He became renowned in 1960 for preventing an early call of victory for Richard Nixon in the presidential election won by John F. Kennedy.

We can extend the idea to include both deciles and percentiles. To locate the 23rd percentile in a sample of 80 observations, we would look for the 18.63 position.

To find the value corresponding to the 23rd percentile, we would locate the 18th value and the 19th value and determine the distance between the two values. Next, we would m ultiply this difference by 0.63 and add the result to the smaller value. The result would be the 23rd percentile.

Statistical software is very helpful when describing and summarizi ng data. Excel, Minitab, and MegaStat, a statistical analysis Excel add-in, all provide summary stati stics that include quartiles. For example, Minitab’s summary of the Smith Barney commission data, shown below, i ncludes the first and third quartiles, and other statistics. Based on the reported quartiles, 25% of t he commissions earned were less than $1,721 and 75% were less than $2,205. These are the same values we calculated using formula (4–1). There are ways other than formula (4–1) to locate quartile values. For example, another method uses 0.25 n + 0.75 to locate the position of the first quartile and 0.75 n + 0.25 to locate the position of the third quartile. We will call this the Excel Method. In the Smith Barney data, this method would place the first quartile at position 4.5 (.25 × 15 + .75) and the third quartile at pos ition 11.5 (.75 × 15 + .25). The first quartile would be interpolated as 0.5, or one-half the difference betw een the fourth- and the fifth-ranked values. Based on this method, the first quartile is $1739.5, found by ($1,721 + 0.5[$1,758 $1,721]). The third quartile, at position 11.5, would be $2,151, or one-half the distance between the eleventh- and the twelfth-ranked values, found by ($2,097 + 0.5[$2,205 $2,097]). Excel, as shown i n the Smith Barney and Applewood examples, can compute quartiles using either of the two m ethods. Please note the text uses formula (4–1) to calculate quartiles. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

4 of 6 4/9/2016 9:21 AM Page 105 Is the difference between the two methods important? No. Usually it is just a nuisance. In general, both methods calculate values that will support the statement that approxi mately 25% of the values are less than the value of the first quartile, and approximately 75% of the data va lues are less than the value of the third quartile. When the sample is large, the difference in the res ults from the two methods is small. For example, in the Applewood Auto Group data there are 180 vehicles. The quartil es computed using both methods are shown to the left. Based on the variable profit, 45 of the 180 values (25%) are less than both values of the first quartile, and 135 of the 180 values (75%) are less than both values of the third quartile.

When using Excel, be careful to understand the method used to cal culate quartiles. In Excel 2007, quartiles are calculated using the Excel Method. Excel 2010 provides both methods to calculate quartiles. The Excel 2010 commands to compute the quartiles are shown in Appendix C. 4–2 The Quality Control department of Plainsville Peanut Company is res ponsible for checking the weight of the 8-ounce jar of peanut butter. The weights of a sample of nine jars produced last hour are: https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

5 of 6 4/9/2016 9:21 AM Page 106 (a) What is the median weight?

(b) Determine the weights corresponding to the first and third quartiles.

E X E R C I S E S For DATA FILE , please visit www.mhhe.com/lind16e 11. Determine the median and the first and third quartiles in the following data.

12. Determine the median and the first and third quartiles in the following data.

13. The Thomas Supply Company Inc. is a distributor of gas-powered generators. As with any business, the length of time customers take to pay their invoices is important. Listed below, arranged from smallest to largest, is the time, in days, for a sample of The Thomas Supply Company Inc. invoices. a. Determine the first and third quartiles.

b. Determine the second decile and the eighth decile.

c. Determine the 67th percentile. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

6 of 6 4/9/2016 9:21 AM Page 106 PRINTED BY: [email protected]. Printing is for personal, private use only. No pa rt of this book may be reproduced or transmitted without publisher's prior permission. Violators will be pr osecuted.

14. Kevin Horn is the national sales manager for National Textbooks Inc. He has a sales sta ff of 40 who visit college professors all over the United States. Each Saturday morning he requires his sales staff to send him a report. This report includes, among other things, the num ber of professors visited during the previous week. Listed below, ordered from smallest to large st, are the number of visits last week. a. Determine the median number of calls.

b. Determine the first and third quartiles.

c. Determine the first decile and the ninth decile.

d. Determine the 33rd percentile.

LO4-4 Construct and analyze a box plot.

BOX PLOTS A box plot is a graphical display, based on quartiles, that helps us picture a set of data. To construct a box plot, we need only five statistics: the minimum value, Q 1 (the first quartile), the median, Q 3 (the third quartile), and the maximum value. An example will help to explain.

E X A M P L E Alexander’s Pizza offers free delivery of its pizza within 15 miles. Alex, the owner, wants some information on the time it takes for delivery. How long does a typi cal delivery take? Within what range of times will most deliveries be completed? For a sample of 20 del iveries, he determined the following information:

Develop a box plot for the delivery times. What conclusions can you make about the delivery times?

S O L U T I O N The first step in drawing a box plot is to create an appropria te scale along the horizontal axis. Next, we draw a box that starts at Q 1 (15 minutes) and ends at Q 3 (22 minutes). Inside the box we place a vertical line to represent the median (18 minutes). Finally, we extend horizontal lines from the box out to the minimum value (13 minutes) and the maximum value (30 minutes). These horizontal lines outside of the box are sometimes called “whiskers” because they look a bit like a cat’s whiskers. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

1 of 6 4/9/2016 9:22 AM Page 107 The box plot also shows the interquartile range of delivery times between Q1 and Q3.

The interquartile range is 7 minutes and indicates that 50% of the deliveries are between 15 and 22 minutes.

The box plot also reveals that the distribution of delivery times is positively skewed. In Chapter 3, we defined skewness as the lack of symmetry in a set of data. H ow do we know this distribution is positively skewed? In this case, there are actually two pieces of infor mation that suggest this. First, the dashed line to the right of the box from 22 minutes ( Q 3) to the maximum time of 30 minutes is longer than the dashed line from the left of 15 minutes ( Q 1) to the minimum value of 13 minutes. To put it another way, the 25% of the data larger than the third quartile is more spread out than the 25% less than the first quartile. A second indication of positive skewness is that the median is not in the c enter of the box. The distance from the first quartile to the median is smaller than the distance from the median to the third quartile. We know that the number of delivery times between 15 minutes and 18 minutes is the same as the number of delivery times between 18 minutes and 22 minutes.

E X A M P L E Refer to the Applewood Auto Group data. Develop a box plot for the va riable age of the buyer. What can we conclude about the distribution of the age of the buyer?

S O L U T I O N Minitab was used to develop the following chart and summary statistics.

The median age of the purchaser is 46 years, 25% of the purchasers a re less than 40 years of age, and 25% are more than 52.75 years of age. Based on the summary information and the box plot, we conclude: https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

2 of 6 4/9/2016 9:22 AM Page 108 Fifty percent of the purchasers are between the ages of 40 and 52.75 years.

The distribution of ages is symmetric. There are two reasons for this conclusion. The length of the whisker above 52.75 years ( Q 3) is about the same length as the whisker below 40 years ( Q 1). Also, the area in the box between 40 years and the median of 46 years is about the same as the area between the median and 52.75.

There are three asterisks (*) above 70 years. What do they indica te? In a box plot, an asterisk identifies an outlier. An outlier is a value that is inconsistent with the rest of t he data. It is defined as a value that is more than 1.5 times the interquartile range smaller than Q 1 or larger than Q 3. In this example, an outlier would be a value larger than 71.875 years, found by:

Outlier > Q 3 + 1.5( Q 3 Q 1) = 52.75 + 1.5(52.75 40) = 71.875 An outlier would also be a value less than 20.875 years. Outlier, Q 1 1.5( Q 3 Q 1) = 40 1.5(52.75 40) = 20.875 From the box plot, we conclude there are three purchasers 72 years of age or older and none less than 21 years of age. Technical note: In some cases, a single asteris k may represent more than one observation because of the limitations of the software and space available. It is a good idea to check the actual data. In this instance, there are three purchasers 72 years old or older; two are 72 and one is 73. 4–3 The following box plot shows the assets in millions of dollars for credit unions in Seattle, W ashington.

What are the smallest and largest values, the first and third quartiles, and the median? Would you agree that the distribution is symmetrical? Are there any outliers?

E X E R C I S E S For DATA FILE , please visit www.mhhe.com/lind16e 15. The box plot below shows the amount spent for books and supplies per year by students at four-year public colleges. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

3 of 6 4/9/2016 9:22 AM Page 109 a. Estimate the median amount spent.

b. Estimate the first and third quartiles for the amount spent.

c. Estimate the interquartile range for the amount spent.

d. Beyond what point is a value considered an outlier?

e. Identify any outliers and estimate their value.

f. Is the distribution symmetrical or positively or negatively skewed?

16. The box plot shows the undergraduate in-state charge per credit hour at four-year public college s.

a. Estimate the median.

b. Estimate the first and third quartiles.

c. Determine the interquartile range.

d. Beyond what point is a value considered an outlier?

e. Identify any outliers and estimate their value.

f. Is the distribution symmetrical or positively or negatively skewed?

17. In a study of the gasoline mileage of model year 2013 automobiles, the mean miles per gallon was 27.5 and the median was 26.8. The smallest value in the study was 12.70 m iles per gallon, and the largest was 50.20. The first and third quartiles were 17.95 and 35.45 miles per gallon, respectively. Develop a box plot and comment on the distribution. Is it a symmetric distribution?

18. A sample of 28 time shares in the Orlando, Florida, area reve aled the following daily charges for a one-bedroom suite. For convenience, the data are ordered from smalles t to largest. Construct a box plot to represent the data. Comment on the distribution. Be sure to ide ntify the first and third quartiles and the median. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

4 of 6 4/9/2016 9:22 AM Page 110 SKEWNESS In Chapter 3, we described measures of central location for a set of observations by reporting the mean, median, and mode. We also described measures that show the amount of spread or variation in a set of data, such as the range and the standard deviation.

LO4-5 Compute and interpret the coefficient of skewness.

Another characteristic of a set of data is the shape. There are four shapes com monly observed: symmetric, positively skewed, negatively skewed, and bimodal. In a symmetric set of observations the mean and median are equal and the data values are evenly spread around these values . The data values below the mean and median are a mirror image of those above. A set of values is skewed to the right or positively skewed if there is a single peak, but the values extend much farther to the right of the peak than to the left of the peak.

In this case, the mean is larger than the median. In a negatively skewed distribution there is a single peak, but the observations extend farther to the left, in the negative dir ection, than to the right. In a negatively skewed distribution, the mean is smaller than the me dian. Positively skewed distributions are more common. Salaries often follow this pattern. Think of the salaries of those employed in a small company of about 100 people. The president and a few top execut ives would have very large salaries relative to the other workers and hence the distribution of salari es would exhibit positive skewness. A bimodal distribution will have two or more peaks. This is often the case when the values are from two or more populations. This information is summarized in Chart 4–1.

CHART 4–1 Shapes of Frequency Polygons There are several formulas in the statistical literature use d to calculate skewness. The simplest, developed by Professor Karl Pearson (1857–1936), is based on the difference between the mean and the median. The late Stephen Jay Gould (1941–2002) was a professor of zoology and profes sor of geology at Harvard University. In 1982, he was diagnosed with cancer and had an expect ed survival time of 8 months.

However, never to be discouraged, his research showed that the distribution of survival ti me is dramatically skewed to the right and showed that not only do 50% of similar cance r patients survive more than 8 months, but that the survival time could be years rather than months! In fact, Dr. Gould lived another 20 https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

5 of 6 4/9/2016 9:22 AM Page 111 years. Based on his experience, he wrote a widely published essay titled “The Median Is Not the Message.” Using this relationship, the coefficient of skewness can range from 3 up to 3. A value near 3, such as 2.57, indicates considerable negative skewness. A value such as 1.63 indi cates moderate positive skewness.

A value of 0, which will occur when the mean and median are equal, indica tes the distribution is symmetrical and there is no skewness present.

In this text, we present output from Minitab and Excel. Both of t hese software packages compute a value for the coefficient of skewness based on the cubed deviations from the mean. The formula is: Formula (4–3) offers an insight into skewness. The right-hand side of the formula is the difference between each value and the mean, divided by the standard deviation. That is the portion of the formula.

This idea is called standardizing. We will discuss the idea of standardizing a value in more detai l in Chapter 7 when we describe the normal probability distribution. At this point, observe that the result is to report the difference between each value and the mean in units of the sta ndard deviation. If this difference is positive, the particular value is larger than the mean; if the https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=10...

6 of 6 4/9/2016 9:22 AM Page 111 PRINTED BY: [email protected]. Printing is for personal, private use only. No pa rt of this book may be reproduced or transmitted without publisher's prior permission. Violators will be pr osecuted.

value is negative, the standardized quantity is smaller than the mean. When we cube these values, we retain the information on the direction of the difference. Recall that in the formula for the standard deviation [see formula (3–10)] we squared the difference between each va lue and the mean, so that the result was all nonnegative values.

If the set of data values under consideration is symmetric, when we cube t he standardized values and sum over all the values, the result would be near zero. If there are several large values, clearly separate from the others, the sum of the cubed differences would be a large positive va lue. If there are several small values clearly separate from the others, the sum of the cubed differences will be negative.

An example will illustrate the idea of skewness.

E X A M P L E Following are the earnings per share for a sample of 15 software companies for the year 2013. The earnings per share are arranged from smallest to largest. Compute the mean, median, and standard deviation. Find the coefficie nt of skewness using Pearson’s estimate and the software methods. What is your conclusion regarding the shape of the distri bution?

S O L U T I O N These are sample data, so we use formula (3–2) to determine the mean The median is the middle value in a set of data, arranged from smallest to largest. In this case, the middle value is $3.18, so the median earnings per share is $3.18.

We use formula (3–10) on page 77 to determine the sample standard deviation.

Pearson’s coefficient of skewness is 1.017, found by This indicates there is moderate positive skewness in the earnings per share data.We obtain a similar, but not exactly the same, value from the s oftware method. The details of the calculations are shown in Table 4–2. To begin, we find the difference between each earni ngs per share value and the mean and divide this result by the standard deviation. We ha ve referred to this as standardizing.

Next, we cube, that is, raise to the third power, the result of the first step. Finally, we sum the cubed values.

The details for the first company, that is, the company with an earnings per share of $0.09, are: https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

1 of 12 4/9/2016 9:22 AM Page 112 TABLE 4–2 Calculation of the Coefficient of Skewness When we sum the 15 cubed values, the result is 11.8274. That is, the term . To find the coefficient of skewness, we use formula (4–3), with n = 15. We conclude that the earnings per share values are somewhat positivel y skewed. The following Minitab summary reports the descriptive measures, such as the mean, median, and standard deviati on of the earnings per share data. Also included are the coefficient of skewness a nd a histogram with a bell-shaped curve superimposed. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

2 of 12 4/9/2016 9:22 AM Page 113 4–4 A sample of five data entry clerks employed in the Horry County Tax Office revised the following number of tax records last hour: 73, 98, 60, 92, and 84.

(a) Find the mean, median, and the standard deviation.

(b) Compute the coefficient of skewness using Pearson’s method.

(c) Calculate the coefficient of skewness using the software method.

(d) What is your conclusion regarding the skewness of the data?

E X E R C I S E S For DATA FILE , please visit www.mhhe.com/lind16e For Exercises 19–22: a. Determine the mean, median, and the standard deviation.

b. Determine the coefficient of skewness using Pearson’s method.

c. Determine the coefficient of skewness using the software method.

19. The following values are the starting salaries, in $000, for a sample of five accounting graduates who accepted positions in public accounting last year.

20. Listed below are the salaries, in $000, for a sample of 15 chi ef financial officers in the electronics industry.

21. Listed below are the commissions earned ($000) last year by the 15 sales representatives at Furniture Patch Inc.

22. Listed below are the salaries for the 2012 New York Yankees Major League Baseball t eam.

https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

3 of 12 4/9/2016 9:22 AM Page 114 LO4-6 Create and interpret a scatter diagram.

DESCRIBING THE RELATIONSHIP BETWEEN TWO VARIABLES In Chapter 2 and the first section of this chapter, we presented graphical techniques to summarize the distribution of a single variable. We used a histogram in Chapter 2 to summarize the profit on vehicles sold by the Applewood Auto Group. Earlier in this chapter, we used dot pl ots and stem-and-leaf displays to visually summarize a set of data. Because we are studying a s ingle variable, we refer to this as univariate data. There are situations where we wish to study and visually portra y the relationship between two variables.

When we study the relationship between two variables, we refe r to the data as bivariate. Data analysts frequently wish to understand the relationship between two variables. Here are some exa mples:

Tybo and Associates is a law firm that advertises extensively on local TV. The partners are considering increasing their advertising budget. Before doing so, they would like to know the relationship between the amount spent per month on advertising and the total amount of billings for that month. To put it https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

4 of 12 4/9/2016 9:22 AM Page 115 another way, will increasing the amount spent on advertising result in an increase in bill ings? Coastal Realty is studying the selling prices of homes. What variables seem to be related to the selling price of homes? For example, do larger homes sell for more than s maller ones? Probably. So Coastal might study the relationship between the area in square feet and the selling price.

Dr. Stephen Givens is an expert in human development. He is st udying the relationship between the height of fathers and the height of their sons. That is, do tall fa thers tend to have tall children? Would you expect Dwight Howard, the 69 110, 265-pound professional basketball player, to have relatively tall sons?

One graphical technique we use to show the relationship between variables is called a scatter diagram.

To draw a scatter diagram, we need two variables. We scal e one variable along the horizontal axis ( X -axis) of a graph and the other variable along the vertical axis ( Y-axis). Usually one variable depends to some degree on the other. In the third example above, the height of the son depends on the height of the father. So we scale the height of the father on the horizontal axis and that of the son on the verti cal axis.

We can use statistical software, such as Excel, to perfor m the plotting function for us. Caution: You should always be careful of the scale. By changing the scale of e ither the vertical or the horizontal axis, you can affect the apparent visual strength of the relationship.

Following are three scatter diagrams (Chart 4–2). The one on the l eft shows a rather strong positive relationship between the age in years and the maintenance cost l ast year for a sample of 10 buses owned by the city of Cleveland, Ohio. Note that as the age of the bus increa ses, the yearly maintenance cost also increases. The example in the center, for a sample of 20 vehicl es, shows a rather strong indirect relationship between the odometer reading and the auction price. That is, as the number of miles driven increases, the auction price decreases. The example on the right depicts the rel ationship between the height and yearly salary for a sample of 15 shift supervisors. This graph indicates there is little relationship between their height and yearly salary. CHART 4–2 Three Examples of Scatter Diagrams.

E X A M P L E In the introduction to Chapter 2, we presented data from the Applew ood Auto Group. We gathered information concerning several variables, including the profit earned from the sale of 180 vehicles sold last month. In addition to the amount of profit on each sale, one of the ot her variables is the age of the purchaser. Is there a relationship between the profit earned on a vehicle sale and the age of the purchaser?

Would it be reasonable to conclude that more profit is made on vehicles purchased by older buyers?

S O L U T I O N https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

5 of 12 4/9/2016 9:22 AM Page 116 We can investigate the relationship between vehicle profit and t he age of the buyer with a scatter diagram.

We scale age on the horizontal, or X-axis, and the profit on the vertical, or Y-axis. We assume profit depends on the age of the purchaser. As people age, they earn more income a nd purchase more expensive cars which, in turn, produce higher profits. We use Excel to develop the sc atter diagram. The Excel commands are in Appendix C. The scatter diagram shows a rather weak positive relationship be tween the two variables. It does not appear there is much relationship between the vehicle profit and the age of the buyer . In Chapter 13, we will study the relationship between variables more extensively, even c alculating several numerical measures to express the relationship between variables.

In the preceding example, there is a weak positive, or direct, relationship between the variables. There are, however, many instances where there is a relationship between the variables, but that relationship is inverse or negative. For example: The value of a vehicle and the number of miles driven. As the number of miles inc reases, the value of the vehicle decreases.

The premium for auto insurance and the age of the driver. Auto ra tes tend to be the highest for younger drivers and less for older drivers.

For many law enforcement personnel, as the number of years on the job increases, the number of traffic citations decreases. This may be because personnel become more li beral in their interpretations or they may be in supervisor positions and not in a position to issue as ma ny citations. But in any event, as age increases, the number of citations decreases.

LO4-7 Develop and explain a contingency table.

CONTINGENCY TABLES A scatter diagram requires that both of the variables be at lea st interval scale. In the Applewood Auto Group example, both age and vehicle profit are ratio scale variables. Height is also ratio scale as used in the discussion of the relationship between the height of fathers and the height of their sons. What if we wish to study the relationship between two variables when one or both are nominal or ordinal scale? In this case, we https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

6 of 12 4/9/2016 9:22 AM Page 117 tally the results in a contingency table.

CONTINGENCY TABLE A table used to classify observations according to two identif iable characteristics.

A contingency table is a cross-tabulation that simultaneously summa rizes two variables of interest. For example: Students at a university are classified by gender and class (freshman, sophomore, junior, or s enior).

A product is classified as acceptable or unacceptable and by the shift (day, afternoon, or night) on which it is manufactured.

A voter in a school bond referendum is classified as to party affiliation (Democrat, Republican, other) and the number of children that voter has attending school in the district (0, 1, 2, etc.).

E X A M P L E There are four dealerships in the Applewood Auto Group. Suppose we want to compare the profit earned on each vehicle sold by the particular dealership. To put it another way, is there a relationship between the amount of profit earned and the dealership?

S O L U T I O N In a contingency table, both variables only need to be nominal or ordina l. In this example, the variable dealership is a nominal variable and the variable profit is a ra tio variable. To convert profit to an ordinal variable, we classify the variable profit into two categories, those case s where the profit earned is more than the median and those cases where it is less. On page 63 we c alculated the median profit for all sales last month at Applewood Auto Group to be $1,882.50. By organizing the information into a contingency table, we can compa re the profit at the four dealerships. We observe the following:

From the Total column on the right, 90 of the 180 cars sold had a profit above the median and half below. From the definition of the median, this is expected.

For the Kane dealership, 25 out of the 52, or 48%, of the cars sold were sold for a profit more than the median.

The percentage of profits above the median for the other dealerships are 50% for Olean, 42% for Sheffield, and 60% for Tionesta.

We will return to the study of contingency tables in Chapter 5 dur ing the study of probability and in Chapter 15 during the study of nonparametric methods of analysis. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

7 of 12 4/9/2016 9:22 AM Page 118 4–5 The rock group Blue String Beans is touring the United States. The following chart shows the relationship between concert seating capacity and revenue in $000 for a sample of concerts.

(a) What is the diagram called?

(b) How many concerts were studied?

(c) Estimate the revenue for the concert with the largest seating capacity.

(d) How would you characterize the relationship between revenue and seating capacity? Is it strong or weak, direct or inverse?

E X E R C I S E S For DATA FILE , please visit www.mhhe.com/lind16e 23. Develop a scatter diagram for the following sample data. How w ould you describe the relationship between the values?

24. Silver Springs Moving and Storage Inc. is studying the relationship be tween the number of rooms in a move and the number of labor hours required for the move. A s part of the analysis, the CFO of Silver Springs developed the following scatter diagram. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

8 of 12 4/9/2016 9:22 AM Page 119 a. How many moves are in the sample?

b. Does it appear that more labor hours are required as the number of rooms increase s, or do labor hours decrease as the number of rooms increases?

25. The Director of Planning for Devine Dining Inc. wishes to study t he relationship between the gender of a guest and whether the guest orders dessert. To investigate the r elationship, the manager collected the following information on 200 recent customers.

a. What is the level of measurement of the two variables?

b. What is the above table called?

c. Does the evidence in the table suggest men are more likely to order dessert than women? Explain why.

26. Ski Resorts of Vermont Inc. is considering a merger with Gulf Shores Beach Resorts Inc. of Alabama.

The board of directors surveyed 50 stockholders concerning their position on the merger. The results are reported below.

a. What level of measurement is used in this table?

b. What is this table called?

c. What group seems most strongly opposed to the merger?

C H A P T E R S U M M A R Y https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

9 of 12 4/9/2016 9:22 AM I. A dot plot shows the range of values on the horizontal axis and the number of observations for each value on the vertical axis.

A. Dot plots report the details of each observation.

B. They are useful for comparing two or more data sets.

II. A stem-and-leaf display is an alternative to a histogram.

A. The leading digit is the stem and the trailing digit the leaf.

B. The advantages of a stem-and-leaf display over a histogram include:

1. The identity of each observation is not lost.

2. The digits themselves give a picture of the distribution.

3. The cumulative frequencies are also shown.

III. Measures of location also describe the shape of a set of observations.

A. Quartiles divide a set of observations into four equal parts.

1. Twenty-five percent of the observations are less than the first quartile, 50% are less than the second quartile, and 75% are less than the third quartile.

2. The interquartile range is the difference between the third quartile and the first quartile.

B. Deciles divide a set of observations into 10 equal parts and percentiles into 100 equal parts.

IV. A box plot is a graphic display of a set of data.

A. A box is drawn enclosing the regions between the first quartile and the third quartile.

1. A line is drawn inside the box at the median value.

2. Dotted line segments are drawn from the third quartile to the largest value to show the highest 25% of the values and from the first quartile to the smallest va lue to show the lowest 25% of the values.

B. A box plot is based on five statistics: the maximum and minimum values, the first and third quartiles, and the median.

V. The coefficient of skewness is a measure of the symmetry of a distribution.

A. There are two formulas for the coefficient of skewness.

1. The formula developed by Pearson is: 2. The coefficient of skewness computed by statistical software is: VI. A scatter diagram is a graphic tool to portray the relationship between two variables .

A. Both variables are measured with interval or ratio scales.

B. If the scatter of points moves from the lower left to the upper right, the variabl es under consideration are directly or positively related.

C. If the scatter of points moves from the upper left to the lower right, the variables are inversely or negatively related.

VII. A contingency table is used to classify nominal-scale observations according to two chara cteristics.

P R O N U N C I A T I O N K E Y https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

10 of 12 4/9/2016 9:22 AM Page 120 SYMBOL MEANING PRONUNCIATION L p Location of percentile L sub p Q 1 First quartile Q sub 1 Q 3 Third quartile Q sub 3 C H A P T E R E X E R C I S E S For DATA FILE , please visit www.mhhe.com/lind16e 27. A sample of students attending Southeast Florida University is as ked the number of social activities in which they participated last week. The chart below was prepared from the sample data . a. What is the name given to this chart?

b. How many students were in the study?

c. How many students reported attending no social activities?

28. Doctor’s Care is a walk-in clinic, with locations in George town, Moncks Corner, and Aynor, at which patients may receive treatment for minor injuries, colds, and f lu, as well as physical examinations. The following charts report the number of patients treated in each of the three locations la st month. Describe the number of patients served at the three locations each day. What are the maximum and minimum numbers of patients served at each of the locations?

29. The screen size for 23 LCD televisions is given below. Make a stem-and-leaf display of this variable.

30. The top 25 companies (by market capitalization) operating in the Washington, DC, area along with the year they were founded and the number of employees are given below. Make a stem-and-leaf display of each of these variables and write a short description of your findings. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=11...

11 of 12 4/9/2016 9:22 AM Page 121 https://jigsaw.vitalsource.com/api/v0/books/1259698246/print?from=11...

12 of 12 4/9/2016 9:22 AM Page 121 Page 122 PRINTED BY: [email protected]. Printing is for personal, private use only. No pa rt of this book may be reproduced or transmitted without publisher's prior permission. Violators will be pr osecuted.

31. In recent years, due to low interest rates, many homeowners refinanced their home mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings and Loan.

Below is the amount refinanced for 20 loans she processed last week. The data are reporte d in thousands of dollars and arranged from smallest to largest. a. Find the median, first quartile, and third quartile.

b. Find the 26th and 83rd percentiles.

c. Draw a box plot of the data.

32. A study is made by the recording industry in the United States of the number of music CDs owned by 25 senior citizens and 30 young adults. The information is reported below.

a. Find the median and the first and third quartiles for the number of CDs owned by senior citizens.

Develop a box plot for the information.

b. Find the median and the first and third quartiles for the number of CDs owned by young adults.

Develop a box plot for the information.

c. Compare the number of CDs owned by the two groups.

33. The corporate headquarters of Bank.com, a new Internet company that performs all banking transactions via the Internet, is located in downtown Philadelphia. The director of human resources is making a study of the time it takes employees to get to work. The city is pla nning to offer incentives to each downtown employer if they will encourage their employees to use public transpor tation. Below is a listing of the time to get to work this morning according to whether the employee used public tra nsportation or drove a car.

a. Find the median and the first and third quartiles for the tim e it took employees using public https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=12...

1 of 6 4/9/2016 9:23 AM transportation. Develop a box plot for the information.

b. Find the median and the first and third quartiles for the tim e it took employees who drove their own vehicle. Develop a box plot for the information.

c. Compare the times of the two groups.

34. The following box plot shows the number of daily newspapers published in each state and the District of Columbia. Write a brief report summarizing the number published. Be sure to include information on the values of the first and third quartiles, the median, and whether there is any skewness. If there are any outliers, estimate their value. 35. Walter Gogel Company is an industrial supplier of fasteners, tool s, and springs. The amounts of its invoices vary widely, from less than $20.00 to more than $400.00. During t he month of January the company sent out 80 invoices. Here is a box plot of these invoices. Wr ite a brief report summarizing the invoice amounts. Be sure to include information on the values of the f irst and third quartiles, the median, and whether there is any skewness. If there are any outliers, approximate the value of thes e invoices.

36. The American Society of PeriAnesthesia Nurses (ASPAN; www.aspan.org) is a national organization serving nurses practicing in ambulatory surgery, preanesthesia, a nd postanesthesia care. The organization consists of the 40 components listed below. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=12...

2 of 6 4/9/2016 9:23 AM Page 123 Use statistical software to answer the following questions.

a. Find the mean, median, and standard deviation of the number of membe rs per component.

b. Find the coefficient of skewness, using the software. What do you conclude about the shape of the distribution of component size?

c. Compute the first and third quartiles using formula (4–1).

d. Develop a box plot. Are there any outliers? Which components are out liers? What are the limits for outliers?

37. McGivern Jewelers is located in the Levis Square Mall jus t south of Toledo, Ohio. Recently it ran an advertisement in the local newspaper reporting the shape, size, price, and cut grade for 33 of its diamonds currently in stock. The information is reported below. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=12...

3 of 6 4/9/2016 9:23 AM Page 124 a.

Develop a box plot of the variable price and comment on the result. Are there any outliers ? What is the median price? What are the values of the first and the third quartiles?

b. Develop a box plot of the variable size and comment on the result. Are there any outliers? What is the median price? What are the values of the first and the third quartiles?

c. Develop a scatter diagram between the variables price and size. Be sure to put pri ce on the vertical axis and size on the horizontal axis. Does there seem to be an assoc iation between the two variables? Is the association direct or indirect? Does any point seem to be different from the others ?

d. Develop a contingency table for the variables shape and cut grade. W hat is the most common cut grade? What is the most common shape? What is the most common combi nation of cut grade and shape?

38. Listed below is the amount of commissions earned last month for the ei ght members of the sales staff at Best Electronics. Calculate the coefficie nt of skewness using both methods.

Hint: Use of a spreadsheet will expedite the calculations. 39. Listed below is the number of car thefts in a large city over the last week. Calculate the coefficient of skewness using both methods. Hint: Use of a spreadsheet will expedite the calculations.

40. The manager of Information Services at Wilkin Investigations, a private investigation firm, is studying the relationship between the age (in months) of a combination printe r, copier, and fax machine and its monthly maintenance cost. For a sample of 15 machines, the manager developed the following chart.

What can the manager conclude about the relationship between the variables?

41. An auto insurance company reported the following information regardi ng the age of a driver and the number of accidents reported last year. Develop a scatter diagram for the data and wri te a brief summary.

42. Wendy’s offers eight different condiments (mustard, catsup, onion, mayonna ise, pickle, lettuce, tomato, and relish) on hamburgers. A store manager collected the following information on the number of https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=12...

4 of 6 4/9/2016 9:23 AM Page 125 condiments ordered and the age group of the customer. What can you concl ude regarding the information? Who tends to order the most or least number of condiments? 43. Here is a table showing the number of employed and unemployed workers 20 years or older by gender in the United States. a. How many workers were studied?

b. What percent of the workers were unemployed?

c. Compare the percent unemployed for the men and the women.

D A T A S E T E X E R C I S E S (The data for these exercises is available at the text website: www.mhhe.com/lind16e.) 44. Refer to the Real Estate data, which reports information on homes sold in the Goodyear, Arizona, area during the last year. Prepare a report on the selling prices of the homes. Be sure to answer the following questions in your report.

a. Develop a box plot. Estimate the first and the third quartiles. Are there any outliers?

b. Develop a scatter diagram with price on the vertical axis a nd the size of the home on the horizontal.

Does there seem to be a relationship between these variables? Is the relationshi p direct or inverse?

c. Develop a scatter diagram with price on the vertical axis and distance from the center of the city on the horizontal axis. Does there seem to be a relationship between t hese variables? Is the relationship direct or inverse?

45. Refer to the Baseball 2012 data that report information on the 30 Major League Baseball teams for the 2012 season.

a. In the data set, the variable, built, is the year that a stadium was constructed. U sing this variable, create a new variable, age, by subtracting the value of the variable, built , from the current year for each team. Develop a box plot with the new variable, age. Are there any outliers? If so, which of the stadiums are outliers?

b. Using the variable, payroll, create a box plot. Are there any out liers? Compute the quartiles using formula (4–1). How does the New York Yankees’ payroll compare to other team payrolls? Write a brief summary of your analysis.

c. Draw a scatter diagram with the variable, wins, on the vert ical axis and payroll on the horizontal axis.

What are your conclusions?

d. Using the variable, wins, draw a dot plot. What can you conclude from this plot?

46. Refer to the Buena School District bus data. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=12...

5 of 6 4/9/2016 9:23 AM Page 126 a.

Refer to the maintenance cost variable. Develop a box plot. What are the first and third quartiles? Are there any outliers?

b. Determine the median maintenance cost. Based on the median, de velop a contingency table with bus manufacturer as one variable and whether the maintenance cost was above or below the m edian as the other variable. What are your conclusions? A REVIEW OF CHAPTERS 1–4 This section is a review of the major concepts and terms introduc ed in Chapters 1–4. Chapter 1 began by describing the meaning and purpose of statistics. Next we described the di fferent types of variables and the four levels of measurement. Chapter 2 was concerned with describing a set of observa tions by organizing it into a frequency distribution and then portraying the frequency distri bution as a histogram or a frequency polygon. Chapter 3 began by describing measures of location, such as the mean, weighted mean, median, geometric mean, and mode. This chapter also included measures of dispersion, or spread. Discussed in this section were the range, variance, and standard deviation. Chapter 4 included several graphing techniques such as dot plots, box plots, and scatter diagrams. We also disc ussed the coefficient of skewness, which reports the lack of symmetry in a set of data. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=12...

6 of 6 4/9/2016 9:23 AM Page 93 PRINTED BY: [email protected]. Printing is for personal, private use only. No pa rt of this book may be reproduced or transmitted without publisher's prior permission. Violators will be pr osecuted. MCGIVERN JEWELERS recently ran an advertisement in the local newspaper reporting the https://jigsaw.vitalsource.com/api/v0/books/1259698246/print?from=93...

1 of 4 4/9/2016 9:20 AM Page 94 shape, size, price, and cut grade for 33 of its diamonds in stock. Develop a box plot of the variable price and comment on the result. (See Exer cise 37 and LO4-4.) LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO4-1 Construct and interpret a dot plot. LO4-2 Construct and describe a stem-and-leaf display. LO4-3 Identify and compute measures of position. LO4-4 Construct and analyze a box plot. LO4-5 Compute and interpret the coefficient of skewness. LO4-6 Create and interpret a scatter diagram. LO4-7 Develop and explain a contingency table. INTRODUCTION Chapter 2 began our study of descriptive statistics. In order to tra nsform raw or un-grouped data into a meaningful form, we organize the data into a frequency distribution. We present the frequency distribution in graphic form as a histogram or a frequency polygon. This allows us to visualize where the data tend to cluster, the largest and the smallest values, and the general shape of the data.

In Chapter 3, we first computed several measures of location, s uch as the mean, median, and mode.

These measures of location allow us to report a typical value in the set of observations. We also computed several measures of dispersion, such as the range, varianc e, and standard deviation. These measures of dispersion allow us to describe the variation or the spread in a set of observa tions.

We continue our study of descriptive statistics in this chapter. W e study (1) dot plots, (2) stem-and-leaf displays, (3) percentiles, and (4) box plots. These charts and statistics give us additional insight into where the values are concentrated as well as the genera l shape of the data. Then we consider bivariate data. In bivariate data, we observe two variables for each individual or observation. Examples include the number of hours a student studied and the points earned on an examination; whether a sampled product is acceptable or not and the shift on which it is manufactured; and the amount of electricity used in a month by a homeowner and the mean daily high temperature in the region for the month.

LO4-1 Construct and interpret a dot plot.

DOT PLOTS Recall for the Applewood Auto Group data, we summarized the profi t earned on the 180 vehicles sold into eight classes. When we organized the data into the eight cl asses, we lost the exact value of the observations. A dot plot, on the other hand, groups the data as little as possible, and we do not lose the identity of an individual observation. To develop a dot plot, we display a dot for each observation along a horizontal number line indicating the possible values of the data. If there are identical observations or the observations are too close to be shown individually, the dots are “piled” on top of each other. This https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=93...

2 of 4 4/9/2016 9:20 AM Page 95 allows us to see the shape of the distribution, the value about which the data tend to cluster, and the largest and smallest observations. Dot plots are most useful for smaller data sets, whereas histograms tend to be most useful for large data sets. An example will show how to construct and interpre t dot plots.

E X A M P L E The service departments at Tionesta Ford Lincoln Mercury and She ffield Motors Inc., two of the four Applewood Auto Group dealerships, were both open 24 days last month. Li sted below is the number of vehicles serviced last month at the two dealerships. Construc t dot plots and report summary statistics to compare the two dealerships. S O L U T I O N The Minitab system provides a dot plot and outputs the mean, median, ma ximum, and minimum values, and the standard deviation for the number of cars serviced at each dealership over the last 24 working days.

The dot plots, shown in the center of the output, graphically illustra te the distributions for each dealership. The plots show the difference in the location and disper sion of the observations. By looking at the dot plots, we can see that the number of vehicles ser viced at the Sheffield dealership is more widely dispersed and has a larger mean than at the Tiones ta dealership. Several other features of the number of vehicles serviced are: https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=93...

3 of 4 4/9/2016 9:20 AM Page 96 Tionesta serviced the fewest cars in any day, 23.

Sheffield serviced 26 cars during their slowest day, which is 4 cars less than the next lowest day.

Tionesta serviced exactly 32 cars on four different days.

The numbers of cars serviced cluster around 36 for Sheffield and 32 for Tionesta.

From the descriptive statistics, we see Sheffield serviced a mean of 35.83 vehicles per day. Tionesta serviced a mean of 31.292 vehicles per day during the same period. So Sheffield typically services 4.54 more vehicles per day. There is also more dispersion, or variation, in the daily number of vehicles serviced at Sheffield than at Tionesta. How do we know this? T he standard deviation is larger at Sheffield (4.96 vehicles per day) than at Tionesta (4.112 cars per day). https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=93...

4 of 4 4/9/2016 9:20 AM Page 96 PRINTED BY: [email protected]. Printing is for personal, private use only. No pa rt of this book may be reproduced or transmitted without publisher's prior permission. Violators will be pr osecuted. LO4-2 Construct and describe a stem-and-leaf display.

STEM-AND-LEAF DISPLAYS In Chapter 2, we showed how to organize data into a frequency distr ibution so we could summarize the raw data into a meaningful form. The major advantage to organizing the dat a into a frequency distribution is we get a quick visual picture of the shape of the distribution without doing any further calculation. To put it another way, we can see where the data are concentrated and a lso determine whether there are any extremely large or small values. There are two disadvantages, however, to organizing the data into a frequency distribution: (1) we lose the exact identity of each value and (2) we are not sure how the values within each class are distributed. To explain, the following frequency distribut ion shows the number of advertising spots purchased by the 45 members of the Greater Buffalo Automobile Dea lers Association in 2010. We observe that 7 of the 45 dealers purchased at least 90 but less than 100 spots. However, a re the spots purchased within this class clustered about 90, spread evenly throughout the class, or clustered near 99? We cannot tell. One technique used to display quantitative information in a condensed form is the stem-and-leaf display.

An advantage of the stem-and-leaf display over a frequency distribution is we do not lose the identity of each observation. In the above example, we would not know the identity of the values in the 90 up to 100 class. To illustrate the construction of a stem-and-leaf display using the number of advertising spots purchased, suppose the seven observations in the 90 up to 100 class are 96, 94, 93, 94, 95, 96, and 97. The stem value is the leading digit or digits, in this case 9. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right.

The values in the 90 up to 100 class would appear as follows: It is also customary to sort the values within each stem f rom smallest to largest. Thus, the second row of the stem-and-leaf display would appear as follows:

With the stem-and-leaf display, we can quickly observe that two dealers purchased 94 spots and the number of spots purchased ranged from 93 to 97. A stem-and-leaf display is similar to a frequency distribution with more information, that is, the identity of the observations is preserved. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

1 of 7 4/9/2016 9:20 AM Page 97 STEM-AND-LEAF DISPLAY A statistical technique to present a set of data. Each numer ical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other al ong the horizontal axis.

The following example explains the details of developing a stem-and-leaf display. E X A M P L E Listed in Table 4–1 is the number of 30-second radio advertising spot s purchased by each of the 45 members of the Greater Buffalo Automobile Dealers Association last year. Organize the data into a stem-and-leaf display. Around what values do the number of advertising spots tend to cluster? What is the fewest number of spots purchased by a dealer? The largest number purchased? TABLE 4–1 Number of Advertising Spots Purchased by Members of the Greater Buffalo Automobile Dealers Association S O L U T I O N From the data in Table 4–1, we note that the smallest number of spots pur chased is 88. So we will make the first stem value 8. The largest number is 156, so we will have the stem values begin at 8 and continue to 15.

The first number in Table 4–1 is 96, which has a stem value of 9 and a leaf value of 6. Moving across the top row, the second value is 93 and the third is 88. After the fi rst 3 data values are considered, your chart is as follows.

Organizing all the data, the stem-and-leaf chart looks as follows. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

2 of 7 4/9/2016 9:20 AM Page 98 The usual procedure is to sort the leaf values from the smallest to largest. The last line, the row referring to the values in the 150s, would appear as: The final table would appear as follows, where we have sorted all of the leaf values. You can draw several conclusions from the stem-and-leaf display. First, the minimum number of spots purchased is 88 and the maximum is 156. Two dealers purchased less than 90 spots, and three purchased 150 or more. You can observe, for example, that the three dealers who purchased more than 150 spots actually purchased 155, 155, and 156 spots. The concentration of the number of spots is between 110 and 130. There were nine dealers who purchased between 110 and 119 spots and eight who purchased between 120 and 129 spots. We can also tell that within the 120 to 129 group the actual number of spots purchased was spread evenly throughout. That is, two dealers purchased 120 spots , one dealer purchased 124 spots, three dealers purchased 125 spots, and two purchased 127 spots.

We can also generate this information on the Minitab software system. We have named the variable Spots. The Minitab output is below. You can find the Minitab commands tha t will produce this output in Appendix C.

The Minitab solution provides some additional information regarding cumul ative totals. In the column to the left of the stem values are numbers such as 2, 9, 15, and so on. The number 9 indicates there are 9 observations that have occurred before the value of 100. The number 15 indicat es that 15 observations have occurred prior to 110. About halfway down the column the number 9 appear s in parentheses. The parentheses indicate that the middle value or median appears in tha t row and there are nine values in this group. In this case, we describe the middle value as the value below which half of the observations occur.

https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

3 of 7 4/9/2016 9:20 AM Page 99 There are a total of 45 observations, so the middle value, if the data were arranged from smallest to largest, would be the 23rd observation; its value is 118. After the median, the values begin to decline. These values represent the “more than” cumulative totals. There are 21 observat ions of 120 or more, 13 of 130 or more, and so on.

Which is the better choice, a dot plot or a stem-and-leaf chart? This is really a matter of personal choice and convenience. For presenting data, especially wit h a large number of observations, you will find dot plots are more frequently used. You wi ll see dot plots in analytical literature, marketing reports, and occasionally in annual reports. If you are doing a quick analysis for yourself, stem-and-leaf tallies are handy and easy, particularly on a smaller set of data . 4–1 1. The number of employees at each of the 142 Home Depot stores in the Southeast region is shown in the following dot plot.

(a) What are the maximum and minimum numbers of employees per store? https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

4 of 7 4/9/2016 9:20 AM Page 100 (b) How many stores employ 91 people?

(c) Around what values does the number of employees per store tend to cluster?

2. The rate of return for 21 stocks is: Organize this information into a stem-and-leaf display.

(a) How many rates are less than 9.0?

(b) List the rates in the 10.0 up to 11.0 category.

(c) What is the median?

(d) What are the maximum and the minimum rates of return?

E X E R C I S E S For DATA FILE , please visit www.mhhe.com/lind16e 1. Describe the differences between a histogram and a dot plot. Whe n might a dot plot be better than a histogram?

2. Describe the differences between a histogram and a stem-and-leaf display.

3. Consider the following chart. a. What is this chart called?

b. How many observations are in the study?

c. What are the maximum and the minimum values?

d. Around what values do the observations tend to cluster?

4. The following chart reports the number of cell phones sold at Radio Shack for the last 26 days.

a. What are the maximum and the minimum numbers of cell phones sold in a day?

b. What is a typical number of cell phones sold?

5. The first row of a stem-and-leaf chart appears as follows: 62 | 1 3 3 7 9. Assume whole number va lues.

a. What is the “possible range” of the values in this row?

b. How many data values are in this row?

c. List the actual values in this row of data.

6. The third row of a stem-and-leaf chart appears as follows: 21 | 0 1 3 5 7 9. Assume whole number value s.

https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

5 of 7 4/9/2016 9:20 AM a. What is the “possible range” of the values in this row?

b. How many data values are in this row?

c. List the actual values in this row of data.

7. The following stem-and-leaf chart from the Minitab software s hows the number of units produced per day in a factory.

a. How many days were studied?

b. How many observations are in the first class?

c. What are the minimum value and the maximum value?

d. List the actual values in the fourth row.

e. List the actual values in the second row.

f. How many values are less than 70?

g. How many values are 80 or more?

h. What is the median?

i. How many values are between 60 and 89, inclusive?

8. The following stem-and-leaf chart reports the number of movies rent ed per day at Video Connection on the corner of Fourth and Main Streets. https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

6 of 7 4/9/2016 9:20 AM Page 101 a. How many days were studied?

b. How many observations are in the last class?

c. What are the maximum and the minimum values in the entire set of data? https://jigsaw.vitalsource.com/api/v0/books/1259698 246/print?from=96...

7 of 7 4/9/2016 9:20 AM