Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Mathematical View of Our World 1st ed. Parks, Musser, Trimpe, Maurer, and Maurer Chapter 9 Collecting and Interpreting Data Section 9.1 Populations, Samples, and Data • Goals • Study populations and samples • Study data • Quantitative data • Qualitative data • Study bias • Study simple random sampling 9.1 Initial Problem • How can a professor choose 5 students from among 25 volunteers in a fair way? • The solution will be given at the end of the section. Populations and Samples • The entire set of objects being studied is called the population. • A population can consist of: • People or animals • Plants • Inanimate objects • Events • The members of a population are called elements. Populations and Samples, cont’d • Any characteristic of elements of the population is called a variable. • When we collect information from a population element, we say that we measure the variable being studied. • A variable that is naturally numerical is called quantitative. • A variable that is not numerical is called qualitative. Populations and Samples, cont’d • A census measures the variable for every element of the population. • A census is time-consuming and expensive, unless the population is very small. • Instead of dealing with the entire population, a subset, called a sample, is usually selected for study. Example 1 • Suppose you want to determine voter opinion on a ballot measure. You survey potential voters among pedestrians on Main Street during lunch. a) What is the population? b) What is the sample? c) What is the variable being measured? Example 1, cont’d a) Solution: The population consists of all the people who intend to vote on the ballot measure. Example 1, cont’d b) Solution: The sample consists of all the people you interviewed on Main Street who intend to vote on the ballot measure. Example 1, cont’d c) Solution: The variable being measured is the voter’s intent to vote “yes” or “no” on the ballot measure. Data • The measurement information recorded from a sample is called data. • Quantitative data is measurements for a quantitative variable. • Qualitative data is measurements for a qualitative variable. Data, cont’d • Qualitative data with a natural ordering is called ordinal. • For example, a ranking of a pizza on a scale of “Excellent” to “Poor” is ordinal. • Qualitative data without a natural ordering is called nominal. • For example, eye color is nominal. Data, cont’d • The types of data are illustrated below. Example 2 • Suppose you survey potential voters among the people on Main Street during lunch to determine their political affiliation and age, as well as their opinion on the ballot measure. • Classify the variables as quantitative or qualitative. Example 2, cont’d • Solution: • Political affiliation is a qualitative variable. • Age is a quantitative variable. • Opinion on the ballot measure is a qualitative variable. Question: Suppose you survey potential voters among the people on Main Street during lunch to determine their political affiliation and age, as well as their opinion on the ballot measure. Classify the qualitative variables political affiliation and opinion on the ballot measure as ordinal or nominal. a. Both are ordinal. b. Both are nominal. c. Political affiliation is ordinal and opinion is nominal. d. Political affiliation is nominal and opinion is ordinal. Samples, cont’d • Statistical inference is used to make an estimation or prediction for the entire population based on data collected from the sample. • If a sample has characteristics that are typical of the population as a whole, we say it is a representative sample. • A bias is a flaw in the sampling that makes it more likely the sample will not be representative. Common Sources of Bias • Faulty sampling: The sample is not representative. • Faulty questions: The questions are worded to influence the answers. • Faulty interviewing: Interviewers fail to survey the entire sample, misread questions, and/or misinterpret answers. Common Sources of Bias, cont’d • Lack of understanding or knowledge: The person being interviewed does not understand the question or needs more information. • False answers: The person being interviewed intentionally gives incorrect information. Example 3 • Suppose you wish to determine voter opinion regarding eliminating the capital gains tax. You survey potential voters on a street corner near Wall Street in New York City. • Identify a source of bias in this poll. Example 3, cont’d • Solution: One source of bias in choosing the sample is that people who work on Wall Street would benefit from the elimination of the tax and are more likely to favor the elimination than the average voter may be. • This is faulty sampling. Example 4 • Suppose a car manufacturer wants to test the reliability of 1000 alternators. They will test the first 30 from the lot for defects. • Identify any potential sources of bias. Example 4, cont’d • Solution: One source of bias could be that the first 30 alternators are chosen for the sample. It may be that defects are either much more likely at the beginning of a production run or much less likely at the beginning. In either case, the sample would not be representative. • This is potentially faulty sampling. Simple Random Samples • Representative samples are usually chosen randomly. • Given a population and a desired sample size, a simple random sample is any sample chosen in such a way that all samples of the same size are equally likely to be chosen. Simple Random Samples, cont’d • One way to choose a simple random sample is to use a random number generator or table. • A random number generator is a computer or calculator program designed to produce numbers with no apparent pattern. • A random number table is a table produced with a random number generator. • An example of the first few rows of a random number table is shown on the next slide. Random Number Table Example 5 • Choose a simple random sample of size 5 from 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Example 5, cont’d • Solution: Assign numerical labels to the population elements, in any order, as shown below: Example 5, cont’d • Solution, cont’d: Choose a random spot in the table to begin. • In this case, we could choose to start at the top of the third column and to read down, looking at the last 2 digits in each number. This choice is arbitrary. • Numbers that correspond to population labels are recorded, ignoring duplicates, until 5 such numbers have been found. Example 5, cont’d Example 5, cont’d • Solution, cont’d: The numbers located are 01, 06, 10, 11, and 07. • The simple random sample consists of Beatrix, Gaston, Heidi, Kirsten, and Lex. Question: Choose a different simple random sample of size 5 from the 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Question, cont’d Use the first 2 digits of each number, reading across the row starting in row 128 of the random number table. a. Delila, Beatrix, Lex, Kirsten, Jose b. Frank, Jose, Elsie, Delila, Ian c. Charles, Ian, Frank, Beatrix, Gaston d. Jose, Beatrix, Ian, Heidi, Lex Example 6 • Choose a simple random sample of size 8 from the states of the United States of America. Example 6, cont’d • Solution: Numerical labels can be assigned to the population elements in any order. • In this example we choose to order the states by area. • The labels are shown on the next slide. Example 6, cont’d Example 6, cont’d • Solution, cont’d: We randomly choose to start at the top row, left column of the number table and read the last 2 digits of each entry across the row. • The entries are 03918 77195 47772 21870 87122 99445 10041 31795 63857 64569 34893 20429 43537 25368 95237 17707 34280 04755 64301 66836 12201… Example 6, cont’d • Solution, cont’d: • The numbers obtained from the table are 18, 22, 45, 41, 29, 37, 07, 01. • The states selected for the sample are Washington, Florida, Vermont, West Virginia, Arkansas, Kentucky, Nevada, and Alaska. 9.1 Initial Problem Solution • To fairly select 5 students from 25 volunteers, a professor could choose a simple random sample. • Solution: Assign the students labels of 00 through 24 according to some ordering. • Pick a starting place in a random number table and read until 5 students have been selected. Initial Problem Solution, cont’d • Suppose the first 2 digits of each entry in the last column are used. • The first 5 numbers that are 24 or less are 20, 04, 16, 07, and 06. • The students that were assigned these labels are fairly chosen from the 25 volunteers. Section 9.2 Survey Sampling Methods • Goals • Study sampling methods • Independent sampling • Systematic sampling • Quota sampling • Stratified sampling • Cluster sampling 9.2 Initial Problem • You need to interview at least 800 people nationwide. • You need a different interviewer for each county. • Each interviewer costs $50 plus $10 per interview. • Your budget is $15,000. • Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in randomly-selected counties? • The solution will be given at the end of the section. Sample Survey Design • Simple random sampling can be expensive and time-consuming in practice. • Statisticians have developed sample survey design to provide less expensive alternatives to simple random sampling. Independent Sampling • In independent sampling, each member of the population has the same fixed chance of being selected for the sample. • The size of the sample is not fixed ahead of time. • For example, in a 50% independent sample, each element of the population has a 50% chance of being selected. Example 1 • Find a 50% independent sample of the 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Example 1, cont’d • Solution: Because a random number table contains 10 digits, there is a 50% chance that one of the five digits 0, 1, 2, 3, or 4 will occur. • Let the digits 0, 1, 2, 3, or 4 represent “select this contestant” and let the remaining digits represent “do not select this contestant”. Example 1, cont’d • Solution, cont’d: We randomly choose column 6 in the random number table and look at the first 12 digits: 99445 20429 04. • The first 9 indicates that Astoria is not selected. • The second 9 indicates that Beatrix is not selected. • The 4 represents that Charles is selected, and so on… • The 50% independent sample is Charles, Delila, Frank, Gaston, Heidi, Ian, Kirsten, and Lex. Question: Choose a 40% independent sample from the 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Use the first 12 digits of row 145 of the random number table and use digits 0, 1, 2, 3 for selection. Question, cont’d Use the first 12 digits of row 145 of the random number table and use digits 0, 1, 2, 3 for selection. a. Astoria, Beatrix, Charles, Delila b. Charles, Elsie, Frank, Gaston c. Charles, Elsie, Frank, Gaston, Heidi, Jose, Kirsten, Lex. d. Beatrix, Charles, Delila, Frank, Heidi, Ian, Lex. Example 2 • Find a 10% independent sample of the 100 automobiles produced in one day at a factory. Example 2, cont’d • Solution: Choose some ordering for the 100 automobiles. • There is a 10% chance that the digit 0 will occur, so let the digit 0 represent “select this automobile” and let the other 9 digits represent “do not select this automobile”. Example 2, cont’d • Solution, cont’d: We randomly start in the first column, first row of the random number table and read from left to right. • In the first 100 digits we read in the table, a 0 occurs in the positions 1, 7, 8, 19, 33, 39, 62, 70, 73, 81, 88, 93, 95, 98, and 100. Example 2, cont’d • Solution, cont’d: The automobiles that are selected are highlighted. Systematic Sampling • In systematic sampling, we decide ahead of time what proportion of the population we wish to sample. • For a 1-in-k systematic sample: • List the population elements in some order. • Randomly choose a number, r, from 1 to k. • The elements selected are those labeled r, r + k, r + 2k, r + 3k, … Example 3 • Use systematic sampling to select a 1in-10 systematic sample of the 100 automobiles produced in one day at a factory. Example 3, cont’d • Solution: List the automobiles in some order. • Suppose we randomly choose r = 5. • Since r = 5 and k = 10, the automobiles selected for the sample are those labeled 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95. Example 3, cont’d • Solution, cont’d: The automobiles that are selected are highlighted. Example 3, cont’d • A systematic sample is easier to choose than an independent sample. • However, the regularity in the selection of a systematic sample can sometimes be a source of bias. Question: Choose a 1-in-3 systematic sample from the 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Use the randomly chosen value of r = 2 a. Beatrix, Elsie, Heidi, Kirsten b. Astoria, Delila, Gaston, Jose c. Charles, Frank, Ian, Lex d. Astoria, Charles, Elsie, Gaston, Ian, Kirsten Quota Sampling • In quota sampling, the sample is chosen to be representative for known important variables. • Quotas may be set for age groups, genders, ethnicities, occupations, and so on. • There is no way to know ahead of time which variables are important enough to require quotas. • Quota sampling is not always reliable. Stratified Sampling • In stratified sampling, the population is subdivided into 2 or more nonoverlapping subsets, each of which is called a stratum. Stratified Sampling, cont’d • A stratified random sample is obtained by selecting a simple random sample from each stratum. • A stratified sample can be less costly because the strata allow a smaller sample to be used. Example 4 • Select a stratified random sample of 10 men and 10 women from a population of 200. • Suppose there are equal numbers of men and women in the population. • Use the first 2 digits of the 2nd and 3rd columns of the random number table for selecting men and women, respectively. Example 4, cont’d • Solution: The 2 strata are men and women. • Choose a simple random sample from the men. • Number the 100 men with labels 00 through 99. • The 10 men chosen from the random number table are those with labels 77, 31, 25, 66, 49, 38, 00, 95, 24, and 57. Example 4, cont’d • Solution, cont’d: Choose a simple random sample from the women. • Number the 100 women with labels 00 through 99. • The 10 women chosen from the random number table are those with labels 47, 63, 95, 12, 49, 37, 48, 94, 35, and 78. Example 4, cont’d • Solution, cont’d: The stratified random sample is represented below. Question: Suppose the 12 semifinalists can be divided into 2 strata as follows. Junior division: Astoria, Charles, Delila, Gaston, Heidi, Lex Senior division: Beatrix, Elsie, Frank, Ian, Jose, Kirsten Choose a stratified random sample so the sample contains 2 members of each stratum. Label the members of each stratum 01 through 06. For the junior division use the first 2 digits of column 3, starting at the top and reading down. For the senior division use the last 2 digits of column 3, starting at the top and reading down. Question, cont’d Choose a stratified random sample so the sample contains 2 members of each stratum. a. Frank, Beatrix, Astoria, Charles b. Gaston, Lex, Ian, Elsie c. Heidi, Astoria, Jose, Beatrix d. Lex, Charles, Beatrix, Frank Cluster Sampling • In cluster sampling, the population is divided into nonoverlapping subsets called sampling units or clusters. • Clusters may vary in size. • A frame is a complete list of the sampling units. • A sample is a collection of sampling units selected from the frame. Cluster Sampling, cont’d • In cluster sampling, a simple random sample determines the clusters to be included in the sample. Example 5 • Select a cluster sample of 12 individuals from a population of 96 people who all live in four-person suites. • Use the first 2 digits of the 4th column of the random number table. Example 5, cont’d • Solution: The clusters will be the 24 suites. • Label the suites 01 through 24. • We need a simple random sample of 3 of these suites to obtain a cluster sample of 12 people. Example 5, cont’d • Solution, cont’d: The people in suites 21, 17, and 10 are selected. Sampling Summary 9.2 Initial Problem Solution • You need to interview at least 800 people nationwide. • You need a different interviewer for each county. • Each interviewer costs $50 plus $10 per interview. • Your budget is $15,000. • Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in randomly-selected counties? Initial Problem Solution, cont’d • A simple random sample is unbiased, so this might seem to be the best choice. • However, there are 3130 counties in the U.S. • If, for example, you get people in your sample from only 400 of the counties, it would cost you 400($50) + 800($10) = $28,000. • You cannot afford to choose a simple random sample. Initial Problem Solution, cont’d • The second type of sample is a much less expensive choice. • You must pay 800($10) = $8000 for the interviews, which leaves $7000 for hiring interviewers. • You can select a simple random sample of up to 140 counties. • Then select a simple random sample of people from each selected county, for a total of 800 people. Section 9.3 Central Tendency and Variability • Goals • Study measures of central tendency • Mean • Median • Mode • Study measures of dispersion • Range • Quartiles • Standard deviation 9.3 Initial Problem • Which stockbroker should you choose if you want to minimize risk while maintaining a steady rate of growth? • One stockbroker’s recommendations had percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%. • The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%. • The solution will be given at the end of the section. Measures of Central Tendency • Statistics that tell us about the location of values in a data set are called measures of location. • The most important measures of location, called measures of central tendency, tell us where the center of the data set lies. • The most important measures of central tendency are mean, median, and mode. The Mean • The mean is the most common type of average. • This is an arithmetic mean. • If there are N numbers in a data set, the mean is: x1 x2 xN N The Mean, cont’d • The mean of a sample is denoted by x , which is read “x-bar”. • The mean of a population is denoted by μ, the Greek letter pronounced “mew”. Example 1 • Find the mean of each data set. a) 1, 1, 2, 2, 3 b)1, 1, 2, 2, 11 c)1, 1, 2, 2, 47 • Solution: 11 2 2 3 9 4 1 a)The mean is 5 5 5 Example 1, cont’d • Solution, cont’d: 1 1 2 2 11 17 2 b) The mean is 3 5 5 5 c) The mean is 1 1 2 2 47 53 10 3 5 5 5 Example 2 • A college graduate reads that a company with 5 employees has a mean salary of $48,000. • How might this be misleading? Example 2, cont’d • Solution: One possibility is that every employee earns a salary of $48,000. • 48000 48000 48000 48000 48000 240000 $48, 000 5 5 • Another possibility is that the owner makes $120,000, while the other 4 employees each earn $30,000. 120000 30000 30000 30000 30000 240000 • $48, 000 5 5 Example 2, cont’d • There are also other possible situations, but these two are enough to show that the salary the graduate could expect to earn can vary widely based only on knowing the mean salary. Question: Find the mean of the data set: 19, 27, 83, 94. Round to 2 decimal places. a. 54.33 b. 55.75 c. 44.60 d. 56.50 The Median • The median is the “middle number” of a data set when the values are arranged from smallest to largest. • If there are an odd number of data points, the data point exactly in the middle of the list is the median. • If there are an even number of data points, the mean of the two data points in the middle of the list is the median. Example 3 • Find the mean and median of each data set. a) 0, 2, 4 b) 0, 2, 4, 10 c) 0, 2, 4, 10, 1000 Example 3, cont’d a) Solution for 0, 2, 4 • The median is 2. • The mean is: 024 6 2 3 3 Example 3, cont’d b) Solution: for 0, 2, 4, 10 2 4 6 • The median is: 3 2 2 • The mean is: 0 2 4 10 16 4 4 4 Example 3, cont’d c) Solution: for 0, 2, 4, 10, 1000 • The median is 4. • The mean is: 0 2 4 10 1000 1016 203.2 5 5 Example 3, cont’d • One very large or very small data value can change the mean dramatically. • Large or small data values do not have much of an effect on the median. Example 4 • Find the median salary for the 2 situations. a) Five employees each earn $48,000. b) Four employees earn $30,000 and one earns $120,000. Example 4, cont’d • Solution: a) The median salary is $48,000. • The median is the same as the mean. b) The median salary is $30,000. • In this case the median more accurately shows the typical salary than does the mean of $48,000. Question: Find the median of the data set: 19, 27, 83, 94. Round to 2 decimal places. a. 27.00 b. 83.00 c. 56.50 d. 55.00 Symmetric Distributions • If the mean and median of a data set are equal, the data distribution is called symmetric. • An example of a symmetric data set is shown below. Skewed Distributions • A distribution is skewed left if the mean is less than the median. • A distribution is skewed right if the mean is greater than the median. The Mode • The mode is the most commonlyoccurring value in a data set. • A data set may have: • No mode. • One mode. • Multiple modes. Example 5 • Find the mode(s) of the following set of test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96. • Solution: The value 87 occurs more times than any other score. The mode is 87. Example 5, cont’d The Weighted Mean • A weighted mean is calculated when different data points have different levels of importance, called weights. • If the numbers in a data set, x1 , x2 , , xN , have weights w1 , w2 , then the weighted mean is: w1 x1 w2 x2 wN xN w1 w2 wN , wN Example 6 • Suppose your grades one semester are: • An A in a 5-credit course • A B in a 4-credit course • A C in two 3-credit courses • What is your GPA that semester? Example 6, cont’d • Solution: A grade of A is worth 4 points, a B 3 points, and a C 2 points. • The weights are the number of credits. • Your GPA is the weighted mean of your grades: 4(5) 3(4) 2(3) 2(3) 2.93 5 433 Example 7 • Determine the per capita income for the group of nations listed in the table. Example 7, cont’d • Solution: The populations of the countries are the weights. • The per capita income of the entire group is the weighted mean: 24.2 • The per capita income for the group of countries in 2002 was about $24,200. Measures of Variability • The measures of central tendency describe only part of the behavior of a data set. • Statistics that tell us how the data varies from its center are called measures of variability or measures of spread. • The measures of variability studied here are: • Range • Quartiles • Standard deviation The Range • The range of a data set is the difference between the largest data value and the smallest data value. Example 8 • Compute the mean and the range for each data set. a) 3, 4, 5, 6, 7, 8 b) 0, 2, 5, 7, 8, 11 Example 8, cont’d • Solution: a) 3, 4, 5, 6, 7, 8 • The mean is 5.5. • The range is 8 – 3 = 5. b) 0, 2, 5, 7, 8, 11 • • The mean is 5.5. • The range is 11 – 0 = 11. The two data sets have the same mean, but the difference in ranges shows that the second data set is more spread out. Example 9 • Compute the range for each data set. a) 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 9 b) 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5, 6, 9, 5, 0 Example 9, cont’d • Solution: a) 0, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 9 • The range is 9 – 0 = 9. b) 0, 8, 9, 6, 1, 4, 6, 0, 1, 5, 3, 0, 9, 8, 0, 5, 6, 9, 5, 0 • The range is 9 – 0 = 9. Example 9, cont’d • Solution, cont’d: The two data sets have the same range, but the graphs show that one data set varies more than the other. Quartiles • Quartiles are measures of location that divide a data set approximately into fourths. • The quartiles are labeled as the • first quartile, q1 • second quartile, q2 • The second quartile is the same as the median. • third quartile, q3 Quartiles, cont’d • To find the quartiles, arrange the data values in order from smallest to largest. 1) Find the median. This is also the second quartile. 2) If the number of data points is even, go to Step 3. If the number of data point is odd, remove the median from the list before going to Step 3. Quartiles, cont’d 3) Divide the remaining data points into a lower half and an upper half. 4) The first quartile, q1, is the median of the lower half of the data. 5) The third quartile, q3, is the median of the upper half of the data. Quartiles, cont’d • The interquartile range, IQR, is the difference between the first and third quartiles. • IQR = q3 - q1 • The IQR is a measure of variability. • About half of the data points lie within the IQR Example 10 • Find the median, the first and third quartiles, and the interquartile range for the test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96. Example 10, cont’d • Solution: 76 80 • The median is m 78 2 • Since there is an even number of data points, we do not remove the median from the list. • The first quartile is the median of the lower half of the list: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76. 67 70 68.5 • The first quartile is q1 2 Example 10, cont’d • Solution, cont’d: • The third quartile is the median of the upper half of the list: 80, 81, 84, 87, 87, 87, 89, 93, 95, 96. 87 87 87 • The third quartile is q3 2 • The IQR is 87 – 68.5 = 18.5 The Five-Number Summary • The five-number summary of a data set is a list of 5 informative numbers related to that set: • The smallest value, s • The first quartile, q1 • The median, m • The third quartile, q3 • The largest value, L • The numbers are always written in this order. Example 11 • Consider the set of test scores from the previous example: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96. • The five-number summary for this data set is 26, 68.5, 78, 87, 96. Question: Find the 5 number summary of the data set: 19, 27, 83, 94. a. 19, 27, 55, 83, 94 b. 19, 23, 55, 85.5, 94 c. 19, 23, 55, 88.5, 94 d. 19, 27, 55, 85.5, 94 Box-and-Whisker Plot • The box-and-whisker plot, also called a box plot, is a graphical representation of the fivenumber summary of a data set. • The box (rectangle) represents the IQR. • The location of the median is marked within the box. • The whiskers (lines) represent the lower and upper 25% of the data. Box-and-Whisker Plot, cont’d Example 12 • The list of test scores from the previous example had a five-number summary of 26, 68.5, 78, 87, 96. • The box-and-whisker plot for this data set is shown below. Example 13 • The monthly rainfall for 2 cities is shown below. • Use box-and-whisker plots to compare the rainfall amounts. Example 13, cont’d • Solution: In St. Louis, MO, the rainfalls were: 2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 3.74, 4.10, 4.12. • The median is 3.08. • The first quartile is 2.475. • The third quartile is 3.515. • The five-number summary for St. Louis is 2.21, 2.475, 3.08, 3.515, 4.12. Example 13, cont’d • Solution, cont’d: In Portland, OR, the rainfalls were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, 3.61, 3.93, 5.17, 6.14, 6.16. • The median is 2.68. • The first quartile is 1.54. • The third quartile is 4.55. • The five-number summary for Portland is 0.46, 1.54, 2.68, 4.55, 6.16. Example 13, cont’d • Solution, cont’d: The 2 box-and-whisker plots are shown above. • Note that the amount of rainfall in Portland, OR, varies much more from month-to-month than it does in St. Louis, MO. Standard Deviation • The standard deviation is a widely-used measure of variability. • Calculating the standard deviation requires several intermediate steps, which will be illustrated using the data set of incomes shown below. Deviation From The Mean • The difference between a data point and the mean of the data set is called the deviation from the mean of that data point. Deviation From The Mean, cont’d • The mean income is $35,800. Variance • The variance is a type of average of all the deviations from the mean. • Variance is calculated differently for data from a sample or from the entire population. • Sample variance, s2: Divide the sum of all the squared deviations from the mean by n – 1. • Population variance, σ2: Divide the sum of all the squared deviations from the mean by n. Sample Variance • The variance of the incomes is calculated by first squaring all the deviations. Sample Variance, cont’d • The squared deviations are added and then divided by n – 1 = 9 – 1 = 8. • 2, 465,560, 000 308,195, 000 8 Standard Deviation • Standard deviation is the square root of the variance. • Taking the square root allows the standard deviation to have the same units as the original data values. • Because it is related to variance, the standard deviation formula also distinguishes between samples and the population. Standard Deviation, cont’d • Sample standard deviation is s s 2 • Population standard deviation is 2 • The standard deviation of the incomes is: s s 308,195, 000 $17,555.00 2 Example 14 • Find the sample standard deviation of the weights (in pounds) in the 2 data sets. • Turkeys: 17, 18, 19, 20, 21 • Dogs: 13, 16, 19, 22, 25 Example 14, cont’d • Solution: • The sample mean for the turkeys is 19 pounds. • The sample mean for the dogs is also 19 pounds. • We note that although the means are the same, the standard deviations should reflect the amount of variability in the data values. Example 14, cont’d • Solution, cont’d: The deviations from the mean for the turkey weights are found. Example 14, cont’d • Solution, cont’d: 10 10 • The sample variance s 2.5 of the turkey weights 5 1 4 2 is 2.5 square pounds. s • The sample standard 2 s 2.5 1.58 deviation of the turkey weights is 1.58 pounds. Example 14, cont’d • Solution, cont’d: The deviations from the mean for the dog weights are found. Example 14, cont’d • Solution, cont’d: • The sample variance 90 90 2 s 22.5 of the dog weights is 5 1 4 22.5 square pounds. s • The sample standard deviation of the dog 2 s 22.5 4.74 weights is 4.74 pounds. Example 14, cont’d • Solution, cont’d: The standard deviation of the sample of dog weights is larger than the standard deviation of the sample of turkey weights because there was a much wider spread among the dog weights. Question: Find the sample standard deviation of the data set: 19, 27, 83, 94. Round to the nearest hundredth. a. 4382.75 b. 38.22 c. 66.20 d. 1460.92 9.3 Initial Problem Solution • Which stockbroker should you choose if you want to minimize risk while maintaining a steady rate of growth? • One stockbroker’s recommendations had percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%. • The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%. Initial Problem Solution, cont’d • First you could calculate the mean rate of return for each stockbroker. • Both stockbrokers have a mean rate of return of 13%. • Since the average growth rates are the same, you can measure the variability to determine which stockbroker’s recommendations have the least variability. Initial Problem Solution, cont’d • First stockbroker: Initial Problem Solution, cont’d • Second stockbroker: Initial Problem Solution, cont’d • The standard deviation of the second portfolio is much smaller than the standard deviation of the first stock portfolio. • Since the growth rates were the same, the second stockbroker should be chosen in order to minimize risk.