Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Copyright © 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide 13-1 Chapter 13 Statistics Copyright © 2005 Pearson Education, Inc. 13.1 Sampling Techniques Copyright © 2005 Pearson Education, Inc. Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information (data) obtained in an experiment. Statistics are divided into two main braches. Descriptive statistics is concerned with the collection, organization, and analysis of data. Inferential statistics is concerned with the making of generalizations or predictions of the data collected. Copyright © 2005 Pearson Education, Inc. Slide 13-4 Statisticians A statistician’s interest lies in drawing conclusions about possible outcomes through observations of only a few particular events. The population consists of all items or people of interest. The sample includes some of the items in the population. When a statistician draws a conclusion from a sample, there is always the possibility that the conclusion is incorrect. Copyright © 2005 Pearson Education, Inc. Slide 13-5 Types of Sampling A random sampling occurs if a sample is drawn in such a way that each time an item is selected, each item has an equal chance of being drawn. When a sample is obtained by drawing every nth item on a list or production line, the sample is a systematic sample. A cluster sample is referred to as an area sample because it is applied on a geographical basis. Copyright © 2005 Pearson Education, Inc. Slide 13-6 Types of Sampling continued Stratified sampling involves dividing the population by characteristics such as gender, race, religion, or income. Convenience sampling uses data that is easily obtained and can be extremely biased. Copyright © 2005 Pearson Education, Inc. Slide 13-7 Example: Identifying Sampling Techniques A raffle ticket is drawn by a blindfolded person at a festival to win a grand prize. Students at an elementary are classified according to their present grade level. Then, a random sample of three students from each grade are chosen to represent their class. Every sixth car on highway is stopped for a vehicle inspection. Copyright © 2005 Pearson Education, Inc. Slide 13-8 Example: Identifying Sampling Techniques continued Voters are classified based on their polling location. A random sample of four polling locations are selected. All the voters from the precinct are included in the sample. The first 20 people entering a water park are asked if they are wearing sunscreen. Solution: a) Random d) Cluster b) Stratified e) Convenience c) Systematic Copyright © 2005 Pearson Education, Inc. Slide 13-9 13.2 The Misuses of Statistics Copyright © 2005 Pearson Education, Inc. Misuses of Statistics Many individuals, businesses, and advertising firms misuse statistics to their own advantage. When examining statistical information consider the following: Was the sample used to gather the statistical data unbiased and of sufficient size? Is the statistical statement ambiguous, could it be interpreted in more than one way? Copyright © 2005 Pearson Education, Inc. Slide 13-11 Example: Misleading Statistics An advertisement says, “Fly Speedway Airlines and Save 20%”. Here there is not enough information given. The “Save 20%” could be off the original ticket price, the ticket price when you buy two tickets or of another airline’s ticket price. Copyright © 2005 Pearson Education, Inc. A helped wanted ad read,” Salesperson wanted for Ryan’s Furniture Store. Average Salary: $32,000.” The word “average” can be very misleading. If most of the salespeople earn $20,000 to $25,000 and the owner earns $76,000, this “average salary” is not a fair representation. Slide 13-12 Charts and Graphs Charts and graphs can also be misleading. Even though the data is displayed correctly, adjusting the vertical scale of a graph can give a different impression. A circle graph can be misleading if the sum of the parts of the graphs do not add up to 100%. Copyright © 2005 Pearson Education, Inc. Slide 13-13 Example: Misleading Graphs While each graph presents identical information, the vertical scales have been altered. Sales Dollars (in thousands) Dollars (in thousands) Sales 175 150 125 100 75 50 25 0 99 00 01 02 Years Copyright © 2005 Pearson Education, Inc. 03 04 500 400 300 200 100 99 00 01 02 03 04 Years Slide 13-14 13.3 Frequency Distributions Copyright © 2005 Pearson Education, Inc. Example The number of pets per family is recorded for 30 families surveyed. Construct a frequency distribution of the following data: 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 4 4 Copyright © 2005 Pearson Education, Inc. Slide 13-16 Solution Number of Pets Frequency 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 0 6 1 10 2 2 2 2 2 2 2 8 3 3 3 3 4 4 3 4 4 2 Copyright © 2005 Pearson Education, Inc. Slide 13-17 Rules for Data Grouped by Classes The classes should be of the same “width.” The classes should not overlap. Each piece of data should belong to only one class. Copyright © 2005 Pearson Education, Inc. Slide 13-18 Definitions Classes 04 59 10 14 Lower class limits Upper class limits 15 19 20 24 25 29 Midpoint of a class is found by adding the lower and upper class limits and dividing the sum by 2. Copyright © 2005 Pearson Education, Inc. Slide 13-19 Example The following set of data represents the distance, in miles, 15 randomly selected second grade students live from school. 6.8 5.3 9.7 3.8 8.7 0.5 5.9 0.8 5.7 1.3 4.8 9.6 1.5 7.4 0.2 Construct a frequency distribution with the first class 0 2. Copyright © 2005 Pearson Education, Inc. Slide 13-20 Solution First, rearrange the data from lowest to highest. # of miles from school Frequency 0.2 0.5 0.8 0-2 5 1.3 1.5 3.8 2.1 - 4.1 1 4.8 5.3 5.7 4.2 - 6.2 4 5.9 6.8 7.4 6.3 - 8.3 2 8.4 -10.4 3 8.7 9.6 Copyright © 2005 Pearson Education, Inc. 9.7 15 Slide 13-21 13.4 Statistical Graphs Copyright © 2005 Pearson Education, Inc. Circle Graphs Circle graphs (also known as pie charts) are often used to compare parts of one or more components of the whole to the whole. Copyright © 2005 Pearson Education, Inc. Slide 13-23 Example According to a recent hospital survey of 200 patients the following table indicates how often hospitals used four different kinds of painkillers. Use the information to construct a circle graph illustrating the percent each painkiller was used. Aspirin Ibuprofen 56 104 Acetaminophen 16 Other 24 200 Copyright © 2005 Pearson Education, Inc. Slide 13-24 Solution Determine the measure of the corresponding central angle. Painkiller Number of Patients Percent of Total Measure of Central Angle Aspirin 56 56 200 100 28% 0.28 360 = 100.8 Ibuprofen 104 104 200 100 52% 0.52 360 = 187.2 Acetaminophen 16 16 200 100 8% 0.08 360 = 28.8 Other 24 24 200 100 12% 0.12 360 = 43.2 Total Copyright © 2005 Pearson Education, Inc. 200 100% 360 Slide 13-25 Solution continued Use a protractor to construct a circle graph and label it properly. Hospital Painkiller Use Ibuprofen 52% Aspirin 28% Other 12% Copyright © 2005 Pearson Education, Inc. Acetaminophe n 8% Slide 13-26 Histogram A histogram is a graph with observed values on its horizontal scale and frequencies on it vertical scale. # of pets Frequency Example: Construct a 0 6 histogram of the 1 10 frequency distribution. Copyright © 2005 Pearson Education, Inc. 2 8 3 4 4 2 Slide 13-27 Solution Number of Pets per Family 12 Frequency 10 8 6 4 2 0 0 1 2 3 4 Number of Pets Copyright © 2005 Pearson Education, Inc. # of pets Frequency 0 6 1 10 2 8 3 4 4 2 Slide 13-28 Frequency Polygon Number of Pets per Family 12 Frequency 10 8 6 4 2 0 0 1 2 3 4 Number of Pets Copyright © 2005 Pearson Education, Inc. Slide 13-29 Stem-and-Leaf Display A stem-and-leaf display is a tool that organizes and groups the data while allowing us to see the actual values that make up the data. The left group of digits is called the stem. The right group of digits is called the leaf. Copyright © 2005 Pearson Education, Inc. Slide 13-30 Example The table below indicates the number of miles 20 workers have to drive to work. construct a stem-and-leaf display. 12 18 3 8 12 25 21 3 15 4 17 27 43 21 16 12 26 35 14 9 Copyright © 2005 Pearson Education, Inc. Slide 13-31 Solution Data 12 18 3 8 12 25 21 3 15 4 17 27 43 21 16 12 26 35 14 9 0 33489 1 22245678 2 11567 3 5 4 3 Copyright © 2005 Pearson Education, Inc. Slide 13-32 13.5 Measures of Central Tendency Copyright © 2005 Pearson Education, Inc. Definitions An average is a number that is representative of a group of data. The arithmetic mean, or simply the mean is symbolized by x or by the Greek letter mu, . Copyright © 2005 Pearson Education, Inc. Slide 13-34 Mean The mean, x is the sum of the data divided by the number of pieces of data. The formula for calculating the mean is x x n where x represents the sum of all the data and n represents the number of pieces of data. Copyright © 2005 Pearson Education, Inc. Slide 13-35 Example-find the mean Find the mean amount of money parents spent on new school supplies and clothes if 5 parents randomly surveyed replied as follows: $327 $465 $672 $150 $230 x $327 $465 $672 $150 $230 x n $1844 $368.80 5 Copyright © 2005 Pearson Education, Inc. 5 Slide 13-36 Median The median is the value in the middle of a set of ranked data. Example: Determine the mean of $327 $465 $672 $150 $230. Rank the data from smallest to largest. $150 $230 $327 $465 $672 middle value (median) Copyright © 2005 Pearson Education, Inc. Slide 13-37 Example: Median (even data) Determine the median of the following set of data: 8, 15, 9, 3, 4, 7, 11, 12, 6, 4. Rank the data: 3 4 4 6 7 8 9 11 12 15 There are 10 pieces of data so the median will lie halfway between the two middle pieces the 7 and 8. The median is (7 + 8)/2 = 7.5 3 4 4 6 7 8 9 11 12 15 Copyright © 2005 Pearson Education, Inc. Slide 13-38 Mode The mode is the piece of data that occurs most frequently. Example: Determine the mode of the data set: 3, 4, 4, 6, 7, 8, 9, 11, 12, 15. The mode is 4 since is occurs twice and the other values only occur once. Copyright © 2005 Pearson Education, Inc. Slide 13-39 Midrange The midrange is the value halfway between the lowest (L) and highest (H) values in a set of data. lowest value + highest value Midrange = 2 Example: Find the midrange of the data set $327, $465, $672, $150, $230. $150 + $672 Midrange = $411 2 Copyright © 2005 Pearson Education, Inc. Slide 13-40 Example The weights of eight Labrador retrievers rounded to the nearest pound are 85, 92, 88, 75, 94, 88, 84, and 101. Determine the a) mean b) median c) mode d) midrange e) rank the measures of central tendency from lowest to highest. Copyright © 2005 Pearson Education, Inc. Slide 13-41 Example--dog weights 85, 92, 88, 75, 94, 88, 84, 101 85 92 88 75 94 88 84 101 x 8 707 88.375 8 Mean Median-rank the data 75, 84, 85, 88, 88, 92, 94, 101 The median is 88. Copyright © 2005 Pearson Education, Inc. Slide 13-42 Example--dog weights 85, 92, 88, 75, 94, 88, 84, 101 Mode-the number that occurs most frequently. The mode is 88. Midrange = (L + H)/2 = (75 + 101)/2 = 88 Rank the measures 88.375, 88, 88, 88 Copyright © 2005 Pearson Education, Inc. Slide 13-43 Measures of Position Measures of position are often used to make comparisons. Two measures of position are percentiles and quartiles. Copyright © 2005 Pearson Education, Inc. Slide 13-44 To Find the Quartiles of a Set of Data Order the data from smallest to largest. Find the median, or 2nd quartile, of the set of data. If there are an odd number of pieces of data, the median is the middle value. If there are an even number of pieces of data, the median will be halfway between the two middle pieces of data. Copyright © 2005 Pearson Education, Inc. Slide 13-45 To Find the Quartiles of a Set of Data continued The first quartile, Q1, is the median of the lower half of the data; that is, Q1, is the median of the data less than Q2. The third quartile, Q3, is the median of the upper half of the data; that is, Q3 is the median of the data greater than Q2. Copyright © 2005 Pearson Education, Inc. Slide 13-46 Example: Quartiles The weekly grocery bills for 23 families are as follows. Determine Q1, Q2, and Q3. 170 330 225 75 95 210 80 225 160 172 Copyright © 2005 Pearson Education, Inc. 270 170 215 130 190 270 240 310 74 280 270 50 81 Slide 13-47 Example: Quartiles continued Order the data: 50 75 74 80 81 95 130 160 170 170 172 190 210 215 225 225 240 270 270 270 280 310 330 Q2 is the median of the entire data set which is 190. Q1 is the median of the numbers from 50 to 172 which is 95. Q3 is the median of the numbers from 210 to 330 which is 270. Copyright © 2005 Pearson Education, Inc. Slide 13-48 13.6 Measures of Dispersion Copyright © 2005 Pearson Education, Inc. Measures of Dispersion Measures of dispersion are used to indicate the spread of the data. The range is the difference between the highest and lowest values; it indicates the total spread of the data. Copyright © 2005 Pearson Education, Inc. Slide 13-50 Example: Range Nine different employees were selected and the amount of their salary was recorded. Find the range of the salaries. $24,000 $32,000 $26,500 $56,000 $48,000 $27,000 $28,500 $34,500 $56,750 Range = $56,750 $24,000 = $32,750 Copyright © 2005 Pearson Education, Inc. Slide 13-51 Standard Deviation The standard deviation measures how much the data differ from the mean. s Copyright © 2005 Pearson Education, Inc. x x 2 n 1 Slide 13-52 To Find the Standard Deviation of a Set of Data 1. Find the mean of the set of data. 2. Make a chart having three columns: Data Data Mean (Data Mean)2 3. List the data vertically under the column marked Data. 4. Subtract the mean from each piece of data and place the difference in the Data Mean column. Copyright © 2005 Pearson Education, Inc. Slide 13-53 To Find the Standard Deviation of a Set of Data continued 5. 6. 7. 8. Square the values obtained in the Data Mean column and record these values in the (Data Mean)2 column. Determine the sum of the values in the (Data Mean)2 column. Divide the sum obtained in step 6 by n 1, where n is the number of pieces of data. Determine the square root of the number obtained in step 7. This number is the standard deviation of the set of data. Copyright © 2005 Pearson Education, Inc. Slide 13-54 Example Find the standard deviation of the following prices of selected washing machines: $280, $217, $665, $684, $939, $299 Find the mean. x 665 217 684 280 939 299 3084 x 514 n Copyright © 2005 Pearson Education, Inc. 6 6 Slide 13-55 Example continued, mean = 514 Data Mean (Data Mean)2 217 297 (297)2 = 88,209 280 234 54,756 299 215 46,225 665 151 22,801 684 170 28,900 939 425 180,625 0 421,516 Data Copyright © 2005 Pearson Education, Inc. Slide 13-56 Example continued, mean = 514 s 421,516 6 1 s 421,516 290.35 5 The standard deviation is $290.35. Copyright © 2005 Pearson Education, Inc. Slide 13-57 13.7 The Normal Curve Copyright © 2005 Pearson Education, Inc. Types of Distributions Rectangular Distribution J-shaped distribution Rectangular Distribution Frequency Values Copyright © 2005 Pearson Education, Inc. Slide 13-59 Types of Distributions continued Bimodal Copyright © 2005 Pearson Education, Inc. Skewed to right Slide 13-60 Types of Distributions continued Skewed to left Copyright © 2005 Pearson Education, Inc. Normal Slide 13-61 Normal Distribution In a normal distribution, the mean, median, and mode all have the same value. Z-scores determine how far, in terms of standard deviations, a given score is from the mean of the distribution. value of piece of data mean x z standard deviation Copyright © 2005 Pearson Education, Inc. Slide 13-62 Example: z-scores A normal distribution has a mean of 50 and a standard deviation of 5. Find z-scores for the following values. a) 55 b) 60 c) 43 a) value of piece of data mean z standard deviation 55 50 5 z55 1 5 5 A score of 55 is one standard deviation above the mean. Copyright © 2005 Pearson Education, Inc. Slide 13-63 Example: z-scores continued b) z60 60 50 10 2 5 5 A score of 60 is 2 standard deviations above the mean. c) z43 43 50 7 1.4 5 5 A score of 43 is 1.4 standard deviations below the mean. Copyright © 2005 Pearson Education, Inc. Slide 13-64 To Find the Percent of Data Between any Two Values 1. 2. 3. Draw a diagram of the normal curve, indicating the area or percent to be determined. Use the formula to convert the given values to z-scores. Indicate these zscores on the diagram. Look up the percent that corresponds to each z-score in Table 13. Copyright © 2005 Pearson Education, Inc. Slide 13-65 To Find the Percent of Data Between any Two Values continued 4. a) When finding the percent of data between two z-scores on the opposite side of the mean (when one z-score is positive and the other is negative), you find the sum of the individual percents. b) When finding the percent of data between two z-scores on the same side of the mean (when both z-scores are positive or both are negative), subtract the smaller percent from the larger percent. Copyright © 2005 Pearson Education, Inc. Slide 13-66 To Find the Percent of Data Between any Two Values continued c) When finding the percent of data to the right of a positive z-score or to the left of a negative zscore, subtract the percent of data between ) and z from 50%. d) When finding the percent of data to the left of a positive z-score or to the right of a negative zscore, add the percent of data between 0 and z to 50%. Copyright © 2005 Pearson Education, Inc. Slide 13-67 Example Assume that the waiting times for customers at a popular restaurant before being seated for lunch at a popular restaurant before being seated for lunch are normally distributed with a mean of 12 minutes and a standard deviation of 3 min. a) Find the percent of customers who wait for at least 12 minutes before being seated. b) Find the percent of customers who wait between 9 and 18 minutes before being seated. c) Find the percent of customers who wait at least 17 minutes before being seated. d) Find the percent of customers who wait less than 8 minutes before being seated. Copyright © 2005 Pearson Education, Inc. Slide 13-68 Solution wait for at least 12 minutes Since 12 minutes is the mean, half, or 50% of customers wait at least 12 min before being seated. between 9 and 18 minutes 9 12 z9 1.00 3 18 12 z18 2.00 3 Use table 13.7 page 801. 34.1% + 47.7% = 81.8% Copyright © 2005 Pearson Education, Inc. Slide 13-69 Solution continued at least 17 min 17 12 z17 1.67 3 Use table 13.7 page 801. 45.3% is between the mean and 1.67. 50% 45.3% = 4.7% Thus, 4.7% of customers wait at least 17 minutes. Copyright © 2005 Pearson Education, Inc. less than 8 min 8 12 z8 1.33 3 Use table 13.7 page 801. 40.8% is between the mean and 1.33. 50% 40.8% = 9.2% Thus, 9.2% of customers wait less than 8 minutes. Slide 13-70 13.8 Linear Correlation and Regression Copyright © 2005 Pearson Education, Inc. Linear Correlation Linear correlation is used to determine whether there is a relationship between two quantities and, if so, how strong the relationship is. The linear correlation coefficient, r, is a unitless measure that describes the strength of the linear relationship between two variables. If the value is positive, as one variable increases, the other increases. If the value is negative, as one variable increases, the other decreases. The variable, r, will always be a value between –1 and 1 inclusive. Copyright © 2005 Pearson Education, Inc. Slide 13-72 Scatter Diagrams A visual aid used with correlation is the scatter diagram, a plot of points (bivariate data). The independent variable, x, generally is a quantity that can be controlled. The dependant variable, y, is the other variable. The value of r is a measure of how far a set of points varies from a straight line. The greater the spread, the weaker the correlation and the closer the r value is to 0. Copyright © 2005 Pearson Education, Inc. Slide 13-73 Correlation Copyright © 2005 Pearson Education, Inc. Slide 13-74 Correlation Copyright © 2005 Pearson Education, Inc. Slide 13-75 Linear Correlation Coefficient The formula to calculate the correlation coefficient (r) is as follows: r n xy x y n x Copyright © 2005 Pearson Education, Inc. 2 x 2 n y 2 y 2 Slide 13-76 Example: Words Per Minute versus Mistakes There are five applicants applying for a job as a medical transcriptionist. The following shows the results of the applicants when asked to type a chart. Determine the correlation coefficient between the words per minute typed and the number of mistakes. Applicant Words per Minute Mistakes Ellen 24 8 George 67 11 Phillip 53 12 Kendra 41 10 Nancy 34 9 Copyright © 2005 Pearson Education, Inc. Slide 13-77 Solution We will call the words typed per minute, x, and the mistakes, y. List the values of x and y and calculate the necessary sums. WPM Mistakes x y x2 y2 xy 24 8 576 64 192 67 11 4489 121 737 53 12 2809 144 636 41 10 1681 100 410 34 9 1156 81 306 x = 219 y = 50 Copyright © 2005 Pearson Education, Inc. x2 =10,711 y2 = 510 xy = 2,281 Slide 13-78 Solution continued The n in the formula represents the number of pieces of data. Here n = 5. r r n xy x y n x2 x 2 n y 2 y 2 5 2281 219 50 5 10,711 219 2 5 510 50 2 11,405 10,950 5 10,711 47,961 5 510 2500 455 53,555 47,961 2550 2500 455 0.86 5594 50 Copyright © 2005 Pearson Education, Inc. Slide 13-79 Solution continued Since 0.86 is fairly close to 1, there is a fairly strong positive correlation. This result implies that the more words typed per minute, the more mistakes made. Copyright © 2005 Pearson Education, Inc. Slide 13-80 Linear Regression Linear regression is the process of determining the linear relationship between two variables. The line of best fit (line of regression or the least square line) is the line such that the sum of the vertical distances from the line to the data points is a minimum. Copyright © 2005 Pearson Education, Inc. Slide 13-81 The Line of Best Fit Equation: y mx b, m where n xy x y n x2 x Copyright © 2005 Pearson Education, Inc. 2 , and b y m x n Slide 13-82 Example Use the data in the previous example to find the equation of the line that relates the number of words per minute and the number of mistakes made while typing a chart. Graph the equation of the line of best fit on a scatter diagram that illustrates the set of bivariate points. Copyright © 2005 Pearson Education, Inc. Slide 13-83 Solution From the previous results, we know that m n xy x y n x 2 x 2 5(2,281) (219)(50) 5(10,711) 2192 455 m 5594 m 0.081 m Now we find the y-intercept, b. b b y m x n 50 0.081 219 5 32.261 b 6.452 5 Therefore the line of best fit is y = 0.081x + 6.452 Copyright © 2005 Pearson Education, Inc. Slide 13-84 Solution continued To graph y = 0.081x + 6.452, plot at least two points and draw the graph. Copyright © 2005 Pearson Education, Inc. x y 10 7.262 20 8.072 30 8.882 Slide 13-85 Solution continued Copyright © 2005 Pearson Education, Inc. Slide 13-86