* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Statistical Inference and Confidence Intervals
Survey
Document related concepts
Transcript
Chapter 10 Describing Data Distributions McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Categorical Data Print Output Frequency Table Occupational Status: Category Code Professional Mgr., Executive Admin., Clerical Engr., Technical Sales, Marketing Craft, Trade Semi-Skilled Missing Data Total 1 2 3 4 5 6 7 0 Freq. 37 62 69 16 30 22 27 5 268 Pct. Adj. Cum. 13.8 23.1 25.7 6.0 11.2 8.2 10.1 1.9 14.1 14.1 23.6 37.6 26.2 63.9 6.1 70.0 11.4 81.4 8.4 89.7 10.3 100.0 Missing 100.0 100.0 Modal and Median Category 10-2 Frequency and Percentage Distributions Report Format Age Number Percent Over 50 94 22.4 36 to 50 188 45.4 18 to 35 132 31.9 10-3 Bar Chart With Frequency Labels Over 50 94 36 to 50 188 18 to 35 132 0 50 100 Number 150 200 10-4 Vertical Bar Chart With Percentage Labels 60% 45.4% 50% 40% 30% 31.9% 22.7% 20% 10% 0% Over 50 36 to 50 18 to 35 10-5 Pie Chart With Percentage Labels 31.9% 22.7% 18 to 35 Over 50 36 to 50 45.4% 10-6 Descriptive Statistical Tools Scale Average Spread Shape Nominal Mode Ordinal Mode Median Interquartile Range Data Range Minimum, Maximum Interval & Ratio Mode Mode Median Standard Deviation Skewness Interquartile Range Kurtosis Data Range Maximum & Minimum 10-7 Choosing an Average • Mean • The sum divided by the number • Inappropriate for highly skewed distributions • Overly sensitive to extreme values • Median • Middle value when arrayed from low to high • Unaffected by asymmetry or extreme values • Mode • Peak of a continuous distribution • Category with the highest frequency • Only legitimate average for nominal data 10-8 Measures of Central Tendency Mode Mean Median 10-9 Spread and Standard Deviation • Standard Deviation • Root mean squared deviation from the mean • Special properties that make it very useful • Normal Distributions • 68% of data are within ± 1 S.D. of the mean • 95% of data are within ± 2 S.D. of the mean • 99% of data are within ± 3 S.D. of the mean 10-10 Spread and Standard Deviation 99% w/i ± 3 S.D. 95% w/i ± 2 S.D. 68% w/i ± 1 S.D. Mean 10-11 Zero Skewness Indicates Symmetry Mean Median Mode 10-12 Positive Skewness Leans Left Mode Mean Median 10-13 Negative Skewness Leans Right Mean Mode Median 10-14 Zero Kurtosis Indicates Normality Mean Median Mode 10-15 Negative Kurtosis: A Low Peak and High Tails Mean Median Mode 10-16 Positive Kurtosis: A High Peak and Low Tails Mean Median Mode 10-17 Mean, Median, and Mode Frequency and Percentage Table Code Freq. 1 5 2 10 3 15 4 40 5 30 Total 100 Pct. Adj. Cum. 5.0 10.0 15.0 40.0 30.0 100.0 5.0 10.0 15.0 40.0 30.0 100.0 5.0 15.0 30.0 70.0 100.0 Statistics Mean 3.800 Skewness -0.887 Median 4.000 Kurtosis 0.092 Mode 4.000 Std. dev. 1.128 Number100 Std. err. 0.113 Bar Chart 5 26% 4 42% 3 16% 2 11% 1 5% 0% 10% 20% 30% 40% 50% 50 40 30 20 10 0 Line Plot 1 2 3 4 5 10-18 Averages and Outliers • This bar chart appears at a glance to show a symmetrical distribution. In fact, there is radical asymmetry resulting from 5 outliers with values of 50. One Two Mean 5.66 Median 4 Mode 4 Three Four Five Six Fifty 0 5 10 15 20 25 30 10-19 Outliers Correctly Shown • This more clear representation of the distribution makes the radical asymmetry very obvious. 30 25 20 15 10 5 0 1 6 11 16 21 26 31 36 41 46 51 10-20 Standard Normal Distribution 2.5% 13.5% 34% 34% 13.5% 2.5% Mean Normal Amount of Data to the Left and Right of the Mean 10-21 Positively Skewed Distribution 0.0% 9.5% 47% 33% 7.5% 0.3% Mean More Data to the Left than to the Right of the Mean 10-22 Distribution with Positive Kurtosis 1.5% 8.0% 40.5% 40.5% 8.0% 1.5% Mean More Toward the Center than in the Tails of the Distribution 10-23 Distribution with Negative Kurtosis 4% 17% 29% 29% 17% 4% Mean More Toward the Center than in the Tails of the Distribution 10-24 Statistical Inference and Confidence Intervals • Objective • To make inferences about the population based on the sample • Sample Statistics • Used as estimates of the population parameters • Estimates Are Imperfect • The probability of error can be determined • Confidence Interval • The range within which the parameter is likely to be from the sample mean at a given probability 10-25 Statistical Inference and Confidence Intervals • Sampling Distribution of Means • The distribution that would result if samples of a given size were taken again and again and the mean of each sample were plotted. • Standard Error of the Estimate • The standard deviation of the theoretical sampling distribution of means. • Confidence Interval Probabilities • 68% chance the parameter is within ± 1 S.E • 95% chance the parameter is within ± 2 S.E. • 99% chance the parameter is within ± 3 S.E. 10-26 Confidence Interval Diagram • Mean = 50 • Standard Error = 5 99% C.I. 95% C.I. 20 30 40 50 60 70 80 10-27 End of Chapter 10 McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.