Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Lesson Objectives Learn when each measure of a “typical value” is appropriate. Also called “central tendency” or “location.” Learn when each measure of a “variation” are appropriate. Also called “scatter” or “dispersion.” See how these measures relate to statistical inference, which will covered later in the course. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 1 Statistics is the science of • collecting • organizing • summarizing • interpreting DATA for making decisions. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 2 Organize / Summarize Data Graphical Department of ISM, University of Alabama, 1995-2003 Numerical M07-Numerical Summaries 1 3 Key Features of Data Distributions Shape Typical Value Spread This section covers these two. Outliers Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 4 Measures of Location Give “middle” or “typical” values or “central tendency.” Measures of Variation Describe “spread” or “scatter” or “dispersion” in the data. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 5 Measures of Location 1. Mean the “center of gravity” of the data (histogram). Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 6 formula for mean Sample Mean X= = = Sum of observations divided by sample size S Xi n X1 + X2 + ··· +Xn Department of ISM, University of Alabama, 1995-2003 n M07-Numerical Summaries 1 7 The mean is ________________ to extreme values (outliers). Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 8 2. Median - midpoint of distribution At least half of the observations are at or less than the median, and at least half are at or greater than the median. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 9 Note: For n observations, the median is located at the n+1 -th observation 2 in the ordered sample. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 10 Example 1 Data: 14, 18, 20, 12, 24, 15, 14 (n = 7 “odd”) Median is the middle value of the “ordered” data. At least half the values are at or greater; at least half are at or lower. 7+1 = 4th location of median 2 Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 11 Example 2 median example Data: 14, 18, 20, 12, 24, 15, 14 94 (outlier) (n = 7 “odd”) still the middle value. Median is resistant to outliers. Median is Original, X= with outlier, X = Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 12 Example 3 Data: 14, 18, 20, 12, 24, 15, 14, 214 (n = 8 “even,” outlier) Median is the average of the two middle values. Exactly half the values are greater, half lower. 8+1 = 4.5th location of median 2 Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 13 Summary for finding Median 1. Order the data. 2. For odd n, the median is the center observation. 3. For even n, the median is the average of the two center observations. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 14 3. Mode - most frequently occurring number In a histogram, modal class is the one having largest frequency, i.e., highest bar. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 15 When should each estimator be used? What type of variable is it? If categorical, use the mode. “Average” is meaningless; look at “percentages” of occurrences. If variable is quantitative, first look at a graph: Skewed or outliers? Use median. More or less symmetric? Use mean. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 16 Numerical Summary Location Mean Median Mode Department of ISM, University of Alabama, 1995-2003 Variation Range Std. Deviation IQR M07-Numerical Summaries 1 17 Why does variation matter? Mountain Climbing Rope. Two suppliers; sample and test three ropes from each. “Snap Breaking Strength” Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 18 Measures of Variation 1. Range 2. Variance & Standard Deviation 3. Mean Absolute Deviation (Mad) 4. Interquartile Range (IQR) Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 19 1. Range Highest minus lowest value in the sample. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 20 3, 4, 1, 7, 4, 5 Example 4: Range = Example 5: 1, 1, 1, 7, 7, 7 Range = 1 2 3 4 5 6 7 Department of ISM, University of Alabama, 1995-2003 1 2 3 4 5 6 7 M07-Numerical Summaries 1 21 Range Advantage: _________ _________________. Disadvantage: _______ most of the data. ______________ to outliers. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 22 2. Variance & Standard Deviation How far are the data from the middle, on average? Notation: Sample Variance = s2 Sample Std. Dev. = s Department of ISM, University of Alabama, 1995-2003 Population Variance = s2 Population Std. Dev. = s M07-Numerical Summaries 1 23 Example 4: 1 2 3, 4, 1, 7, 4, 5 3 Department of ISM, University of Alabama, 1995-2003 4 5 6 7 M07-Numerical Summaries 1 24 Note: The average of the deviations from the mean will always be zero. We need to keep the negatives from canceling the positives. We can do this by 1. _____________, 2. _____________, Department of ISM, University of Alabama, 1995-2003 ______ _____ M07-Numerical Summaries 1 25 Equation for Variance: For a population: s2 = For a sample: 2 s = S(Xi - m)2 N S(Xi - (see page 88) 2 X) n-1 Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 26 Equation for Variance: Example 4 data: 2 s = = S(Xi - 2 X) (see page 88) n-1 (3-4)2 + (4-4)2 + (1-4)2 + (7-4)2 + (4-4)2 + (5-4)2 6-1 = = Department of ISM, University of Alabama, 1995-2003 units? M07-Numerical Summaries 1 27 Equations for Variance: 1. 2. 3. 2 s = 2 s = 2 s = S(Xi - 2 X) (see page 88) n-1 2 2 n X Xi - n 2 Xi -1 (see page 90) ( X i) - n n -1 Department of ISM, University of Alabama, 1995-2003 2 M07-Numerical Summaries 1 28 Example 4: 2 X X 9 3 4 16 1 1 7 49 4 16 5 25 24 116 3, 4, 1, 7, 4, 5 Department of ISM, University of Alabama, 1995-2003 SX = 2 SX = M07-Numerical Summaries 1 29 ( Xi ) X n 2 s = n-1 2 2 i 2 s = 6-1 Department of ISM, University of Alabama, 1995-2003 = 4.0 M07-Numerical Summaries 1 30 Comments • Both equations should give the same answer. • First is easier when data and the mean are integers. • Second is easier for larger data sets, or data not integer. • More chance of round-off error with first equation. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 31 Variance Advantage: ________________; ________________. Disadvantages: Units are _________. ____ resistant to outliers. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 32 Standard Deviation S= S 2 = 4.0 = 2.0 “The square root of the variance.” Advantage: Easier to interpret than variance, Units same as data. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 33 3. Mean Absolute Deviation, MAD S xi – m MAD = N S xi – x MAD = n (see page 87) for population data for sample data This will be used extensively in OM 300 Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 34 4. Interquartile Range (IQR) IQR = Q 3 - Q 1 IQR is the range of the middle 50% of the data. Observations more than 1.5 IQR’s beyond quartiles are considered outliers. Department of ISM, University of Alabama, 1995-2003 M07-Numerical Summaries 1 35 Statistical Inference Generalizing from a sample to a population, by using a statistic to estimate a parameter. Department of ISM, University of Alabama, 1995-2003 C07-Numerical Summaries 1 36 Statistic Parameter Mean: X estimates ____ Standard deviation: s estimates ____ Proportion: p estimates ____ from sample Department of ISM, University of Alabama, 1995-2003 from entire population C07-Numerical Summaries 1 37 Statistics Descriptive Graphical Numerical Department of ISM, University of Alabama, 1995-2003 C07-Numerical Summaries 1 38 Example 5: Estimate the true mean net weight of 16 oz. bags of Golden Flake Potato Chips with a 95% confidence interval. Measured Weights in ounces. 16.05 16.01 15.92 15.68 16.10 16.01 15.72 15.80 16.21 15.70 15.95 16.24 16.02 15.90 16.07 16.05 16.18 15.45 16.04 16.05 Department of ISM, University of Alabama, 1995-2003 Is the filling machine doing what it should be doing? C07-Numerical Summaries 1 39 Most commonly used features. Session window Data window name of worksheet file Department of ISM, University of Alabama, 1995-2003 C07-Numerical Summaries 1 40 “Stat” “Basic Statistics ” “Display descriptive statistics” Department of ISM, University of Alabama, 1995-2003 C07-Numerical Summaries 1 41 Department of ISM, University of Alabama, 1995-2003 C07-Numerical Summaries 1 42 “Session Window” results Results for: c07 Weight of chips.MTW Descriptive Statistics: Weights Variable Weights Variable Weights N Mean Median TrMean StDev SE Mean 20 15.958 16.015 15.970 0.199 0.045 Minimum Maximum Q1 Q3 15.450 16.240 15.825 16.065 “Five number” summary Executing from file: C:\Program Files\MTBWIN\MACROS\Describe.MAC Descriptive Statistics Graph: Weights Department of ISM, University of Alabama, 1995-2003 C07-Numerical Summaries 1 43 Histogram with Normal distribution curve superimposed Box plot “95% Confidence Interval” for the population mean. A confidence interval gives the limits of the plausible values of the true population mean, m. Our sample mean was 15.957 oz. This is less than 16.000. Should we be concerned? ____, because 16.000 is a plausible value for the true population mean. “95% Confidence Interval” for the population mean.