Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Statistics: Overview Measures of Center * Mode Median Mean Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance * Standard deviation * Measures of Position Percentile Deviation Score * Z-score * Central tendency • Seeks to provide a single value that best represents a distribution Central tendency 18 16 No. of People 14 12 10 8 6 4 2 0 3.5 4.5 5.5 6.5 7.5 8.5 9.5 Nightly Hours of Sleep 10.5 11.5 Central tendency 16 # of vehicles 14 12 10 8 6 4 2 0 0 1 2 3 # of wheels 4 5 6 Central tendency 40 30 25 20 15 10 5 10 0 12 0 14 0 16 0 18 0 20 0 22 0 24 0 60 80 20 40 0 0 No. of People 35 Income in 1,000s Central tendency • Seeks to provide a single value that best represents a distribution • Typical measures are – mode – median – mean Mode • the most frequently occurring score value • corresponds to the highest point on the frequency distribution The mode = 39 5 4 Frequency For a given sample N=16: 33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 45 3 2 1 0 33 34 35 36 37 38 39 40 41 42 43 44 45 Score Mode • The mode is not sensitive to extreme scores. 5 4 Frequency For a given sample N=16: 33 35 36 37 38 38 38 39 39 39 39 40 40 41 41 50 3 2 1 0 The mode = 39 33 35 37 39 41 43 Score 45 47 49 Mode • a distribution may have more than one mode The modes = 35 and 39 5 4 Frequency For a given sample N=16: 34 34 35 35 35 35 36 37 38 38 39 39 39 39 40 40 3 2 1 0 33 34 35 36 37 Score 38 39 40 Mode • there may be no unique mode, as in the case of a rectangular distribution No unique mode 5 4 Frequency For a given sample N=16: 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 3 2 1 0 33 34 35 36 37 Score 38 39 40 Median • the score value that cuts the distribution in half (the “middle” score) • 50th percentile 5 4 Frequency For N = 15 the median is the eighth score = 37 3 2 1 0 33 34 35 36 37 Score 38 39 40 Median 5 For N = 16 the median is the average of the eighth and ninth scores = 37.5 Frequency 4 3 2 1 0 33 34 35 36 37 Score 38 39 40 Mean • this is what people usually have in mind when they say “average” • the sum of the scores divided by the number of scores For a sample: X X n For a population: X n Changing the value of a single score may not affect the mode or median, but it will affect the mean. Mean 18 __ X=7.07 16 12 10 8 6 4 2 0 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 In many cases the mean is the preferred measure of central tendency, both as a description of the data and as an estimate of the parameter. Nightly Hours of Sleep __ X=2.4 5 In order for the mean to be meaningful, the variable of interest must be measures on an interval scale. Frequency No. of People 14 4 3 2 1 0 Score Mean __ X=36.8 5 4 Frequency 4 3 2 1 3 2 1 0 0 36 37 38 39 33 40 Score 35 36 37 38 39 40 Score 40 __ X=93.2 35 No. of People The mean is sensitive to extreme scores and is appropriate for more symmetrical distributions. 34 30 25 20 15 10 5 0 10 0 12 0 14 0 16 0 18 0 20 0 22 0 24 0 35 60 80 34 0 33 20 40 Frequency __ X=36.5 5 Income in 1,000s Symmetry • a symmetrical distribution exhibits no skewness • in a symmetrical distribution the Mean = Median = Mode 18 16 No. of People 14 12 10 8 6 4 2 0 3.5 4.5 5.5 6.5 7.5 8.5 9.5 Nightly Hours of Sleep 10.5 11.5 Skewed distributions • Skewness refers to the asymmetry of the distribution 40 35 30 25 20 15 10 5 10 0 12 0 14 0 16 0 18 0 20 0 22 0 24 0 60 80 20 40 0 0 Mode = 70,000$ Median = 88,700$ Mean = 93,600$ median No. of People • A positively skewed distribution is asymmetrical and points in the positive direction. Income in 1,000s •mode < median < mean mode mean Skewed distributions • A negatively skewed distribution median • mode > median > mean 7 No. of People 6 5 4 3 2 1 0 0 20 40 60 80 100 Test score mean mode Measures of central tendency + Mode • quick & easy to compute • useful for nominal data • poor sampling stability • not affected by extreme scores • somewhat poor sampling stability • sampling stability • related to variance • inappropriate for discrete data • affected by skewed distributions Median Mean - Distributions • Center: mode, median, mean • Shape: symmetrical, skewed • Spread 16 14 # of People 12 10 8 6 4 2 0 0 10 20 30 40 50 60 Scores 70 80 90 100 Measures of Spread • the dispersion of scores from the center • a distribution of scores is highly variable if the scores differ wildly from one another • Three statistics to measure variability – range – interquartile range – variance Range • largest score minus the smallest score 16 14 12 # of People • these two have same range (80) but spreads look different 10 8 6 4 2 0 0 10 20 30 40 50 60 70 80 Scores • says nothing about how scores vary around the center • greatly affected by extreme scores (defined by them) 90 100 Interquartile range • the distance between the 25th percentile and the 75th 16 percentile 14 • Q3-Q1 = 70 - 30 = 40 • Q3-Q1 = 52.5 - 47.5 = 5 # of People 12 10 8 6 4 2 0 0 10 20 30 40 50 60 Scores 70 80 90 100 • effectively ignores the top and bottom quarters, so extreme scores are not influential • dismisses 50% of the distribution Deviation measures • Might be better to see how much scores differ from the center of the distribution -using distance • Scores further from the mean have higher deviation scores Score Deviation Amy 10 -40 Theo 20 -30 Max 30 -20 Henry 40 -10 Leticia 50 0 Charlotte 60 10 Pedro 70 20 Tricia 80 30 Lulu 90 40 AVERAGE 50 Deviation measures • To see how ‘deviant’ the distribution is relative to another, we could sum these scores • But this would leave us with a big fat zero Score Deviation Amy 10 -40 Theo 20 -30 Max 30 -20 Henry 40 -10 Leticia 50 0 Charlotte 60 10 Pedro 70 20 Tricia 80 30 Lulu 90 40 SUM 0 Deviation measures So we use squared deviations from the mean This is the sum of squares (SS) __ SS= ∑(X-X)2 Score Sq. Deviation Deviation Amy 10 -40 1600 Theo 20 -30 900 Max 30 -20 400 Henry 40 -10 100 Leticia 50 0 0 Charlotte 60 10 100 Pedro 70 20 400 Tricia 80 30 900 Lulu 90 40 1600 0 6000 SUM Variance We take the “average” squared deviation from the mean and call it VARIANCE For a population: SS N 2 For a sample: SS s n 1 2 (to correct for the fact that sample variance tends to underestimate pop variance) Variance 1. Find the mean. 2. Subtract the mean from every score. 3. Square the deviations. 4. Sum the squared deviations. 5. Divide the SS by N or N-1. Score Dev’n Amy 10 -40 1600 Theo 20 -30 900 Max 30 -20 400 Henry 40 -10 100 Leticia 50 0 0 Charlotte 60 10 100 Pedro 70 20 400 Tricia 80 30 900 Lulu 90 40 1600 0 6000 SUM Sq. Dev. 6000/8 =750 Standard deviation The standard deviation is the square root of the variance SS s s n 1 2 The standard deviation measures spread in the original units of measurement, while the variance does so in units squared. Variance is good for inferential stats. Standard deviation is nice for descriptive stats. Example 14 12 # of People N = 28 X = 50 s2 = 140.74 s = 11.86 10 8 6 4 2 N = 28 X = 50 s2 = 555.55 s = 23.57 0 0 10 20 30 40 50 60 Scores 70 80 90 100 Descriptive Statistics: Quick Review Measures of Center * Mode Median Mean * Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance * Standard deviation * * * Descriptive Statistics: Quick Review For a population: For a sample: Variance SS N SS s n 1 Standard Deviation Mean 2 2 2 s s2 Exercise 1 2 3 4 5 • Treat this little distribution as a sample and calculate: – Mode, median, mean – Range, variance, standard deviation Descriptive Statistics: Overview Measures of Center * Mode Median Mean Measures of Symmetry Skewness Measures of Spread Range Inter-quartile Range Variance * Standard deviation * Measures of Position Percentile Deviation Score * Z-score * Measures of Position How to describe a data point in relation to its distribution Measures of Position Quantile Deviation Score Z-score Quantiles Quartile Divides ranked scores into four equal parts 25% (minimum) 25% 25% 25% (median) (maximum) Quantiles Decile Divides ranked scores into ten equal parts 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% Quantiles Percentile rank Divides ranked scores into 100 equal parts Percentile rank of score x = number of scores less than x total number of scores • 100 Deviation Scores Score For a population: deviation X For a sample: deviation X X Deviation Amy 10 -40 Theo 20 -30 Max 30 -20 Henry 40 -10 Leticia 50 0 Charlotte 60 10 Pedro 70 20 Tricia 80 30 Lulu 90 40 Average 50 •What if we want to compare scores from distributions that have different means and standard deviations? •Example –Nine students scores on two different tests –Tests scored on different scales Nine Students on Two Tests Test 1 Test 2 Amy 10 1 Theo 20 2 Max 30 3 Henry 40 4 Leticia 50 5 Charlotte 60 6 Pedro 70 7 Tricia 80 8 Lulu 90 9 50 5 Average Nine Students on Two Tests Test 1 Test 2 Deviation Score 1 Amy 10 1 -40 -4 Theo 20 2 -30 -3 Max 30 3 -20 -2 Henry 40 4 -10 -1 Leticia 50 5 0 0 Charlotte 60 6 10 1 Pedro 70 7 20 2 Tricia 80 8 30 3 Lulu 90 9 40 4 50 5 Average Deviation Score 2 Z-Scores • Z-scores modify a distribution so that it is centered on 0 with a standard deviation of 1 • Subtract the mean from a score, then divide by the standard deviation For a population: z X For a sample: z X X S Z-Scores Test 1 Test 2 Z- Score 1 Z-Score 2 Amy 10 1 -1.5 -1.5 Theo 20 2 -1.2 -1.2 Max 30 3 -.77 -.77 Henry 40 4 -.34 -.34 Leticia 50 5 0 0 Charlotte 60 6 .34 .34 Pedro 70 7 .77 .77 Tricia 80 8 1.2 1.2 Lulu 90 9 1.5 1.5 50 5 0 0 25.8 2.58 1 1 Average St Dev Z-Scores A distribution of Z-scores… • Always has a mean of zero • Always has a standard deviation of 1 • Converting to standard or z scores does not change the shape of the distribution: z scores cannot normalize a non-normal distribution A Z-score is interpreted as “number of standard deviations above/below the mean” Exercise On their third test, the class average was 45 and the standard deviation was 6. Fill in the rest. Test 3 Amy 52 Theo 39 Z-Score Max -1.5 Henry 1.3 Descriptive Statistics: Quick Review For a population: For a sample: Variance SS N SS s n 1 Standard Deviation Mean Z-score 2 2 z X 2 s s2 z X X S Messing with Units If you add or subtract a constant from each value in a distribution, then • the mean is increased/decreased by that amount • the standard deviation is unchanged • the z-scores are unchanged If you multiply or divide each value in a distribution by a constant, then • the mean is multiplied/divided by that amount • the standard deviation is multiplied/divided by that amount • the z-scores are unchanged Example Theo Max Henry Leticia Charlotte Pedro Tricia Lulu MEAN Score Dev’s Sq dev Z-score 5 3 5 7 7 8 4 9 6 -1 -3 -1 1 1 2 -2 3 1 9 1 1 1 4 4 9 1.94 -1.5 -.5 .5 .5 1.0 -1.0 1.5 -.5 STDEV Adding 1 Theo Max Henry Leticia Charlotte Pedro Tricia Lulu MEAN Score Dev’s Sq dev Z-score 6 4 6 8 8 9 5 10 7 -1 -3 -1 1 1 2 -2 3 1 9 1 1 1 4 4 9 1.94 -1.5 -.5 .5 .5 1.0 -1.0 1.5 -.5 STDEV Example Theo Max Henry Leticia Charlotte Pedro Tricia Lulu MEAN Score Dev’s Sq dev Z-score 5 3 5 7 7 8 4 9 6 -1 -3 -1 1 1 2 -2 3 1 9 1 1 1 4 4 9 1.94 -1.5 -.5 .5 .5 1.0 -1.0 1.5 -.5 STDEV Multiplying by 10 Theo Max Henry Leticia Charlotte Pedro Tricia Lulu MEAN Score Dev’s Sq dev Z-score 50 30 50 70 70 80 40 90 60 -10 -30 -10 10 10 20 -20 30 100 900 100 100 100 400 400 900 19.4 -1.5 -.5 .5 .5 1.0 -1.0 1.5 -.5 STDEV Other Standardized Distributions The Z distribution is not the only standardized distribution. You can easily create others (it’s just messing with units, really). Other Standardized Distributions Score Example: Let’s change these test scores into ETS type scores (mean 500, stdev 100) Theo 5 Max 3 Henry 5 Leticia 7 Charlotte 7 Pedro 8 Tricia 4 Lulu 9 Average St Dev 6 1.94 Other Standardized Distributions Score Z-Score ETS type score Theo 3 -1.5 350 Max 5 -.5 450 Henry 7 .5 550 Leticia 7 .5 550 Multiply by 100 to increase the st dev Charlotte 8 1.0 600 Pedro 4 -1.0 400 Add 500 to increase the mean Tricia 9 1.5 650 Lulu 5 -.5 450 6 0 500 1.94 1 100 Here’s How: Convert to Z scores Average St Dev Exercise Score Theo 20 Max 18 Henry 13 Leticia 17 Charlotte 19 Pedro 16 Tricia 11 Lulu 9 Percentile Deviation Score Z-Score IQ type score (Mean 100 Stdev 10)