Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Statistics (자료의 정리) Chapter 3 Three characteristics of data. 1. Representative score, such as average 2. Measure of scattering or variation 3. Nature of the distribution, such as bell-shaped. Slide 1 Do women really talk more than men? Slide 2 Measures of Center 1. Arithmetic Mean (Mean) 2. Median 3. Mode 4. Midrange 5. Weighted Mean 6. Symmetricity Slide 3 1. Arithmetic Mean (Mean) x = µ = x n x N : the mean of a set of sample values : the mean of all values in a population Slide 4 2. Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. Slide 5 5.40 1.10 0.42 0.73 0.48 1.10 0.66 0.42 0.48 0.66 0.73 1.10 1.10 5.40 (in order - odd number of values) exact middle 5.40 1.10 1.10 0.42 0.73 0.48 0.42 5.40 0.48 0.73 1.10 1.10 (in order - even number of values – no exact middle shared by two numbers) Slide 6 3. Mode the value that occurs most frequently A data set may be: Bimodal Multimodal No Mode Slide 7 Mode - Examples a. 5.40 1.10 0.42 0.73 0.48 1.10 b. 27 27 27 55 55 55 88 88 99 c. 1 2 3 6 7 8 9 10 Slide 8 4. Midrange the value midway between the maximum and minimum values in the original data set Midrange = maximum value + minimum value 2 Slide 9 Weighted Mean In some cases, values vary in their degree of importance, so they are weighted accordingly. (w • x) x = w Slide 10 Best Measure of Center Slide 11 Example Find the (1)mean, (2)median, (3)mode, and (4) midrange for the following values: 3 3 3 3 5 2 3 3 3 2 (3) 4 2 2 3 2 3 5 3 4 4 (1) mean: (2) Median: (3) Mode: (4) Midrange: Slide 12 Symmetric distribution of data is symmetric if the left half of its histogram is roughly a mirror image of its right half Skewed distribution of data is skewed if it is not symmetric and if it extends more to one side than the other Slide 13 Skewness Slide 14 Measures of Variation 1. Range 2. Standard Deviation (SD) 3. Variance 4. Estimating SD Range Rule of Thumb 5. Empirical Rule 6. Coefficient of Variation (CV) Slide 15 Group A Group B 65 42 66 54 67 58 68 62 71 67 73 77 74 77 77 85 77 93 77 100 Dispersion Statistics Group A Mean Group B = 71.5 Mean = 71.5 Median = 72.0 Median = 72.0 Mode = 77 Midrange = 71.0 Mode = 77 Midrange = 71.0 Slide 16 Dispersion Statistics We can see no difference between the two groups. But the group B are much more widely scattered than those of group A. This variability among data is one characteristic to which averages are not sensitive. Slide 17 Dispersion Statistics Three basic measures of dispersion (a) Range, (b) Standard Deviation, (c) Variance Slide 18 1. Range the difference between the maximum value and the minimum value. Range = (maximum value) – (minimum value) Slide 19 Example The range is simply the difference b/w the highest value and the lowest value. For group A, the range is 12 (77-65) For group B, the range is 58 (100-42) Don’t be confused b/w the midrange (average) and the range (dispersion). Slide 20 2. Standard Deviation and Variance SD and Var measure the dispersion or variation of values about the mean. s= (x - x) n-1 2 Slide 21 Population Standard Deviation = s= (x - µ) 2 N (x - x) n-1 2 Slide 22 x 2 3 5 6 9 17 Total: 42 42 x 7 .0 6 ( x x ) (x x ) -5 -4 -2 -1 2 10 2 25 16 4 1 4 100 150 150 s 30 5.5 6 1 Slide 23 Variance - Notation standard deviation squared } Notation s 2 2 Sample variance Population variance Slide 24 Comparison of Word Counts of Men & Women Slide 25 Interpreting and understanding SD 1. Range Rule 2. Empirical Rule Slide 26 1. Range Rule of Thumb (Estimation of Standard Deviation) For estimating a value of the standard deviation s, Use s Range 4 range = (maximum value) – (minimum value) Slide 27 Rough Estimates of the Min. & Max. “Usual” sample values Minimum “usual” value = (mean) – 2 ∙ (standard deviation) Maximum “usual” value =(mean) + 2 ∙ (standard deviation) Slide 28 2. The Empirical Rule For data sets having a distribution that is approximately bell shaped, About 68% of all values fall within 1 standard deviation of the mean. About 95% of all values fall within 2 standard deviations of the mean. About 99.7% of all values fall within 3 standard deviations of the mean. Slide 29 FIGURE 2-13 Slide 30 Example: P.106 IQ Scores Empirical (68-95-99.7) Rule with Bell-shaped Distribution Mean = 100 S.D. = 15 What percentage of adults have IQ scores b/w 70 and 130? Slide 31 Coefficient of Variation The coefficient of variation (or CV) for a set of sample or population data, expressed as a percent, describes the standard deviation relative to the mean. Sample CV = s 100% x Population CV = 100% m Slide 32 Note: Coefficient of Variation (CV) The coefficient of variation, expressed in percent, is used to describe the standard deviation relative to the mean. *100 m Find the CV for following sample scores: 2, 2, 2, 3, 5, 8, 12, 19, 22, 30 CV = 95% s *100 x Slide 33 Example: p. 109 Heights and Weights Height Weight Mean SD 68.34 in 3.02 in 172.55 lb 26.33 lb CV for Heights = CV for Weights = Slide 34 Measures of Relative Standing 1. Z Score 2. Quartiles and Percentiles Slide 35 1. Z Score (or standardized value the number of standard deviations that a given value x is above or below the mean Sample x x z= s Population x-µ z= Slide 36 Interpreting Z Scores Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: z score between –2 and 2 Unusual Values: z score < -2 or z score > 2 Slide 37 Which measure of center is the only one that can be used with data at the nominal level of measurement? A. Mean B. Median C. Mode Slide 38 Which of the following measures of center is not affected by outliers? A. Mean B. Median C. Mode Slide 39 Find the mode (s) for the given sample data. 79, 25, 79, 13, 25, 29, 56, 79 A. 79 B. 48.1 C. 42.5 D. 25 Slide 40 Which is not true about the variance? A. It is the square of the standard deviation. B. It is a measure of the spread of data. C. The units of the variance are different from the units of the original data set. D. It is not affected by outliers. Slide 41 Weekly sales for a company are $10,000 with a standard deviation of $450. Sales for the past week were $9050. This is A. Unusually high. B. Unusually low. C. About right. Slide 42