Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Elementary Statistics and Inference 22S:025 or 7P:025 Lecture 4 1 Elementary Statistics and Inference 22S:025 or 7P:025 Chapter 4 2 5.) Chapter Four A. Introduction The histogram provides a general description of where the scores are located, the “shape” of the density distribution but not a good description of distribution, “spread/variation” of the scores, or the location/concentration of the scores. 3 1 5.) Chapter Four (cont.) The “center/location” of the scores is often described as the average, or the median. The standard deviation describes the “spread” around the average g score. A second index of the spread p of scores in a histogram is the interquartile range. The mean is the arithmetic average of the scores. The median is the point on the score scale below which 50% of the scores fall. 4 5.) Chapter Four (cont.) Examples: 28 24 20 16 % 12 8 4 0 5 6 7 8 9 10 11 Average = 8.00 Median = 8.00 Standard deviation = 1.66 5 5.) Chapter Four (cont.) 30 28 24 18 18 % 12 12 6 6 0 5 6 7 5 8 9 10 11 12 13 14 15 Average = 8.19 Median = 6.8 Standard deviation = 3.14 6 2 5.) Chapter Four (cont.) The Average (Mean) is affected by every score, and is pulled in the direction of the extreme scores. When the distribution is symmetric, the mean and median are the same. When the distribution is skewed – the mean differs from the median. If distribution is skewed right – median is less than mean. If distribution is skewed left – median is larger than mean. If distribution is symmetrical – median and mean are same. 7 5.) Chapter Four (cont.) B. The Average or Mean Computation of Mean – find the sum of scores, then divide by the number of scores. Example: 9, 1, 2, 2, and 0 mean = 9 + 1 + 2 + 2 + 0 14 = = 2.8 5 5 8 5.) Chapter Four (cont.) On pages 58-60, the text provides data form the 1976-80 Health and Nutrition Examination Survey (Hanes) – a representative cross section of 20,322 Americans aged 1-74. Data were collected on: Demographics – age, education, income Physiological variables – height, weight, blood pressure, etc. Dietary habits Prevalence of disease Levels of pesticides in blood. 9 3 5.) Chapter Four (cont.) The plots of the average heights and weights by age in years for the 2003-04 survey are shown in Figure 3 (page 59). A symbol for the average or the mean is commonly reported as x, and the sum of scores is represented by ΣΧ . So, x = ΣΧ / n = which means to find the sum of all the scores, and then divide the sum by the total number of scores. 10 5.) Chapter Four (cont.) 11 5.) Chapter Four (cont.) Exercise Set A – pp. 60-61 assign 1, 3, 8 #4 N=10 avg=5 ft and 6 inches or 66 inches This means the sum of their heights is 660 inches. The 11th person is 6 feet 5 inches or 77 inches inches. The new sum would be 660 + 77 = 737 inches 737 = 67 inches or 5 feet The new mean would be 11 and 7 inches. 12 4 5.) Chapter Four (cont.) C. Average and Histogram See diagrams on pages 62-63 of text. Figure 4 – page 62 shows histogram for 2,696 women ages 18-74 from the HANES (Health and Nutrition Examination Survey) from 2003-2004. The data for weights are skewed to the right – i.e., i e the tail is to the right. The average (mean) is pulled in the direction the skew – because the extreme scores pull the average in that direction. For example, Mean for 1, 3, 5, 7, 9 = 5 Mean for 1, 3, 5, 7, 90 = 21.2 (skewed right) 13 5.) Chapter Four (cont.) 14 5.) Chapter Four (cont.) 15 5 5.) Chapter Four (cont.) If the data for a histogram are skewed to the left (tail is at lower end of the histogram), the mean is pulled to the left. Mean for 10, 12, 14, 16, 18 = 14 Mean for 1, 5, 14, 16, 18 = 10.8 The Mean is the point of balance in a distribution of scores – that is, the sum of the scores above the mean is equal to the sum of scores below the mean – the mean is a centroid. Σ( x − x ) = 0 16 5.) Chapter Four (cont.) The sum of the differences between each score and the mean is always equal to zero. Example: ΣΧ = 25 1, 3, 5, 7, 9 ΣΧ 5 Σ( x − x ) = (1 − 5) + (3 − 5) + (5 − 5) + (7 − 5) + (9 − 5) Σ( x − x ) = (−4) + (−2) + (0) + (2) + (4) Σ( x − x ) = 0 mean = x = 5 = The centroid (mean) for a seesaw (page 64) is such that it can be balanced by two persons of different weights by their distance from the centroid. 17 5.) Chapter Four (cont.) 18 6 5.) Chapter Four (cont.) Median – as shown in histograms on page 64, the median has 50% of the scores (area) below the median, and 50% of the scores (area) above the median. Symmetrical y histogram g – mean is same as median Skewed Right histogram – median is less than mean Skewed Left histogram – median is greater than mean See Exercise Set B (page 65) – 1, 2, 3, 4 19 5.) Chapter Four (cont.) Computing mean in a Histogram x f f·x 19 1 19 18 2 36 17 3 51 16 4 64 15 5 75 14 6 84 13 4 52 12 3 36 11 2 22 N=30 Sum=439 x = 14.63 20 5.) Chapter Four (cont.) Note: Again we assume each score has value of the midpoint of score interval. For example: the 3 scores of 17 would be evenly distributed across the interval between 16.5 and 17.5 21 7 5.) Chapter Four (cont.) Computing mean in a grouped distribution – HISTOGRAM x x midpt f f · x midpt 18-20 19 2 38 15-17 16 3 48 12-14 13 4 52 9-11 10 6 60 6-8 7 3 21 3-5 4 2 8 N=20 Sum ≈227 x= sum of scores number of scores 227 = 11.35 20 x ≈ 11.35 x≈ Assume: Scores in an interval evenly distributed throughout the score interval f · midpt ≈ sum of scores in interval 22 5.) Chapter Four (cont.) D. Root-Mean-Square (RMS) The RMS is an index in score scale units that describes the spread or variation of the scores in a histogram. RMS = average of the squares of each score = ΣX 2 / n 23 5.) Chapter Four (cont.) Example: 3, 5, 7, 9, 11 average = x = mean = 7 9, 25, 49, 81, 121 RMS = 9 + 25 + 49 + 81 + 121 = 5 285 = 57 = 7.549 5 RMS is always slightly larger than the mean. 24 8