Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MEASURES OF THE LOCATION or CENTRE OF A SAMPLE Sample Mean: Let a sample has n observations x1 , x2 , . . . , xn . The sample mean is denoted by the symbol and is defined as follows: is the upper case Greek letter “sigma”. In statistics it means to sum. Thus xi means sum of the sample values xi . (23) Example: Stop watches are tested for reliability by counting the number of cycles (on-off-restart) till some part of the mechanism fails. The data below gives the failure time ( in thousands of cycles) for a random sample of a certain type. 22, 2, 12, 18, 16 = xi / n = 70/5 = 14 _____________________________________________________ 0 5 10 15 20 25 Sample Median : The value above and below which approximately 50% of the sample falls is called the median. It is denoted by the symbol m. Steps in the calculation of the sample median (1) Order the sample values from smallest to largest. (2) Calculate the approximate position of m: AP(m)=(50/100)n, where n is the sample size. (3) From (2) calculate the exact position of m as follows: (i) If AP(m) is not a whole number: Pos(m) = AP(m) rounded up to the next whole number. (ii) If AP(m) is a whole number: Pos(m)= AP(m) + .5 (4) RESULT: m is the value at Pos(m). If Pos(m) is not a whole number, there is no observation in that position. In that case m is the average of the values in the two positions on either side of Pos(m). (24) Example: Consider the stop watch data of the previous example. For this sample, (1) Ordered Sample: 2 12 16 18 22 (2) AP(m)= (.50)n = (.50)5 = 2.5 (3) Pos(m) = 3 (4) m=16 Interpretation: Approximately 50% of the failure times in the sample fall below m=16. Example: Incomes ( in $1000) of a sample of 8 residents of a hypothetical town are: 29 16 17 19 964 26 10 29 = xi / n = 1110/8 = 138.75 Now we calculate sample median for this sample: (1) Ordered Sample: 10 16 17 19 26 29 29 964 (2) AP(m) = (.50)n = (3) Pos(m) = (4) m = Interpretation: Approximately 50% incomes are below 22.5 thousands in the sample. Note: Data values like “964” in the above example are called OUTLIERS. These are extreme data values which are either (a) infrequently occurring members of the population, or (b) sample values that do not belong to the population such as measurement errors, recording errors, or measurements from the wrong population. In case (a) outliers should not be removed from the sample. In case (b) they should be removed. (25) Questions: 1. How sensitive to outliers is (a) The sample mean ? (c) The sample median m? 2. In general which is a better measure of the centre ( or location) of a sample when the sample is (a) Skewed or has outliers ______________________ (b) symmetric __________________________ 3. What does each of (a)-(c) below indicate about the shape of the sample? (left skewed, right skewed or symmetric): (a) is well above m? ________________ (b) is well below m?___________________ (c) is close to m ?___________________ (26) The Sample Mean and the Sample Median on Minitab Consider the data from our previous example: Incomes ( in $1000) of a sample of 8 residents of a hypothetical town. C1: 10 16 17 19 26 29 29 964 MTB> mean C1 Mean = 138.75 MTB> medi C1 Median = 22.5 MTB> desc C1 C1 N MEAN MEDIAN 8 139 22 MIN MAX Q1 10 964 16 TRMEAN 139 Q3 29 STDEV 334 Note: To see how the outlier “964” affects the sample mean the sample without the outlier. C2: 10 16 17 19 26 29 SEMEAN 118 , consider 29 MTB> mean c2 Mean = 20.857 MTB>medi c2 Median = 19.00 Notice that the removal of the outlier has changed the sample mean from 138.75 to 20.857 while the sample median has changed little [from 22.5 to 19] (27) MEASURES OF VARIATION (SPREAD) OF A SAMPLE VARIATION: a measure of how far the sample values are from their central value. There are many such measures. One convenient way is as follows: Total Variation: SSTO= ( )2 . The average value of the total variation for a sample is known as the Sample Variance: s2 = SSTO/n-1 Note that in calculating the average here we divide by n-1 rather than n. It turns out that a sample variance defined this way has better statistical properties. Since the total variation and the sample variance are calculated as squares of the observations, their unit of measurement is the square of the unit of measurement of the data ( for example if the data is measured in centimeters (cm) then SSTO and s2 are measured in centimeters2 (cm2)). To obtain a measure of variation which has the same units of measurement as the data we take the square root of s2 to get the Sample Standard Deviation: s = s2 (28) Example: The stopwatch data from a previous example is given below: 22 We have calculated _______ SSTO = sample mean. 2 12 18 16 =14. We now calculate SSTO, s2 and s. ______________ _____________ , Total squared deviations of the sample values from the S2 = SSTO/n-1 = , roughly the average distance2 of = the sample values from the sample mean. S= = Some sort of “average distance” of the sample values from the sample mean. (29) Computational Formula for Total variation The computation of total variation can actually be done without calculating An alternative formula is SSTO = xi2 - (xi)2/n For the stopwatch data __________ _______________ SSTO = S2 = SSTO/n-1 = S= Example: Below is a srs of 36 grades of students in a mathematics course 57 83 75 79 60 75 60 51 56 65 52 78 57 47 89 50 62 54 96 66 73 62 68 64 60 55 75 78 59 62 57 77 64 68 57 61 For this sample n=36, x=2352, x2 = 158,210 Sample mean = SSTO = = = = S2 = SSTO/n-1 = ;s= (30) SAMPLE PERCENTILES The median splits the sample evenly into two halves. We can define measures that divide the sample into parts of different size. The rth Percentile of a sample is a value Pr such that (approximately) r% of the sample falls below Pr. Example: If a students’s score on a university entrance exam is the 84th percentile, this means that approximately 84% of all the scores in the exam are below this student’s score. STEPS IN THE CALCULATION OF Pr FOR A SAMPLE OF SIZE n (1) Order the sample values from smallest to largest. (2) Calculate the approximate position of Pr: AP(Pr)=(r/100)n (3) From (2) calculate the exact position of Pr as follows: If AP(r) is not a whole number: Pos(Pr) = AP(Pr) rounded up to the next whole number. If AP(Pr) is a whole number: Pos(Pr) = AP(Pr)+.5 (4) RESULT: ‘Pr’ is the value at Pos(Pr). If Pos(Pr) is not a whole number, there is no observation in that position. In that case, Pr is the average of the values in the two positions on either side of Pos(Pr). Example: Below are SAT (scholastic aptitude test) mathematics scores for a srs of 32 university applicants. [the sample has been ordered for convenience] 484 490 506 509 523 532 539 539 544 545 550 558 578 580 591 593 610 610 630 634 641 647 648 655 662 673 682 688 693 726 745 780 (i) 60th Percentile: AP(P60) = ; Pos(P60 ) = P60 = Interpretation: (ii) 25th Percentile: AP(P25)= ; Pos(P25) = P25 = Interpretation: (31) SPECIAL PERCENTILES First Quartile: Q1 = P25 Second Quartile: Q2 = P50 = m Third Quartile: Q3 = P75 Example: Below are the scores of a profile test given by a psychologist to random sample of 30 convicted felons. The scores have been ordered from smallest to largest. 146 165 171 179 181 184 190 191 192 192 192 193 195 196 196 197 198 199 200 200 201 203 204 205 206 213 215 221 232 247 First Quartile: Second Quartile: Third Quartile: (32)