Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
tom.h.wilson [email protected] Department of Geology and Geography West Virginia University Morgantown, WV Back to statistics remember the pebble mass distribution? Pebble masses collected from beach A 0.40 0.35 Probability 0.30 0.25 0.20 0.15 0.10 0.05 0.00 150 200 250 300 350 400 Mass (grams) 450 500 550 224 242 256 256 265 269 277 283 283 283 284 287 290 294 301 301 302 303 307 307 311 314 317 318 318 322 324 324 326 327 329 330 331 331 331 334 335 338 338 338 340 340 341 342 342 343 346 346 350 352 353 355 355 355 357 358 359 359 364 366 367 368 369 370 370 371 373 374 374 375 379 380 383 384 384 384 386 389 389 393 394 394 395 397 400 401 403 403 403 407 408 409 420 422 423 432 433 435 450 454 Mt. Aso, Japan (see Davis pages 179-181) Number of Eruptions 50 40 30 20 10 0 0 10 20 30 40 50 Years between successive Eruptions 60 Pebble masses collected from beach A 0.40 0.35 Probability 0.30 0.25 0.20 0.15 0.10 0.05 0.00 150 200 250 300 350 400 450 500 550 Mass (grams) The probability of occurrence of specific values in a sample often takes on that bell-shaped Gaussian-like curve, as illustrated by the pebble mass data. Probability Distribution of Pebble Masses 0.01 Probability 0.008 0.006 Series1 0.004 Series2 0.002 0 0 200 400 600 800 Pebble Mass (grams) The Gaussian (normal) distribution of pebble masses looked a bit different from the probability distribution we derived directly from the sample, but it provided a close approximation of the sample probabilities. Equivalent Gaussian Distribution of Pebble Masses Probability 0.01 0.008 0.006 Series1 0.004 Series2 0.002 0 0 200 400 600 800 Pebble Mass (grams) Range (g) 201-250 251-300 301-350 351-400 401-450 451-500 Measured probability 0.02 0.12 0.35 0.36 0.14 0.01 Range (multiple of s) -3.10 to -2.06 -2.06 to -1.02 -1.02 to 0.02 0.02 to 1.06 1.06 to 2.10 2.10 to 3.13 Gaussian (normal) probability 0.019 0.134 0.354 0.347 0.127 0.017 The pebble mass data represents just one of a nearly infinite number of possible samples that could be drawn from the parent population of pebble masses. We obtained one estimate of the population mean and this estimate is almost certainly incorrect. What might additional pebble mass samples look like? Sample2 <x>=350.6 Sample1 <x>=348.84 25 20 20 15 N 15 N 10 10 5 5 0 0 200 300 400 500 200 300 400 500 Mass (grams) Mass (grams) Sample 3 <x>=356.43 N Sample 4 <x>=354.5 25 30 20 25 20 N 15 10 5 15 10 5 0 0 200 300 400 500 200 300 400 500 Mass (grams) Mass (grams) Sample 5 <x>=348.42 20 15 N 10 5 0 200 300 400 500 Mass (grams) These samples were drawn at random from a parent population having mean 350.18 and variance of 2273 (standard deviation of 47.68 gm). Sample2 <x>=350.6 Sample1 <x>=348.84 25 20 20 15 N 15 N 10 10 5 5 0 0 200 300 400 500 200 300 400 500 Mass (grams) Mass (grams) Sample 3 <x>=356.43 N Sample 4 <x>=354.5 25 30 20 25 20 N 15 10 5 Mean 15 10 5 0 0 200 300 400 500 200 300 400 500 Mass (grams) Mass (grams) Sample 5 <x>=348.42 20 15 N 10 5 0 200 300 400 500 Mass (grams) Note that each of the sample means differs from the population mean 348.84 350.6 356.43 354.5 348.42 Variance 2827.5 2192.59 2124.63 1977.63 2611.3 Standard deviation 53.17 46.82 46.09 44.47 51.1 The distribution of 35 means calculated from 35 samples drawn at random from a parent population with assumed mean of 350.18 and variance of 2273. Distribution of Means 12 10 8 N 6 4 2 0 330 335 340 345 350 Mass 355 360 365 Distribution of Means 12 10 8 N 6 4 2 0 330 335 340 345 350 355 360 365 Mass The mean of the above distribution of means is 350.45. Their variance is 21.51 (i.e. standard deviation of 4.64). Statistics of the distribution of sample means tell us something different from the statistics of the individual samples. The statistics of the distribution of means give us information about the variability we can anticipate in the mean value of 100 specimen samples. Just as with the individual pebble mass values observed in the sample, probabilities can also be associated with the possibility of drawing a sample with a certain mean and standard deviation. This is how it works You go out to your beach and take a bucket full of pebbles in one area and then go to another part of the beach and collect another bucket full of pebbles. You have two samples, and each has their own mean and standard deviation. You ask the question - Is the mean determined for the one sample different from that determined for the second sample? To answer this question you use probabilities determined from the distribution of means (inferred indirectly from those of an individual sample). The means of the samples may differ by only 20 grams. If you look at the range of individual masses which is around 225 grams, you might conclude that these two samples are not really different. However, you are dealing with means derived from individual samples each consisting of 100 specimens. The distribution of means is different from the distribution of specimens. The range of possible means is much smaller. Histogram of pebble masses Distribution of means 40 12 10 Number of occurrences Number of Occurrences 35 30 25 20 15 10 8 6 4 2 5 0 200 250 300 350 400 Mass (grams) 450 500 0 200 250 300 350 400 Mean Mass (grams) 450 500 Thus, when trying to estimate the possibility that two means come from the same parent population, you need to examine probabilities based on the standard deviation of the means and not those of the specimens. Number of standard deviations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Area 0.000 0.080 0.159 0.236 0.311 0.383 0.451 0.516 0.576 0.632 0.683 Number of standard deviations 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Area 0.729 0.770 0.806 0.838 0.866 0.890 0.911 0.928 0.943 0.954 Number of standard deviations 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 Area .964 .972 .979 .984 .988 .991 .993 .995 .996 .997 In the class example just presented we derived the mean and standard deviation of 35 samples drawn at random from a parent population having a standard deviation of 47.7. Recall that the standard deviation of means was only 4.64 and that this is just about 1/10th the standard deviation of the sample. This is the standard deviation of the sample mean from the true mean and is referred to as the standard error. The standard error, se, is estimated from the standard deviation of the sample as se sˆ / N What is a significant difference? To estimate the likelihood that a sample having a specific calculated mean and standard deviation comes from a parent population with given mean and standard deviation, one has to define some limiting probabilities. There is some probability, for example, that you could draw a sample whose mean might be 10 standard deviations from the parent mean. It’s really small, but still possible. What chance of being wrong will you accept? This decision about how different the mean has to be in order to be considered statistically different is actually somewhat arbitrary. In most cases we are willing to accept a one in 20 chance of being wrong or a one in 100 chance of being wrong. This chance of being right is 19 out of 20 or 99 out of 100 and this chance is referred to as our “confidence limit.” Confidence limits used most often are 95% or 99%. The 95% confidence limit gives us a one in 20 chance of being wrong. The confidence limit of 99% gives us a one in 100 chance of being wrong. The risk that we take (the chance of being wrong) is referred to as the alpha level. If our confidence limit is 95% our alpha level is 5% or 0.05. If our confidence limit is 99% is 1% or 0.01 Whatever your bias may be, whatever your desired result, you can’t go wrong in your presentation by clearly stating the confidence limit you used. Number of standard deviations 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Area 0.000 0.080 0.159 0.236 0.311 0.383 0.451 0.516 0.576 0.632 0.683 Number of standard deviations 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.96 2.0 Area 0.729 0.770 0.806 0.838 0.866 0.890 0.911 0.928 0.943 0.954 Number of standard deviations 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 Area .964 .972 .979 .984 .988 .991 .993 .995 .996 .997 In the above table of probabilities (areas under the normal distribution curve), you can see that the 95% confidence limit extends out to 1.96 standard deviations from the mean. The standard deviation we are interested in using when comparing means is the standard error, se. Assuming that our standard error is 4.8 grams, then 1.96se corresponds to 9.41 grams. Notice that the 5% probability of being wrong is equally divided into 2.5% of the area greater than 9.41grams from the mean and less than 9.41 grams from the mean. You probably remember the discussions of one- and two-tailed tests. The 95% probability is a two-tailed probability. So if your interest is only to make the general statement that a particular mean lies outside 1.96 standard deviations from the assumed population mean, your test is a two-tailed test. If you wish to be more specific in your conclusion and say that the estimate is significantly greater or less than the population mean, then your test is a onetailed test. The probability of error in your one-tailed test is 2.5% rather than 5%. i.e. is 0.025 rather than 0.05 Using our example dataset, we have assumed that the parent population has a mean of 350.18 grams, thus all means greater than 359.6 grams or less than 340.8 grams are considered to come from a different parent population at the 95% confidence level. Mean 348.84 350.6 356.43 354.5 348.42 Variance 2827.5 2192.59 2124.63 1977.63 2611.3 Standard deviation 53.17 46.82 46.09 44.47 51.1 i.e. ±9.41 Note that the samples we drew at random from the parent population have means which lie inside this range and are therefore not statistically different from the parent population. It is worth noting that - we could very easily have obtained a different “first sample” which would have had a different mean and standard deviation. Remember that we designed our statistical test assuming that the sample mean and standard deviation correspond to those of the population. Mean 348.84 350.6 356.43 354.5 348.42 Variance 2827.5 2192.59 2124.63 1977.63 2611.3 Standard deviation 53.17 46.82 46.09 44.47 51.1 This would give us different confidence limits and slightly different answers. Even so, the method provides a fairly objective quantitative basis for assessing statistical differences between samples. Mean 348.84 350.6 356.43 354.5 348.42 Variance 2827.5 2192.59 2124.63 1977.63 2611.3 Standard deviation 53.17 46.82 46.09 44.47 51.1 95% C. L. 338.29 - 359.39 341.31 - 359.89 347.28 - 365.57 345.68 - 363.33 338.28 - 358.56 The method of testing we have just summarized is known as the z-test, because we use the z statistic to estimate probabilities, where m2 m1 z se Not to be confused with the standardized variable Remember the t-test? Tests for significance can be improved if we account for the fact that estimates of mean derived from small samples are inherently sloppy estimates. The t-test acknowledges this sloppiness and compensates for it by making the criterion for significant difference more stringent when the sample size is smaller. The z-test and t-test yield similar results for relatively large samples larger than 100 or so. The 95% confidence limit for example, diverges considerably from 1.96 s for smaller sample size. The effect of sample size (N) is expressed in terms of degrees of freedom which is N-1. ~2.6 s 1.96 s Making the t-test using tables of the critical values of t for various levels of confidence. How do we compute the critical value of t - the test statistic? The test statistic X1 X 2 t se We have the means of two samples X1 & X 2 se is the standard error, But its computed differently from the single sample standard error which is just sˆ se N where - ŝ is the unbiased estimate of the standard deviation X1 X 2 t se In this case se s p 1 1 n1 n2 sˆ se N where sp is the pooled estimate of the standard deviation derived as follows 2 2 ( n 1 ) s ( n 1 ) s 1 2 2 s 2p 1 n1 n2 2 Going through this computation for the samples of strike at locations A and B yields t ~ 5.2. Strike Location A 11 26 29 30 34 36 37 42 44 45 47 48 48 50 51 54 54 55 61 61 43.15 Mean Standard Deviation 12.6959 Variance 161.19 Strike Location B 60 54 69 59 58 59 66 62 48 62 53 72 62 69 70 41 59 76 54 64 60.85 8.41224 70.766 Is bedding strike measured at these two locations statistically different? Evaluating the statistical significance of differences in strike at locations A and B using the t-test 161.19 6.5895 70.766 14.408 -2.532306 -1.350826 -1.11453 -1.035764 -0.720703 -0.563172 -0.484407 -0.09058 0.0669505 0.1457159 0.3032466 0.3820119 0.3820119 0.5395426 0.618308 0.854604 0.854604 0.9333694 1.4059615 1.4059615 0.001273 0.012619 0.016885 0.018378 0.024236 0.026815 0.027944 0.031294 0.031352 0.031091 0.030011 0.029212 0.029212 0.027166 0.025955 0.02181 0.02181 0.020327 0.011695 0.011695 Probability Distribution of Strikes at Location A 0.035 Normal Probability Strike A Dip A Strike B Dip B 11 22 60 25 26 23 54 22 29 21 69 22 30 24 59 19 34 21 58 27 36 23 59 16 37 23 66 18 42 21 62 22 44 22 48 23 45 22 62 16 47 22 53 16 48 18 72 19 48 26 62 21 50 24 69 21 51 26 70 26 54 30 41 20 54 21 59 16 55 25 76 22 61 21 54 22 61 21 64 12 43.15 22.8 60.85 20.25 12.69594 2.566997 8.41224 3.795773 0.03 0.025 0.02 Series1 0.015 0.01 0.005 0 0 20 40 60 80 Strike (degrees) Probability that average strike at locations A and B is different. Probability that average dip at locations A and B is different. The cells below refer to explicit computation of the t statistic see handout The pooled variance = The pooled estimate of the standard deviation = t= 3.58E-06 115.9763 3.40553 5.19743 The value of our test statistic, t = 5.2 Degrees of freedom = n1+n2-2 = 38 Closest value in the table is 40 Note the similarities of the table below to Davis’s Table A.2 = 0.1 % =0.001 as a one-tailed probability = 1 chance in 1000 that these two means are actually equal or were drawn from the same parent population. Critical Value ( = 0.001) 300 250 200 150 100 50 0 0 20 40 60 80 100 120 100 120 Degrees of Freedom Critical Value ( = 0.001) The variation of critical value from 30 to 40 degrees of freedom is small and we could assume it to be linear. t - distribution 350 t - distribution 5.00 4.50 4.00 3.50 3.00 2.50 0 20 40 60 80 Degrees of Freedom Since t ~ 5.2, we know that the actual probability for this to happen will be much less than 0.001.