* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Geology 399 - Quantitative Methods in Geosciences
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Inductive probability wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Law of large numbers wikipedia , lookup
Misuse of statistics wikipedia , lookup
Geology 659 - Quantitative Methods in Geosciences t-test THE TTEST  Evaluate the significance of differences in mean using the t-test. In your help topics list (see right), note the function TTEST Click on TTEST to read about its use. In cell G24 type in - Probability that average strike at locations A and B is different. Click and pull the left edge of cell G24 to the right so that the entire statement fits in this cell. Format cells G24 and H24 (follow in class) In cell H24 Enter =TTEST(A2:A21,C2:C21,1,2) In cell G25 type in - Probability that average dip at locations A and B is different. It should not be necessary to make any adjustments to the width of cell G25. Format cells G25 and H25 (as above) Then in cell H25 Enter =TTEST(B2:B21,D2:D21,1,2) T-Tests the old-fashioned way It would also be a good idea to learn how to do the t-test using t-statistics tables. When you do it by hand you actually have to think about what you are doing and have a little better understanding of what's going on. In the preceding example, we undertook the t-test using the expression =TTEST(A2:A21,C2:C21,1,2). A2:A21 and C2:C21 define the series of numbers we wish to compare. This is a two-sample t-test as distinguished from the one-sample t-test in which you attempt to asses whether a particular mean is different from what you think the overall population mean is. The two-sample t-test evaluates the significance of difference in the means of two samples. Our test statistic has the form t where se  s p D1  D2 se 1 1 and sp is the pooled estimate of the standard deviation, found  n1 n2 by combining the sample variances of the two data sets as follows - s 2p  (n1  1) s12  (n2  1) s22 n1  n2  2 You could return to your EXCEL spreadsheet and make this computation. If you do, you can verify that the pooled variance is 115.98, that the pooled estimate of the standard deviation is 3.4 and that the t-statistic - characterizing the difference between the two means in terms of multiples of the pooled estimate of the standard deviation - is 5.2. Thus these two means differ by 5.2 times the pooled estimate of the standard deviation. Remember a z-statistic of 1.96 would be significant at the 95% confidence level (5% two tailed or 2.5% - one-tailed). So we suspect that such a high t-statistic indicates a significant difference between the two means. We can consult the t-tables for specific levels of significance. Note that the degrees of freedom in this case is n1+n2-2 or 38. The table doesn't have a listing for 38 degrees of freedom, but 40 is close. The numbers listed in the 10% column refer to the two-tailed values of t. For example, for N=40 degrees of freedom, 10% significance is met by tvalues greater than or equal to 1.303. However, for the one-tailed test, there is a 5% probability that the means are actually the same when t is greater than or equal to 1.303. Looking out at the rightmost column,  of 0.1 implies a two-tailed probability of 1 in one thousand or a one tailed probability of 1 in 2000 for t greater than or equal to 3.307. Our value of 5.2 is larger than that but notice we cannot assign the exact probability to our value of t. EXCEL does, but the tables are usually not detailed enough for you to do that. To be able to say that there is less than 1 chance in 2000 that these two samples have a common mean is good enough. In-Class Example Example: The following example uses samples drawn at random from two Gaussian populations with means of 10 and 15 and variances of 9 and 16 respectively. Using a two-sample t-test evaluate the probability that the two sample means are different. Population 1 (Av=10, variance =9) Frequency Sample B 14.808192413246 8.4390653500278 16.761766602591 12.281997096758 18.888119899041 14.671814910813 11.234751798483 10.121603579427 14.251247578907 17.554814925236 11.855818335879 15.322010580447 20.006402776244 19.724962966533 2.9002522996093 23.253655135496 21.535460113509 10.241238094704 17.352836865527 15.043251358677 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 0.0 5.0 10.0 15.0 20.0 25.0 Value Population 2 (Av=15, Variance = 16) Frequency Sample A 19.047814817567 14.339398312365 10.882831179374 11.118035806765 12.407715713480 5.4705207881495 7.9806988037671 13.100560177358 12.276228253195 10.235127116348 11.786092151873 5.3180390706098 10.498646148543 10.447622062696 7.0047774076432 7.7911350722552 9.3845561985864 4.8024526731576 7.0741130430246 6.3445457034960 4.00 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 0.0 5.0 10.0 15.0 20.0 25.0 Value Mean =9.87 Variance = 12.44 Mean =14.88 Variance = 24 Histograms of the above samples are shown above 2 2 2 (n  1) s1  (n2  1) s2 Using the formula S p  1 , we obtain n1  n2  2 Sp2=18.23 Sp=4.269 1 1  , Se = 1.25 n1 n2 From S e  S p t X 2  X1 Se = 4.95 = 3.664 1.35 Refer to the table of critical values Conducting the t-test using PsiPlot If you want to try this out, you can copy the columns of strike and dip data from EXCE L into Psiplot, Your spreadsheet will look something like that shown below. To conduct the t-test click on Data and select t-test as shown below. In the t-test window, select the two datasets you want to compare then click OK. The following summary window appears - That's all there is to it. Note that the t-values obtained by PsiPlot and EXCEL are in agreement, and also note that PsiPlot explicitly states the probability of such an occurrence. Homework Assignment Do problem 7.13 on page 134. Evaluate significance in the differences of mean using both the EXCEL t-test function and the "old fashioned" approaches discussed above (direct calculation of the pooled variance and standard error, and the use of t-statistic tables to evaluate alpha level). 1. Report your work in standard form including a statement of the problem and a summary of results. 2. Show the computation of the pooled variance and standard error. 3. Present histograms of the Mount Monger and Emu data sets. 4. On these histograms note the mean, standard deviation and standard error of each individual sample. Due Next Tuesday (April )
 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            