Download Geology 399 - Quantitative Methods in Geosciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Geology 659 - Quantitative Methods in Geosciences
t-test
THE TTEST
 Evaluate the significance of differences in mean using the t-test.
In your help topics list (see right), note
the function TTEST
Click on TTEST to read about its use.
In cell G24 type in - Probability that average strike at locations A and B is different.
Click and pull the left edge of cell G24 to the right so that the entire statement fits in
this cell.
Format cells G24 and H24 (follow in class)
In cell H24
Enter =TTEST(A2:A21,C2:C21,1,2)
In cell G25 type in - Probability that average dip at locations A and B is different.
It should not be necessary to make any adjustments to the width of cell G25.
Format cells G25 and H25 (as above)
Then in cell H25
Enter =TTEST(B2:B21,D2:D21,1,2)
T-Tests the old-fashioned way
It would also be a good idea to learn how to do the t-test using t-statistics tables.
When you do it by hand you actually have to think about what you are doing and have a
little better understanding of what's going on. In the preceding example, we undertook
the t-test using the expression =TTEST(A2:A21,C2:C21,1,2). A2:A21 and C2:C21
define the series of numbers we wish to compare. This is a two-sample t-test as
distinguished from the one-sample t-test in which you attempt to asses whether a
particular mean is different from what you think the overall population mean is.
The two-sample t-test evaluates the significance of difference in the means of two
samples. Our test statistic has the form
t
where se  s p
D1  D2
se
1
1
and sp is the pooled estimate of the standard deviation, found

n1 n2
by combining the sample variances of the two data sets as follows -
s 2p 
(n1  1) s12  (n2  1) s22
n1  n2  2
You could return to your EXCEL spreadsheet and make this computation. If you do, you
can verify that the pooled variance is 115.98, that the pooled estimate of the standard
deviation is 3.4 and that the t-statistic - characterizing the difference between the two
means in terms of multiples of the pooled estimate of the standard deviation - is 5.2.
Thus these two means differ by 5.2 times the pooled estimate of the standard deviation.
Remember a z-statistic of 1.96 would be significant at the 95% confidence level (5% two tailed or 2.5% - one-tailed). So we suspect that such a high t-statistic indicates a
significant difference between the two means.
We can consult the t-tables for specific levels of significance. Note that the degrees of
freedom in this case is n1+n2-2 or 38. The table doesn't have a listing for 38 degrees of
freedom, but 40 is close. The numbers listed in the 10% column refer to the two-tailed
values of t. For example, for N=40 degrees of freedom, 10% significance is met by tvalues greater than or equal to 1.303. However, for the one-tailed test, there is a 5%
probability that the means are actually the same when t is greater than or equal to 1.303.
Looking out at the rightmost column,  of 0.1 implies a two-tailed probability of 1 in one
thousand or a one tailed probability of 1 in 2000 for t greater than or equal to 3.307. Our
value of 5.2 is larger than that but notice we cannot assign the exact probability to our
value of t. EXCEL does, but the tables are usually not detailed enough for you to do that.
To be able to say that there is less than 1 chance in 2000 that these two samples have a
common mean is good enough.
In-Class Example
Example: The following example uses samples drawn at random from two Gaussian populations with
means of 10 and 15 and variances of 9 and 16 respectively. Using a two-sample t-test evaluate the
probability that the two sample means are different.
Population 1 (Av=10, variance =9)
Frequency
Sample B
14.808192413246
8.4390653500278
16.761766602591
12.281997096758
18.888119899041
14.671814910813
11.234751798483
10.121603579427
14.251247578907
17.554814925236
11.855818335879
15.322010580447
20.006402776244
19.724962966533
2.9002522996093
23.253655135496
21.535460113509
10.241238094704
17.352836865527
15.043251358677
4.00
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
0.0
5.0
10.0
15.0
20.0
25.0
Value
Population 2 (Av=15, Variance = 16)
Frequency
Sample A
19.047814817567
14.339398312365
10.882831179374
11.118035806765
12.407715713480
5.4705207881495
7.9806988037671
13.100560177358
12.276228253195
10.235127116348
11.786092151873
5.3180390706098
10.498646148543
10.447622062696
7.0047774076432
7.7911350722552
9.3845561985864
4.8024526731576
7.0741130430246
6.3445457034960
4.00
3.50
3.00
2.50
2.00
1.50
1.00
0.50
0.00
0.0
5.0
10.0
15.0
20.0
25.0
Value
Mean =9.87
Variance = 12.44
Mean =14.88
Variance = 24
Histograms of the above samples are shown above
2
2
2 (n  1) s1  (n2  1) s2
Using the formula S p  1
, we obtain
n1  n2  2
Sp2=18.23 Sp=4.269
1
1

, Se = 1.25
n1 n2
From S e  S p
t
X 2  X1
Se
=
4.95
= 3.664
1.35
Refer to the table of critical values
Conducting the t-test using PsiPlot
If you want to try this out, you can copy the columns of strike and dip data from EXCE L
into Psiplot, Your spreadsheet will look something like that shown below.
To conduct the t-test click on Data and select t-test as shown below.
In the t-test window, select the two datasets you want to compare then click OK.
The following summary window appears -
That's all there is to it. Note
that the t-values obtained by
PsiPlot and EXCEL are in
agreement, and also note that
PsiPlot explicitly states the
probability of such an
occurrence.
Homework Assignment
Do problem 7.13 on page 134. Evaluate significance in the differences of mean using
both the EXCEL t-test function and the "old fashioned" approaches discussed above
(direct calculation of the pooled variance and standard error, and the use of t-statistic
tables to evaluate alpha level).
1. Report your work in standard form including a statement of the problem and a
summary of results.
2. Show the computation of the pooled variance and standard error.
3. Present histograms of the Mount Monger and Emu data sets.
4. On these histograms note the mean, standard deviation and standard error of each
individual sample.
Due Next Tuesday (April
)