Download 252onesx0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Examples of Hypothesis tests for the Mean with Unknown Population Variance
First Example: A left-sided Problem
Statement of problem: The problem statement is either “Test at the 5% significance level to see if the mean income is at
least 20000,” or “Test at the 5% significance level to see if the mean income is less than 20000.” These statements are
opposites. Since the first statement contains an implicit equality, it must be a null hypothesis. The second statement
does not contain an equality, so in must be an alternate hypotheses.
 H :   20000
Statement of problem as a hypothesis pair:  0
  .05
 H1 :   20000
The test statistic will be the sample mean x taken from a sample of n  64 , and since the population mean is unknown
we must use the t distribution with 64 – 1 = 63 degrees of freedom. Assume that we found s  1600 , which means
that s x  s
 1600
 200 .
n
64
First Example Using Traditional hypothesis testing.
Assume that when we take our survey we find x  19800 .
Test Ratio Method:
x  0
,  0  20000 . If the sample mean is below 20000, the t statistic will be negative. Make a diagram
t
sx
showing a ‘Normal’ curve with a mean at zero and a value of t cutting off a 5% tail on the left side of zero.
19800  20000
63
 1.669 , so the ‘reject’ region is below -1.669. t 
According to the t table tn1  t.05
 1
200
is not in the ‘reject’ region, so do not reject the null hypothesis.
Critical Value Method for x :
The formula table says xcv    t s x , but this is a formula for a 2-sided interval. We want a critical value
2
below 20000, since it should be obvious that if our sample mean is above 20000, we would have no reason to
reject the null hypothesis. We use xcv   0  ts x  20000  1.669 200   19666 .2 . Make a diagram with
20000 in the middle showing a 95% ‘accept’ region above 19666.2 and a 5% ‘reject’ region below 19666.2.
Since x  19800 does not fall in the ‘reject’ region, do not reject the null hypothesis.
Confidence Interval Method:
The formula for a confidence interval for the mean is   x  t  x , but a one-sided hypothesis requires a
2
one-sided confidence interval, and to have any value, the confidence interval must be in the same direction as
the alternate hypothesis. The interval becomes   x  tn1 x  19800  1.669200 or   20133.8 . Make
a diagram – you should use 19800 as the middle. To represent the null hypothesis shade the area above 20000.
To represent the confidence interval shade the area below 20133.8. Since these areas overlap, the confidence
interval and the null hypothesis do not contradict one another, so do not reject the null hypothesis.
Assume that when we take our survey we find x  19600 .
Test Ratio Method:
x  0
t
,  0  20000 . Use the same ‘Normal’ curve as before with a mean at zero and a value of
sx
19600  20000
63
 tn1  t.05
 1.669 cutting off a 5% tail on the left side of zero. t 
 2 is in the
200
‘reject’ region, so reject the null hypothesis.
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Critical Value Method for x :
The formula table says xcv    t  s x , but this is a formula for a 2-sided interval. Because the alternate
2
hypothesis says that the mean is below 2000, we want a critical value below 20000. It should be obvious that
if our sample mean is above 20000, we would have no reason to reject the null hypothesis. We use
xcv   0  ts x  20000  1.669 200   19666 .2 . Make the same diagram with 20000 in the middle showing a
95% ‘accept’ region above 19666.2 and a 5% ‘reject’ region below 19666.2. Since x  19600 falls in the
‘reject’ region, reject the null hypothesis.
Confidence Interval Method:
The confidence interval becomes   x  tn1 x  19600 1.669200 or   19933 .8 . Make a diagram –
you may use 19600 as the middle. To represent the null hypothesis shade the area above 20000. To represent
the confidence interval shade the area below 19933.8. Since these areas do not overlap, the confidence interval
and the null hypothesis contradict one another, so reject the null hypothesis.
First Example using p-values.
A p-value is a measure of the credibility of the null hypothesis and is defined as the probability that a test statistic or
lower
 low 






ratio as extreme  as or more extreme  than the observed statistic or ratio could occur, assuming that the null
 high 
 higher 




hypothesis is true. In this case, values are extreme, relative to the null hypothesis H 0 :   20000 , if the sample mean
is way below 20000. The easiest way to measure the probability is with the test ratio t 
t , call it t1 and then find Pt  t1  .
Assume that when we take our survey we find x  19800 . t 
x  0
. We find the value of
sx
19800  20000
 1 , and we want Pt  1
200
The easy way is to check the results on the computer.
t Curve with 63 Degrees of Freedom and Standard Deviation 1.01626
The Area to the Left of -1 is 0.1606
0.4
Density
0.3
0.2
0.1
0.0
-4
-3
-2
-1
0
Data A xis
1
2
3
The hard way, unfortunately, is the only way if a computer is not available. Look at your
4
t table on the line with 63
63
63
 0.847 and t.15
 1.045 . Since Pt  t    and the t
degrees of freedom. The two nearest values to one are t.20
distribution is symmetrical, we conclude .15  pvalue  .20 .
Assume that when we take our survey we find x  19600 . t 
19600  20000
 2 , and we want Pt  2
200
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
The easy way is to check the results on the computer.
t Curve with 63 Degrees of Freedom and Standard Deviation 1.01626
The Area to the Left of -2 is 0.0249
0.4
Density
0.3
0.2
0.1
0.0
-4
If we have to use the
-3
-2
-1
0
Data A xis
1
2
3
4
t table, look at the line with 63 degrees of freedom. The two nearest values to two are
63
63
t.025
 1.998 and t.01
 2.387 . Since Pt  t    and the t distribution is symmetrical, we conclude
.01  pvalue  .025 .
Assume that when we take our survey we find x  18000 . t 
18000  20000
 10 , and we want Pt  10 
200
The easy way is to check the results on the computer.
t Curve with 63 Degrees of Freedom and Standard Deviation 1.01626
The Area to the Left of -10 is 0.0000
0.4
Density
0.3
0.2
0.1
0.0
-10.0
-7.5
-5.0
-2.5
Data A xis
0.0
2.5
5.0
63
 3.225 .
t table, look at the line with 63 degrees of freedom. The nearest value to ten is t.001
Since Pt  t    and the t distribution is symmetrical, and the values of t get larger as the significance level gets
If we have to use the
smaller, we conclude pvalue  .001 .
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Assume that when we take our survey we find x  20200 . t 
20200  20000
 1 , and we want Pt  1
200
The easy way is to check the results on the computer.
t Curve with 63 Degrees of Freedom and Standard Deviation 1.01626
The Area to the Left of 1 is 0.8394
0.4
Density
0.3
0.2
0.1
0.0
-4
If we have to use the
-3
-2
-1
0
Data A xis
1
2
3
4
t table, look at the line with 63 degrees of freedom. . The two nearest values to one are
63
63
t.20
 0.847 and t.15
 1.045 . Since Pt  t    we conclude that the area above 1 is between .20 and .15. But we
want the area below 1, so we subtract these probabilities from 1 and get .80  pvalue  .85 .
Interpretation:
If we want to return to traditional hypothesis testing, remember the following:
The rule on p-value:
If the p-value is less than the significance level (alpha) reject the null hypothesis; if the p-value is greater
than or equal to the significance level, do not reject the null hypothesis.
So, if   .05 , we reject the null hypothesis if the sample mean is 19600 or 18000, but not if it is 19800 or
20200.
Second example: A two-sided hypothesis.
Assume that we are using the original two-sided problem. The problem statement is now “Test at the 5% significance
level to see if the mean income is 20000.”
 H :   20000
Statement of problem as a hypothesis pair:  0
  .05
 H 1 :   20000
The test statistic will be the sample mean x taken from a sample of n  64 , and since the population mean is unknown
we must use the t distribution with 64 – 1 = 63 degrees of freedom. Assume that we found s  1600 , which means
that s x  s
 1600
 200 .
n
64
The Second Example Using Traditional hypothesis testing.
Assume that when we take our survey we find x  19600 .
Test Ratio Method:
x  0
63
t
,  0  20000 . Use a ‘Normal’ curve with a mean at zero, a value of  tn1  t.025
 1.998
2
sx
cutting off a 2.5% tail on the left side of zero and a value of t n1  t 63  1.998 cutting off a 2.5% tail on the

right side of zero. t 
2
.025
19600  20000
 2 is in the lower ‘reject’ region, so reject the null hypothesis.
200
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Critical Value Method for x :
The formula table says xcv   0  t s x . We use xcv   0  ts x  20000  1.998 200   20000  399 .6 .
2
Make a diagram with 20000 in the middle showing a 95% ‘accept’ region above 19600.4 and below 20399.6.
Show one 2.5% ‘reject’ region below 19600.4 and another above 20399.6. Since x  19600 falls in the lower
‘reject’ region, reject the null hypothesis.
Confidence Interval Method:
The confidence interval becomes   x  tn1 x  19600  1.998200  19600  399.6 or
19200.4    19999.9 . Make a diagram – you may use either 20000 or 19600 as the middle but you are
much better off using 19600. To represent the confidence interval shade the area between 19200.4 and
19999.9. Since 20000 is not on this interval, the confidence interval and the null hypothesis contradict one
another, so reject the null hypothesis.
The Second Example Using p-values.
In the case of a 2-sided test ‘extreme’ means either way above or way below 20000. Assume that when we take our
survey we find x  19600 . 20400 is just as far from 20000 as 19600, so we want the probability that the sample mean
is above 20400 or below 19600. In practice, we just double a p-value in the case of a 2-sided hypothesis. The p-value is
2Px  19600  . But we have seen in the previous problem that a sample mean of 19600 is the same as
19600  20000
t
 2 . So the p-value is 2Pt  2 . We have already found, using the computer, that
200
Pt  2  .0249 . Using the t table we found that .01  Pt  2  .025 . If we use the computer results,
pvalue  .0498 , and, if we use the table results, .02  pvalue  .05 .
If we want to return to traditional hypothesis testing, remember the following:
The rule on p-value:
If the p-value is less than the significance level (alpha) reject the null hypothesis; if the p-value is greater
than or equal to the significance level, do not reject the null hypothesis.
So, if   .05 , we (barely) reject the null hypothesis.
Third Example: A right-sided Problem (The reverse of the First Example)
Statement of problem: The problem statement is either “Test at the 5% significance level to see if the mean income is at
most 20000,” or “Test at the 5% significance level to see if the mean income is above 20000.” These statements are
opposites. Since the first statement contains an implicit equality, it must be a null hypothesis. The second statement
does not contain an equality, so in must be an alternate hypotheses.
 H :   20000
Statement of problem as a hypothesis pair:  0
  .05
 H 1 :   20000
The test statistic will be the sample mean x taken from a sample of n  64 , and since the population mean is unknown
we must use the t distribution with 64 – 1 = 63 degrees of freedom. Assume that we found s  1600 , which means
that s x  s
 1600
 200 . Common sense states that if the sample mean is below 20000, we will not reject the
n
64
null hypothesis. Since I will use the same sample means that I used before, we will not reject the null hypothesis.
Third Example Using Traditional hypothesis testing.
Assume that when we take our survey we find x  19800 .
Test Ratio Method:
x  0
t
,  0  20000 . If the sample mean is above 20000, the t statistic will be positive. Make a diagram
sx
showing a ‘Normal’ curve with a mean at zero and a value of t cutting off a 5% tail on the right side of zero.
19800  20000
63
 1.669 , so the ‘reject’ region is above 1.669. t 
According to the t table tn1  t.05
 1
200
is not in the ‘reject’ region, so do not reject the null hypothesis.
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Critical Value Method for x :
The formula table says xcv    t s x , but this is a formula for a 2-sided interval. Because the alternate
2
hypothesis says that the mean is above 20000, we want a critical value above 20000. It should be obvious that
if our sample mean is above 20000, we would have no reason to reject the null hypothesis. We use
x cv   0  ts x  20000  1.669 200   20333 .8 . Make a diagram with 20000 in the middle showing a 95%
‘accept’ region below 20333.8 and a 5% ‘reject’ region above 20333.8. Since x  19800 does not fall in the
‘reject’ region, do not reject the null hypothesis.
Confidence Interval Method:
The formula for a confidence interval for the mean is   x  t  x , but a one-sided hypothesis requires a
2
one-sided confidence interval, and to have any value, the confidence interval must be in the same direction as
the alternate hypothesis. The interval becomes   x  tn1 x  198001.669200 or   19466 .2 . Make a
diagram – you may use 19800 as the middle. To represent the null hypothesis shade the area below 20000. To
represent the confidence interval shade the area above 19466.2. Since these areas overlap, the confidence
interval and the null hypothesis do not contradict one another, so do not reject the null hypothesis.
Assume that when we take our survey we find x  19600 .
Test Ratio Method:
x  0
,  0  20000 . Use the same ‘Normal’ curve as before with a mean at zero and a value of
t
sx
19600  20000
63
tn1  t.05
 1.669 cutting off a 5% tail on the right side of zero. t 
 2 is not in the
200
‘reject’ region, so do not reject the null hypothesis.
Critical Value Method for x :
The formula table says xcv    t s x , but this is a formula for a 2-sided interval. We want a critical value
2
above 20000, since it should be obvious that if our sample mean is below 20000, we would have no reason to
reject the null hypothesis. We use x cv   0  ts x  20000  1.669 200   20333 .8 . Make the same diagram
with 20000 in the middle showing a 95% ‘accept’ region below 20333.8 and a 5% ‘reject’ region above
20333.8. Since x  19600 does not fall in the ‘reject’ region, do not reject the null hypothesis.
Confidence Interval Method:
The confidence interval becomes   x  tn1 x  196001.669200 or   19266.2 . Make a diagram –
you may use 19600 as the middle. To represent the null hypothesis shade the area below 20000. To represent
the confidence interval shade the area above 19266.2. Since these areas overlap, the confidence interval and
the null hypothesis do not contradict one another, so do not reject the null hypothesis.
Third Example Using p-values.
A p-value is a measure of the credibility of the null hypothesis and is defined as the probability that a test statistic or
lower
 low 






ratio as extreme  as or more extreme  than the observed statistic or ratio could occur, assuming that the null
 high 
 higher 




hypothesis is true. In this case, values are extreme, relative to the null hypothesis H 0 :   20000 , if the sample mean
is way above 20000. The easiest way to measure the probability is with the test ratio t 
t , call it t1 and then find Pt  t1  .
x  0
. We find the value of
sx
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
19800  20000
 1 , and we want Pt  1 . We have
200
already found Pt  1  .1606 on the computer, or .15  Pt  1  .20 using the t table. We should thus realize that
if p  value  Pt  1 , we can subtract the above probabilities from 1 to get pvalue  1  .1606  .8394 or
1  .20  pvalue  1  .15 or .80  pvalue  .85 .
18000  20000
Assume that when we take our survey we find x  18000 . t 
 10 , and we want Pt  10  . We
200
have already found, using the computer, that Pt  10   0 . This implies that pvalue  Pt  10   1. If we want to
Assume that when we take our survey we find x  19800 . t 
use the t table, recall that in the first example we found Pt  10   .001 so pvalue  Pt  10   1  .001  .999 .
These high p-values indicate that, no matter what the significance level is, we will not reject the null hypothesis.
The above tests using Minitab
Minitab Run: The following run involves the creation of a data set called x19800.mtw that consists of four samples
that cover all the tests shown in the above examples. The data set will consist of four columns, each of which represents
a sample of 64. These columns have the means given in their names. I will then take all four data sets and put them
through the tests given in the First through Third examples above. Explanations of commands and results are given in
red.
————— 10/12/2006 8:41:43 PM ————————————————————
Welcome to Minitab, press F1 for help.
As soon as Minitab opened, I used the pull down menu Editor < Enable commands to enable me to use the Session
window as well as the pulldown menu to initiate commands.
MTB > Random 64 c1;
SUBC>
Normal 19800 1600.
The above command was initiated through the pulldown menu Calc < Random data < Normal and then setting the
options as a mean of 19800 and a standard deviation of 1600. It can also be entered directly into the ‘Session’ window.
The created sample was put into c1 (column 1).
MTB > #Roger Even Bove
The pound sign introduces a remark – in this case, my name.
MTB > round c1 c1
The above command was typed into the Session window after a MTB> prompt. It rounds the numbers to the nearest
whole number. It can be used with multiplying and dividing by multiples of ten to limit the number of places to the
right of the decimal point.
MTB > describe c1
Descriptive Statistics: C1
Variable
C1
N
64
N*
0
Mean
19756
SE Mean
206
StDev
1648
Minimum
16083
Q1
18735
Median
19749
Q3
21003
Maximum
23166
The above command was used to check on the mean and standard deviation of c1. Since the standard deviation was too
 0.9709 . The ‘describe’ command prints out as N the sample size, as N* the
large, I multiplied the column by 1600
1648
number of missing values, the sample mean, the standard error of the mean, the standard deviation, the lowest number
in the data, the first quartile, the median, the third quartile, and the largest number in the data.
MTB > let c2=44+.9709*c1
The above command multiplied the first column by .9798 and added 44 to raise the mean and put the result s in C2. I
should have done the addition later.
MTB > describe c2
Descriptive Statistics: C2
Variable
C2
N
64
N*
0
Mean
19225
SE Mean
200
StDev
1600
Minimum
15659
Q1
18234
Median
19218
Q3
20436
Maximum
22536
The above command was used to check on the mean and standard deviation of c2. The standard deviation is now right,
but the mean is too small.
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
MTB > let c3=c2+575
The above command added 19800 – 19225 = 575 to c2 and put the result in c3.
MTB > describe c3
Descriptive Statistics: C3
Variable
C3
N
64
N*
0
Mean
19800
SE Mean
200
StDev
1600
Minimum
16234
Q1
18809
Median
19793
Q3
21011
Maximum
23111
The above command was used to check on the mean and standard deviation of c3. The mean and standard deviation are
now the same as in the p-value part of the First example. The data set was saved as x19800.mtw using the File
pulldown menu.
Results for: x19800.MTW
MTB > WSave "C:\Documents and Settings\75RBOVE\My Documents\Bove'sMinitab\x19800.MTW";
SUBC>
Replace.
Saving file as: 'C:\Documents and Settings\75RBOVE\My
Documents\Bove'sMinitab\x19800.MTW'
MTB > let c1=c3
MTB > let c1 = c1-1800
MTB > let c2=c3-200
MTB > let c4 = c3+400
The above commands were used adjust the means in c1, c2 and c4 by adding the desired differences in sample means to
the data in c3. The columns were named for their means.
MTB > describe c1-c4
Descriptive Statistics: x18000, x19600, x19800, x20200
Variable
x18000
x19600
x19800
x20200
N
64
64
64
64
N*
0
0
0
0
Mean
18000
19600
19800
20200
SE Mean
200
200
200
200
StDev
1600
1600
1600
1600
Minimum
14434
16034
16234
16634
Q1
17009
18609
18809
19209
Median
17993
19593
19793
20193
Q3
19211
20811
21011
21411
Maximum
21311
22911
23111
23511
Now I had four samples with the sample means used in the examples and identical sample standard deviations.
MTB > Save "C:\Documents and Settings\75RBOVE\My Documents\Bove'sMinitab\x19800.MTW";
SUBC>
Replace.
Saving file as: 'C:\Documents and Settings\75RBOVE\My
Documents\Bove'sMinitab\x19800.MTW'
Existing file replaced.
First Example – a left sided test.
MTB > Onet c1;
SUBC>
Test 20000;
SUBC>
Alternative -1.
One-Sample T: x18000
Test of mu = 20000 vs < 20000
Variable
x18000
N
64
Mean
18000.0
StDev
1600.4
SE Mean
200.0
95%
Upper
Bound
18333.9
T
-10.00
P
0.000
The Onet command stands for one-sample t and is accessed by using the pulldown menu Stat<Basic Statistics<1sample t. or by using Session commands as above. By itself, Onet will produce a 95% confidence level. Using the
pulldown menu after 1-sample t is chosen, ‘perform hypothesis test’ can be checked and a mean supplied and then
under ‘options’ the confidence level and the direction of the alternative hypothesis can be selected. In the command
form above, ‘Test 20000’ sets a hypothesis test with a null hypothesis mean of 20000 and ‘Alter(native) -1’ makes the
 H 0 :   20000
alternative hypothesis ‘less than.’ Our hypotheses for this entire first example are thus 
.
 H 1 :   20000
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
The results printed out are sample size n  64  , sample mean x  18000  , standard deviation s  1600 .4 , standard


s
error  s x 
 200  , a one-sided 95% confidence interval in the same direction as the alternate hypothesis
n




x   0 18000  20000
n 1
  x  t .05
s x  18333 .9 , a t ratio  t 

 10  and a p-value
sx
200


Px  18000   Pt  10   0 . The p-value is the area under the zero-centered t curve to the left of -10.

MTB >
SUBC>
SUBC>

Onet c2;
Test 20000;
Alternative -1.
One-Sample T: x19600
Test of mu = 20000 vs < 20000
Variable
x19600
N
64
Mean
19600.0
StDev
1600.4
SE Mean
200.0
95%
Upper
Bound
19933.9
T
-2.00
P
0.025
This is, of course exactly the same command and the same test, but now the sample mean is closer to 20000. The pvalue Px  19600   Pt  2  .025  , the area to the left of -2 under the t curve, is considerably larger.
MTB >
SUBC>
SUBC>
Onet c3;
Test 20000;
Alternative -1.
One-Sample T: x19800
Test of mu = 20000 vs < 20000
Variable
x19800
N
64
Mean
19800.0
StDev
1600.4
SE Mean
200.0
95%
Upper
Bound
20133.9
T
-1.00
P
0.161
The sample mean has moved even closer to 20000 and the area to the left of -1.00 is a p value
large that we cannot reject the null hypothesis even at the 10% significance level.
MTB >
SUBC>
SUBC>
Pt  1  .161 so
Onet c4;
Test 20000;
Alternative -1.
One-Sample T: x20200
Test of mu = 20000 vs < 20000
Variable
x20200
N
64
Mean
20200.0
StDev
1600.4
SE Mean
200.0
95%
Upper
Bound
20533.9
T
1.00
P
0.839
The sample mean has moved to the right of 20000 and the p-value Px  20200   Pt  1  .839  , the area to the left
of 1.00 would be represented as an area under a zero-centered curve that covers the entire area to the left of zero plus a
large part of the area to the right of zero. The high p-value should be no surprise, since we found Pt  1  .161  in
the last test. By symmetry, Pt  1  Pt  1  .161 . So Pt  1  1  Pt  1  1  .161  .839 .
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Second Example – a two-sided test
MTB >
SUBC>
Onet c1;
Test 20000.
One-Sample T: x18000
Test of mu = 20000 vs not = 20000
Variable
N
Mean
StDev SE Mean
x18000
64 18000.0 1600.4
200.0
95% CI
(17600.2, 18399.7)
T
-10.00
P
0.000
The Onet command has been modified by removing the ‘alter’ part of the command so that the test becomes 2-sided.
 H :   20000
Our hypotheses for this entire second example are thus  0
.
 H 1 :   20000
The results printed out are sample size n  64  , sample mean x  18000  , standard deviation s  1600 .4 , standard




s
n 1
error  s x 
s x , a t ratio
 200  , a two-sided 95% confidence interval   x  t .025
n




x   0 18000  20000
t 

 10  and a p-value 2Px  18000   Pt  10   Pt  10   20  0 . The p-value is

sx
200


the areas under the zero-centered t curve the left of -10 and to the right of 10. It is thus double the area we got for a
sample mean of 18000 in the first example.
MTB >
SUBC>
Onet c2;
Test 20000.
One-Sample T: x19600
Test of mu = 20000 vs not = 20000
Variable
x19600
N
64
Mean
19600.0
StDev
1600.4
SE Mean
200.0
95% CI
(19200.2, 19999.7)
T
-2.00
P
0.050
The p-value is the areas under the zero-centered t curve the left of -2 and to the right of 2. It is thus double the area we
got for a sample mean of 19600 in the first example.
MTB >
SUBC>
Onet c3;
Test 20000.
One-Sample T: x19800
Test of mu = 20000 vs not = 20000
Variable
x19800
MTB >
SUBC>
N
64
Mean
19800.0
StDev
1600.4
SE Mean
200.0
95% CI
(19400.2, 20199.7)
T
-1.00
95% CI
(19800.2, 20599.7)
T
1.00
P
0.321
Onet c4;
Test 20000.
One-Sample T: x20200
Test of mu = 20000 vs not = 20000
Variable
x20200
N
64
Mean
20200.0
StDev
1600.4
SE Mean
200.0
P
0.321
The p-value is the area under the zero-centered t curve the left of -1 and the area to the right of 1. It is thus not double
the area we got for a sample mean of 20200 in the first example. The reason is that the sample mean of 20200 is now
above 20000. If we use the ‘as extreme or more extreme’ definition, the sample mean is extreme because it is above
20000. We can say pvalue  2Px  20200   Pt  1  Pt  1  2.161   .321
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Third Example – a right-sided test.
MTB > Onet c1;
SUBC> test 20000;
SUBC> alter 1.
One-Sample T: x18000
Test of mu = 20000 vs > 20000
Variable
x18000
N
64
Mean
18000.0
StDev
1600.4
SE Mean
200.0
95%
Lower
Bound
17666.0
T
-10.00
P
1.000
The Onet command has been modified again by use of the ‘alter(native) 1’ subcommand, which makes the alternative
 H :   20000
hypothesis ‘greater than.’ Our hypotheses for this entire first example are thus  0
.
 H 1 :   20000
The results printed out are sample size n  64  , sample mean x  18000  , standard deviation s  1600 .4 , standard


s
error  s x 
 200  , a one-sided 95% confidence interval in the same direction as the alternate hypothesis
n




x   0 18000  20000
n 1
  x  t .05
s x  17666 .0 , a t ratio  t 

 10  and a p-value
sx
200


Px  18000   Pt  10   1 . The p-value is the area under the zero-centered t curve to the right of -10. In the first


part of the first example, we found Px  18000   Pt  10   0 . So it should be no surprise that
Px  18000   1  Px  18000   1  Pt  10   1  0  1 .
MTB > Onet c2;
SUBC> test 20000;
SUBC> alter 1.
One-Sample T: x19600
Test of mu = 20000 vs > 20000
Variable
x19600
N
64
Mean
19600.0
StDev
1600.4
SE Mean
200.0
95%
Lower
Bound
19266.0
T
-2.00
P
0.975
SE Mean
200.0
95%
Lower
Bound
19466.0
T
-1.00
P
0.839
MTB > Onet c3;
SUBC> test 20000;
SUBC> alter 1.
One-Sample T: x19800
Test of mu = 20000 vs > 20000
Variable
x19800
N
64
Mean
19800.0
StDev
1600.4
In the two tests above, we could have predicted the p-value from the results in the First example. The values of the
sample mean are obviously in accord with the null hypothesis,   20000  .so that a p-value above .5 can be expected.
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
MTB > Onet c4;
SUBC> test 20000;
SUBC> alter 1.
One-Sample T: x20200
Test of mu = 20000 vs > 20000
Variable
x20200
N
64
Mean
20200.0
StDev
1600.4
SE Mean
200.0
95%
Lower
Bound
19866.0
T
1.00
P
0.161
MTB > Save "C:\Documents and Settings\75RBOVE\My Documents\Bove'sMinitab\x19800.MTW";
SUBC>
Replace.
Saving file as: 'C:\Documents and Settings\75RBOVE\My
Documents\Bove'sMinitab\x19800.MTW'
Existing file replaced.
Unfortunately the data set used in the run above could not be saved. For a previous version of this paper, a similar data
set with a sample mean of 19800 was created and stored as 2b.mtb. It appears below.
x
21600.7
21632.5
17681.6
20698.0
22062.0
20893.8
20158.2
18283.7
19239.5
22797.4
21192.6
17492.7
21207.4
19583.5
19505.5
23181.6
19500.2
17459.4
16739.8
23308.3
19274.1
17918.9
22850.6
20699.6
19042.8
20001.0
19101.5
19322.1
19524.5
19406.7
19254.7
22663.6
20375.1
18117.0
18189.4
21869.6
20280.6
18819.2
18284.8
19875.8
18986.3
18200.6
19122.4
19061.2
16098.3
19441.1
20992.1
20375.9
20695.9
17350.3
21222.1
20476.2
17722.9
20010.4
20566.7
21815.2
20515.5
19953.3
20956.3
19732.8
19374.0
19002.4
18422.6
18028.2
How the p-values were found using Minitab
You already have an example above of how to do the entire problem using Minitab. To just check the values of t , I
used my tareaA macro after first using the ‘file’ pulldown menu to retrieve the nonsense worksheet ‘notmuch,’ which is
in the same file as the macro. As soon as Minitab opened, I used the pull down menu Editor < Enable commands to
enable me to use the Session window as well as the pulldown menu to initiate commands.
————— 2/8/2005 7:01:16 PM ————————————————————
Welcome to Minitab, press F1 for help.
Results for: notmuch.MTW
MTB > WOpen "C:\Documents and Settings\rbove\My Documents\Minitab\notmuch.MTW".
Retrieving worksheet from file: 'C:\Documents and Settings\rbove\My
Documents\Minitab\notmuch.MTW'
Worksheet was saved on Fri Jan 21 2005
MTB > %tareaA
Executing from file: tareaA.MAC
Graphic display of t curve areas
Finds and displays areas to the left or right of a given value
or between two values. (This macro uses C100-C116 and K100-K120)
Enter the degrees of freedom.
DATA> 63
Do you want the area to the left of a value? (Y or N)
y
252onesx0 10/20/06
(Open this document in 'Page Layout' view!)
Enter the value for which you want the area to the left.
DATA> -1
...working...
t Curve Area
Data Display
mode
0
median
0
MTB > %TareaA
Executing from file: TareaA.MAC
Graphic display of t curve areas
Finds and displays areas to the left or right of a given value
or between two values. (This macro uses C100-C116 and K100-K120)
Enter the degrees of freedom.
DATA> 63
Do you want the area to the left of a value? (Y or N)
y
Enter the value for which you want the area to the left.
DATA> -2
...working...
t Curve Area
Data Display
mode
0
median
0
MTB > %tareaA
Executing from file: tareaA.MAC
Graphic display of t curve areas
Finds and displays areas to the left or right of a given value
or between two values. (This macro uses C100-C116 and K100-K120)
Enter the degrees of freedom.
DATA> 63
Do you want the area to the left of a value? (Y or N)
y
Enter the value for which you want the area to the left.
DATA> -10
...working...
t Curve Area
Data Display
mode
0
median
0
MTB > %tareaA
Executing from file: tareaA.MAC
Graphic display of t curve areas
Finds and displays areas to the left or right of a given value
or between two values. (This macro uses C100-C116 and K100-K120)
Enter the degrees of freedom.
DATA> 63
Do you want the area to the left of a value? (Y or N)
y
Enter the value for which you want the area to the left.
DATA> 1
...working...
t Curve Area
Data Display
mode
0
median
0
© 2006 R. E. Bove