Download Statistics MINITAB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistics
MINITAB - Lab 12
Comparing Two Population Means - Large Samples
1.
When dealing with large samples we can use the Central Limit Theorem to test
hypotheses about the difference between two population means.
Summary From Lecture Notes
1. The mean of the sampling distribution of
x1  x2 
 1   2  .
is
2. If the two samples are independent, the standard deviation of the sampling distribution is
 12
 x x  
1
Where
 12 ,  22
n1
2

 22
n2

s12 s 22

n1 n2
are the variances of the two populations, and n1, and n2 are the sample sizes of the
two groups respectively.
 x x
1
2
 is also known as the standard error of the statistic

x1  x2  .

3. The sampling distribution of x1  x2 is approximately normal for large sample (i.e. where
n1, and n2 are both > 30) by the central limit theorem.
4. A large sample confidence interval for
formula
1   2  may be calculated using the following
2
2
x1  x2   zcrit  x1  x2   x1  x2   zcrit  1   2  x1  x2   zcrit
n1
n2
s12 s22

n1 n2
On the onlineclass you will find a data set called IQ.mtw. This data set contains two
variables, IQ Group1 are the IQ scores for children in their first year of school who had not
attended any pre-school, IQ Group2 are the IQ scores for children in their first year of
school who had attended pre-school for 1 year. An educational psychologists wished to
investigate whether the data supports any evidence that children who attend pre-school
display higher IQ scores.
Here are the familiar steps in hypothesis testing - amended to reflect comparing two
population means from independent large samples.
Step 1. Choose the population characteristic of interest. D0 - the difference
between the population means (i.e. 1 - 2)
Step 2. Choose the significance level.  = .10 (or 10% level).
Step 3. State null hypothesis. Ho: D0 = 0 (i.e. 1 - 2 = 0 )
1
Step 4. State alternative hypothesis. Ha: D0 < 0.
Step 5. Choose a test statistic. In this case the test statistic chosen is
z
x1  x2   D0
 12
n1

 22
n2

x1  x2   D0
s12 s 22

n1 n2
Step 6. Choose a rejection region
Since the alternative hypothesis includes only means of less than 0, this is a
one-tailed test. The rejection region will be in the lower tail of the standard
normal distribution. First get the z critical value such that 10% of the standard
normal distribution is to the right and therefore 90% is to the left with the
following command and then take the negative of the answer since this is a
lower tailed test.
MTB > INVCDF .90;
SUBC> NORMAL 0 1.
What is the answer ? _____________
Verify this in the Cambridge tables.
Reject if z is ______________________
Step 7. Calculate the test statistic
Fill in the following equation.
z
x1  x2   D0

2
1
n1


2
2
n2

x1  x2   D0
s12 s 22

n1 n2
= ______________________________ = ______________
Step 8. State Conclusion in the context of the question
Reject / Fail To Reject the Ho: at  = _______, that
______________________________________________________
______________________________________________________
2
Now calculate a two sided 90% confidence interval for the difference between the 2
population means:
90% Confidence Interval = (____________________, to _____________________)
Comparing Two Population Means - Small Samples
2.
We can also compare two population means for small samples. As in the one sample case
we can no longer assume that the sample standard deviation gives a good estimate for the
population standard deviation . Therefore the test statistic is no longer normally distributed
by the central limit theorem. However a two sample t-test for independent samples is
available in these cases.
Summary From Lecture Notes
1. If the two samples are random and independently sampled, the mean of the sampling distribution
of
x1  x2 
is
 1   2  .
2. If the two samples are random and independently sampled, and both populations have the same
variance, then a pooled estimate of variance may be obtained from the following weighted average
of variances formula
s 2p 
n1  1s12  n2  1s212
n1  n2  2
3. The test statistic t is distributed as t with n1 + n2 -2 degrees of freedom.
t
4. A (1-)100% confidence interval for
x1  x2   tcrit
 x1  x 2   D0
1
1
s 2p  
 n1 n 2



1   2  is given by
1 1
s 2p   
 n1 n2 
where tcrit is the appropriate quantile from the student
t distribution with (n1 + n2 - 2) degrees of freedom
3
The same educational psychologist that tested for the effects of pre-school education on
IQ scores above implemented a new technique of teaching reading to ‘slow learners’. This
new technique was implemented with 8 children from a class group and the remaining 12
members of the class were given the standard teaching method. After 6 months a
standardised reading test was given to all the students and the results bellow were
obtained. The psychologist wanted to report two things – the result of a two sample
independent t test on the data to detect for any difference between the two teaching
methods (with  = .10) and to report a 2 sided 95% confidence interval for the mean
difference between the two teaching methods.
Reading Test Scores for Slow Learners
New Method
Standard Method
80
77
79
86
80
66
62
73
79
79
70
72
81
76
68
68
73
75
76
66
Since a standardised reading test is administered it is reasonable to assume that the
distribution of reading scores on the test are normally distributed with equal variance.
Enter this data into two columns in MINITAB and name these columns appropriately.
Conduct this test using MINITAB’s two sample t-test for independent sample function. Go
to STAT > BASIC STATISTICS > 2 SAMPLE T…
1. Specify the
Columns where the
data is stored
2. Click on assume equal
variances
3. Click on options
4
4. Specify the appropriate
confidence level
i.e. (1-)100%
5. Specify the
hypothesised D0 under
the null hypothesis
6. Select the appropriate
alternative hypothesis
Fill in the following and state your conclusion fully.
1.
What is the statistic of interest ? _________________________________
2.
What is the significance level ? ______________________
3.
State H0:
_________________________________________________________
4.
State HA:
_________________________________________________________
5.
What is the appropriate test statistic formula?
6.
What is the rejection region? (Use Minitab or Cambridge tables)
_____________________________________________________________
7.
What is the result of calculating the test statistic ? ____________________
What is pooled estimate of variance ?
8.
____________________
What is the conclusion from you hypothesis test ?
______________________________________________________________
______________________________________________________________
9.
What is the 95% confidence interval for the mean difference between the two teaching
methods ?
5
From (_______________, to _______________).
10.
There is an apparent contradiction between the results of the hypothesis test and the
range of the confidence interval. What is the apparent contradiction and why is it only
apparent – i.e. explain why it is not a real contradiction in statistical terms.
___________________________________________________________________
___________________________________________________________________
Assignment:
Open the dataset called Data_lab12.mtw on onlineclasses and answer the following two
questions:
(a) There are two types of trees in a forest and two samples of each one are taken at random
and their heights (in metres) are recorded (in column 1 and column 2). Is there evidence at
99% confidence that tree1 is taller than tree 2?
[Hint: Notice the sample size by generating the summary statistics. There is no inbuilt
function in minitab for this so the test statistic must be worked out by hand]
Answer:
H0:
HA:
:
Test Statistic:
Critical Value:
P-Value:
Conclusion:
6
(b) In the same forest there are two types of flowers, a and b. A random sample of each is
taken and their heights (in centimetres) is recorded (in column 3 with column 4 indicating
which type each flower is). Is there any evidence of a difference between the heights of the
two flowers?
[Hint: To familiarise yourself with the data, first display the summary statistics by using the
flower type variable as the ‘by variable’.]
Answer:
H0:
HA:
:
Test Statistic:
Critical Value:
P-Value:
Conclusion:
REVISION SUMMARY
After this lab you should be able to :
-
perform a hypothesis test for two large independent samples
-
perform a hypothesis test for two small independent samples
-
be able to determine when to use normal distribution tables and when to use t
distribution tables when dealing with 2 samples tests
END
7