Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Confidence Interval and
Hypothesis Testing for:
Population Mean ( )
1
Assumptions & Conditions
Random sample
Independent observations
Nearly normal distribution
y ~ N (, / n )
2
Student’s t-Model for
decisions about the mean, 
t=
y-

s
n
With df=n-1
3
One-Sample t-Interval
When the conditions are met, the confidence
interval for the means of one population is:
_______________________
where the standard error of the means is:
_______________________
The critical value (t*) depends on the particular
confidence level, C, and the degrees of freedom, df.
4
CI for the mean, 
Stat, tests, 8: T-Interval
HT for the mean, 
Stat, tests, 2: T-Test
5
Terms
Significant Level ()
P-value (P in TI)
Null Hypothesis (Ho )
Alternative Hypothesis (HA )
6
Decisions
Reject the null hypothesis if the
P-value is less than or equal to
the significance level .
Reject Ho if P-value < .
Fail to reject Ho if P-value > .
7
Procedures
1. Hypotheses:
Ho
HA
2. Assumptions and Conditions:
3. Mechanics
T=
P-value =
< Significant Level ()?
4. Conclusion:
Answer the original question.
8
Confidence Interval and
Hypothesis Testing
Comparing Two Population Means:
Finding and Testing their difference
(  1-  2)
9
Assumptions and
Conditions for t-model
• Independence Assumption (Each condition
needs to be checked for both groups.):
– Randomization Condition: Were the data collected
with suitable randomization (representative
random samples or a randomized experiment)?
– 10% Condition: Is the sample size (n) less than
10% of the population size (N)? We don’t usually
check this condition for differences of means. We
will check it for means only if we have a very small
population or an extremely large sample.
10
Assumptions and
Conditions (cont.)
• Normal Population Assumption:
– Nearly Normal Condition: This must be checked
for both groups. A violation by either one violates
the condition.
• Independent Groups Assumption: The two
groups we are comparing must be
independent of each other.
11
Two-Sample t-Interval
When the conditions are met, the confidence interval
for the difference (between means of two
independent groups) is:
 y1 - y2 )  t

df
 SE  y1 - y2 )
where the standard error of the difference of the
means is:
s2 s2
SE  y1 - y2 ) 
1
n1

2
n2
The critical value (t*) depends on the particular confidence level,
C, and the degrees of freedom, df, derived from the sample
sizes and a special formula.
12
Degrees of Freedom (df)
• The special formula for the degrees of
freedom for our t critical value is a bear:
2
 s12 s22 
  
 n1 n2 
df 
2
2
1  s12 
1  s22 
  
 
n1 - 1  n1  n2 - 1  n2 
• Because of this, we will let technology
calculate degrees of freedom for us!
(or pursue a stat major or minor)
13
Testing the Difference
Between Two Means
• The hypothesis test we use is the
two-sample t-test for means.
• The conditions for the two-sample t-test
for the difference between the means of
two independent groups are the same
as for the two-sample t-interval.
14
Testing the Difference
Between Two Means (cont.)
We test the hypothesis H0:1 – 2 = 0, where the
hypothesized difference, 0, is almost always 0, using
the statistic
y1 - y2 ) -  0

t
SE  y1 - y2 )
The standard error is
SE  y1 - y2 ) 
s12 s22

n1 n2
When the conditions are met and the null hypothesis is true, this
statistic can be closely modeled by a Student’s t-model with a
number of degrees of freedom given by a special formula. We
use that model to obtain a P-value.
15
Back Into the Pool
• Remember that when we know a
proportion, we know its standard
deviation.
– Thus, when testing the null hypothesis that
two proportions were equal, we could
assume their variances were equal as well.
– This led us to pool our data for the
hypothesis test for p1-p2.
16
Back Into the Pool (cont.)
• For means, there is also a pooled t-test.
– Like the two-proportions z-test, this test
assumes that the variances in the two
groups are equal.
– But, be careful, there is no link between a
mean and its standard deviation…
17
Back Into the Pool (cont.)
• If we are willing to assume that the variances
of two means are equal, we can pool the data
from two groups to estimate the common
variance and make the degrees of freedom
formula much simpler.
• We are still estimating the pooled standard
deviation from the data, so we use Student’s
t-model, and the test is called a pooled t-test.
18
*The Pooled t-Test
• If we assume that the variances are equal, we
can estimate the common variance from the
numbers we already have:
s 2pooled
n1 - 1) s12   n2 - 1) s22


 n1 - 1)   n2 - 1)
• Substituting into our standard error formula,
we get:
1 1
SE pooled  y1 - y2 )  s pooled

n1
n2
19
*The Pooled t-Test and
Confidence Interval
• The conditions are the same, plus the
assumption that the variances of the two
groups are the same.
• For the hypothesis test, our test statistic is
y1 - y2 ) -  0

t
SE pooled  y1 - y2 )
which has df = n1 + n2 – 2.
• Our confidence interval is
 y1 - y2 )  t

df
 SE pooled  y1 - y2 )
20
Is the Pool All Wet?
• So, when should you use pooled-t
methods rather than two-sample t
methods? Well, hardly ever.
• Because the advantages of pooling are
small, and you are allowed to pool only
rarely (when the equal variance
assumption is met).
• Don’t pool.
21
Can We Test Whether
the Variances Are Equal?
• The test is very sensitive to non-normal
data and works poorly for small sample
sizes.
• So, the test does not work when we
need it to.
22
What Can Go Wrong?
• Watch out for paired data.
– The Independent Groups Assumption
deserves special attention.
– If the samples are not independent, you
can’t use two-sample methods.
• Look at the plots.
– Check for outliers and non-normal
distributions by making and examining
boxplots or normal probability plots.
23
What have we learned?
To use statistical inference to
compare the means of two
independent groups.
– We use t-models for CI and HT.
– It is important to check conditions to see if
the assumptions for t-model are met.
– Don’t pool the standard errors.
24
Coke Versus Pepsi
Independent random samples of 36 cans of Coke
and Pepsi are weighed and summarized below.
Use the 0.01 significance level to test the claim
that the mean weight of regular Coke is different
from the mean weight of regular Pepsi.
Regular Coke
Regular Pepsi
n
36
36
y
0.817
0.824
s
0.0076
0.0057
25
Coke Versus Pepsi
26
Coke vs. Pepsi
Ho :  1 =  2
Ha :  1   2
 = 0.01
Reject H0
- t* = - ____
Fail to reject H0
t=0
Reject H0
t* = _____
27