Download Lecture 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Today’s lesson (Chapter 13)
•
•
•
•
variance(W-Y)
Two sample (W, Y) standard normal test
Confidence intervals for E(W-Y)
Determining the sample size needed to have
a specified probability of a Type II error and
probability of a Type I error in a two sample
test.
Var(W-Y)
•
•
•
•
•
Definition: Variance(Y)=E[(Y-EY)2]
Fact: Expectation is a linear operator.
Variance(W-Y)=E[(W-Y-E(W-Y))2]
Variance(W-Y)=E[(W-EW-(Y-EY))2]
Expand right term out using high school
algebra.
Var(W-Y)
• E[(W-EW-(Y-EY))2]=
• E[(W-EW)2+(Y-EY)2-2(W-EW)(Y-EY)]=
• E[(W-EW)2]+E[(Y-EY)2]-2E[(W-EW)(YEY)]=
• var(W)+var(Y)-2cov(W,Y).
New Facts
• New Definition
– cov(W,Y)=cov(Y,W)=E[(W-EW)(Y-EY)]
– cov(W,Y) is the numerator of the correlation
coefficient of W and Y
• New Fact
– var(W-Y)=var(W)+var(Y)-2cov(W,Y)
Two Sample Testing Problem
• Research team has two samples, n
observations from W and m observations
from Y.
• ASS-U-ME
– W sample and Y sample are independent
– W is normally distributed with mean E(W) and
variance σW2
– Y is normally distributed with mean E(Y) and
variance σY2
Two Sample Testing Problem
• Null hypothesis: E(W)=E(Y)
– New parameter: E(W)-E(Y)
• Alternative hypothesis may be left-sided,
right-sided, or two-sided.
• Test statistic:
– W mean (of n observations) - Y mean (of m
observations)
Distribution of Test Statistic
• Distribution of W mean - Y mean
– either normal or approximately normal
– expected value is E(W-Y)
– variance is (σW2/n)+(σY2/m)
• Under null hypothesis, expected value of
difference of means is 0.
Deriving standard error of
difference of two means
• Var(W mean-Y mean)=
• var(W mean)+var(Y mean)-2cov(W mean,
Y mean)
• W sample is assumed to be independent of
Y sample, so covariance of means is 0.
• var(W mean)=σW2/n, n is number in W
sample.
• var(Y mean)=σY2/m, m is number in Y
sample.
Variances Known
• Find null distribution of test statistic using
known variances and sample sizes.
• Standardize the test statistic, which is
always the difference of the two sample
means.
• Follow standard decision sequence.
Variances Unknown
• This is a Student’s t problem.
• Two possibilities
– ASS-U-ME var(W)=var(Y) is reasonable; this
is the classic two independent sample t-test that
is usually covered in the prerequisite class.
– Assumption var(W)=var(Y) is not reasonable;
use unequal variance t-test.
Checking Assumption of Equal
Variances
• Use SPSS
– statistics, compare means, independent sample
t-test.
• SPSS uses Levene’s test for equality of
variances.
– Sig. means p-value of Levene’s test
– Use it as you would any observed significance
level.
Choosing Equal or Unequal
Variance t-test
• Some statistics professors always want
equal variance t-test. Answer their questions
with the equal variance t-test. This typically
includes Actuary Society questions.
• In AMS315 life, I will tell you which test to
use (use the equal variance t-test if there is
no specification).
Choosing Equal or Unequal
Variance t-test
• In real life, I ALWAYS use the unequal
variance t-test.
• Some people choose the unequal variance ttest if the p-value for Levene’s test of the
equality of variances is very small.
Example Problem Group I
• I present you with a computer output on the
comparison of average irresponsibility at
time 5 for subjects who did not use
marijuana at time 3 to the average
irresponsibility at time 5 for subjects who
did use marijuana at time 3.
Example Problem Group I
• You register that this is an A vs. B
comparison, with the A group being those
who did not use marijuana at time 3 and the
B group is those who did use marijuana at
time 3. The dependent variable is
irresponsibility at time 5.
Example Problem Group I
• Reading the output, you learn that there
were 215 subjects who did not use
marijuana at time 3 and that their average
irresponsibility was 10.7860, with a
standard deviation of 2.5779. There were
151 subjects who did use marijuana at time
3, and their average irresponsibility was
10.8411. The standard deviation was
2.3526.
Example Problem 1
• Levene’s test for the equality of variances
had sig.=0.206.
• The 2-tailed sig for the equal variance test
was 0.835, for the unequal variance test was
0.833.
First Problem
• Which of the following conclusions is
correct about the test of the null hypothesis
that expected irresponsibility at time 5 for a
subject who did not use marijuana at time
3=expected irresponsibility at time 5 for a
subject who did use marijuana at time 3
First Problem Continued
• against the alternative hypothesis that
expected irresponsibility at time 5 for a
subject who did not use marijuana at time 3
was not equal expected irresponsibility at
time 5 for a subject who did use marijuana
at time 3?
• Usual options.
Solution
• Both p-values were approximately equal
and were large (0.8).
• Hence, the correct decision is to accept at
the 0.10 level of significance (last option).
Second Problem
• What is the correct decision in the
following? The null hypothesis is: Expected
irresponsibility at time 5 for a subject who
did not use marijuana at time 3 - expected
irresponsibility at time 5 for a subject who
did use marijuana at time 3 = 0, alpha=0.05;
Second Problem Continued
• the alternative is expected irresponsibility at
time 5 for a subject who did not use
marijuana at time 3 - expected
irresponsibility at time 5 for a subject who
did use marijuana at time 3 is not equal to 0.
Solution to Second Problem
• Read your output to find that the mean
difference in the two groups was -5.50E-02.
– -5.50E-02 is a notation for -5.50x10-02=-0.0550.
• Check means of groups to confirm that the
mean difference is that of irresponsibility
for subjects who did not use marijuana at
time 3-irresponsibility for subjects who did
use (10.7860-10.8411=-0.0550.
Solution to Second Problem
• Check that this order is the same as the
order that I asked you about.
• Read your computer output to find that the
95 percent confidence interval for the mean
difference is -0.5744 to 0.4644.
• Check whether or not the value 0 specified
in the problem is in the confidence interval.
• It is, so accept the null hypothesis.
Third Problem
• What is the value of the t-test assuming
equal variances?
Solution to the Third Problem
• The computer output has the mean
difference (-5.50E-02).
• The computer output has the standard error
of the mean difference (0.2641, on the line
labeled “Equal variances assumed”)
• t-test is the standard score value of the test
statistic (-0.0550-0)/0.2641=-0.21.
Fourth Problem
• How many degrees of freedom does the
equal variance independent sample t-test
have in this problem?
Solution
• Read the computer output to find that there
were 215 subjects who did not use
marijuana at time 3.
• There were 151 subjects who did use
marijuana at time 3.
• The number of degrees of freedom is n+m-2
in general.
• Here, 215+151-2=364.
Fifth Problem
• Which of the following is a correct decision
about the test of the null hypothesis that
variance of irresponsibility at time 5 for a
subject who did not use marijuana at time 3
is equal to the variance of irresponsibility at
time 5 for a subject who did use marijuana
against the alternative that these two
variances are not equal? Usual options.
Solution
• Read the computer output to find that the
sig of Levene’s test is 0.206.
• This is larger than 0.10, the level of
significance in the last option.
• The answer is to accept at the 0.10 level
(choose the last option).
Example Problem Group II
• Each patient in a study will take a specified
medicine, and the patient’s response to that
medicine will be measured. Twenty patients
will be randomly assigned to two groups of
ten each.
Example Problem Group II
• Group 1 will receive an experimental
medicine. The random variable X denotes a
patient’s response to the experimental
medicine and is normally distributed with
unknown expected value E(X) and
unknown standard deviation σ.
Example Problem Group II
• Group 2 will receive the best available
medicine. The random variable B denotes a
patient’s response to the experimental
medicine and is normally distributed with
unknown expected value E(B) and unknown
standard deviation σ. The null hypothesis in
this experiment is that E(X-B)=0, and the
alternative is that E(X-B)<0.
Example Problem Group II
• The experiment was run. The observed x
sample average was 274.9; and the observed
b sample average was 473.7. The observed
X group standard deviation was 233.7, and
the B group standard deviation was 348.0.
The resulting pooled estimate of the
standard deviation was 296.5.
Group II First Problem
• What is the standard deviation of the
random variable X average - B average?
Solution
• Var(X average)=σ2/10.
• Var(B average)=σ2/10.
• Two averages are from independent
samples, and so the covariance is zero.
• Var(X average-B average)=(σ2/10)+(σ2/10)
• sd(X average-B average)=(0.2)0.5σ=0.447σ.
• The answer is 0.447σ. NOT 0.447(296.5)!
Group II, Second Problem
• Which of the following is a correct decision
for accepting or rejecting the null
hypothesis based on the sample averages
and standard deviations given in the
common paragraph?
• Usual options: reject at 0.01, accept at 0.01
and reject at 0.05, accept at 0.05 and reject
at 0.10, and accept at 0.10.
Solution
• Calculate the t-statistic (standard score form
of the test statistic).
– Difference of means is 274.9-473.7=-198.8
– Estimated standard error of test statistic is
0.447(296.5)=132.36.
– Standard units value=(-198.8-0)/132.36=-1.50.
• Find degrees of freedom.
– 10+10-2=18
Solution
• Determine side of test.
– Left sided test.
• Stretch normal distribution critical values to
values appropriate for 18 degrees of
freedom.
– Stretch -2.326 (0.01 level) to -2.552, -1.645 to 1.734, and -1.282 to -1.330
• Decide: Accept at 0.01; accept at 0.05;
reject at 0.10. Option C is correct.
Today’s Class
• New fact about var(W-Y)
• Application to testing two independent
samples.
• Making Student’s corrections.
Next Class
• Paired t-test.
• Finding smarter ways of making an A vs. B
comparison.