Download Two-Sample Inference Procedures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Confidence interval wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Two-Sample Inference
Procedures with Means
Two-Sample Procedures
with Means
• Goal: Compare two different
populations/treatments
• INDEPENDENT samples from each
population/treatment
Remember:
When combining two random variables X
and Y,
   
x y

x y
x
y
  
2
2
x
y
This formula
only works if
X and Y are
independent
Suppose we have a population of adult
men with a mean height of 71 inches
and standard deviation 2.6 inches.
We also have a population of adult women with a
mean height of 65 inches and standard deviation
2.3 inches. Heights are normally distributed.
Describe the distribution of the difference in
heights between males and females (male –
female).
Normal distribution
μM-F = 6 inches & σM-F = 3.471 inches
Female
65
Male
71
Difference (male – female)
6
σ = 3.471
a) What is the probability that a randomly
selected man is at most 5 inches taller than
a randomly selected woman?
P(xM – xF < 5) = normalcdf(-1E99, 5, 6, 3.471)
= .3866
b) What is the 70th percentile for the difference
(male – female) in heights of a randomly
selected man and woman?
(xM – xF) = invNorm(.7, 6, 3.471) = 7.82
Calculator Simulation!
a) What is the probability that the mean
height of 30 men is at most 5 inches taller
than the mean height of 30 women?
P(xM – xW < 5) = .057
b) What is the 70th percentile for the
difference (male – female) in mean heights
of 30 men and 30 women?
6.332 inches
Conditions for Two Means
• Two independent SRS's (or randomly
assigned treatments)
• Both samp. dist. are approx. normal
– Both populations normal
– Both n's > 30
– Both graphs linear
Degrees of Freedom
Option 1: Use the smaller df: n1 – 1 or n2 – 1
 Using the larger one overestimates the
collective sample sizes
Option 2: Welch-Satterthwaite approximation
s s 
  
n n 

df 
1 s 
1 s
  

n  1 n  n  1 n
1
2
2
1
2
1
2
2
2
2
1
2
1
2
2
Calculator
does this
automatically!



Confidence Interval for the
Difference of Two Means
Standard Error/
Deviation
CI  statistic  critical value SD
of statistic
x  x   t
1
2
*
df
2
2
1
2
1
2
s s

n n

Two competing headache remedies claim to give fast-acting
relief. An experiment was performed to compare the mean
lengths of time required for bodily absorption of brand A and
brand B. Absorption time is normally distributed. Twelve
people were randomly selected and given a dosage of brand A.
Another 12 were randomly selected and given an equal dosage
of brand B. The length of time in minutes for the drugs to
reach a specified level in the blood was recorded. The results
follow:
mean
SD
n
Brand A
20.1
8.7
12
Brand B
18.9
7.5
12
a) Describe the sampling distribution of the
differences in the mean speed of absorption (A – B).
Normal; s = 3.316
b) Construct a 95% confidence interval for the
difference in mean lengths of time (A – B) required
for bodily absorption of each brand.
Conditions:
• 2 independent randomly assigned treatments
• Populations are normal
2
2
1
“Price2
s
s
 is Right”:
x1  x 2   t
n 2 going over
Closest dfn1without
*Think
21.53
From calculator
2
2
8.7 7.5
20.1  18.9  2.080

 (5.685,8.085)
12
12
We are 95% confident that the true difference in
mean absorption time (A minus B) is between
-5.685 minutes and 8.085 minutes.
If we made lots of intervals this way, 95% of them
would contain the true difference in means.
A Subtle Distinction
• Matched pairs: “mean difference”
• Two-sample inference:
“difference in means”
Hypothesis Statements
H0: μ1 =
– μ22 = 0
Ha: μ1 –
< μ22 < 0
– μ22 > 0
Ha: μ1 >
Ha: μ1 –≠ μμ22 ≠ 0
Be sure to
define
BOTH μ1
and μ2!
Test Statistic
Test statistic 
t df 
Since
we
assume
statistic - parameter
H0 is true, this part
0 – so we
SD equals
of statistic
can leave it out
 x  x      
1
2
1
2
2
1
2
1
2
s s

n n
2
c) Is there sufficient evidence that the two
brands differ in the speed at which they enter
the bloodstream?
Conditions:
• 2 independent randomly assigned treatments
• Populations are normal
H0: A= B
H a: A ≠ B
t21.53 
Where μA and μB are the true
mean absorption times
x1  x 2
2
1
2
2
s
s

n1 n 2

20.1 18.9
2
8.7 7.5

12
12
2
 .361
p-value = .7210 α = .05
Since p-value > α, we fail to reject H0. There
is not sufficient evidence to suggest these
drugs differ in their absorption time.
Pooling
• Used for two populations with the same
variance (σ2)
• Pooling = Averaging the two s2 to
estimate σ2
• We almost never pool for means, since
we don't know σ
Robustness
• Two-sample procedures: more robust
than one-sample procedures
• Most robust with equal sample sizes
(but not necessary!)
A modification has been made to the process for producing a
certain type of film. Since the modification costs extra, it will
be incorporated only if sample data indicate that the
modification decreases the true average development time. At
a significance level of 10%, should the company incorporate
the modification?
Original
Modified
8.6 5.1 4.5 5.4 6.3
5.5 4.0 3.8 6.0 5.8
6.6 5.7 8.5
4.9 7.0 5.7
Conditions:
• 2 independent SRS's of film
• Normal prob. plots linear  approx. normal sampling dist.’s
Where μO and μM are the true mean
developing times for original and
modified film
H0: μO = μM
Ha: μO > μM
t12.55 
xO  xM 
2
O
2
M
s
s

nO nM
p-value = .076

6.3375  5.3375
2
1.5146 1.0636

8
8
2
 1.53
α = .1
Since p-value < α, we reject H0. There is
sufficient evidence to suggest the company
should incorporate the modification.