Download Chapter 20 Testing Hypothesis about proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 20 Testing Hypothesis
about proportions

Example:

Metal Manufacturer



After Changes in the casting process:


Ingots
20% defective (cracks)
400 ingots and only 17% defective
Is this a result of natural sampling
variability or there is a reduction in the
cracking rate?
Hypotheses

We begin by assuming that a hypothesis is
true (as a jury trial).

Data consistent with the hypothesis:


Retain Hypothesis
Data inconsistent with the hypothesis:


We ask whether they are unlikely beyond reasonable
doubt.
If the results seem consistent with what we would
expect from natural sampling variability we will retain the
hypothesis. But if the probability of seeing results like
our data is really low, we reject the hypothesis.
Testing Hypotheses

Null Hypothesis H0


Specifies a population model parameter of interest
and proposes a value for this parameter
Usually:





No change from traditional value
No effect
No difference
In our example H0:p=0.20
How likely is it to get 0.17 from sample variation?
Testing Hypotheses (cont.)

Normal Sampling distribution
SD( pˆ ) 
z

pq
0.20  0.80

 0.02
n
400
pˆ  0.17
0.17  0.20
 1.5
0.02
How likely is to observe a value at least 1.5 standard
deviations below the mean of a normal model
P( z  1.5)  0.067

Management must decide whether an event that would
happen 6.7% of the time by chance is strong enough evidence
to conclude that the true cracking proportion has decreased
A Trial as a Hypothesis Test

The jury’s null hypothesis is


H0 : innocent
If the evidence is too unlikely given this
assumption, the jury rejects the null hypothesis
and finds the defendant guilty. But if there is
insufficient evidence to convict the defendant, the
jury does not decide that H0 is true and declare
him innocent. Juries can only fail to reject the null
hypothesis and declare the defendant “not guilty”
The Reasoning of Hypothesis
Testing
Hypothesis


To perform a hypothesis test, we must
specify an alternative hypotheses.
Remember we can never prove a null
hypothesis, only reject it or retain it. If
we reject it, we then accept the
alternative

Example: Pepsi or Coke



p : proportion preferring coke
H0 : p = 0.50
HA : p ≠ 0.50
The Reasoning of Hypothesis
Testing (cont.)

Plan



Specify the model and test you will use
(proportions, means).
We call this test about the value of a proportion a
one-proportion z-test
Mechanics


Actual Calculation of a test from the data.
P-value : the probability that the observed
statistic value could occur if the null model were
correct. If the P-value is small enough, we reject
the null hypothesis
The Reasoning of Hypothesis
Testing (cont.)

Conclusion

The conclusion in a hypothesis test is
always a statement about the null
hypothesis. The conclusion must state
either that we reject or that we fail to
reject the null hypothesis
Alternatives

Two-sided Alternative



HA : p ≠ 0.50 (Pepsi – Coke)
The P-value is the probability
of deviating in either direction
from the null hypothesis
One-sided Alternative



H0 : p = 0
HA : p < 0.20 (Ingots)
The P-value is the probability
of deviating only in the
direction of the alternative
away from the null hypothesis
value.
Exercises

Page 467



#1
#3
#20
Chapter 21
More About Tests

Example : Therapeutic Touch (TT)

One-proportion z-test









15 TT practitioners 10 trials each
H0 : p=0.50
HA : p>0.50
Random Sampling
Independence
10% condition
Success/Failure condition
Observed proportion 0.467
Find the P-value…
How to Think About P-values


A P-value is a conditional probability. It
is the probability of the observed
statistic given that the null hypothesis
is true.
P-value : P(Observed statistic value|H0)
Alpha Levels



When the P-value is small, it tells us that our
data are rare given the null hypothesis.
We can define a “rare event” arbitrarily by
setting a threshold for our P-value.
If our P-value falls below that point we’ll
reject the null hypothesis. We call such
results “statistically significant” the threshold
is called an alpha level or significance level.
Alpha Levels (cont.)





 = 0.10
 = 0.05
 = 0.01
Rejection Region
One Sided
Two sided
Making Errors

Type I error


The null hypothesis is true, but we mistakenly reject it.
Type II error

The null hypothesis is false but we fail to reject it.
Types of errors

Examples

Medical disease testing



I : False Positive
II : False Negative
Jury Trial


I : Convicting an innocent
II : Absolving someone guilty
Probabilities of errors


To reject H0, the P-value must fail below .
When H0 is true that happens exactly with
probability  so when you choose the level ,
you are setting the probability of a Type I
error to .
When H0 is false and we fail to reject it, we
have made a Type II error. We assign the
letter  to the probability of this mistake
Power



The power of a test is the probability that it correctly
rejects a false null hypothesis. When the power is
high, we can be confident that we’ve looked hard
enough.
We know that  is the probability that a test fails to
reject a false null hypothesis, so the power of the
test is the complement 1 - 
When we calculate power, we have to imagine that
the null hypothesis is false. The value of the power
depends on how far the truth lies from the null
hypothesis value. We call this distance between the
null hypothesis value p0 and the truth p the effect
size.
Chapter 22
Comparing Two Proportions

Recall (Ch.16)

The variance of the sum or difference of two
independent random quantities is the sum of
their individual variances
Var( X  Y )  Var( X )  Var(Y )
SD( X  Y )  Var( X )  Var(Y )

Example of the cereals
Comparing Two Proportions
(cont.)

The Standard Deviation of the Difference
Between Two Proportions
SD( pˆ1  pˆ 2 ) 

p1q1 p2 q2

n1
n2
For proportions from the data
SE ( pˆ1  pˆ 2 ) 
pˆ1qˆ1 pˆ 2 qˆ2

n1
n2
Assumptions and Conditions



Random Sampling
10% condition
Independent Samples Condition


The two groups we are comparing must also be
independent of each other (usually evident from
the way the data is collected).
Example :


Same group of people before and after a treatment are
not independent
Success and failure condition in each sample
The Sampling Distribution

The sampling distribution for a
difference between two independent
proportions

Provided the assumptions and conditions
the sampling distribution of pˆ1  pˆ 2 is
modeled by a normal model with mean
  p1  p2 and standard deviation
  SD( pˆ1  pˆ 2 ) 
p1q1 p2 q2

n1
n2
A two-proportion z-interval

When the conditions are met, we are ready to find the
confidence interval for the difference of two proportions
p1-p2. Using the standard error of the difference
S .E.( pˆ1  pˆ 2 ) 

pˆ1qˆ1 pˆ 2 qˆ2

n1
n2
The interval is
pˆ1  pˆ 2  z *S.E.( pˆ1  pˆ 2 )

The critical value z* depends on the particular
confidence level.
Exercises

Two-proportion z-interval
(page 493, 496)
Example

Snoring


Random sample of 1010 Adults
From 995 respondents:


Splitting in two age categories:



37% snored at least few nights a week
Under 30
26.1% of 184
Over 30
39.2% of 811
Is the difference of 13.1% real or due only
to sampling variability?
Example (cont. snoring)

H0 : p1 – p2 = 0

pˆ1qˆ1 pˆ 2 qˆ2
S .E.( pˆ1  pˆ 2 ) 

n1
n2
But p1 and p2 are linked from H0
p 1 = p2
Pooling:



Combining the counts to get an overall proportion
Success1  n1 pˆ1
Success1  Success 2
pˆ pooled 
Success 2  n2 pˆ 2
n n
1
pˆ pooled
2
48  318 366


 0.3678
184  811 995
Two-Proportion z-test

The conditions for the two-proportion z-test
are the same as for the two-proportion zinterval . We are testing the hypothesis:


H0 : p1 = p2
Because we hypothesize that the proportions
are equal, we pool them to find
pˆ pooled 

Success1  Success 2
n1  n2
And we use the pooled value to estimate the
standard error
pˆ pooledqˆ pooled pˆ pooledqˆ pooled
S.E. pooled ( pˆ1  pˆ 2 ) 

n1
n2
Two-Proportion z-test (cont.)

Now we find the test statistic using the statistic
pˆ1  pˆ 2
z
S.E. pooled ( pˆ1  pˆ 2 )

When the conditions are met and the null
hypothesis is true, this statistic follows the
standard Normal model, so we can use that
model to obtain a P-value
Example (cont. snoring)


Randomization
Independent samples condition
S .E. pooled ( pˆ1  pˆ 2 ) 
10% Condition
Success / Failure
(0.3678)( 0.6322) (0.3678)( 0.6322)

 0.0394
n1
n2
pˆ1  pˆ 2  0.392  0.261  0.131

The P-value is the probability of observing a difference greater
or equal to 0.131
0.131  0
z
 3.33
0.0394

The two sided P-value is 0.0008. This is rare enough, so we
reject the null hypothesis and conclude that there us a
difference in the snoring rate between this two age groups.
Exercise

Page 508 #16
Homework #5




Page
Page
Page
Page
423
443
467
491
#8, 16
#12, 18
#2, 4, 6, 12
#20