Download Confidence Intervals - Rowan University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Confidence Intervals &
Hypothesis Testing
-3 -2 -
 + +2 +3
Lecture 7-8
© 2010, All Rights Reserved, Robi Polikar.
No part of this presentation may be used without explicit
written permission. Such permission will be given – upon
request – for noncommercial educational purposes only.
Limited permission is hereby granted, however, to post or
distribute this presentation if you agree to all of the
following:
1. you do so for noncommercial educational purposes;
2. the entire presentation is kept together as a whole,
including this entire notice.
3. you include the following link/reference on your site:
Robi Polikar, http://engineering.rowan.edu/~polikar.
ECE 09.360
Dr. P.’s
Clinic Consultant Module in
Unless indicated otherwise, all cartoons from
The Cartoon Guide to Statistics by L. Gonick and W. Smith
1993, Harper Resource
Probability & Statistics
in Engineering
Today in P&S
-3 -2 -
 + +2 +3
 Confidence Intervals
 Confidence intervals for population proportions
 Confidence intervals for population means
 Hypothesis testing
 Null hypothesis vs. alternative hypothesis
 A statistician’s cherished values: The -value, the β value, the p-value and all that
jazz…
 We find the defendant guilty of committing a type II error…, your honor!
• Type I and type II error in hypothesis testing
 Next week: Tests of Hypotheses
 Large sample significance tests for proportions
 Large sample tests for population mean
 Small sample tests for population mean
© 2010 All Rights Reserved, Robi Polikar, Rowan University
-3 -2 -
1.
2.
Confidence Intervals
for Population Proportions
 + +2 +3
Compute the probability of success p̂ as the sample proportion (an estimate of the
population proportion) that satisfy certain criteria
•
For example, the rate of defective chips, the percentage of people voting, etc.
Determine the confidence level, α, and the corresponding critical value zα/2.
•
This is the value, to the right of which there is α/2 probability , with another α/2
probability lying on the left of - zα/2 for a total probability of α.
•
Recall: prob. of success p from n trials has binomial dist. whcih can be approximated with
Gaussian distribution with mean np and variance p(1-p). Our estimate p̂, on the other
hand, also has a normal distribution with mean p and variance  p2ˆ  p 1  p  n
3. Compute the 100(1- α )% confidence interval as

p   pˆ  z /2 pˆ    pˆ  z /2


pˆ 1  pˆ  


n

The prob. that p̂ will
exceed p by more than
zα/2 is at most α/2.
The prob. that p̂ will be
short p by more than
zα/2 is at most α/2.
Conf. Level (%)
99
98
95
90
80
Critical Value z/2
2.58
2.33
1.96
1.65
1.28
1-
/2
-z
/2
0
+z
/2All Rights Reserved, Robi Polikar,
/2 Rowan University
© 2010
An Example
-3 -2 -
 + +2 +3
 A manufacturer tests 70 items for defects, and finds that 52 of meets
specifications. What is the 95%, 99% confidence interval for the proportion
of his entire inventory meeting the specs?
pˆ  52 / 70  0.743
pˆ 1  pˆ  0.743*0.257
 

 0.00273   pˆ  0.00273  0.052
n
70
2
pˆ
  0.05  z 2  1.96
p  pˆ  z 2 pˆ  0.743  1.96*0.052  0.743  0.102  p95%   0.641 0.845
  0.01  z 2  2.58
p  pˆ  z 2 pˆ  0.743  2.58*0.052  0.743  0.134  p99%   0.609 0.877 
© 2010 All Rights Reserved, Robi Polikar, Rowan University
A correction
-3 -2 -
 + +2 +3
 Recall that our original formulation actually required us to compute the confidence
interval as p   pˆ  z /2 p  and that we fudged a little and replace the population
standard deviation with sample deviation.
 This fudge may underestimate the true coverage of the interval (say, we in fact get 93%
confidence worth of coverage when we think we have 95% confidence).
 So to fix this problem created with our fudge, we fudge again, this time in the opposite
direction to be more conservative:
• Replace the sample size n with n  n  4 and the number of successes x by two additional successes
such that the new probability of success is computed as pˆ   x  2  n  4 . This correction is known
as the Agresti-Coull Interval.
• The previous example would then yield:
pˆ   52  2  /  70  4   0.730,
 p2ˆ  0.73*0.27 74  0.00266   pˆ  0.00266  0.052
p95%  0.73  1.96*0.052  0.73  0.102  p95%  0.628 0.832 
p99%  0.73  2.58*0.052  0.73  0.134  p99%  0.596 0.864 
© 2010 All Rights Reserved, Robi Polikar, Rowan University
-3 -2 -
 + +2 +3
Confidence Interval
for Population Mean: μ
 So far we have looked at the confidence intervals for the proportion of successes,
that is there is a random population distributed binomially, from which we took a
sample of size n. Each r.v. in the experiment has two possible outcomes: success or
failure, with a probability of success p which we tried to estimate.
 Polling: The polled person vote for a particular candidate or not
 Quality control: The product has a defect or not
 Medicine: A treatment plan is successful or not
 How about population means, where the outcome of an experiment has many
potential outcomes, more specifically, a numerical outcome:




The average speed of a chip
The weight/height/BP/HR of a group of people/students/patients
5 year survival rates of patients treated with a certain cancer drug
Average bit error rates in telecommunications
 How can we infer confidence intervals about such quantities…?
© 2010 All Rights Reserved, Robi Polikar, Rowan University
CLT to the rescue…
-3 -2 -
 + +2 +3
 Our calculations of confidence intervals for proportion of success were based on the
assumption that the binomial distribution can be approximated by the Gaussian.
 But, according to CLT, the mean of a random sample from any population can be
approximated with a Gaussian, as long as we have a sufficient sample size
 So the expressions we have derived are pretty much all valid for confidence of means as
well…
 In particular, recall that if X1,…, Xn is a random sample from a distribution with mean value μ and
standard deviation σ. Then , the average of all Xi, that is the sample mean X is normally distributed with
E X    X  
 Then for sufficiently large n ,
s X2   X2   2 n


X 
P(1.96  Z  1.96)  0.95  P  1.96 
 1.96   0.95
 n


 Then again, since we actually do not know σ, we replace that with our best estimate s, the std. dev. of
the sample, and we obtain
s/√n: Sample standard error,


X 
also denoted as SE( X )


P(1.96  Z  1.96)  0.95  P  1.96 
s

n
 1.96   0.95

© 2010 All Rights Reserved, Robi Polikar, Rowan University
Confidence Interval
-3 -2 -
 + +2 +3
 As before, provided that we have a sufficiently large sample size (typically
n>40) for an arbitrary level of confidence (1-)100%, we have

   x  z / 2

s 

n
1-
/2
-z/2
0
/2
+z/2
Confidence Level (%)
99.73
99
98
96
95.45
95
90
80
68.27
50
Critical Value z/2
3.00
2.58
2.33
2.05
2.00
1.96
1.645
1.28
1.00
0.675
Solving for n that would give a specified confidence level and interval is difficult, however, since we have no
idea on what s will be before collecting the data. A conservative guess is usually made, but that requires some
prior knowledge about the data. For example, if we have reason to believe from previous experience that the
data values never fall outside the [LL UL] boundary, and that the data is not too skewed, then a reasonable
estimate for s is ¼ of this (UL-LL) range.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
-3 -2 -
 + +2 +3
Confidence Interval
(Re-explained)
We obtain a sample, compute its mean X and its
std. dev. s X (shown as  X in the figures). We draw a 95%
confidence interval around this mean, as X  1.96sX .
Due to CLT, we know that this sample comes from a dist.
with mean µ and σ=s/√n. Then, we know that 95 out of
100 times, our CI around the sample mean will include
the true mean µ.
One of those 95 cases where CI around the
sample mean indeed includes µ.
One of those 5 cases where CI around the
sample mean does not include µ.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Example
-3 -2 -
 + +2 +3
 Here is one sample of size 100 from a group of students’ weights. Unbeknownst
to us, the population is normal with mean weight of 160 lbs and a standard
deviation of 20. These are the parameters we wish to estimate.
From sample data we
compute:
n=100
Sample mean X =157.46
Sample std.dev. s=18.89
Sample Size
We want to find the 90%,
95% and 99% confidence
intervals (=0.1, 0.05 and
0.01, respectively) for the
students’ weight.
s 

   x  z / 2

n

136
136
162
176
153
157
169
180
150
191
115
138
173
164
143
158
141
128
167
174
179
189
136
140
169
169
160
158
174
199
149
161
150
186
189
148
147
146
202
170
166
135
154
165
149
157
170
139
132
197
164
159
135
189
143
151
171
135
139
153
160
137
133
182
155
158
155
165
180
137
139
133
178
146
173
190
118
151
152
155
141
154
160
134
142
147
162
161
132
183
152
179
147
158
135
133
191
152
166
137
z0.1 2  z0.05  1.645    157.46  1.645 * 18.89 10  [154.4 160.6]
z0.05 2  z0.025  1.96    157.46  1.96 * 18.89 10  [153.8 161.2]
z0.01 2  z0.005  2.58    157.46  2.58 * 18.89 10  [152.6 162.3]
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Small Sample Size
-3 -2 -
 + +2 +3
 So far we have secretly and inconspicuously introduced the phrase “for
sufficiently large sample sizes” into our calculations
 Exactly what is sufficiently large? Depends on the problem, but usually n>40
 What happens if n is not sufficiently large?
 Recall that in calculating the confidence interval we needed to compute,
X 
which included the term σ, a term whose value is unknown to us. So
 n
we replaced it with the standard error, s, the variance of the sample mean.
 While
 In fact,
X 
 n
X 
s n
is indeed normal,
X 
s n
is only approximately normal for large n.
is said to have a student’s t-distribution
© 2010 All Rights Reserved, Robi Polikar, Rowan University
T-Distribution
-3 -2 -
 + +2 +3
 When X is the mean of a random sample of size n from a normal distribution with
mean μ the random variable
X 
T
S n
has a probability distribution called (student’s) t-distribution with n – 1 degrees of
freedom (df).
 For large n the r.v. S will have a value s close to the true σ, however, for small n this
is not the case. Therefore, the t-distribution resembles the normal distribution for
large n but deviates from it for smaller n
std. normal
t-dist., large n
t-dist., small n
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Properties of T-Distribution
-3 -2 -
 + +2 +3
 Let tv denote the density function curve for v degrees of freedom.
1. Each tν curve is bell-shaped and centered at 0.
2. Each tν curve is spread out more than the standard normal-z curve.
3. As ν increases, the spread of the corresponding tν curve decreases.
4. As ν→∞ , the sequence of tν curves approaches the standard normal curve (the z
curve is called a t curve with df =∞)
5. Let t,ν= the number on the measurement axis for which the area under the t
curve with ν df to the right of t,ν is . Then, t,ν is called a t-critical value (which
is the counterpart of the z critical value in normal distribution). For brevity, when
the meaning is obvious, we will drop ν and simply use t just like z
tν curve
/2
-t/2,ν
1-
0
/2
t/2,ν
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Confidence Intervals Using
T-Distribution
-3 -2 -
 + +2 +3
 Then, for smaller sample sizes (where the original distribution is normal),
we can write the confidence interval expression as follows:
 Let x and s be the sample mean and standard deviation computed from the results
of a random sample from a normal population with mean μ. The 100(1-)%
confidence interval is:
s
s 

   x  t 2,n 1 
, x  t 2,n 1 

n
n

s
 x  t 2,n 1 
n
Strictly speaking, the t-distribution applies if and only if the population parameter being
estimated is normally distributed. However, in practice, t-distribution works well, if the
population distribution is only approximately mound-shaped.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Caution!
When not to use z- or t- distributions
-3 -2 -
 + +2 +3
 For confidence interval calculations, the data must be truly random, that is independent and
identically distributed. The following data, which shows some yield strength over time, is
clearly not i.i.d.(why not?). Therefore, neither the normal nor t- dist. approximation is valid.
 If the data are i.i.d., then the sample size must be sufficiently large to justify Gaussian approx.
(say n>40). If that is the case, we are not too concerned about the shape of the underlying
distribution, as CLT says that sample mean will be approximately normal.
 If the sample size is small, then the Gaussian approximation is not valid. In that case, you can
use the t-dist. approximation, but that requires the underlying data to be normal (or at least
approximately normal, say a near bell shape with a single modal). Any data that has an outlier
is unlikely to be (near) normal; t-dist. should not be used with data including outliers.
 Finally, if the true population σ is known, use z-values, not t.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Homework
-3 -2 -
 + +2 +3
 Read Chapter 5, Sections 5.1 ~ 5.8.
 Problems from Chapter 5
 Section 5.1: 4, 10, 16,
 Section 5.2: 4. Bonus Question: 9 (if you can solve part c)
MIDTERM: Friday, Oct 22, 9 AM
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Hypothesis Testing
-3 -2 -
 + +2 +3
 Estimating the value of a parameter, even along with its confidence interval, has
little meaning, unless we use that information to make a decision.
 The probability that a randomly selected processor from a specific manufacturer will be
flawed is 0.24%±0.01% with a confidence level of 95%...So what…?
 Shall we decide that this is a reliable processor?
 Confidence intervals are most useful in making decisions based on statistical tests
 Given an observation based on a finite random sample, can this observation be
entirely due to chance?
 In HT, we compare two hypotheses against each other and determine whether
we have enough statistical evidence to reject the hypothesis that the observation
is entirely due to chance.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Hypothesis Testing:
Setting the stage
-3 -2 -
 + +2 +3
 We will start with an example to familiarize our self with the terminology. Note that
any given application can easily be substituted into a number of engineering or nonengineering scenarios:
 As the CEO of Owl Superior Chip Co., you hear the announcement of your competitor
Lentil’s new chips: snor e i7 , and its low cost version cr apler on.
 Lentil declares that their chips, even the low cost versions, are 99.99% defect free (that is,
only 0.01% of their chips are flawed).
 Since you are in this business for quite some time (2 ½ months), you think this is pretty
impressive, if not too good to be true…You are suspicious.
 You know that snor e i7 is pretty reliable, but 99.99% on cr apler on…?
 You suspect that Lentil is cheating in its figures…that the 99.99% is primarily for the snor e
i7 chips, not for the cr apler ons… How to prove?
 You later learn that in estimating the 99.99% figure, they have taken a sample of 80 chips, of
which only 4 were cr apler on…You consider going to court, stating that this is false
advertising!...to which they reply with “…well, we randomly picked 80 chips from a
production run that manufactures equal number of chips of each kind. The fact that there
were only 4 of cr aplons in the sample is purely coincidental. There is no foul play!”
 You say… “well that is crap!”
© 2010 All Rights Reserved, Robi Polikar, Rowan University
By chance…?
-3 -2 -
 + +2 +3
 Other versions of the same scenario:
 A company whose workforce of 80 employees consists of 76 males and 4 females. The
company claims that they do not favor males, and the fact that there are only 4 females is
purely by chance. On the days they were hiring, only men happened to apply – although men
and women are equally likely to apply and be successful in such a position
 Court Drama: Out of a panel of 80 potential jurors, only 4 were African –American, in a
district where 50% of all eligible citizens were AAs.
50% of all eligible employees/jurors/chips are
women / African American / Crapleron
On a random sample of 80 employees/jurors/chips,
only 4 are women / African American / Crapleron !
Could this be the result of pure chance?
© 2010 All Rights Reserved, Robi Polikar, Rowan University
What are the odds?
-3 -2 -
 + +2 +3
 If the selection is really random, and that each group is 50% of the
total population, then the number of women / AAs / Craplerons in the
sample would be the binomial random variable X with p=0.5, and
n=80.
 Thus the chances of getting only 4 women/AAs/Craplerons is P(X=4),
which is
 80  4
 0.5 1  0.576 
4
0.0000000000000000014 !
 You think you have enough statistical evidence to reject Lentil’s claim
that having only 4 craplons in their sample was random or pure
chance. You go to court!
© 2010 All Rights Reserved, Robi Polikar, Rowan University
What are the odds?
-3 -2 -
 + +2 +3
 To drive the point home, you argue that this probability is less than the
chances of getting three consecutive royal flushes in poker, or almost the
same as hitting the big jackpot twice in a row!
 Remember? Picking 6 numbers out of 52 in order: 0.000000000068
 Getting 4 Craplerons in sample of 80: 0.0000000000000000014 !
 So the judge rejects Lentil’s claim (hypothesis) of random selection!
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Formal Definitions
-3 -2 -
 + +2 +3
 A statistical hypothesis is a claim about the value or values of one or more
parameters.
 Proportion of defective chips is p<0.01%
 Average SAT math score in NJ is s>500
 Average wattage of a 60W bulb is w=60W
 In any hypothesis testing problem, there are two competing hypothesis
 H0 – Null hypothesis: the protected hypothesis that is initially assumed to be true,
that is, the observations are the result of pure chance
 Ha – Alternative hypothesis: the claim that the null hypothesis is false, that is, the
observations are not by chance, but are the result of a real effect, plus variation.
 The test is to analyze observed data to determine whether there is enough
evidence to reject the null hypothesis in favor of the alternative hypothesis.
 The burden of proof is with the alternative hypothesis. If the data does not
strongly support the Ha claim, then the test fails to reject H0.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
H0 vs. Ha
-3 -2 -
 + +2 +3
 Often we wish to find out whether a new value / a new theory / a new treatment plan
is better then the previous / existing one.
 H0: The claim that the new value/theory/plan is no better than the current one.
 Ha: The alternative claim that the new value/theory/plan is better.
 We only replace the current with the new if there is enough, convincing and
compelling evidence to do so.
 Ex: If in the defective chips example, if we develop a new procedure to fabricate the chips,
we would use it if and only if it produces fewer defects. If the current procedure has
proportion of defective chips as p=0.01
• Ha , on which the burden of proof is placed, is the assertion that the new procedure has
p<0.01. H0 is then the initial and prior claim that p=0.01
 The null hypothesis is always in the form of Ho: θ=θ0 (the null value)
 The alternative hypothesis can be in any of the following three forms:
• Ha: θ > θ0 (which implicitly assumes that Ho: θ≤θ0)
• Ha: θ < θ0 (which implicitly assumes that Ho: θ≥θ0)
• Ha: θ ≠ θ0 (which implicitly assumes that Ho: θ=θ0)
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Choosing an
Appropriate test
-3 -2 -


 + +2 +3
Suppose that a 9V battery – when fresh – is required to provide 9.1 V. As the quality control engineer,
you draw a random sample of size n to determine whether you are in compliance.
You design an experiment where H0 μ = 9.1 V and
a. Ha>9.1V
b. Ha<9.1V
c. Ha≠9.1V
You would choose (a) because, in this formulation H0 indicates non-compliance. As a quality
control engineering, you put the burden of proof on asserting that the specs are satisfied.
If we were to choose the other options, then H0 would indicate compliance, and Ha would then
put the burden of proof on asserting that the batteries are in non-compliance. If you were
challenged in a legal proceeding, however, the alleger would have to choose test (b).

Suppose 5pCi/L is the borderline for radioactivity in water. Which test would you choose?
Choose H0: μ=5pCi/L vs. Ha: μ<5pCi/L  Then the water is believed unsafe unless proven
otherwise, that is the burden of proof is on showing that the water is indeed safe, that is μ<5pCi/L.
Choosing Ha: μ>5pCi/L would mean that the water is safe, unless proven otherwise.

Suppose you manufacture 20 A fuses for home use. If the fuse burns out at < 20 A, then users would
complain fuse burning out prematurely. If fuse burns out at >20 A, then fire may occur due to
malfunctioning fuse. What test should you choose?
Choose H0: μ=20 A vs. Ha: μ≠20 A. Because this time the burden of proof is on showing that fuse
blows out at exactly at 20A. Departure in either way from 20A is equally costly.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Testing Procedure
-3 -2 -
 + +2 +3
 Step 1: Formulate the hypotheses and determine the null value
The null hypothesis asserting that current / status quo situation is preferred




H0:Lentil’s sample was purely random – there was 50% chance to pick either chip
H0: The new drug will not lower the cholesterol by (more then) 20%
H0: The new engine technology will allow gas mileage of (no more then) 30mph
H0: The defective component ratio of our product is the same as the competitor’s
The alternative hypothesis claiming that the null hypothesis should be
rejected in preference of the new procedure
 Ha: Lentil’s sample was not purely random, but rather it was biased: there was
>50% chance to pick Pantsium in the sample.
 Ha: The new drug will lower the cholesterol by > 20%
 Ha: The new engine technology will allow gas mileage > 30 mph
 Ha: The defective component ratio of our product is < that of the competitor’s.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Testing Procedure
-3 -2 -
 + +2 +3
 Step 2: Choose a test statistic and the formula for computing it
 A test statistic is a function of the sample data on which the decision – reject H0 or do not
reject H0 – will be based
 This is the statistical value that will asses your evidence against the null hypothesis
• For the random sampling of chips example, the test statistic would be the binomial random
variable with probability of success p=0.5, and the number of trials n=80.
– For applications of the form „proportion of successes‟, the test statistic is
generally the mean of the observed binomial random variable probability of success,
compared with the presumed probability of success (p0, the null value) Note that
for a large enough sample size this random variable is approximately normal
z
pˆ  p0

/ n
pˆ  p0
p0 1  p0  n
• For the gas mileage problem, the test statistic would be the sample mean of the gas mileage
obtained from a normally distributed gas mileages of the cars with the new technology: H0:
μnew_tech=30mpg vs. Ha: μnew_tech> 30mpg.
– For all applications of the form “average value”, the test statistic is generally the
mean of the random sample (sample mean) compared to presumed average (μ0, the
null value). Note that from CLT, for a sufficiently large sample size, this statistic
x  0
will also be approximately normal.
z
/ n
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Testing Procedure
-3 -2 -
 + +2 +3
 Step 3: State the rejection region for a selected significance level 
 The rejection region is the set of all test statistic values for which H0 will be
rejected.
• For the Lentil’s random sampling, we may want to reject their hypothesis if the
probability of selecting 4 Craplons at random is less then a specific value. In the
previous example, the de-facto rejection (for the judge) was the probability of three
royal flushes in a row or hitting the jackpot twice in a row.
• For the gas mileage example, we may choose the rejection as average gas mileage being
greater then 35 mph.
– Note that since the H0 is the default hypothesis, we need convincing and
compelling argument to reject it. Therefore, the rejection region usually
picked in such a manner to give H0 plenty of “benefit of the doubt”
• The  value is the confidence we wish to have in our rejection region. For example a
95% confidence for the car example, would mean that after observing a large number of
cars with the new technology, on average, 95% will have a gas mileage 35 (or higher).
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Testing Procedure
-3 -2 -
 + +2 +3
 Step 4:Compute the sample quantities and decide whether H0
should be rejected
 For the random sampling example, we compute the probability
P(X≤4|p0=0.5, n=80)
 For the car example, we compute P( ≥35 |μ0=30, σ=…, n=…)
x
• We then compare these values to rejection region at the specified confidence level.
 A commonly used figure of merit is the p-value,
which answers the following question:
 If the null hypothesis were true, then what is the probability of observing a test
statistic as extreme as the one we observed ?
 The smaller the p-value, the stronger the evidence against the null hypothesis.
 If the p-value is less then a threshold, corresponding to the rejection region,
then we agree that there is statistically compelling evidence against H0.
 For the random sampling example, p=1.4x10-18, we have enough evidence to rule
out Lentil’s claim that having only 4 Craplerons in their sample was purely
coincidental!
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Errors in
Hypothesis Testing
-3 -2 -
 + +2 +3
 Can we make errors despite being over cautious and giving H0 plenty of
benefit of the doubt…?
 Of course, in fact, there are two types of errors we can make. To make the point,
think of the fire detector in your house, and how often it goes off if you make the
toast little too dark!
 Well, this is called Type I error: An alarm without a fire (false alarm)!
 Every cook knows how to avoid a type I error: Just remove the batteries!
 But then this can cause a fire going undetected – and this is called Type II error :
A fire without an alarm (missed alarm) !
 Similarly, we can reduce the chance of Type II error by increasing the
sensitivity of the sensor, but then again, that increases the probability of
Type I error.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Errors in
Hypothesis Testing
-3 -2 -
 + +2 +3
 We can put these observations in a table, called the decision table.
 Now consider the null hypothesis that there is no fire, and Ha: FIRE!. The
alarm, then corresponds to rejection of the null hypothesis.
 Statistically speaking:
 A type I error is committed if we reject the null hypothesis when in fact it was true
 A type II error is committed if we fail to reject the null hypothesis, when in fact it
was false.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
:Type I Error
-3 -2 -
 + +2 +3
 Examples:
 For the car example, let’s suppose we observed 50 cars and checked their gas
mileage. It is possible that the average gas mileage of those 50 cars was say 35.7
mph, when in fact the true average is below 35. Then by rejecting H0, a type I error
is made.
 On the other hand, it is also possible that the average gas mileage of those 50 cars
were, say 34.6 mph, when in fact, the true average was above 35. Then by not
rejecting H0, a type II error is made.
 Note that the significance level we mention earlier, emphasized the
probability of committing a type I error:
 P(rejecting H0 | H0 is true) = P(type I error | H0 )=
 Then, with 100(1- )% confidence, we claim that the observed observation under
H0 is statistically very unlikely, and hence reject H0. The lower the , the higher the
confidence we have in rejecting H0 hence the lower the probability of committing a
type I error.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Type II error
-3 -2 -
 + +2 +3
 But, sometimes we are interested in type II error, is our alarm too sensitive?
 In the past, factories discharging chemicals into waterways were required to show
that the discharge had no effect on the downstream wildlife. This is H0. The
factory could continue, as long as H0 was not rejected at the 0.05 significance level.
 So a polluter, suspecting that he is in violation of EPA standards could devise an
ineffective pollution monitoring program, such as “let’s ask the ducks!”
 Type I error: Reject H0, when it is true (shut down the factory, when in fact
its discharge really has no effect on wildlife)
 Type II error: Accept H0, when it is false (factory continues, when in fact it is
decimating the wildlife).
 Such a test, say “interviewing the ducks” is equivalent to removing the batteries from the fire
detector. Both are designed to reduce (remove) type I error.
 Of course, such a test greatly increases the probability of committing a type II error, that is,
accepting the H0 that the factory discharge is harmless, when in fact it is.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
β:Type II error
-3 -2 -
 + +2 +3
 Just like we limit our probability of committing a type I error using a confidence
level of , we can also limit our probability of making a type II error.
 We define: β = P(accepting H0 | Ha is true) = P(type II error |Ha)
 Thus β defines the probability of making a type II error. The lower the β, the more
confident we are of not committing a type II error.
 Again, just like our confidence in not making a type I error is 1-, our confidence in
not making a type II error is then 1-β, which is called the power of a hypothesis test.
 Note that the two types of error, type I and II are always in competition. Reducing
one increases the other.
 Of course, we’re happy to report, the environmental regulations have changed since
then, requiring pollution monitoring programs to show that they have a high
probability of detecting serious pollution events – that is having a very small β,
revealing any hidden flaws in the monitoring program.
© 2010 All Rights Reserved, Robi Polikar, Rowan University
A Complete Example
-3 -2 -
 + +2 +3
 A new design for braking systems is proposed. For the current system, the true
average braking distance at 40 mph is 120 ft. The new system is to be implemented,
if there is substantial evidence that it will reduce the braking distance significantly.
 Parameter of interest, appropriate hypotheses to test the new system
 Suppose the new system’s braking distance has a σ=10 ft. Let X be the sample average
breaking distance of the new system for 36 observations. Which rejection region is most
appropriate? R1: x >124.80,
R2: x <115.20,
R3: { x >125.13 or x <114.87}
 What is the significance level for the appropriate region in selected above? How would we
change the region to obtain 99% confidence level?
 What is the probability that the new design is NOT implemented when its true average
braking distance is actually 115 ft and the appropriate region from above is used?
 Let Z  X  120  n  . What is the significance level for the rejection region of z<-2.33?
How about z<-2.88?
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Solution
-3 -2 -
a.
b.
c.
 + +2 +3
Let μ = true average braking distance for the new design at 40 mph. We want to make sure that
the burden of proof is on the new braking distance be lower, then, Ho: μ = 120 vs. Ha: μ < 120
We want to give the null hypothesis the benefit of doubt. Therefore, we need significant
evidence that the new average distance is substantially less then that of the existing one.
Therefore, we should choose R2. Reject Ho if x< 115.2 (<120)
Recall, significance level is probability of type I error, that is
rejecting H0, when in fact we shouldn’t. We will reject H0, if
observed average is <115.2. The area under the normal curve
with mean 120 (the assumed average for existing system 
hence H0) is the green shaded region whose area is then :
115.2 120
1-

  Px  115.2 |   120  P z 

x 
115.2  120 
  P z 
  Pz  2.88  0.02  98% confidence

10
/
6
 n


Now, if we want =0.001 (that is increased to 99.9% confidence) , then we should expect a
smaller rejection region: We find the z- value that would give a green shaded
area of 0.001 as -3.08 from the Gaussian tables. Then the
new rejection region threshold c is:
0.001
c  120
114.87  120 

 3.08  c  114.87  P z 
  0.001
10 6
1.667


1-
114.87
© 2010 All Rights Reserved,
Robi115.2
Polikar, Rowan University
Solution – Cont.
-3 -2 -
d.
 + +2 +3
What is the probability that the new design is NOT implemented when its true average
braking distance is actually 115 ft and the appropriate region from above is used?
▪ Now, if we are not implementing the new design, then we must have failed to reject H0
(presumably because we think we do not have enough evidence). But in fact the true
average distance for the new design is 115, which is less then 115.20. Clearly, we are
committing a type II error (failed to reject H0 when it should have been)
▪ According to our hypothesis, we will not reject H0 if average braking distance is >115.2.
Therefore, we are looking at the probability of the observed average braking distance being
greater then 115.2, when in fact the observed sample is drawn from a population that has a
mean of 115:
 (115)  Px  115.20 | true   115
115.2  115 

 P z 
  Pz  0.12  0.4522
1.6667 

e.
For Z  X  120 
n

z is normal, therefore
  Pz  2.33  0.01
115 115.2
  Pz  2.88  0.02
© 2010 All Rights Reserved, Robi Polikar, Rowan University
No new Homework
(for week of 10/14)
-3 -2 -
 + +2 +3
© 2010 All Rights Reserved, Robi Polikar, Rowan University