Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Intervals & Hypothesis Testing -3 -2 - + +2 +3 Lecture 7-8 © 2010, All Rights Reserved, Robi Polikar. No part of this presentation may be used without explicit written permission. Such permission will be given – upon request – for noncommercial educational purposes only. Limited permission is hereby granted, however, to post or distribute this presentation if you agree to all of the following: 1. you do so for noncommercial educational purposes; 2. the entire presentation is kept together as a whole, including this entire notice. 3. you include the following link/reference on your site: Robi Polikar, http://engineering.rowan.edu/~polikar. ECE 09.360 Dr. P.’s Clinic Consultant Module in Unless indicated otherwise, all cartoons from The Cartoon Guide to Statistics by L. Gonick and W. Smith 1993, Harper Resource Probability & Statistics in Engineering Today in P&S -3 -2 - + +2 +3 Confidence Intervals Confidence intervals for population proportions Confidence intervals for population means Hypothesis testing Null hypothesis vs. alternative hypothesis A statistician’s cherished values: The -value, the β value, the p-value and all that jazz… We find the defendant guilty of committing a type II error…, your honor! • Type I and type II error in hypothesis testing Next week: Tests of Hypotheses Large sample significance tests for proportions Large sample tests for population mean Small sample tests for population mean © 2010 All Rights Reserved, Robi Polikar, Rowan University -3 -2 - 1. 2. Confidence Intervals for Population Proportions + +2 +3 Compute the probability of success p̂ as the sample proportion (an estimate of the population proportion) that satisfy certain criteria • For example, the rate of defective chips, the percentage of people voting, etc. Determine the confidence level, α, and the corresponding critical value zα/2. • This is the value, to the right of which there is α/2 probability , with another α/2 probability lying on the left of - zα/2 for a total probability of α. • Recall: prob. of success p from n trials has binomial dist. whcih can be approximated with Gaussian distribution with mean np and variance p(1-p). Our estimate p̂, on the other hand, also has a normal distribution with mean p and variance p2ˆ p 1 p n 3. Compute the 100(1- α )% confidence interval as p pˆ z /2 pˆ pˆ z /2 pˆ 1 pˆ n The prob. that p̂ will exceed p by more than zα/2 is at most α/2. The prob. that p̂ will be short p by more than zα/2 is at most α/2. Conf. Level (%) 99 98 95 90 80 Critical Value z/2 2.58 2.33 1.96 1.65 1.28 1- /2 -z /2 0 +z /2All Rights Reserved, Robi Polikar, /2 Rowan University © 2010 An Example -3 -2 - + +2 +3 A manufacturer tests 70 items for defects, and finds that 52 of meets specifications. What is the 95%, 99% confidence interval for the proportion of his entire inventory meeting the specs? pˆ 52 / 70 0.743 pˆ 1 pˆ 0.743*0.257 0.00273 pˆ 0.00273 0.052 n 70 2 pˆ 0.05 z 2 1.96 p pˆ z 2 pˆ 0.743 1.96*0.052 0.743 0.102 p95% 0.641 0.845 0.01 z 2 2.58 p pˆ z 2 pˆ 0.743 2.58*0.052 0.743 0.134 p99% 0.609 0.877 © 2010 All Rights Reserved, Robi Polikar, Rowan University A correction -3 -2 - + +2 +3 Recall that our original formulation actually required us to compute the confidence interval as p pˆ z /2 p and that we fudged a little and replace the population standard deviation with sample deviation. This fudge may underestimate the true coverage of the interval (say, we in fact get 93% confidence worth of coverage when we think we have 95% confidence). So to fix this problem created with our fudge, we fudge again, this time in the opposite direction to be more conservative: • Replace the sample size n with n n 4 and the number of successes x by two additional successes such that the new probability of success is computed as pˆ x 2 n 4 . This correction is known as the Agresti-Coull Interval. • The previous example would then yield: pˆ 52 2 / 70 4 0.730, p2ˆ 0.73*0.27 74 0.00266 pˆ 0.00266 0.052 p95% 0.73 1.96*0.052 0.73 0.102 p95% 0.628 0.832 p99% 0.73 2.58*0.052 0.73 0.134 p99% 0.596 0.864 © 2010 All Rights Reserved, Robi Polikar, Rowan University -3 -2 - + +2 +3 Confidence Interval for Population Mean: μ So far we have looked at the confidence intervals for the proportion of successes, that is there is a random population distributed binomially, from which we took a sample of size n. Each r.v. in the experiment has two possible outcomes: success or failure, with a probability of success p which we tried to estimate. Polling: The polled person vote for a particular candidate or not Quality control: The product has a defect or not Medicine: A treatment plan is successful or not How about population means, where the outcome of an experiment has many potential outcomes, more specifically, a numerical outcome: The average speed of a chip The weight/height/BP/HR of a group of people/students/patients 5 year survival rates of patients treated with a certain cancer drug Average bit error rates in telecommunications How can we infer confidence intervals about such quantities…? © 2010 All Rights Reserved, Robi Polikar, Rowan University CLT to the rescue… -3 -2 - + +2 +3 Our calculations of confidence intervals for proportion of success were based on the assumption that the binomial distribution can be approximated by the Gaussian. But, according to CLT, the mean of a random sample from any population can be approximated with a Gaussian, as long as we have a sufficient sample size So the expressions we have derived are pretty much all valid for confidence of means as well… In particular, recall that if X1,…, Xn is a random sample from a distribution with mean value μ and standard deviation σ. Then , the average of all Xi, that is the sample mean X is normally distributed with E X X Then for sufficiently large n , s X2 X2 2 n X P(1.96 Z 1.96) 0.95 P 1.96 1.96 0.95 n Then again, since we actually do not know σ, we replace that with our best estimate s, the std. dev. of the sample, and we obtain s/√n: Sample standard error, X also denoted as SE( X ) P(1.96 Z 1.96) 0.95 P 1.96 s n 1.96 0.95 © 2010 All Rights Reserved, Robi Polikar, Rowan University Confidence Interval -3 -2 - + +2 +3 As before, provided that we have a sufficiently large sample size (typically n>40) for an arbitrary level of confidence (1-)100%, we have x z / 2 s n 1- /2 -z/2 0 /2 +z/2 Confidence Level (%) 99.73 99 98 96 95.45 95 90 80 68.27 50 Critical Value z/2 3.00 2.58 2.33 2.05 2.00 1.96 1.645 1.28 1.00 0.675 Solving for n that would give a specified confidence level and interval is difficult, however, since we have no idea on what s will be before collecting the data. A conservative guess is usually made, but that requires some prior knowledge about the data. For example, if we have reason to believe from previous experience that the data values never fall outside the [LL UL] boundary, and that the data is not too skewed, then a reasonable estimate for s is ¼ of this (UL-LL) range. © 2010 All Rights Reserved, Robi Polikar, Rowan University -3 -2 - + +2 +3 Confidence Interval (Re-explained) We obtain a sample, compute its mean X and its std. dev. s X (shown as X in the figures). We draw a 95% confidence interval around this mean, as X 1.96sX . Due to CLT, we know that this sample comes from a dist. with mean µ and σ=s/√n. Then, we know that 95 out of 100 times, our CI around the sample mean will include the true mean µ. One of those 95 cases where CI around the sample mean indeed includes µ. One of those 5 cases where CI around the sample mean does not include µ. © 2010 All Rights Reserved, Robi Polikar, Rowan University Example -3 -2 - + +2 +3 Here is one sample of size 100 from a group of students’ weights. Unbeknownst to us, the population is normal with mean weight of 160 lbs and a standard deviation of 20. These are the parameters we wish to estimate. From sample data we compute: n=100 Sample mean X =157.46 Sample std.dev. s=18.89 Sample Size We want to find the 90%, 95% and 99% confidence intervals (=0.1, 0.05 and 0.01, respectively) for the students’ weight. s x z / 2 n 136 136 162 176 153 157 169 180 150 191 115 138 173 164 143 158 141 128 167 174 179 189 136 140 169 169 160 158 174 199 149 161 150 186 189 148 147 146 202 170 166 135 154 165 149 157 170 139 132 197 164 159 135 189 143 151 171 135 139 153 160 137 133 182 155 158 155 165 180 137 139 133 178 146 173 190 118 151 152 155 141 154 160 134 142 147 162 161 132 183 152 179 147 158 135 133 191 152 166 137 z0.1 2 z0.05 1.645 157.46 1.645 * 18.89 10 [154.4 160.6] z0.05 2 z0.025 1.96 157.46 1.96 * 18.89 10 [153.8 161.2] z0.01 2 z0.005 2.58 157.46 2.58 * 18.89 10 [152.6 162.3] © 2010 All Rights Reserved, Robi Polikar, Rowan University Small Sample Size -3 -2 - + +2 +3 So far we have secretly and inconspicuously introduced the phrase “for sufficiently large sample sizes” into our calculations Exactly what is sufficiently large? Depends on the problem, but usually n>40 What happens if n is not sufficiently large? Recall that in calculating the confidence interval we needed to compute, X which included the term σ, a term whose value is unknown to us. So n we replaced it with the standard error, s, the variance of the sample mean. While In fact, X n X s n is indeed normal, X s n is only approximately normal for large n. is said to have a student’s t-distribution © 2010 All Rights Reserved, Robi Polikar, Rowan University T-Distribution -3 -2 - + +2 +3 When X is the mean of a random sample of size n from a normal distribution with mean μ the random variable X T S n has a probability distribution called (student’s) t-distribution with n – 1 degrees of freedom (df). For large n the r.v. S will have a value s close to the true σ, however, for small n this is not the case. Therefore, the t-distribution resembles the normal distribution for large n but deviates from it for smaller n std. normal t-dist., large n t-dist., small n © 2010 All Rights Reserved, Robi Polikar, Rowan University Properties of T-Distribution -3 -2 - + +2 +3 Let tv denote the density function curve for v degrees of freedom. 1. Each tν curve is bell-shaped and centered at 0. 2. Each tν curve is spread out more than the standard normal-z curve. 3. As ν increases, the spread of the corresponding tν curve decreases. 4. As ν→∞ , the sequence of tν curves approaches the standard normal curve (the z curve is called a t curve with df =∞) 5. Let t,ν= the number on the measurement axis for which the area under the t curve with ν df to the right of t,ν is . Then, t,ν is called a t-critical value (which is the counterpart of the z critical value in normal distribution). For brevity, when the meaning is obvious, we will drop ν and simply use t just like z tν curve /2 -t/2,ν 1- 0 /2 t/2,ν © 2010 All Rights Reserved, Robi Polikar, Rowan University Confidence Intervals Using T-Distribution -3 -2 - + +2 +3 Then, for smaller sample sizes (where the original distribution is normal), we can write the confidence interval expression as follows: Let x and s be the sample mean and standard deviation computed from the results of a random sample from a normal population with mean μ. The 100(1-)% confidence interval is: s s x t 2,n 1 , x t 2,n 1 n n s x t 2,n 1 n Strictly speaking, the t-distribution applies if and only if the population parameter being estimated is normally distributed. However, in practice, t-distribution works well, if the population distribution is only approximately mound-shaped. © 2010 All Rights Reserved, Robi Polikar, Rowan University Caution! When not to use z- or t- distributions -3 -2 - + +2 +3 For confidence interval calculations, the data must be truly random, that is independent and identically distributed. The following data, which shows some yield strength over time, is clearly not i.i.d.(why not?). Therefore, neither the normal nor t- dist. approximation is valid. If the data are i.i.d., then the sample size must be sufficiently large to justify Gaussian approx. (say n>40). If that is the case, we are not too concerned about the shape of the underlying distribution, as CLT says that sample mean will be approximately normal. If the sample size is small, then the Gaussian approximation is not valid. In that case, you can use the t-dist. approximation, but that requires the underlying data to be normal (or at least approximately normal, say a near bell shape with a single modal). Any data that has an outlier is unlikely to be (near) normal; t-dist. should not be used with data including outliers. Finally, if the true population σ is known, use z-values, not t. © 2010 All Rights Reserved, Robi Polikar, Rowan University Homework -3 -2 - + +2 +3 Read Chapter 5, Sections 5.1 ~ 5.8. Problems from Chapter 5 Section 5.1: 4, 10, 16, Section 5.2: 4. Bonus Question: 9 (if you can solve part c) MIDTERM: Friday, Oct 22, 9 AM © 2010 All Rights Reserved, Robi Polikar, Rowan University Hypothesis Testing -3 -2 - + +2 +3 Estimating the value of a parameter, even along with its confidence interval, has little meaning, unless we use that information to make a decision. The probability that a randomly selected processor from a specific manufacturer will be flawed is 0.24%±0.01% with a confidence level of 95%...So what…? Shall we decide that this is a reliable processor? Confidence intervals are most useful in making decisions based on statistical tests Given an observation based on a finite random sample, can this observation be entirely due to chance? In HT, we compare two hypotheses against each other and determine whether we have enough statistical evidence to reject the hypothesis that the observation is entirely due to chance. © 2010 All Rights Reserved, Robi Polikar, Rowan University Hypothesis Testing: Setting the stage -3 -2 - + +2 +3 We will start with an example to familiarize our self with the terminology. Note that any given application can easily be substituted into a number of engineering or nonengineering scenarios: As the CEO of Owl Superior Chip Co., you hear the announcement of your competitor Lentil’s new chips: snor e i7 , and its low cost version cr apler on. Lentil declares that their chips, even the low cost versions, are 99.99% defect free (that is, only 0.01% of their chips are flawed). Since you are in this business for quite some time (2 ½ months), you think this is pretty impressive, if not too good to be true…You are suspicious. You know that snor e i7 is pretty reliable, but 99.99% on cr apler on…? You suspect that Lentil is cheating in its figures…that the 99.99% is primarily for the snor e i7 chips, not for the cr apler ons… How to prove? You later learn that in estimating the 99.99% figure, they have taken a sample of 80 chips, of which only 4 were cr apler on…You consider going to court, stating that this is false advertising!...to which they reply with “…well, we randomly picked 80 chips from a production run that manufactures equal number of chips of each kind. The fact that there were only 4 of cr aplons in the sample is purely coincidental. There is no foul play!” You say… “well that is crap!” © 2010 All Rights Reserved, Robi Polikar, Rowan University By chance…? -3 -2 - + +2 +3 Other versions of the same scenario: A company whose workforce of 80 employees consists of 76 males and 4 females. The company claims that they do not favor males, and the fact that there are only 4 females is purely by chance. On the days they were hiring, only men happened to apply – although men and women are equally likely to apply and be successful in such a position Court Drama: Out of a panel of 80 potential jurors, only 4 were African –American, in a district where 50% of all eligible citizens were AAs. 50% of all eligible employees/jurors/chips are women / African American / Crapleron On a random sample of 80 employees/jurors/chips, only 4 are women / African American / Crapleron ! Could this be the result of pure chance? © 2010 All Rights Reserved, Robi Polikar, Rowan University What are the odds? -3 -2 - + +2 +3 If the selection is really random, and that each group is 50% of the total population, then the number of women / AAs / Craplerons in the sample would be the binomial random variable X with p=0.5, and n=80. Thus the chances of getting only 4 women/AAs/Craplerons is P(X=4), which is 80 4 0.5 1 0.576 4 0.0000000000000000014 ! You think you have enough statistical evidence to reject Lentil’s claim that having only 4 craplons in their sample was random or pure chance. You go to court! © 2010 All Rights Reserved, Robi Polikar, Rowan University What are the odds? -3 -2 - + +2 +3 To drive the point home, you argue that this probability is less than the chances of getting three consecutive royal flushes in poker, or almost the same as hitting the big jackpot twice in a row! Remember? Picking 6 numbers out of 52 in order: 0.000000000068 Getting 4 Craplerons in sample of 80: 0.0000000000000000014 ! So the judge rejects Lentil’s claim (hypothesis) of random selection! © 2010 All Rights Reserved, Robi Polikar, Rowan University Formal Definitions -3 -2 - + +2 +3 A statistical hypothesis is a claim about the value or values of one or more parameters. Proportion of defective chips is p<0.01% Average SAT math score in NJ is s>500 Average wattage of a 60W bulb is w=60W In any hypothesis testing problem, there are two competing hypothesis H0 – Null hypothesis: the protected hypothesis that is initially assumed to be true, that is, the observations are the result of pure chance Ha – Alternative hypothesis: the claim that the null hypothesis is false, that is, the observations are not by chance, but are the result of a real effect, plus variation. The test is to analyze observed data to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis. The burden of proof is with the alternative hypothesis. If the data does not strongly support the Ha claim, then the test fails to reject H0. © 2010 All Rights Reserved, Robi Polikar, Rowan University H0 vs. Ha -3 -2 - + +2 +3 Often we wish to find out whether a new value / a new theory / a new treatment plan is better then the previous / existing one. H0: The claim that the new value/theory/plan is no better than the current one. Ha: The alternative claim that the new value/theory/plan is better. We only replace the current with the new if there is enough, convincing and compelling evidence to do so. Ex: If in the defective chips example, if we develop a new procedure to fabricate the chips, we would use it if and only if it produces fewer defects. If the current procedure has proportion of defective chips as p=0.01 • Ha , on which the burden of proof is placed, is the assertion that the new procedure has p<0.01. H0 is then the initial and prior claim that p=0.01 The null hypothesis is always in the form of Ho: θ=θ0 (the null value) The alternative hypothesis can be in any of the following three forms: • Ha: θ > θ0 (which implicitly assumes that Ho: θ≤θ0) • Ha: θ < θ0 (which implicitly assumes that Ho: θ≥θ0) • Ha: θ ≠ θ0 (which implicitly assumes that Ho: θ=θ0) © 2010 All Rights Reserved, Robi Polikar, Rowan University Choosing an Appropriate test -3 -2 - + +2 +3 Suppose that a 9V battery – when fresh – is required to provide 9.1 V. As the quality control engineer, you draw a random sample of size n to determine whether you are in compliance. You design an experiment where H0 μ = 9.1 V and a. Ha>9.1V b. Ha<9.1V c. Ha≠9.1V You would choose (a) because, in this formulation H0 indicates non-compliance. As a quality control engineering, you put the burden of proof on asserting that the specs are satisfied. If we were to choose the other options, then H0 would indicate compliance, and Ha would then put the burden of proof on asserting that the batteries are in non-compliance. If you were challenged in a legal proceeding, however, the alleger would have to choose test (b). Suppose 5pCi/L is the borderline for radioactivity in water. Which test would you choose? Choose H0: μ=5pCi/L vs. Ha: μ<5pCi/L Then the water is believed unsafe unless proven otherwise, that is the burden of proof is on showing that the water is indeed safe, that is μ<5pCi/L. Choosing Ha: μ>5pCi/L would mean that the water is safe, unless proven otherwise. Suppose you manufacture 20 A fuses for home use. If the fuse burns out at < 20 A, then users would complain fuse burning out prematurely. If fuse burns out at >20 A, then fire may occur due to malfunctioning fuse. What test should you choose? Choose H0: μ=20 A vs. Ha: μ≠20 A. Because this time the burden of proof is on showing that fuse blows out at exactly at 20A. Departure in either way from 20A is equally costly. © 2010 All Rights Reserved, Robi Polikar, Rowan University Testing Procedure -3 -2 - + +2 +3 Step 1: Formulate the hypotheses and determine the null value The null hypothesis asserting that current / status quo situation is preferred H0:Lentil’s sample was purely random – there was 50% chance to pick either chip H0: The new drug will not lower the cholesterol by (more then) 20% H0: The new engine technology will allow gas mileage of (no more then) 30mph H0: The defective component ratio of our product is the same as the competitor’s The alternative hypothesis claiming that the null hypothesis should be rejected in preference of the new procedure Ha: Lentil’s sample was not purely random, but rather it was biased: there was >50% chance to pick Pantsium in the sample. Ha: The new drug will lower the cholesterol by > 20% Ha: The new engine technology will allow gas mileage > 30 mph Ha: The defective component ratio of our product is < that of the competitor’s. © 2010 All Rights Reserved, Robi Polikar, Rowan University Testing Procedure -3 -2 - + +2 +3 Step 2: Choose a test statistic and the formula for computing it A test statistic is a function of the sample data on which the decision – reject H0 or do not reject H0 – will be based This is the statistical value that will asses your evidence against the null hypothesis • For the random sampling of chips example, the test statistic would be the binomial random variable with probability of success p=0.5, and the number of trials n=80. – For applications of the form „proportion of successes‟, the test statistic is generally the mean of the observed binomial random variable probability of success, compared with the presumed probability of success (p0, the null value) Note that for a large enough sample size this random variable is approximately normal z pˆ p0 / n pˆ p0 p0 1 p0 n • For the gas mileage problem, the test statistic would be the sample mean of the gas mileage obtained from a normally distributed gas mileages of the cars with the new technology: H0: μnew_tech=30mpg vs. Ha: μnew_tech> 30mpg. – For all applications of the form “average value”, the test statistic is generally the mean of the random sample (sample mean) compared to presumed average (μ0, the null value). Note that from CLT, for a sufficiently large sample size, this statistic x 0 will also be approximately normal. z / n © 2010 All Rights Reserved, Robi Polikar, Rowan University Testing Procedure -3 -2 - + +2 +3 Step 3: State the rejection region for a selected significance level The rejection region is the set of all test statistic values for which H0 will be rejected. • For the Lentil’s random sampling, we may want to reject their hypothesis if the probability of selecting 4 Craplons at random is less then a specific value. In the previous example, the de-facto rejection (for the judge) was the probability of three royal flushes in a row or hitting the jackpot twice in a row. • For the gas mileage example, we may choose the rejection as average gas mileage being greater then 35 mph. – Note that since the H0 is the default hypothesis, we need convincing and compelling argument to reject it. Therefore, the rejection region usually picked in such a manner to give H0 plenty of “benefit of the doubt” • The value is the confidence we wish to have in our rejection region. For example a 95% confidence for the car example, would mean that after observing a large number of cars with the new technology, on average, 95% will have a gas mileage 35 (or higher). © 2010 All Rights Reserved, Robi Polikar, Rowan University Testing Procedure -3 -2 - + +2 +3 Step 4:Compute the sample quantities and decide whether H0 should be rejected For the random sampling example, we compute the probability P(X≤4|p0=0.5, n=80) For the car example, we compute P( ≥35 |μ0=30, σ=…, n=…) x • We then compare these values to rejection region at the specified confidence level. A commonly used figure of merit is the p-value, which answers the following question: If the null hypothesis were true, then what is the probability of observing a test statistic as extreme as the one we observed ? The smaller the p-value, the stronger the evidence against the null hypothesis. If the p-value is less then a threshold, corresponding to the rejection region, then we agree that there is statistically compelling evidence against H0. For the random sampling example, p=1.4x10-18, we have enough evidence to rule out Lentil’s claim that having only 4 Craplerons in their sample was purely coincidental! © 2010 All Rights Reserved, Robi Polikar, Rowan University Errors in Hypothesis Testing -3 -2 - + +2 +3 Can we make errors despite being over cautious and giving H0 plenty of benefit of the doubt…? Of course, in fact, there are two types of errors we can make. To make the point, think of the fire detector in your house, and how often it goes off if you make the toast little too dark! Well, this is called Type I error: An alarm without a fire (false alarm)! Every cook knows how to avoid a type I error: Just remove the batteries! But then this can cause a fire going undetected – and this is called Type II error : A fire without an alarm (missed alarm) ! Similarly, we can reduce the chance of Type II error by increasing the sensitivity of the sensor, but then again, that increases the probability of Type I error. © 2010 All Rights Reserved, Robi Polikar, Rowan University Errors in Hypothesis Testing -3 -2 - + +2 +3 We can put these observations in a table, called the decision table. Now consider the null hypothesis that there is no fire, and Ha: FIRE!. The alarm, then corresponds to rejection of the null hypothesis. Statistically speaking: A type I error is committed if we reject the null hypothesis when in fact it was true A type II error is committed if we fail to reject the null hypothesis, when in fact it was false. © 2010 All Rights Reserved, Robi Polikar, Rowan University :Type I Error -3 -2 - + +2 +3 Examples: For the car example, let’s suppose we observed 50 cars and checked their gas mileage. It is possible that the average gas mileage of those 50 cars was say 35.7 mph, when in fact the true average is below 35. Then by rejecting H0, a type I error is made. On the other hand, it is also possible that the average gas mileage of those 50 cars were, say 34.6 mph, when in fact, the true average was above 35. Then by not rejecting H0, a type II error is made. Note that the significance level we mention earlier, emphasized the probability of committing a type I error: P(rejecting H0 | H0 is true) = P(type I error | H0 )= Then, with 100(1- )% confidence, we claim that the observed observation under H0 is statistically very unlikely, and hence reject H0. The lower the , the higher the confidence we have in rejecting H0 hence the lower the probability of committing a type I error. © 2010 All Rights Reserved, Robi Polikar, Rowan University Type II error -3 -2 - + +2 +3 But, sometimes we are interested in type II error, is our alarm too sensitive? In the past, factories discharging chemicals into waterways were required to show that the discharge had no effect on the downstream wildlife. This is H0. The factory could continue, as long as H0 was not rejected at the 0.05 significance level. So a polluter, suspecting that he is in violation of EPA standards could devise an ineffective pollution monitoring program, such as “let’s ask the ducks!” Type I error: Reject H0, when it is true (shut down the factory, when in fact its discharge really has no effect on wildlife) Type II error: Accept H0, when it is false (factory continues, when in fact it is decimating the wildlife). Such a test, say “interviewing the ducks” is equivalent to removing the batteries from the fire detector. Both are designed to reduce (remove) type I error. Of course, such a test greatly increases the probability of committing a type II error, that is, accepting the H0 that the factory discharge is harmless, when in fact it is. © 2010 All Rights Reserved, Robi Polikar, Rowan University β:Type II error -3 -2 - + +2 +3 Just like we limit our probability of committing a type I error using a confidence level of , we can also limit our probability of making a type II error. We define: β = P(accepting H0 | Ha is true) = P(type II error |Ha) Thus β defines the probability of making a type II error. The lower the β, the more confident we are of not committing a type II error. Again, just like our confidence in not making a type I error is 1-, our confidence in not making a type II error is then 1-β, which is called the power of a hypothesis test. Note that the two types of error, type I and II are always in competition. Reducing one increases the other. Of course, we’re happy to report, the environmental regulations have changed since then, requiring pollution monitoring programs to show that they have a high probability of detecting serious pollution events – that is having a very small β, revealing any hidden flaws in the monitoring program. © 2010 All Rights Reserved, Robi Polikar, Rowan University A Complete Example -3 -2 - + +2 +3 A new design for braking systems is proposed. For the current system, the true average braking distance at 40 mph is 120 ft. The new system is to be implemented, if there is substantial evidence that it will reduce the braking distance significantly. Parameter of interest, appropriate hypotheses to test the new system Suppose the new system’s braking distance has a σ=10 ft. Let X be the sample average breaking distance of the new system for 36 observations. Which rejection region is most appropriate? R1: x >124.80, R2: x <115.20, R3: { x >125.13 or x <114.87} What is the significance level for the appropriate region in selected above? How would we change the region to obtain 99% confidence level? What is the probability that the new design is NOT implemented when its true average braking distance is actually 115 ft and the appropriate region from above is used? Let Z X 120 n . What is the significance level for the rejection region of z<-2.33? How about z<-2.88? © 2010 All Rights Reserved, Robi Polikar, Rowan University Solution -3 -2 - a. b. c. + +2 +3 Let μ = true average braking distance for the new design at 40 mph. We want to make sure that the burden of proof is on the new braking distance be lower, then, Ho: μ = 120 vs. Ha: μ < 120 We want to give the null hypothesis the benefit of doubt. Therefore, we need significant evidence that the new average distance is substantially less then that of the existing one. Therefore, we should choose R2. Reject Ho if x< 115.2 (<120) Recall, significance level is probability of type I error, that is rejecting H0, when in fact we shouldn’t. We will reject H0, if observed average is <115.2. The area under the normal curve with mean 120 (the assumed average for existing system hence H0) is the green shaded region whose area is then : 115.2 120 1- Px 115.2 | 120 P z x 115.2 120 P z Pz 2.88 0.02 98% confidence 10 / 6 n Now, if we want =0.001 (that is increased to 99.9% confidence) , then we should expect a smaller rejection region: We find the z- value that would give a green shaded area of 0.001 as -3.08 from the Gaussian tables. Then the new rejection region threshold c is: 0.001 c 120 114.87 120 3.08 c 114.87 P z 0.001 10 6 1.667 1- 114.87 © 2010 All Rights Reserved, Robi115.2 Polikar, Rowan University Solution – Cont. -3 -2 - d. + +2 +3 What is the probability that the new design is NOT implemented when its true average braking distance is actually 115 ft and the appropriate region from above is used? ▪ Now, if we are not implementing the new design, then we must have failed to reject H0 (presumably because we think we do not have enough evidence). But in fact the true average distance for the new design is 115, which is less then 115.20. Clearly, we are committing a type II error (failed to reject H0 when it should have been) ▪ According to our hypothesis, we will not reject H0 if average braking distance is >115.2. Therefore, we are looking at the probability of the observed average braking distance being greater then 115.2, when in fact the observed sample is drawn from a population that has a mean of 115: (115) Px 115.20 | true 115 115.2 115 P z Pz 0.12 0.4522 1.6667 e. For Z X 120 n z is normal, therefore Pz 2.33 0.01 115 115.2 Pz 2.88 0.02 © 2010 All Rights Reserved, Robi Polikar, Rowan University No new Homework (for week of 10/14) -3 -2 - + +2 +3 © 2010 All Rights Reserved, Robi Polikar, Rowan University