Download hypothesis testing - Rutgers Statistics

Chapter 8 Tests of Hypotheses Based on a Single Sample Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.1 Nonstatistical Hypothesis Testing… A criminal trial is an example of hypothesis testing without the statistics. In a trial a jury must decide between two hypotheses. The null hypothesis is H0: The defendant is innocent The alternative hypothesis or research hypothesis is H1: The defendant is guilty The jury does not know which hypothesis is true. They must make a decision on the basis of evidence presented. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.2 Nonstatistical Hypothesis Testing… In the language of statistics convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis. That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to support the alternative hypothesis). If the jury acquits it is stating that there is not enough evidence to support the alternative hypothesis. Notice that the jury is not saying that the defendant is innocent, only that there is not enough evidence to support the alternative hypothesis. That is why we never say that we accept the null hypothesis. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.3 Nonstatistical Hypothesis Testing… There are two possible errors. A Type I error occurs when we reject a true null hypothesis. That is, a Type I error occurs when the jury convicts an innocent person. A Type II error occurs when we don’t reject a false null hypothesis. That occurs when a guilty defendant is acquitted. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.4 Nonstatistical Hypothesis Testing… The probability of a Type I error is denoted as α (Greek letter alpha). The probability of a type II error is β (Greek letter beta). The two probabilities are inversely related. Decreasing one increases the other. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.5 Nonstatistical Hypothesis Testing… In our judicial system Type I errors are regarded as more serious. We try to avoid convicting innocent people. We are more willing to acquit guilty people. We arrange to make α small by requiring the prosecution to prove its case and instructing the jury to find the defendant guilty only if there is “evidence beyond a reasonable doubt.” Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.6 Hypothesis Testing… The critical concepts are these: 1. There are two hypotheses, the null and the alternative hypotheses. 2. The procedure begins with the assumption that the null hypothesis is true. 3. The goal is to determine whether there is enough evidence to reject the truth of the null hypothesis. 4. There are two possible decisions: Conclude there is enough evidence to reject the null hypothesis. [Conclude alternative hypothesis is true] Conclude there is not enough evidence to reject the null hypothesis. [Conclude null hypothesis is true] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.7 Nonstatistical Hypothesis Testing… 5. Two possible errors can be made. Type I error: Reject a true null hypothesis Type II error: Do not reject a false null hypothesis. P(Type I error) = α P(Type II error) = β In practice, Type II errors are by far the most serious mistake because you go away believing the null is true and it is not true. Type I errors would allow you to conduct additional testing to verify conclusion. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.8 Introduction… In addition to estimation, hypothesis testing is a procedure for making inferences about a population. Population Sample Inference Statistic Parameter Hypothesis testing allows us to determine whether enough statistical evidence exists to conclude that a belief (i.e. hypothesis) about a parameter is supported by the data. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.9 Statistical Hypotheses Testing… A statistical hypothesis, or just hypothesis, is a claim either about the value of a single parameter, or values of several parameters, or the form of an entire probability. Example: Test if population mean µ = 350 Test if population mean µ is between 340 and 360 Test if the samples come from normal distribution etc…… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.10 Exercise 1… For each of the following assertion, state whether it is a legitimate statistical hypothesis and why: 1. H: σ > 100 2. H: ~ x = 45 3. H: s ≤ .20 4. H: σ1/σ2 < 1 5. H: X − Y = 5 6. H: λ ≤ .01, where λ is the parameter of an exponential distribution used to model component lifetime. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.11 Concepts of Hypothesis Testing … There are two hypotheses. One is called the null hypothesis and the other the alternative or research hypothesis. The usual notation is: pronounced H “nought” H0: — the ‘null’ hypothesis H1: — the ‘alternative’ or ‘research’ hypothesis The null hypothesis (H0) will always include “=“ [two tail test], “<“ [one tail test], and “>” [one tail test] in this course Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.12 Concepts of Hypothesis Testing… Consider a problem dealing with the mean demand for computers during assembly lead time. Rather than estimate the mean demand, our operations manager wants to know whether the mean is different from 350 units. We can rephrase this request into a test of the hypothesis: H0: = 350 Thus, our research hypothesis becomes: H1: ≠ 350 This is what we are interested in determining… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.13 Concepts of Hypothesis Testing … The testing procedure begins with the assumption that the null hypothesis is true. Thus, until we have further statistical evidence, we will assume: H0: = 350 (assumed to be TRUE) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.14 Concepts of Hypothesis Testing … The goal of the process is to determine whether there is enough evidence to infer that the null hypothesis is most likely not true and consequently the alternative hypothesis then must be true. That is, is there sufficient statistical information to determine if this statement: H1: ≠ 350, is true? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.15 Concepts of Hypothesis Testing … There are two possible decisions that can be made: Conclude that there is enough evidence to reject the null hypothesis Conclude that there is not enough evidence to reject the null hypothesis NOTE: we do not say that we accept the null hypothesis, although most people in industry will say this. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.16 Concepts of Hypothesis Testing… Once the null and alternative hypotheses are stated, the next step is to randomly sample the population and calculate a test statistic (in this example, the sample mean). If the test statistic’s value is inconsistent with the null hypothesis we reject the null hypothesis and infer that the alternative hypothesis is true. For example, if we’re trying to decide whether the mean is not equal to 350, a large value of (say, 600) would provide enough evidence. If is close to 350 (say, 355) we could not say that this provides a great deal of evidence to infer that the population mean is different than 350. Is 355 really close?? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.17 Concepts of Hypothesis Testing … Two possible errors can be made in any test: A Type I error occurs when we reject a true null hypothesis and A Type II error occurs when we don’t reject a false null hypothesis. There are probabilities associated with each type of error: P(Type I error) = P(Type II error ) = Maximum value of α is called the significance level. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.18 Types of Errors… A Type I error occurs when we reject a true null hypothesis (i.e. Reject H0 when it is TRUE) H0 T Reject I Reject F II A Type II error occurs when we don’t reject a false null hypothesis (i.e. Do NOT reject H0 when it is FALSE) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.19 Types of Errors… Back to our example, we would commit a Type I error if: Reject H0 when it is TRUE We reject H0 ( = 350) in favor of H1 ( ≠ 350) when in fact the real value of is 350. We would commit a Type II error in the case where: Do NOT reject H0 when it is FALSE We believe H0 is correct ( = 350), when in fact the real value of is something other than 350. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.20 Exercise 2… For the following pairs of assertions, indicate which do not comply with our rules for setting up hypotheses and why (the subscripts 1 and 2 differentiate between quantities for two different populations or samples) 1. H0: µ = 100, Ha: µ > 100 2. H0: σ = 20, Ha: σ ≤ 20 3. H0: p ≠ .25, Ha: p = .25 4. H0: µ1 – µ2 = 25, Ha: µ1 – µ2 > 100 5. H0:S12 = S 22 , Ha: S12 ≠ S 22 6. H0: µ = 120, Ha: µ = 150 7. H0: σ1/σ2 = 1, Ha: σ1/σ2 ≠ 1 8. H0: p1 – p2 = –.1, Ha: p1 – p2 < –.1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.21 Exercise 3… To determine whether the pipe welds in a nuclear power plant meet specifications, a random sample of welds is selected, and tests are conducted on each weld in the sample. Weld strength is measured as the force required to break the weld. Suppose the specifications state that the mean strength of welds should exceed 100 lb/in2; the inspection team decides to test H0: µ = 100 versus Ha: µ > 100. Explain why it might be preferable to use this Ha rather than µ < 100. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.22 Exercise 4… Let µ denote the true average radioactivity level (picocuries per litter). The value 5pCi/L is considered the dividing line between safe and unsafe water. Would you recommend testing H0: µ = 5 vs. Ha: µ > 5 or H0: µ = 5 vs. Ha: µ < 5 ? Explain your reasoning. [Hint: Think about the consequnces of a type I and type II error for each possibility.] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.23 Recap I… 1) Two hypotheses: H0 & H1 2) ASSUME H0 is TRUE 3) GOAL: determine if there is enough evidence to infer that H1 is TRUE 4) Two possible decisions: Reject H0 in favor of H1 NOT Reject H0 in favor of H1 5) Two possible types of errors: Type I: reject a true H0 [P(Type I)= ] Type II: not reject a false H0 [P(Type II)= ] Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.24 Recap II… The null hypothesis must specify a single value of the parameter (e.g. =___) Assume the null hypothesis is TRUE. Sample from the population, and build a statistic related to the parameter hypothesized (e.g. the sample mean, ) Compare the statistic with the value specified in the first step Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.25 Example … A department store manager determines that a new billing system will be cost-effective only if the mean monthly account is more than $170. [One tail test] A random sample of 400 [n] monthly accounts is drawn, for which the sample mean is $178 [X-bar]. The accounts are approximately normally distributed with a standard deviation of $65 [Sigma]. Can we conclude that the new system will be cost-effective? Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.26 Example … The system will be cost effective if the mean account balance for all customers is greater than $170. We express this belief as a our research hypothesis, that is: H1: > 170 (this is what we want to determine) Thus, our null hypothesis becomes: H0: < 170 (we use the “=“ part to conduct the test) Some people will leave the less than part off of the null hypothesis. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.27 Example … What we want to show: H1: > 170 H0: < 170 (we’ll assume the = is true) We know: n = 400, = 178, and = 65 What to do next?! Determine sampling distribution of X-bar. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.28 Example … To test our hypotheses, we can use two different approaches: The rejection region approach based on sampling distribution of X-bar (typically used when computing statistics manually), and The p-value approach (which is generally used with a computer and statistical software and also based on the sampling distribution of X-bar). We will explore both … Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.29 Example … Rejection Region…[Based on sampling distribution of X-bar] The rejection region is a range of values such that if the test statistic falls into that range, we decide to reject the null hypothesis in favor of the alternative hypothesis. is the critical value of Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. to reject H0. 11.30 Example … It seems reasonable to reject the null hypothesis in favor of the alternative if the value of the sample mean is large relative to 170, that is if > . Normally stated as “if Xbar is greater than the critical value”. = P( > ) is also… = P(rejecting H0 given that H0 is true) = P(Type I error) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.31 Example … All that’s left to do is calculate and compare it to 170. we can calculate this based on any level of significance ( ) we want… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.32 Example … At a 5% significance level (i.e. =0.05), we get Solving we compute =175.34 Since our sample mean (178) is greater than the critical value we calculated (175.34), we reject the null hypothesis in favor of H1, i.e. that: > 170 and that it is cost effective to install the new billing system Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.33 Example: The Big Picture… H1: H0: > 170 = 170 =175.34 =178 Reject H0 in favor of Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.34 Standardized Test Statistic… An easier method is to use the standardized test statistic: and compare its result to : (rejection region: z > ) Since z = 2.46 > 1.645 (z.05), we reject H0 in favor of H1… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.35 Conclusions of a Test of Hypothesis… If we reject the null hypothesis, we conclude that there is enough evidence to infer that the alternative hypothesis is true. If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that the alternative hypothesis is true. Remember: The alternative hypothesis is the more important one. It represents what we are investigating. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.36 One– and Two–Tail Testing… The department store example was a one tail test, because the rejection region is located in only one tail of the sampling distribution: More correctly, this was an example of a right tail test. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.37 Right-Tail Testing… Calculate the critical value of the mean ( ) and compare against the observed value of the sample mean ( )… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.38 Left-Tail Testing… Calculate the critical value of the mean ( ) and compare against the observed value of the sample mean ( )… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.39 Two–Tail Testing… Two tail testing is used when we want to test a research hypothesis that a parameter is not equal (≠) to some value Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.40 Example … AT&T’s argues that its rates are such that customers won’t see a difference in their phone bills between them and their competitors. They calculate the mean and standard deviation for all their customers at $17.09 and $3.87 (respectively). These are assumed to be the values of the population parameters [mean and standard deviation] You don’t believe them, so you sample 100 customers at random and calculate a monthly phone bill based on competitor’s rates. What we want to show is whether or not: H1: ≠ 17.09. We do this by assuming that: H0: = 17.09 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.41 Example … At a 5% significance level (i.e. = .05), we have /2 = .025. Thus, z.025 = 1.96 and our rejection region is: z < –1.96 -z.025 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. -or- 0 z > 1.96 +z.025 z 11.42 Example … From the data, we calculate = 17.55 Using our standardized test statistic: We find that: Since z = 1.19 is not greater than 1.96, nor less than –1.96 we cannot reject the null hypothesis in favor of H1. That is “there is insufficient evidence to infer that there is a difference between the bills of AT&T and the competitor.” Does not mean that you have proven that “the true mean is $17.09”. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.43 Probability of a Type II Error – It is important that that we understand the relationship between Type I and Type II errors; that is, how the probability of a Type II error is calculated and its interpretation. Recall example about department store … H0: = 170 H1: > 170 At a significance level of 5% we rejected H0 in favor of H1 since our sample mean (178) was greater than the critical value of (175.34) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.44 Probability of a Type II Error – A Type II error occurs when a false null hypothesis is not rejected. In example, this means that if is less than 175.34 (our critical value) we will not reject our null hypothesis, which means that we will not install the new billing system. Thus, we can see that: = P( < 175.34 given that the null hypothesis is false) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.45 Example - Calculating P(Type II errors)… Our original hypothesis… our new assumption… Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.46 Effects on of Changing Decreasing the significance level and vice versa. , increases the value of Consider this diagram again. Shifting the critical value line to the right (to decrease ) will mean a larger area under the lower curve for … (and vice versa) Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.47 Another example… A certain type of automobile is known to sustain no visible damage 25% of time in 10-mph crash tests. A modified bumper design has been proposed in an effort to increase this percentage. The experiment involves n = 20 independent crashes with prototypes of the new design. Let p denote the proportion of all 10-mph crashes with new bumper that result in no visible damage. Hypothesis: H0: p = .25 Ha: p > .25 No Improvement Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.48 Another example… Intuitively, H0 should be rejected if a substantial number of crashes show no damage. How large is enough ? Test statistic: X = the number of crashes with no visible damages. Rejection region: R8 = {8, 9, 10, … , 19, 20} i.e, we will reject H0 if x ≥ 8. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.49 Calculating type I error… When H0 is true, X has a binomial probability distribution with n = 20 and p = .25 α = P(type I error ) = P( H 0 is rejected when it is true) = P( X ≥ 8 when X ~ Bin(20,.25)) = 1 − B(7;20,.25) = 1 − .898 = .102 So, when H0 is actually true, roughly 10% of all experiment consisting of 20 crashes would result in H0 being incorrectly rejected. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.50 Calculating type II error… In contrast to α, there is no single β. Instead, there is a different β for each different p that exceeds .25. Let’s pick p =.3 (i.e, we assume that the true p = .3) β (.3) = P(type II error when p = .3) = P ( H 0 is not rejected when it is false because p = .3) = P ( X ≤ 7 when X ~ Bin(20,.3)) = B (7;20,.3) = .772 When p is actually .3 rather than .25 (a “small” departure from H0), roughly 77% of all experiments of this type would result in H0 being incorrectly not rejected! Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.51 Calculating type II error… Picking different values of p results different type II error p β(p) .3 .772 .4 .416 .5 .132 .6 .021 .7 .001 .8 .000 Clearly, β decreases as the value of p moves farther to the right of the null value .25 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.52 Example… What if we want a smaller α ? Reduce the rejection region to R9={9,10,…,19,20} α = P(type I error ) = P( H 0 is rejected when it is true) = P( X ≥ 9 when X ~ Bin(20,.25)) = 1 − B(8;20,.25) = .041 β (.3) = P( H 0 is not rejected when it is false because p = .3) = P ( X ≤ 8 when X ~ Bin(20,.3)) = B(8;20,.3) = .887 β (.5) = B(8;20,.5) = .252 Compare these with the previous β’s for R8 .772 and .132 respectively Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.53 Judging the Test… A statistical test of hypothesis is effectively defined by the significance level ( ) and the sample size (n), both of which are selected by the statistics practitioner. Therefore, if the probability of a Type II error ( to be too large, we can reduce it by increasing , and/or increasing the sample size, n. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. ) is judged 11.54 Judging the Test… The power of a test is defined as 1– . It represents the probability of rejecting the null hypothesis when it is false [the true mean is some value other than that specified by the null hypothesis]. I.e. when more than one test can be performed in a given situation, its is preferable to use the test that is correct more often. If one test has a higher power than a second test, the first test is said to be more powerful and the preferred test. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.55

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download hypothesis testing - Rutgers Statistics