Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hypothesis Testing - Introduction Six Sigma Greenbelt Training Dave Merritt 12-5-16 Hypothesis Testing - As we perform tests, we are attempting to support a conclusion of equivalence or difference between distributions of data: The process of taking a practical problem and translating it to a statistical problem Are the results from machine 1 the same as from machine 2? Is there variation between the outputs of shift 1, 2 or 3? Did the process change we made, make a difference? Sampling, although practical, can limit our ability to draw accurate conclusions. Because we are using samples (and relatively small ones at that), there is always a chance our sample will not accurately represent the distribution of the population. Hypothesis Testing is used to determine the probability that those samples actually indicate there is equivalence or difference. The Reason for Hypothesis Testing 1) If you start with a given distribution… 3) the sample will represent the distribution. 2) and take an adequate random sample from that distribution… It will have a similar central tendency and variation. The Reason for Hypothesis Testing 1) If you start with a given distribution… 2) and take multiple samples… 3) the samples may not be equal to one another but each represents the distribution. They will each have a similar central tendency and variation. The Reason for Hypothesis Testing 1) If you make a process change resulting in a new distribution that does not overlap the original distribution… Original distribution 2) and you take an adequate random sample from that distribution… 3) the sample will represent the new distribution. It’s central tendency and variation will represent the new distribution, not the original The Reason for Hypothesis Testing 1) If you make a process change resulting in a new distribution that does overlap the original distribution… Original distribution 2) and you take an new sample… 3) can you determine if your process change made a difference? The Reason for Hypothesis Testing 2) Or did the distribution remain the same? 3) Can you tell from just this sample? 1) Did the process change shift the distribution? Hypothesis Testing The process of taking a practical problem and translating it to a statistical problem What is a Hypothesis? A tentative assumption that is made in order to draw out and test it’s logical or empirical consequences All Hypothesis Testing is based on the idea that there is no difference between distributions until adequate evidence is observed to prove otherwise. So the initial Hypothesis that is made is that there is no difference. This is called the Null Hypothesis (Ho) If adequate evidence of change is observed, then a different Hypothesis is supported This is called the Alternative Hypothesis (Ha) The Null Hypothesis The Null Hypothesis (Ho) is assumed to be true. This is like the defendant (culprit) being assumed innocent. You are the prosecuting attorney. You must provide evidence that this assumption is probably not true. Hypothesis Testing - Determining Probability to make the right decision Probability can be expressed numerically. An event that is impossible to occur is said to have a probability of 0. An event that will absolutely occur has a probability of 1. Probability of events between impossible and absolutely are expressed as a decimal between 0 and 1 What is the probability of a fair coin flip resulting in a Head? The Probability of this event can be expressed as the decimal 0.5 (1/2) There is equal Probability that either result will occur The results of flipping a coin fit into a Binomial Distribution because there are only two possible outcomes of each flip: Heads or Tails Hypothesis Testing - Determining Probability to make the right decision Based on the conventions of Hypothesis Testing, if we were to evaluate the fairness of a series of coin flips, our Null Hypothesis be: Ho = The Coin is Fair The Alternate Hypothesis be: Ha = The Coin Not Fair Hypothesis Testing - Determining Probability to make the right decision What is the probability of a fair coin flip resulting in a 10 Heads? The answer can be calculated by multiplying the probability of a single head by itself 10 times .5 x .5 x .5 x .5 x .5 x .5 x .5 x .5 x .5 x.5= .000977 You may see this expressed as a p value (p < 0.001) What is the meaning of a p value? The “p” in p-value stands for probability. The p-value is calculated (generally by using minitab) using a test statistic such as Chi Sq, T, Z, etc. Hypothesis Testing - Determining Probability to make the right decision We repeat the experiment. This time the results are 9 Tails and 1 Head. The p value for this occurrence is 0.009766. In other words, if the coin is fair, these results will occur less than once in 100 Rounding, you may see this expressed as a p value (p < 0.01) What is the probability of getting 9 or more of either heads or tails? The p value for this is 9 heads + 9 tails + 10 heads + 10 tails or .009766 + .009766 + .000977 + .000977 = .021484 Rounding, you may see this expressed as a p value (p < 0.02) Hypothesis Testing - Determining Probability to make the right decision How unlikely would an outcome have to be before we are willing to say the coin is not fair? (i.e. before we should reject the Null Hypothesis) Is there a threshold that marks a boundary on which one side we Accept the Null Hypothesis and on the other side we Reject it, believing the outcome is influenced by something other than chance? Clearly most people would reject the null hypothesis (Ho: the coin is fair) when the observed results occur less than once in 1000 times (p = .000977) Hypothesis Testing - Determining Probability to make the right decision By statistical convention, the boundary that separates the probable from the improbable is 5 times in 100 or .05 Results with a p-value less than 0.05 are considered “statistically significant” Statistical significance means that a result is unlikely to be caused only by chance. In this case we are willing to reject the null hypothesis If p < 0.05 – Reject Null Hypothesis Hypothesis Testing - Determining Probability to make the right decision What is the probability of getting 7 or more of either heads or tails? Although these results are not the most common, they don’t seem unusual, even for a fair coin. The calculated p-value for this confirms our intuition. Possible Outcome 7 Heads (3 Tails) 8 Heads (2 Tails) 9 Heads (1 Tail) 10 Heads 10 Tails 9 Tails (1 Head) 8 Tails (2 Heads) 7 Tails (3 Heads) Total Probability 0.117188 0.043945 0.009766 0.000977 0.000977 0.009766 0.043945 0.117188 0.343752 The result is 34 times in 100 or P = 0.34 Will you reject the Null Hypothesis? Is the coin fair? (i.e. is the calculated p-value <.05?) Hypothesis Testing - Determining Probability to make the right decision Let’s Review the Experiments: 1) 10 Heads p = 0.000977 The result is below the threshold of 0.05. It’s improbable this could happen only by chance. Reject the Null Hypothesis – The coin is not fair! 2) 9 Tails p = 0.00976 The result is below the threshold of 0.05. It’s improbable this could happen only by chance. Reject the Null Hypothesis – The coin is not fair! 3) 9 or more of either heads or tails p = 0.021484 The result is below the threshold of 0.05. It’s improbable this could happen only by chance. Reject the Null Hypothesis – The coin is not fair! 4) 7 or more of either heads or tails p = 0.343752 The result is above the threshold of 0.05. It’s probable this could happen only by chance. Accept the Null Hypothesis – The coin is fair! Hypothesis Testing Evaluating the Errors of Hypothesis Testing We perform Hypothesis Tests to draw conclusions of change or no change In drawing those conclusions we can commit two types of Errors We can reject the Null Hypothesis saying there is a difference when in fact there isn’t. This is called a Type I Error We can accept the Null Hypothesis saying there is no difference when in fact there is. This is called a Type II Error Hypothesis Error Types 2) but there was no change and it actually represents the original distribution… 1) If you conclude that the sample represents a new distribution… 3) you’ve made a… Type I Error – You have rejected Ho when it is in fact true The distribution did not change Hypothesis Error Types 1) If you conclude there was no change and that the sample represents the original distribution… 2) but the distribution actually changed and the sample represents the new distribution… 3) you’ve made a… Type II Error – You have accepted Ho when it is in fact false The distribution did change Hypothesis Error Types What’s the danger of making an Type I Error? You make a change thinking you’ve solved the problem when you haven’t. What’s the danger of making a Type II Error? You just don’t make the change. Hypothesis Testing - The Risks of Hypothesis Errors A Type of Risk has been assigned to each Type of Error The maximum risk or probability of making a Type I Error is called Alpha Risk a This probability is always greater than zero, and is usually established at 5%. The researcher makes the decisions to the greatest level of risk that is acceptable for a rejection of Ho. Also known as significance level. The maximum risk or probability of making a Type II Error is called Beta Risk b Hypothesis Risk Types a Risk = Risk that we will make a Type I Error. The standard is 5% or 2.5% / side Type I Error – Rejecting the Null Hypothesis, saying there is a change when there really is none Hypothesis Risk Types b Risk = The risk we will make a Type II Error. a Risk b Risk a Risk/2 Type II Error – Accepting the Null Hypothesis, saying there is no change when there actually is Hypothesis Risk Types a Risk = Remains b Risk = Varies constant. We choose our comfort level as the distributions overlap 10% to 20% acceptable Standard 5% or 2.5% per side b Risk a Risk/2 Hypothesis Testing - Example 1 Let’s take a look at a manufacturing example. Suppose we have modified one of two reactors. We want to see if we have “significantly” improved the yield with these modifications before we modify all reactors. Let’s look at the resulting data. In this case, Reactor B is the newly modified reactor. Reactor 1 89.7 81.4 84.5 84.8 87.3 79.7 85.1 81.7 83.7 84.5 Reactor 2 84.7 86.1 83.2 91.9 86.3 79.3 82.6 89.1 83.7 88.5 Hypothesis Testing - Procedure 1. Define the Problem 2. State the Objectives 3. Establish the Hypotheses - State the Null Hypothesis (Ho) - State the Alternative Hypothesis (Ha). 4. Decide on appropriate statistical test (assumed probability distribution, z, t, or F). 5. State the Alpha level (usually 5%) 6. State the Beta level (usually 10-20%) Hypothesis Testing - Procedure 7. 8. 9. 10. 11. 12. Establish the Sample Size Develop the Sampling Plan Select Samples Conduct test and collect data Calculate the test statistic (z, t, or F) from the data. Determine the probability of that calculated test statistic occurring by chance. 13. If that probability is less than alpha, reject Ho and accept Ha. If that probability is greater than alpha, do not reject Ho. 14. Replicate results and translate statistical conclusion to practical solution. Two Sample t Test, Unstacked – Example 1 Open up HYPTEST.mpg. Ho = The yield from Reactor 1 equal yield from Reactor 2 Ha = The yield from Reactor 1 not equal yield from Reactor 2 Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in different columns First: Reactor 1 Second: Reactor 2 OK Two Sample T-Test and Confidence Interval Two sample T for Reactor1 vs Reactor2 N Reactor1 10 Reactor2 10 Mean 84.24 85.54 StDev SE Mean 2.90 0.92 3.65 1.2 p-value > 0.05 Accept Null Hypothesis The reactors appear to have the same yield 95% CI for mu Reactor1 - mu Reactor2: ( -4.40, 1.8) T-Test mu Reactor1 = mu Reactor2 (vs not =): T = -0.88 P = 0.39 DF = 18 Both use Pooled StDev = 3.30 Two Sample t Test, Unstacked – Example 2 Open up HYPTEST.mpg. Ho = The heights from Trimmer 1 are equal to the heights from Trimmer 2 Ha = The heights from Trimmer 1 are not equal to the heights from Trimmer 2 Minitab – Stat - Basic Statistics - 2-Sample-t. Samples in different columns First: Trimmer 1 Second: Trimmer 2 OK Two-sample T for Trimmer 1 vs Trimmer 2 Trimmer 1 Trimmer 2 N 20 20 1 Mean 1.00578 .0337 StDev 0.00999 0.0842 SE Mean 0.0022 .019 Difference = μ (Trimmer 1) - μ (Trimmer 2) Estimate for difference: -0.0279 95% CI for difference: (-0.0676, 0.0118) T-Test of difference = 0 (vs ≠): T-Value = -1.47 P-Value = 0.157 p-value > 0.05 Accept Null Hypothesis The trimmers appear to have the same yield Summary of Key Points Introduction to Hypothesis Testing—Terms and definitions - Null and Alternative Hypothesis Probability and p-value Type I and Type II Errors Alpha and Beta Risk. Translation of a Practical problem into a statistical problem Express real situations in statistical language Hypothesis Test Procedure—Step-by-Step