Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Hypothesis Testing - Introduction
Six Sigma Greenbelt Training
Dave Merritt
12-5-16
Hypothesis Testing -

As we perform tests, we are attempting to support a conclusion of
equivalence or difference between distributions of data:





The process of taking a practical problem and
translating it to a statistical problem
Are the results from machine 1 the same as from machine 2?
Is there variation between the outputs of shift 1, 2 or 3?
Did the process change we made, make a difference?
Sampling, although practical, can limit our ability to draw accurate
conclusions. Because we are using samples (and relatively small
ones at that), there is always a chance our sample will not
accurately represent the distribution of the population.
Hypothesis Testing is used to determine the probability that those
samples actually indicate there is equivalence or difference.
The Reason for Hypothesis
Testing
1) If you start with a
given distribution…
3) the sample will represent
the distribution.
2) and take an adequate
random sample from
that distribution…
It will have a similar central
tendency and variation.
The Reason for Hypothesis
Testing
1) If you start with a
given distribution…
2) and take multiple
samples…
3) the samples may not be
equal to one another but
each represents the
distribution.
They will each have a similar
central tendency and
variation.
The Reason for Hypothesis
Testing
1) If you make a process change
resulting in a new distribution that does
not overlap the original distribution…
Original distribution
2) and you take an adequate
random sample from that
distribution…
3) the sample will represent the new distribution. It’s central tendency and
variation will represent the new distribution, not the original
The Reason for Hypothesis
Testing
1) If you make a process change
resulting in a new distribution that does
overlap the original distribution…
Original distribution
2) and you take an new sample…
3) can you determine if your process change made a difference?
The Reason for Hypothesis
Testing
2) Or did the
distribution remain
the same?
3) Can you tell from just this sample?
1) Did the process
change shift the
distribution?
Hypothesis Testing 
The process of taking a practical problem and
translating it to a statistical problem
What is a Hypothesis?
A tentative assumption that is made in order to draw out and test
it’s logical or empirical consequences



All Hypothesis Testing is based on the idea that there is no
difference between distributions until adequate evidence is
observed to prove otherwise.
So the initial Hypothesis that is made is that there is no difference.

This is called the Null Hypothesis (Ho)
If adequate evidence of change is observed, then a different
Hypothesis is supported

This is called the Alternative Hypothesis (Ha)
The Null Hypothesis


The Null Hypothesis (Ho) is assumed to be true. This is like
the defendant (culprit) being assumed innocent.
You are the prosecuting attorney. You must provide evidence
that this assumption is probably not true.
Hypothesis Testing -
Determining Probability to make the right
decision
Probability can be expressed numerically.
An event that is impossible to occur is said to have a probability of 0. An
event that will absolutely occur has a probability of 1.
Probability of events between impossible and absolutely are expressed as a
decimal between 0 and 1
What is the probability of a fair coin flip resulting in a Head?
The Probability of this event can be expressed as the decimal 0.5
(1/2)
There is equal Probability that either result will occur
The results of flipping a coin fit into a Binomial Distribution because there are
only two possible outcomes of each flip: Heads or Tails
Hypothesis Testing -
Determining Probability to make the right
decision
Based on the conventions of Hypothesis Testing, if we
were to evaluate the fairness of a series of coin flips,
our Null Hypothesis be:
Ho = The Coin is Fair
The Alternate Hypothesis be:
Ha = The Coin Not Fair
Hypothesis Testing -
Determining Probability to make the right
decision
What is the probability of a fair coin flip resulting in a
10 Heads?
The answer can be calculated by multiplying the
probability of a single head by itself 10 times
.5 x .5 x .5 x .5 x .5 x .5 x .5 x .5 x .5 x.5= .000977
You may see this expressed as a p value (p < 0.001)
What is the meaning of a p value?
The “p” in p-value stands for probability.
The p-value is calculated (generally by using minitab) using
a test statistic such as Chi Sq, T, Z, etc.
Hypothesis Testing -
Determining Probability to make the right
decision
We repeat the experiment. This time the results are
9 Tails and 1 Head.
The p value for this occurrence is 0.009766. In other words, if the
coin is fair, these results will occur less than once in 100
Rounding, you may see this expressed as a p value (p < 0.01)
What is the probability of getting 9 or more of either heads
or tails?
The p value for this is
9 heads + 9 tails + 10 heads + 10 tails or
.009766 + .009766 + .000977 + .000977 = .021484
Rounding, you may see this expressed as a p value (p < 0.02)
Hypothesis Testing -
Determining Probability to make the right
decision
How unlikely would an outcome have to be before we
are willing to say the coin is not fair?
(i.e. before we should reject the Null Hypothesis)
Is there a threshold that marks a boundary on which one
side we Accept the Null Hypothesis and on the other side
we Reject it, believing the outcome is influenced by
something other than chance?
Clearly most people would reject the null hypothesis (Ho: the
coin is fair) when the observed results occur less than once
in 1000 times (p = .000977)
Hypothesis Testing -
Determining Probability to make the right
decision
By statistical convention, the boundary that
separates the probable from the improbable is
5 times in 100 or
.05
Results with a p-value less than 0.05 are
considered
“statistically significant”
Statistical significance means that a result is unlikely
to be caused only by chance. In this case we are
willing to reject the null hypothesis
If p < 0.05 – Reject Null Hypothesis
Hypothesis Testing -
Determining Probability to make the right
decision
What is the probability of getting 7 or more of either heads or
tails?
Although these results are not the most common, they don’t seem
unusual, even for a fair coin. The calculated p-value for this
confirms our intuition.
Possible Outcome
7 Heads (3 Tails)
8 Heads (2 Tails)
9 Heads (1 Tail)
10 Heads
10 Tails
9 Tails (1 Head)
8 Tails (2 Heads)
7 Tails (3 Heads)
Total
Probability
0.117188
0.043945
0.009766
0.000977
0.000977
0.009766
0.043945
0.117188
0.343752
The result is 34 times in 100 or
P = 0.34
Will you reject the Null
Hypothesis? Is the coin fair?
(i.e. is the calculated p-value
<.05?)
Hypothesis Testing -
Determining Probability to make the right
decision
Let’s Review the Experiments:
1) 10 Heads p = 0.000977
The result is below the threshold of 0.05. It’s improbable this could happen
only by chance.
Reject the Null Hypothesis – The coin is not fair!
2) 9 Tails p = 0.00976
The result is below the threshold of 0.05. It’s improbable this could happen
only by chance.
Reject the Null Hypothesis – The coin is not fair!
3) 9 or more of either heads or tails p = 0.021484
The result is below the threshold of 0.05. It’s improbable this could happen
only by chance.
Reject the Null Hypothesis – The coin is not fair!
4) 7 or more of either heads or tails p = 0.343752
The result is above the threshold of 0.05. It’s probable this could happen only
by chance.
Accept the Null Hypothesis – The coin is fair!
Hypothesis Testing 



Evaluating the Errors of Hypothesis Testing
We perform Hypothesis Tests to draw conclusions of change or no
change
In drawing those conclusions we can commit two types of Errors
We can reject the Null Hypothesis saying there is a difference
when in fact there isn’t.

This is called a Type I Error
We can accept the Null Hypothesis saying there is no difference
when in fact there is.

This is called a Type II Error
Hypothesis Error Types
2) but there was no
change and it
actually represents
the original
distribution…
1) If you conclude
that the sample
represents a new
distribution…
3) you’ve made a…
Type I Error – You have rejected Ho when it is in fact true
The distribution did not change
Hypothesis Error Types
1) If you conclude
there was no change
and that the sample
represents the original
distribution…
2) but the
distribution actually
changed and the
sample represents
the new
distribution…
3) you’ve made a…
Type II Error – You have accepted Ho when it is in fact false
The distribution did change
Hypothesis Error Types
What’s the danger of making an Type I Error?
You make a change thinking you’ve solved the
problem when you haven’t.
What’s the danger of making a Type II Error?
You just don’t make the change.
Hypothesis Testing -
The Risks of Hypothesis Errors

A Type of Risk has been assigned to each Type of Error

The maximum risk or probability of making a Type I Error is called
Alpha Risk


a
This probability is always greater than zero, and is usually
established at 5%. The researcher makes the decisions to the
greatest level of risk that is acceptable for a rejection of Ho. Also
known as significance level.
The maximum risk or probability of making a Type II Error is called
Beta Risk
b
Hypothesis Risk Types
a Risk = Risk
that we will make a
Type I Error. The
standard is 5% or
2.5% / side
Type I Error – Rejecting the Null Hypothesis, saying there is a change
when there really is none
Hypothesis Risk Types
b Risk = The
risk we will make a
Type II Error.
a Risk
b Risk
a Risk/2
Type II Error – Accepting the Null Hypothesis, saying there is no
change when there actually is
Hypothesis Risk Types
a Risk = Remains
b Risk = Varies
constant. We choose
our comfort level
as the distributions
overlap 10% to
20% acceptable
Standard 5% or 2.5%
per side
b Risk
a Risk/2
Hypothesis Testing - Example 1


Let’s take a look at a manufacturing example. Suppose we have
modified one of two reactors. We want to see if we have “significantly”
improved the yield with these modifications before we modify all
reactors.
Let’s look at the resulting data. In this case, Reactor B is the newly
modified reactor.
Reactor 1
89.7 81.4
84.5 84.8
87.3 79.7
85.1 81.7
83.7 84.5
Reactor 2
84.7 86.1
83.2 91.9
86.3 79.3
82.6 89.1
83.7 88.5
Hypothesis Testing - Procedure
1. Define the Problem
2. State the Objectives
3. Establish the Hypotheses
- State the Null Hypothesis (Ho)
- State the Alternative Hypothesis (Ha).
4. Decide on appropriate statistical test (assumed probability
distribution, z, t, or F).
5. State the Alpha level (usually 5%)
6. State the Beta level (usually 10-20%)
Hypothesis Testing - Procedure
7.
8.
9.
10.
11.
12.
Establish the Sample Size
Develop the Sampling Plan
Select Samples
Conduct test and collect data
Calculate the test statistic (z, t, or F) from the data.
Determine the probability of that calculated test statistic occurring
by chance.
13. If that probability is less than alpha, reject Ho and accept Ha.
If that probability is greater than alpha, do not reject Ho.
14. Replicate results and translate statistical conclusion to practical
solution.
Two Sample t Test, Unstacked – Example 1
Open up HYPTEST.mpg.
Ho = The yield from Reactor 1 equal yield from Reactor 2
Ha = The yield from Reactor 1 not equal yield from Reactor 2
Minitab – Stat - Basic Statistics - 2-Sample-t.
Samples in different columns
First: Reactor 1
Second: Reactor 2
OK
Two Sample T-Test and Confidence Interval
Two sample T for Reactor1 vs Reactor2
N
Reactor1 10
Reactor2 10
Mean
84.24
85.54
StDev SE Mean
2.90
0.92
3.65
1.2
p-value > 0.05
Accept Null Hypothesis
The reactors appear to have
the same yield
95% CI for mu Reactor1 - mu Reactor2: ( -4.40, 1.8)
T-Test mu Reactor1 = mu Reactor2 (vs not =): T = -0.88 P = 0.39 DF = 18
Both use Pooled StDev = 3.30
Two Sample t Test, Unstacked – Example 2
Open up HYPTEST.mpg.
Ho = The heights from Trimmer 1 are equal to the heights from Trimmer 2
Ha = The heights from Trimmer 1 are not equal to the heights from Trimmer 2
Minitab – Stat - Basic Statistics - 2-Sample-t.
Samples in different columns
First: Trimmer 1
Second: Trimmer 2
OK
Two-sample T for Trimmer 1 vs Trimmer 2
Trimmer 1
Trimmer 2
N
20
20 1
Mean
1.00578
.0337
StDev
0.00999
0.0842
SE Mean
0.0022
.019
Difference = μ (Trimmer 1) - μ (Trimmer 2)
Estimate for difference: -0.0279
95% CI for difference: (-0.0676, 0.0118)
T-Test of difference = 0 (vs ≠): T-Value = -1.47 P-Value = 0.157
p-value > 0.05
Accept Null Hypothesis
The trimmers appear to have
the same yield
Summary of Key Points

Introduction to Hypothesis Testing—Terms and definitions
- Null and Alternative Hypothesis
 Probability and p-value
 Type I and Type II Errors
 Alpha and Beta Risk.

Translation of a Practical problem into a statistical problem


Express real situations in statistical language
Hypothesis Test Procedure—Step-by-Step