Download p - values - Squarespace

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Hypothesis
Testing
• A decision-making process for evaluating claims
about a population
• Based on information obtained from samples
• Hypotheses are made a priori
before the experiment was conducted
• Usually tests a prediction that some kind of effect
exists in the population
• A method for scientists to test the scientific
questions they generate
• Usually conjecture about a population parameter
• A numerical value that characterizes some aspect of a
population
• Can never be completely sure that a hypothesis is
true; instead, we work with probabilities
• Generally, we will calculate the probability that
results we have obtained in an experiment occurred
by chance
• We are studying whether the results we see are a
random occurrence or due to some mechanism
Testing a Statistical Hypothesis
Steps:
1. Define the population to study
2. State the hypothesis that will be investigated
3. Give a statistical level
4. Select a sample from the population
5. Collect the data
6. Calculate statistical test based on sample data
7. Draw a conclusion
• Null Hypothesis
• Usually written as H0 (H-knot)
• A statement that there is no difference between a
population parameter and a specific value (or no
difference between two parameters)
• Alternative Hypothesis (Research Hypothesis)
• Usually written as HA
• A statement that there is a difference between a
population parameter and a specific value (or there is
a difference between two parameters)
Two types of hypothesis
• A hypothesis test that is directional is called a onetailed test
• Examples:
• Older than age 21
• H0: μ = 21 versus HA: μ > 25
• Treatment A will do better than treatment B
• H0: μA = μB versus HA: μA > μB
One and Two Tailed Tests
• A hypothesis test that is non-directional is called a
two-tailed test
• Examples:
• The population mean age is 21
• H0: μ = 21 versus HA: μ ≠ 25
• Treatment A has the same effect as treatment B
• H0: μA = μB versus HA: μA ≠ μB
One and Two Tailed Tests
• Notice the statement of equality – a statement of no
effect – is written as the null hypothesis
• Preview: If we reject the null hypothesis, this is the
same thing as saying have detected an effect
One and Two Tailed Tests
• Significance level is the probability we use as our
criterion for an unlikely outcome, supposing that H0
is true
• Denoted by α (alpha)
• Must be decided a priori, this means that α must be
determined before conducting a statistical test
• Choosing α after the data as been analyzed lacks
objectivity
Significance Level of a Statistical Test
• α is our criterion for rejecting the null hypothesis
• Reflects how careful the researcher wishes to be
• The smaller the α that is specified, the stronger the
evidence needed to reject H0
• Scientific and medical literature commonly test
hypotheses with α levels of 0.05 or 0.01
Significance Level of a Statistical Test
• Remember, statistical inference uses sample statistics to
make inferences about population parameters
• We use sample data to calculate a test statistic
• A test statistic is a univariate statistic (a number)
• Sample data is used to calculate the test statistic
• Test statistic is used in hypothesis testing to make decisions
• To use the test statistic, we have to determine its
sampling distribution under the null hypothesis
(assuming the null hypothesis is true)
Calculating a Statistical Test
• Once we know the distribution of the test statistic,
we use this to calculate its p-value
• Each time we calculate a test statistic, we will
couple it with a corresponding p-value
• In the literature, statistical test results are often
presented in the form of (test statistic, p-value)
• Examples: (t=-1.98, p=0.04)
(χ(2)=12.32, p<0.001)
(t=2.5, p<0.013)
(F=6.17, p=0.0001)
Calculating a Statistical Test
• Examples: (t=-1.98, p=0.04) (t (28)=2.5, p<0.013)
(χ(2)=12.32, p<0.001)
(F=6.17, p=0.0001)
• In these examples, the letter denotes the sampling
distribution of the test statistic
• For example:
• t denotes a t-distribution
• t(28) denotes a t-distribution with 28 degrees of freedom
• χ(2) denotes a Chi-squared distribution with 2 degrees of
freedom
• F denotes a F-distribution
Calculating a Statistical Test
• The p-value is calculated using the test statistic and
its sampling distribution
• To calculate the p-value, we calculate the area
under the curve for specific values of the test
statistic and its sampling distribution
• We will focus on how to use a p-value to make
decisions about statistical significance
Calculating a Statistical Test
• On the slides that follow, we:
• Define a p-value
• Describe its properties
• Clarify what it does and does not tell us
• Following this, we will then go on to Step 7 of
Hypothesis Testing
• In Step 7, we will talk about Significance Testing
• Significance Testing describes how the p-value is
used to make decision
p - values
• Let’s not confuse
• The definition of a p-value
• The process of using a p-value in making a decision
p-values
• A p-value is the probability of obtaining the same
sample statistic (mean value) or a more extreme
value if the null hypothesis is true
• The p-value is the most commonly reported result of
a significance test
• It enables us to judge the extent of the evidence
against H0
Definition of p-value
• Ranges from 0 to 1
• Summarizes the evidence in the data about H0
• A large p-value (e.g., 0.58) indicates the observed
data would not be unusual if H0 were true
• A small p-value (e.g., 0.0003) indicates the
observed data would be very doubtful if H0 were true
Properties of a p-value
P (A | B) = “the probability of A given B”
No
Is P(A|B) = P(B|A) ?
• Example;
A = dramatic overdose of medication
B = death
• P(A|B): Given a person dies, what is the probability of death
due to overdose? (this probability will likely be very low)
• P(B|A): Given a person overdoses, what is the probability of
this person dying? (this probability will likely be very high)
• Is P(A|B) = P(B|A) here?
p-value
• A p-value does not give information about trend,
direction, strength of an association, size of an effect, or
magnitude of a difference
• p=0.04 is not less significant than p=0.0000000001
• Statistical significance is influenced by the sample size
• With a large enough sample size anything can be
statistically significant
• When reporting p-values, footnotes such as
*p<.05, **p<0.01
suggest a trend and are misleading
Properties of p-values
• A smaller p-value does not indicate a more important result
• Magnitude of the p-value is not a guide to clinical significance
• p-values do not take Into account the size of the effect
• A small effect in a large study can have the same p-value as a
large effect In a small one
• Conclusions should not be based only on p-values
• p=0.06 is not evidence of 'marginal significance' or proof of a
'trend towards significance'; these types of conclusions are
untrue
Properties of p-values
• p-value does give the probability the result observed is
due to chance
• The p-value does give the chance of obtaining the effect
we have observed, assuming the null hypothesis is true
(in other words, assuming there is no real effect)
• Whenever a p-value is reported, it is recommended to
also report a measure of effect size
•
•
•
•
Confidence Interval
Correlation coefficient
Regression parameter
Etc.
Properties of p-values
• We have not discussed how the p-value is used in
making a decision
• Significance Testing describes the process for using
the p-value to make a decision
• Next, we talk about Step 7; drawing conclusions and
making inferences
• We need to discuss the decision we are making, and
possible errors in making that decision
p-values
• In hypothesis testing, we make a decision
• Only two possible outcomes
• We decide to either
1) Reject H0
2) Fail to reject H0
• Errors in inference describes each of these two
possible incorrect decisions
Errors of Inference
"Accepting the null hypothesis"
is NOT the same conclusion as
"Failing to reject the null hypothesis"
Hypothesis Testing Language
• If we do not reject the null hypothesis
• We do not conclude it is true
• We can only recognize the null hypothesis is a
possibility
• Showing the null hypothesis is true is not the same
thing as failing to reject it
• Failing to find an effect is not the same thing as
showing there is no effect
Hypothesis Testing Language
• These two errors are given specific names:
• Type I Error
• We draw a conclusion that H0 is false when it is in fact
true
• Occurs when we conclude there is an effect in our
population, when in fact there is not
• Type II Error
• We draw a conclusion that H0 is true when it is in fact
false
• The probability of a type II error is denoted by the
Greek letter β ('beta’)
• In other words, P(Type II Error)= β
Errors of Inference
• We draw a conclusion that H0 is false when it is in fact
true
• Occurs when we conclude there is an effect in our
population, when in fact there is not
• Example 1: Incorrectly conclude an HIV prevention
intervention is effective (Reject H0, conclude HA is true) in
preventing HIV infections through behavioral changes
when, in fact, it is not (H0 Is actually true)
• Example 2: Incorrectly conclude a new chemo agent is
effective (Reject H0, conclude HA is true) in reducing
tumor size when, in fact, it is not (H0 Is actually true)
Type I Error
• We draw a conclusion that H0 is true when it is In fact
false
• The probability of a type II error Is denoted by the Greek
letter β ('beta')
• Example 1: Incorrectly conclude an HIV prevention
intervention is not effective (Fall to reject H0, conclude HA
is false) in preventing HIV infections through behavioral
changes when, in fact, it is (HA is actually true)
• Example 2: Incorrectly conclude a new chemo agent is
not effective (Fall to reject H0, conclude HA is false) in
reducing tumor size when, in fact, It is (HA is actually
true)
Type II Error
Decision
Null Hypothesis
True
False
Fail to Reject H0
Correct
Type II Error
(False Negative)
Reject H0
Type I Error
(False Positive)
Correct
• False Positive: We reject H0 and conclude there is an
effect (Positive; Alternative Hypothesis) when, in fact,
there is not an effect (False Conclusion)
• False Negative: We fail to reject H0 and conclude there is
not an effect (Negative; Null Hypothesis) when, in fact,
there is an effect (False Conclusion)
Type I and Type II Errors
• This describes the process of using the p-value to
make a decision
• We compare the calculated p-value from Step 6 to
the predetermined significance level (α) from Step 3
• If p < α, we reject H0 and conclude the result is
statistically significant
• If p > α, we fail to reject H0, and conclude we have
failed to find statistical significance
Significance Testing
• Essential to understand the difference between
statistical and clinical significance
• Statistical Significance - the likelihood that the
difference could have occurred by chance alone
• Clinical Significance - the smallest clinically
beneficial and harmful values of the effect; in other
words, the smallest values that matter to the patient
Conclusion: Significance
• A research article gives a p-value of .001 in the analysis
section. Which definition of a p-value is the most
accurate?
a. the probability that the observed outcome will occur again.
a. the probability of observing an outcome as extreme or
more extreme than the one observed if the null hypothesis
is true.
b. the value that an observed outcome must reach in order to
be considered significant under the null hypothesis.
c.
the probability that the null hypothesis is true.
Question
Reference:
Dr. Matt Hayat
Statistics Lecture – Rutgers University 2013
Plichta SB, Kelvin E. (2012) Munro’s Statistical
Methods for Healthcare Research, 6th Edition