Download Type II error

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
SUMMARY
Hypothesis testing
Self-engagement assesment
πœ‡ = 7.8
𝜎 = 0.76
Null hypothesis
song
Null hypothesis: I assume
that populations without
and with song are same.
At the beginning of our
calculations, we
assume the null
hypothesis is true.
no song
Hypothesis testing song
β€’ population πœ‡ = 7.8, 𝜎 = 0.76
β€’ sample 𝑛 = 30, π‘₯ = 8.2
𝑍=
Because of such a low probability,
we interpret 8.2 as a significant
increase over 7.8 caused by
undeniable pedagogical qualities
of the 'Hypothesis testing song'.
8.2 βˆ’ 7.8
= 2.85
0.76
30
corresponding probability is 0.0022
7.8 8.2
Four steps of hypothesis testing
1. Formulate the null and the alternative (this includes
one- or two-directional test) hypothesis.
2. Select the significance level Ξ± – a criterion upon which
we decide that the claim being tested is true or not.
Za pΕ™edpokladu platnosti nulové hypotézy je p-hodnota
--- COLLECT
DATA
--pravdΔ›podobnost,
ΕΎe data jsou
nejménΔ›
tak extrémní
jako ta pozorovaná.
3. Compute the p-value. The p-value is the probability that
the data would be at least as extreme as those
observed, if the null hypothesis were true.
4. Compare the p-value to the Ξ±-level. If p ≀ Ξ±, the
observed effect is statistically significant, the null is
rejected, and the alternative hypothesis is valid.
One-tailed and two-tailed
one-tailed (directional) test
two-tailed (non-directional) test
Z-critical value,
what is it?
NEW STUFF
Decision errors
β€’ Hypothesis testing is prone to misinterpretations.
β€’ It's possible that students selected for the musical lesson
were already more engaged.
β€’ And we wrongly attributed high engagement score to the
song.
β€’ Of course, it's unlikely to just simply select a sample with
the mean engagement of 8.2. The probability of doing so
is 0.0022, pretty low. Thus we concluded it is unlikely.
β€’ But it's still possible to have randomly obtained a sample
with such a mean.
Four possible things can happen
Decision
State of
the world
Reject H0
Retain H0
H0 true
1
3
H0 false
2
4
In which cases we made a wrong decision?
Four possible things can happen
Decision
Reject H0
State of
the world
H0 true
H0 false
Retain H0
1
4
In which cases we made a wrong decision?
Four possible things can happen
Decision
Reject H0
State of
the world
H0 true
H0 false
Retain H0
Type I error
Type II error
Type I error
β€’ When there really is no difference between the
populations, random sampling can lead to a difference
large enough to be statistically significant.
β€’ You reject the null, but you shouldn't.
β€’ False positive – the person doesn't have the disease, but
the test says it does
Type II error
β€’ When there really is a difference between the populations,
random sampling can lead to a difference small enough to
be not statistically significant.
β€’ You do not reject the null, but you should.
β€’ False negative - the person has the disease but the test
doesn't pick it up
β€’ Type I and II errors are theoretical concepts. When you
analyze your data, you don't know if the populations are
identical. You only know data in your particular samples.
You will never know whether you made one of these
errors.
The trade-off
β€’ If you set Ξ± level to a very low value, you will make few
Type I/Type II errors.
β€’ But by reducing Ξ± level you also increase the chance of
Type II error.
β€’ Your decision whether to allow more false positives
(Type I error) or more false negatives (Type II error) is
based on practical consequences of these errors.
β€’ See next example
Clinical trial for a novel drug
β€’ Drug that should treat a disease for which there exists no
β€’
β€’
β€’
β€’
β€’
β€’
therapy.
If the result is statistically significant, drug will me
marketed.
If the result is not statistically significant, work on the drug
will cease.
Type I error: treat future patients with ineffective drug
Type II error: cancel the development of a functional drug
for a condition that is currently not treatable.
Which error is worse?
I would say Type II error. To reduce its risk, it makes
sense to set Ξ± = 0.10 or even higher.
Harvey Motulsky, Intuitive Biostatistics
Clinical trial for a me-too drug
β€’ Similar to the previous example, but now our prospective
drug should treat a disease for which there already exists
another therapy.
β€’ Type I error: treat future patients with ineffective drug
β€’ Type II error: cancel the development of a functional drug
for a condition that can be treated adequately with
existing drugs.
β€’ Thinking scientifically (not commercially) I would minimize
the risk of Type I error (set Ξ± to a very low value).
Harvey Motulsky, Intuitive Biostatistics
population of students that did
not attend the musical lesson
parameters are known
πœ‡0
𝜎0
population of students that did
attend the musical lesson
unknown
sample
πœ‡
𝜎
statistic
is known
π‘₯
Test statistic
test statistic
π‘₯ βˆ’ πœ‡0
𝑍=𝜎
0
𝑛
Z-test
We use Z-test if we know the population
mean πœ‡0 and the population s.d. 𝜎0 .
New situation
β€’ An average engagement score in the population of 100
students is 7.5.
β€’ A sample of 50 students was exposed to the musical
lesson. Their engagement score became 7.72 with the
s.d. of 0.6.
β€’ DECISION: Does a musical performance lead to the
change in the students' engagement? Answer YES/NO.
β€’ Setup a hypothesis test, please.
Hypothesis test
β€’ H0: πœ‡0 = πœ‡
β€’ H1: πœ‡0 β‰  πœ‡
β€’ In this case doing two-sided test is the only way to test the null.
You compare the sample mean of 7.72 with the population mean of
7.5. It seems that sample mean is larger than the population mean
(7.72 > 7.5), but the sample s.d. is 0.6. You can't setup the onetailed test as you can't guess the correct direction of the
relationship. Actually, you could very easily miss the correct
direction.
β€’ 𝛼 = 0.05
Formulate the test statistic
π‘₯ βˆ’ πœ‡0
𝑍=𝜎
0
𝑛
population of students that did
not attend the musical lesson
πœ‡0 known
𝜎0 unknown
but this is unknown!
β€’ Instead of 𝜎0 , we only know sample s.d.
β€’ We can use it as the point estimate of population s.d.
β€’ However, this will estimate s.d. for the population
exposed to the musical lesson, 𝜎0 in the above
formula is for "unperturbed" population.
β€’ In this case, it is common to make an assumption that
both populations have the same standard deviation.
population of students that did
attend the musical lesson
unknown
sample
πœ‡
𝜎
π‘₯
𝑠
t-statistic
π‘₯ βˆ’ πœ‡0
𝑑= 𝑠
𝑛
one sample t-test
jednovýbΔ›rový t-test
Choose a correct alternative in the following statements:
1. The larger/smaller the value of π‘₯, the strongest the
evidence that πœ‡ > πœ‡0 .
2. The larger/smaller the value of π‘₯, the strongest the
evidence that πœ‡ < πœ‡0 .
3. The further the value π‘₯ from πœ‡0 in either direction, the
stronger/weaker evidence that πœ‡ β‰  πœ‡0 .
One-sample t-test
π‘₯ βˆ’ πœ‡0
𝑑= 𝑠
𝑛
𝐻0 : πœ‡ = πœ‡0
𝐻𝐴 : πœ‡ < πœ‡0
πœ‡ > πœ‡0
πœ‡ β‰  πœ‡0
𝛼 level
Quiz
π‘₯ βˆ’ πœ‡0
𝑑= 𝑠
𝑛
β€’ What will increase the t-statistic? Check all that apply.
1. A larger difference between π‘₯ and πœ‡0 .
2. Larger 𝑠.
3. Larger 𝑛.
4. Larger standard error.
Z-test vs. t-test
β€’ Use Z-test if
β€’ you know the standard deviation of the population.
β€’ you know the sample 𝑠 AND you have large sample size
(traditionally over 30). In addition, you assume that the population
standard deviation is the same as the sample standard deviation.
β€’ Use t-test if
β€’ you don't know the population standard deviation (you know only
sample standard deviation 𝑠) and have a relatively small sample
size.
β€’ Tip: If you know only the sample standard deviation,
always use t-test.
Typical example of an one-sample t-test
β€’ You have to prepare 20 tubes with 30% solution od NaCl.
When you're finished, you measure the strength of 20
solutions. The mean strength is 31.5%, with the s.d. of
1.15%.
β€’ Decide if you have 30% solution or not?
β€’ πœ‡0 = 30%
β€’ 𝐻0 : πœ‡ = 30%, 𝐻1 : πœ‡ β‰  30%
β€’ You use the t-test in such situation – t-value = 0.58, failed
to reject the null
β€’ Often, you use one-sample t-test if you have a
specific value you want to compare against.
Two-sample t-test
dvouvýbΔ›rový t-test
β€’ So far, we have been working with just one sample.
β€’ Now, we want to compare the sample of students without
song with the sample of students with the song.
population of students that did
not attend the musical lesson
unknown
πœ‡0
𝜎0
sample
statistic
is known
π‘₯0 , 𝑠0
population of students that did
attend the musical lesson
unknown
πœ‡
𝜎
sample
statistic
is known
π‘₯, 𝑠
Two-sample t-test
β€’ There exists several t-tests according to the setup of your
experiment:
β€’ dependent samples
β€’ paired t-test (párový test)
β€’ independent samples
β€’ equal variances 𝜎0 β‰ˆ 𝜎
β€’ unequal variances 𝜎0 β‰  𝜎
β€’ They use the same mindset which will be explained on
the paired t-test.
β€’ However, they differ in a way how the standard error is
calculated.
Dependent t-test for paired samples
β€’ Two samples are dependent when the same subject
takes the test twice. For example, give one person two
different conditions to see how he/she reacts.
β€’ paired t-test (párový t-test)
β€’ Examples:
β€’ Each subject is assigned to two different conditions
β€’ Give a person two types of treatment.
β€’ Growth over time.
β€’ Compare the effect of the treatment (drug) after 12 hours and after 24
hours.
Engagement example
β€’ 25 students attended the normal lesson. Their mean
engagement is π‘₯0 = 5.08.
β€’ The same 25 students then heard the β€žHypotheses testing
songβ€œ. Their mean engagement score is π‘₯ = 7.80.
β€’ 𝐻0 ∢ πœ‡0 = πœ‡
But this is equivalent to stating π‘―πŸŽ ∢ 𝝁 βˆ’ 𝝁𝟎 = 𝟎
All dependent t-tests articulate H0 in this form.
β€’ π»π‘Ž ∢ πœ‡0 β‰  πœ‡
Test statistic
β€’ 𝐻0 ∢ πœ‡0 βˆ’ πœ‡ = 0
β€’ Now, we will formulate the test statistic.
β€’ Test statistic of the one-sample t-test.
π‘₯ βˆ’ πœ‡0
𝑑= 𝑠
𝑛
β€’ The paired t-test is actually the one-sample t-test in which
we test π‘₯𝐷 , the mean of differences 𝐷, against zero.
π‘₯𝐷 βˆ’ πœ‡0 π‘₯𝐷 βˆ’ 0
π‘₯𝐷
π‘₯ βˆ’ π‘₯0
𝑑= 𝑠
= 𝑠
=𝑠
= 𝑠
𝑛
𝑛
𝑛
𝑛
π‘₯ βˆ’ π‘₯0
𝑑= 𝑠
𝑛
Paired t-test
β€’ Because we have paired samples table, we can easily
calculate s.
β€’ Let's say that 𝑠 = 3.69.
β€’ The t-statistic 𝑑 =
π‘₯βˆ’π‘₯0
𝑠
𝑛
=
7.8βˆ’5.08
3.69
= 3.68
25
β€’ Critical values for 𝑑. 𝑓. = 𝑛 βˆ’ 1 = 24 for two-tailed 𝛼 = 0.05
are ±2.064.
β€’ We reject the null.
Dependent samples
β€’ Advantages
β€’ we can use fewer subjects
β€’ cost-effective
β€’ less time-consuming
β€’ Disadvantages
β€’ carry-over effects
β€’ order may influence results