Download Week9

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Hypothesis Testing
Week 9
Objectives
On completion of this module you should be able to:
• understand and demonstrate the process required for
hypothesis testing,
• explain the difference between one- and two-tailed
tests,
• perform a hypothesis test for the mean and proportion
and
• consider ethical issues relating to hypothesis testing
2
Hypothesis testing methodology
• We test a hypothesis (a theory, claim, assertion
etc) about a parameter of a population (the
mean or proportion).
• The null hypothesis (denoted H0) is used to
indicate the status quo.
• It always contains the equals sign.
• For example, if we assume that the mean
income of accountants is $75,000 the null
hypothesis is
H0: μ = 75000
3
Hypothesis testing methodology
• The alternative hypothesis (denoted H1 or Ha)
must cover all cases where the null hypothesis
is false.
• It represents the conclusion reached if the null
hypothesis is found to be false (based on
sample information).
• In our mean income of accountants example
the alternative hypothesis is
H1: μ  75000
4
Hypothesis testing methodology
• We reject the null hypothesis when there is
evidence in sample data that the alternative is
far more likely to be true.
• Failing to reject the null hypothesis does not
prove that it is true, but rather that there is
insufficient evidence in the sample to prove
that it is not true.
• Because the conclusion is based only on a
sample, we never prove that the null
hypothesis is true.
5
Hypothesis testing methodology
• We reject (or fail to reject) the null hypothesis
based on a test statistic (found using sample
data) and on rejection regions.

Critical
Critical
value
value
Region of
Region of
Region of
rejection
rejection
nonrejection
6
Risks in decision making
• Type I error – rejecting the null hypothesis when
it should not be rejected. The probability of a
type I error is .
• Type II error – not rejecting the null hypothesis
when it should be rejected. The probability of a
type II error is .
• Confidence coefficient – probability of not
rejecting the null hypothesis when it should not
be rejected: 1 –  .
• Power of a test – probability of rejecting null
hypothesis when it should be rejected: 1 – .
7
Example 9-1
In the Australian legal system, the accused is
considered innocent until proven guilty.
Using this information, state the null and
alternative hypotheses and discuss the type I and
II errors (which should be the larger value, which
should be the smaller value and why?).
Solution
• We assume innocence so this forms the null
hypothesis.
• We try to prove guilt, so this forms the
alternative hypothesis.
8
Example 9-1
• So the hypotheses are:
H0: the accused is innocent
H1 : the accused is guilty
• Type I and type II errors can best be understood
with a picture…
9
Truth:
Verdict:
Guilty
Not guilty
Guilty
Not guilty
Guilty
Not guilty
Acceptable
outcome
Type II
error
Type I
error
Acceptable
outcome
10
Example 9-1
• We want to minimise all errors and so keep both
 and  small.
• But, they are inversely related – as one
increases, the other decreases.
• Australian society normally demands minimising
type I errors since we are less tolerant of
convicting innocent people.
• Consequence: a (hopefully small) portion of
accused will be found not guilty when they are
guilty.
11
Z test of hypothesis for the mean ( known)
• If the population standard deviation, , is
known, for large enough samples, the sampling
distribution of the mean follows the normal
distribution.
• Then, the test statistic is given by:
Z
X 

n
12
Example 9-2
A recent graduate from a business degree is
considering the benefits of working for various
companies.
A particular company, Touccancy Inc., has a
reputation for treating its employees well.
In fact, the company’s website gives some
statistics about starting salaries for recent
graduates.
It claims (in large print) that the mean starting
salary for recent graduates is $45,000.
13
Example 9-2
In much smaller print, buried in a report on
statistics, the website states that the known
standard deviation of starting salaries is $5000
and that in a random sample of fifty recently
employed graduates, the mean salary was
$39,000.
(a) At the 5% level of significance, determine if
there is any evidence that the claim given in
large print is valid, based on the sample data.
Use both the critical value and p-value
approaches as part of your answer.
14
Solution 9-2
Following Exhibit 8.2 from the text (p. 339):
1. The null hypothesis is:
H0: μ = 45000
2. and the alternative hypothesis is:
H1: μ  45000
3. Level of significance is 5%, so  = 0.05.
4. n = 50 (fifty recently employed graduates
were sampled).
5. Because  = 5000 is known, we can use the Z
test.
15
The critical value approach
• The hypothesis test in Example 9-2 is a twotailed test.
• We will reject the null hypothesis if the sample
mean is significantly different from $45000.
• This could be if it is too big, or too small (hence
the two tails).
• To find the critical regions for a two-tailed test,
we divide the level of significance in to two
parts.
16
The critical value approach

2
= 0.025
– 1.96
Rejection
region
Critical
value
0.95

2
0
Acceptance
region
45000
= 0.025
+ 1.96
Z
Rejection
region
Critical
value
X
17
Solution 9-2
6. From this graph we can see that the decision
rule is:
Reject H0 if Z > +1.96 of if Z < –1.96.
Z is the test statistic obtained from sample
data.
7. Given that X  39000 , the test statistic is:
Z
X 
39000  45000

 8.49

5000
n
50
18
Solution 9-2
8. Since –8.49 < –1.96, the test statistic falls in
the rejection region.
9. We reject the null hypothesis.
10.We can conclude that the mean starting salary
is significantly different from $45,000.
19
10 steps of hypothesis testing
1.
2.
3.
4.
5.
6.
State the null hypothesis.
State the alternative hypothesis.
Choose the level of significance.
Find the sample size.
Determine the appropriate test statistic.
Set up critical values and define rejection
region.
7. Compute test statistic.
8. Compare test statistic to rejection region.
9. Make statistical decision.
10.Express statistical decision in the context of
the problem.
20
Example 9-2
• Using the p-value approach, steps one to six
are the same as for the critical value
approach.
7. The p-value is the probability of being more
extreme than the test statistic:
P  Z  8.49   P  Z  8.49   0
8. Determine if p-value is less than  (in this
case 0 < 0.05!!).
9. Since the p-value is less than , reject the null
hypothesis.
10.Conclusion is as before…
21
10 steps of hypothesis testing: p-value approach
1. State the null hypothesis.
2. State the alternative hypothesis.
3. Choose the level of significance.
4. Find the sample size.
5. Determine the appropriate test statistic.
6. Compute sample value of test statistic.
7. Compute the p-value based on the test statistic.
8. Compare p-value to α.
9. Make statistical decision.
10.Express statistical decision in the context of the
problem.
22
Example 9-2
(b) Determine the 95% confidence interval for the
population mean starting salary and compare
this to your answer to (a).
Solution
• The confidence interval is given by:

5000
X Z
 39000  1.96
n
50
 39000  1385.93
$37,614.07    $40,385.93
23
Solution 9-2
• The confidence interval does not contain the
company’s claimed mean starting salary.
• It provides further evidence that the company’s
claim is incorrect based on this sample.
24
One-tail tests
• With one-tail tests, the hypotheses focus on a
particular direction.
• The null hypothesis is rejected only if the test
statistic is significantly large or significantly
small (depending on the hypothesis).
• The rejection region is only in one tail of the
distribution.
25
Example 9-3
An investment advisory company has been having
problems with the printery which is responsible
for the printing of the company’s weekly stock
reports.
The company requires that the printing of the
report be completed within twenty-four hours of
receipt of the necessary files and documentation
and specifies a standard deviation of two hours.
They are prepared to employ a different printery
if they can establish statistically that the printery
is not meeting their requirements.
26
Example 9-3
A sample of thirty recent printing times for
reports has a mean printing time of twenty-five
hours.
(a) If the company tests the hypothesis at the 1%
level of significance, what decision would be
made using the p-value approach to hypothesis
testing?
Interpret the meaning of the p-value in this
problem.
27
Solution 9-3
Following the ten-step procedure…
Note: this is what we are trying to prove.
1. H0: μ ≤ 24
We assume the printery is doing okay
and try to prove otherwise.
2. H1: μ > 24
3.  = 0.01
4. n = 30
5.  = 2 is known so use Z-test.
6. Given X  25 the test statistic is:
X   25  24
Z

 2.74

2
n
30
28
Solution 9-3
7. The p-value is:
P  Z  2.74   1  0.9969  0.0031
8. 0.0031 < 0.01
9. Since the p-value is less
than , we reject the null
hypothesis.
0.99
 = 0.01
0
Acceptance
region
Z
Rejection
region
29
Solution 9-3
10.We conclude that there is evidence that the
printery is taking more than the company’s
required 24 hours.
Based on this sample data, the company
appears to be justified in seeking another
printery for their reports.
(b) How would your answer in (a) change if the
standard deviation had been three hours?
30
Solution 9-3
• If  = 3, the test statistic would be:
X   25  24
Z

 1.83

3
n
30
• and the p-value would be:
P  Z  1.83  1  0.9664  0.0336
• Then, since 0.0336  0.01 we would not reject the
null hypothesis at the 1% level of significance.
• In this case, the company would not be justified
in seeking another printery.
31
Solution 9-3
• Important note: we could still have rejected
this null hypothesis at the 5% level of
significance (since 0.0336 < 0.05).
• When a test is significant at the 1% level it is
described as highly significant.
• When a test is significant at the 5% level it is
described as significant.
• A decision such as this one can therefore be
based on how confident the company wanted to
be that they were making the right decision
(based on this sample data).
32
t test of hypothesis for the mean (
unknown)
• If the population standard deviation, , is
unknown, we estimate it using S, the sample
standard deviation.
• If the population is assumed to be normally
distributed, the sampling distribution of the
mean follows a t distribution with n-1 degrees
of freedom.
• Then, the test statistic is given by:
X 
t
S
n
33
Example 9-4
A firm of accountants has been established in a
small regional town for twenty years.
Part of their service includes regular visits to
the firms they service to ensure they keep a
customer service focus.
The accountants must still be contactable while
they are away from their desks, so mobile
phones are provided.
34
Example 9-4
They have discovered, however, that the useful
life of the phones is very much reduced by the
shortness of the phone’s battery life.
In the past, the batteries gave an average talk
time of twenty hours, after which time the
phone needed recharging.
A sample of forty batteries recently revealed an
average talk time of eighteen hours, with a
standard deviation of four hours.
35
Example 9-4
(a) At the 0.05 significance level, is there
evidence that the mean talk time of the
batteries has changed from twenty hours?
(b) What assumptions did you make regarding the
population distribution in answering (a)?
Explain how you would test these assumptions
if you had the talk time data.
36
Solution 9-4
Following the 10 steps of hypothesis testing:
1. and 2. H0: μ ≥ 20
H1: μ < 20
3. Given =0.05.
4. A random sample of n=40 batteries has been
drawn.
37
Solution 9-4
5. If we assume the population of talk times of the
batteries is normally distributed, then the t test
is appropriate since the population standard
deviation of the talk times is unknown (we’re
given the sample value).
6. Given a sample size of n, the test statistic follows
a t distribution with n-1 degrees of freedom.
At =0.05, the critical value will be
t39,0.05 = –1.6849 and so the rejection rule is:
Reject H0 if t < –1.6849, otherwise do not reject
H0.
38
Solution 9-4
0.05
0.95
– 1.6849
Region of
rejection
0
t11
Region of
non-rejection
39
Solution 9-4
7. The test statistic is
X   18  20
t

 3.16
S
4
n
40
Note that we are using the critical value
approach for this hypothesis test. Check the
p-value approach for yourself.
8. Since -3.16 < -1.6849, the test statistic is in
the rejection region.
40
Solution 9-4
9. Reject the null hypothesis.
10.The data provides sufficient evidence to
conclude that the mean talk time provided by
the batteries is significantly less than 20
hours.
41
Solution 9-4
(b) Assumed – random sample of talk times of
batteries comes from a normally distributed
population.
If sample size is reasonably large and population
is not too skewed, then (for unknown ) the t
distribution is a good approximation to the
sampling distribution of the mean.
We test the assumption of normality by
examining the sample data. If it appears
normal we infer that the population is likely to
be normally distributed.
42
Solution 9-4
We can test the assumption of normality by
producing:
• a histogram or stem-and-leaf plot and checking
for the bell shape
• a normal probability plot and checking for
departures from the straight line
• a box-and-whisker plot and checking for
symmetry
43
Z test of hypothesis for the proportion
• We often want to test hypotheses regarding the
population proportion using the sample
proportion ps=X/n.
• If the number of successes (X) and the number
of failures (n-X) are each at least five, the
sampling distribution of the a proportion
approximately follows the standardised normal
distribution.
pS  p
Z
• The test statistic is
p 1  p 
n
44
Z test of hypothesis for the proportion
• If we were to conduct a Z test for the number
of successes, the test statistic is
X  np
Z
np 1  p 
45
Example 9-5
For many years, women have been under
represented in management positions.
A twenty year old study revealed that only 18%
of companies had at least one woman in their
management team.
A recent study was conducted to discover
whether this situation had changed.
A survey was sent out to 500 of the largest
companies across Australia.
46
Example 9-5
Of these, 23% (115) indicated that they had at
least one woman in their management team.
At the 5% level of significance, can you state
that there has been an increase in the
proportion of companies who include women in
their management teams?
Solution
H0: p ≤ 0.18
H1: p > 0.18
47
Solution 9-5
At =0.05, the critical value will be Z=1.64 and
so the rejection rule is:
Reject H0 if Z > 1.64 otherwise do not reject H0.
The test statistic is
Z
pS  p
p 1  p 
n

0.23  0.18
0.18  0.82 
500
 2.91
48
Solution 9-5
• Note we have used the critical value approach.
• Check for yourself that the p-value approach
gives the same conclusion.
• Since 2.91 > 1.64, the test statistic is in the
rejection region and so we reject the null
hypothesis.
• Therefore, the data provides sufficient
evidence to conclude that there has been an
increase in the proportion of companies who
include women in their management teams.
49
Pitfalls and ethical issues
• It is always advisable to consult an experienced
statistician in the planning stage of a study.
This assists in avoiding biased results (due to
poor planning, faulty sampling frame etc).
• Poor research methods, however, are not
necessarily an indication of unethical
behaviour.
• Unethical behaviour involves intentional
manipulation of analysis and results.
50
Pitfalls and ethical issues
To avoid ethical problems consider:
• Data collection method – ensure appropriate
randomisation techniques are used in selecting
a sample.
• Informed consent from human respondents –
individuals who are subjected to a treatment
should be made aware of the research purpose,
and any potential side-effects and give consent
to their involvement.
51
Pitfalls and ethical issues
• Choosing a one- or two-tailed test:
– If you are only interested in differences (and
not whether result is smaller or larger) then
use two-tailed test.
– If you are interested in showing a result is too
large or too small then use one-tailed test.
• Choice of significance level () – select before
data is collected.
• It is good practice to include the p-value in
results.
52
Pitfalls and ethical issues
• Data snooping
– Don’t perform a test and then change
one/two-tailed or significance level to get a
desired result!
– Don’t discard outliers to change the test
result!
– Always decide on hypotheses, whether a test is
one- or two-tailed and the significance level,
before collecting data.
53
Pitfalls and ethical issues
• Data cleansing
– Prior to analysis, check unusual observations
for validity or special causes.
– Only remove data where you can prove there
is an error or unusual behaviour unrelated to
the study (this is rare).
– Decide on rules for data cleansing prior to
data collection.
54
Pitfalls and ethical issues
• Reporting findings:
– Report good and bad results.
– Note that if a null hypothesis is not rejected,
this does not prove it is true, just that there is
insufficient evidence to prove that it is not
true.
• Statistical significance does not imply practical
significance in the field of application – discuss
results with the experts in the field!!!
55
After the lecture each week…
• Review the lecture material
• Complete all readings
• Complete all of recommended problems
(listed in SG) from the textbook
• Complete at least some of additional
problems
• Consider (briefly) the discussion points prior
to tutorials
56