Download File - Jenne Meyer PhD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Week 5
Dr. Jenne Meyer
 Article review
5-Step Hypothesis Testing Procedure
Step 1: Set up the null and alternative hypotheses.
Step 2: Pick the level of significance (value of ) and find the
rejection region.
Step 3: Calculate the test statistics.
Step 4: Decide whether or not to reject the null hypothesis.
Step 5: Interpret the statistical decision in terms of the stated
problem.
Two-tail test
H0:  = 750
HA:   750
Lower-tail test
H0:   700
HA:  < 700
Upper-tail test
H0:   800
HA:  >800
The Rejection Region is the range of values of the test
statistics that will lead you to reject the null hypothesis.

One-tailed
/2
/2
Two-tailed
 For a large sample:
Z
X 

n
 For a small sample:
X 
t
s
n
**At least 30 units
Apply hypothesis testing to different populations
and samples in business research situations
 Test of single population, small sample size
 Test of a single proportion
 Test of two populations, large sample size
 Test of two populations, small sample size
 Test for difference in two population proportions
The 5-Step Hypothesis Testing
Procedure is the same for all
these processes.

For a small sample:
 Small sample and unknown σ
 Calculations are identical to those for z
 Becomes identical to z for n > 30
 Uses degrees of freedom: df = n - 1
X 
t
s
n
Review t-table
Common Values of  and df and the Corresponding t-Values
Upper Tail Area
Degrees of
Freedom (df)
20
21
22
23
24
25
26
1000000
0.25
1.185
1.183
1.182
1.180
1.179
1.178
1.177
1.150
0.10
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.645
0.05
2.086
2.080
2.074
2.069
2.064
2.060
2.056
1.960
0.01
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.576
0.005
3.153
3.135
3.119
3.104
3.091
3.078
3.067
2.807
Example
A State Highway Patrol periodically samples vehicle speeds at
various locations on a particular roadway. The sample of
vehicle speeds is used to test the hypothesis
H0: m < 65mph
The locations where H0 is rejected (average speed exceeds
65mph) are the best for radar traps.
At Location X, a sample of 16 vehicles shows a mean speed
of 68.2 mph with a standard deviation of 3.8 mph. Use an
level of significance=.05 to test the hypothesis.
 Example, cont.
H0:  < 65mph
HA:  > 65mph
Rejection Region
=.05
d.f.=16-1=15, ta = 1.753
t
1.753
Since 3.37 > 1.753, we reject H0.
n = 16
x
= 68.2 mph
s = 3.8 mph
t
x   0 68.2  65

 3.37
s / n 3.8 / 16
Conclusion: We are 95%
confident that the mean speed of
vehicles at Location X is greater
than 65 mph. Location X is a
good candidate for a radar trap.
 Use if concerned with a proportion of the population,
p, that have a particular characteristic

Can be used with nominal data
 Use the same 5-Step Hypothesis Testing Procedures
 Test Statistic calculated
Z=
p-p
√ p (1- p) / n
Example
For a Christmas and New Year’s week, the
National Safety Council estimated that 500 people
would be killed and 25,000 injured on the nation’s
roads. The NSC claimed that 50% of the accidents
would be caused by drunk driving.

A sample of 120 accidents showed that 67 were
caused by drunk driving. Use these data to test the
NSC’s claim with a = 0.05.


 Example, cont.
H0: p = .5
HA: p
 .5
=.05
Rejection Region
Rejection Region
p = 67/120
n = 120
Z=
(67/120) - .5
.5(1 - .5)
120
= 1.278
Since –1.96 < 1.278 < 1.96, we
do not reject H0.
Conclusion: There is insufficient
evidence to suggest that the
population proportion of accidents
caused by drunk driving is
different from 50%
 Example, cont.
z
=
p1  p2
pc (1  pc )
p (1  pc )
 c
n1
n2
. 1167 - . 0880
Rejection Region
Z
1.645
. 1036 (1 - . 1036 ) . 1036 (1 - . 1036 )
+
300
250
= 1.099
Since 1.099 < 1.645, we do not reject H0.
Conclusion: There is insufficient evidence to suggest that there is
an difference between the proportion of unmarried workers missing
more than 5 days of work than the proportion of married ones
 Often we are interested in comparing two different,
independent populations
Population 1
Population 2
Sample 1
Sample 2
Figure 13.1 Two Populations and Two Samples
 When comparing two different, independent
populations the Null Hypothesis takes on the form:
H0: s- p = 0
H0: s ≤ p
H0: s = p
 When comparing two different, independent
populations the with large n, the test statistic looks like
z
( X1  X 2 )
 12
n1

 22
n2
If population std. dev. are unknown, use s1 and s2 instead of σ’s
 Example
A study was conducted to compare the mean years of service for
those retiring in 1979 with those retiring last year at Acme
Manufacturing Co. At the .01 significance level can we conclude
that the workers retiring last year gave more service based on the
following sample data? Note: Let pop #1= “last year”
Sample Mean
Sample Standard Deviation
Sample Size
Population #1
"Last Year"
Population #2
1979
30.4
3.6
45
25.6
2.9
40
 Example, cont.
H0: LY < 1979
HA: LY > 1979
=.01
Rejection Region
Z
2.326
 Example, cont.
Sample Mean
Sample Standard Deviation
Sample Size
z
Population #1
"Last Year"
Population #2
1979
30.4
3.6
45
25.6
2.9
40
30.4  25.6
2
2
3.6
2.9

45
40
 6.80
Z
2.326
Since 6.80 > 2.326, we reject H0.
Conclusion: There is sufficient
evidence at the 99% confidence
level to suggest that the mean
years of service of those retiring
last year is greater than the mean
years of service of those retiring
in 1979.
 When comparing two different, independent populations
with unknown variances that are assumed equal) with
small n, the test statistic looks like
t 
( X1  X 2 )  D
 1
1 


s 
 n1
n2 
df = n1 + n2 – 2
2
p
s
2
p
( n1  1) s12  ( n2  1) s22

n1  n2  2
(Pooled Sample Variance)
 Example
To determine whether there is a difference in the time involved in
using two versions of software, the new version of the software is
compared to the original. Samples are taken from two independent
groups using the software (data below). At the .01 significance
level, is there a difference in the mean amount of time required to
use two versions of software?
Version 1
6
8
6
9
10
Version 2
5
9
8
7
7
6
 Example, cont.
H0 : 1- 2 = 0
HA : 1- 2 =/ 0
 =.01 Because we
have a two tailed test,
there is /2 = .005 in
each tail
df = n1 + n2 – 2
=5+6–2
=9
Version 1
6
8
6
9
10
Version 2
5
9
8
7
7
6
S12 = variance 1
S22 = variance 2
3.2
X1 bar = mean 1
2.0
X2 bar = mean 2
7.8
7.0
From t-table, critical cutoffs for two-tail, alpha/2=.005, df=9 is 3.25
 Example, cont.
2
2
(
n

1
)(
s
)

(
n

1
)(
s
(4)(3.2)  (5)( 2.0)
1
2
2)
sp 2  1

 2.53
n1  n2  2
562
t

X  X 2
1
2
S p (1 / n  1 / n )
1
2
7 .8  7 .0
.80

 .83
2 .53 (1 / 5  1 / 6 ) .963
Since .83 < 3.25, we do
not reject H0.
Conclusion: There is
insufficient evidence to
suggest that there is a
difference between the
mean time to use the two
versions of software
 When comparing two different population proportions,
the Null Hypothesis takes on the form:
H0: p1- p2 = 0
H0: p1 = p2
 The test statistic looks like:
p1  p2
z
pc (1  pc ) pc (1  pc )

n1
n2
where
pc 
Total number of successes X 1  X 2

Total number in samples
n1  n2
= (the weighted mean of the two sample proportions)

Are unmarried workers more likely to be
absent from work than married workers? A
sample of 250 married workers showed 22
missed more than 5 days last year, while a
sample of 300 unmarried workers showed 35
missed more than five days. Use a .05
significance level. Note: let pop #1=
unmarried workers.
 Example, cont.
H 0 : pu = pm
H A : pu > pm
 = .05
Rejection Region
Z
pu = Unmarried Workers = X1/n1 = 35/300 = .1167
pm = Married Workers = X2/n2 = 22/250 = .0880
35 + 22
=
= . 1036
pc
300 + 250
1.645


Chapter 9: problems 10, 15, 17, 18, 20, 40
Chapter 10: problems 12, 14 (two sample), 32,
33 (proportions)