Download Powerpoint Slides for Unit 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Hypothesis Testing
• Make a tentative assumption about a parameter
• Evaluate how likely we think this assumption is true
• Null Hypothesis
• Default possibility
• H0:  = 13
• H0:  = 0
• Alternative (or Research) Hypothesis
• Values of a parameter if your theory is correct
• HA:  > 13
• HA:   0
1
Hypothesis Testing
• Test Statistic
• Measure used to assess the validity of the null
hypothesis
• Rejection Region
• A range of values such that if our test statistic falls
into this range, we reject the null hypothesis
• H0:  = 13
_
_
• If x is close to 13, can’t reject H0. But if x is far
away, then reject. But what’s “far away” ??
2
Hypothesis Testing
Errors
State of Nature (Truth)
H0 True
H0 False
Action
Reject H0
Fail to
Reject H0
3
Hypothesis Testing
Errors
Action
Drug Testing Example
H0: Not using drugs
State of Nature (Truth)
H0 True
H0 False
Reject H0
Conclude
“a drug user”
Fail to Reject H0
Conclude
“clean”
4
Testing 
• A human resources executive for a huge company
wants to set-up a self-insured workers’ compensation
plan based on a company-wide average of 2,000
person-days lost per plant.
_ A survey of 51 plants in the
company reveals that x = 1,800 and s = 500. Is there
sufficient evidence to conclude that company-wide
days lost differs from 2,000? (Use  = 0.05)
5
If H0 is true…
_
x has a t distribution with
50 degrees of freedom
_x =
2,000
6
When to Reject H0?
_
x has a t distribution with
50 degrees of freedom
Rejection
Region
P(rejection
region) = 
_
xL
_x =
2,000
_
xUP
7
Testing 
• Suppose you are a human resources manager and are
investigating health insurance costs for your
employees. You know that five years ago, the average
weekly premium was $30.00. You take a random _
sample of 40 of your employees and calculate that x =
$31.25 and s = 5.
• Have health care costs increased (use a 5%
significance level)?
8
If H0 is true…
_
x has a t distribution with
39 degrees of freedom
_x =
30
9
When to Reject H0?
_
x has a t distribution with
39 degrees of freedom
P(rejection
region) = 
Rejection
Region
_x =
30
_
xUP
10
t Values for 39 d.f.
x
1.55
1.56
1.57
1.58
1.59
1.60
P(t<x)
0.9354
0.9366
0.9378
0.9389
0.9400
0.9412
11
Important Note
• Siegel emphasizes confidence intervals to do
hypothesis tests
• I do NOT want you to do it this way
• It does not fit the logic that I will emphasize
• It doesn’t fit with p-values
• It’s too easy to get confused between one-tailed
and two-tailed tests
• So don’t follow Siegel, follow Budd
12
Testing p
• An HR manager of a large corporation surveys 1,000
workers and asks “Are you satisfied with your job?”
The results are
Responses
Percentage
Satisfied
77%
Not Satisfied
23%
• You want to examine whether dissatisfaction is
increasing. You know that the fraction of workers who
were dissatisfied with their job five years ago was
20%. Has the fraction increased (at the 5% significance
level)?
13
Regression
• Recall Coal Mining Safety Problem
• Dependent Variable: annual fatal injuries
injury = -168.51 + 1.224 hours + 0.048 tons
Test the
(258.82) (0.186)
(0.403)
+ 19.618 unemp + 159.851 WWII
(5.660)
(78.218)
-9.839 Act1952 -203.010 Act1969
(100.045)
(111.535)
(R2 = 0.9553, n=47)
hypothesis that
the
unemployment
rate is not
related to the
injury rate (use
=0.01)
14
Excel Output
Regression Statistics
R Squared
0.955
Adj. R Squared
0.949
Standard Error
108.052
Obs.
47
ANOVA
df
SS
MS
F
Significance
Regression
6
9975694.933
1662615.822
142.406
0.000
Residual
40
467007.875
11675.197
Total
46
10442702.809
Coeff.
Std. Error
t stat
p value
Lower 95%
Upper 95%
-168.510
258.819
-0.651
0.519
-691.603
354.583
hours
1.244
0.186
6.565
0.000
0.001
0.002
tons
0.048
0.403
0.119
0.906
-0.001
0.001
unemp
19.618
5.660
3.466
0.001
8.178
31.058
WWII
159.851
78.218
2.044
0.048
1.766
317.935
Act1952
-9.839
100.045
-0.098
0.922
-212.038
192.360
Act1969
-203.010
111.535
-1.820
0.076
-428.431
22.411
Intercept
15
Minitab Output
Predictor
Constant
hours
tons
unemp
WWII
1952Act
1969Act
S = 108.1
Coef
-168.5
1.2235
0.0478
19.618
159.85
-9.8
-203.0
StDev
258.8
0.186
0.403
5.660
78.22
100.0
111.5
R-Sq = 95.5%
T
-0.65
6.56
0.12
3.47
2.04
-0.10
-1.82
P
0.519
0.000
0.906
0.001
0.048
0.922
0.076
R-Sq(adj) = 94.9%
16
Testing 1- 2
• To compare wages in two large industries, we draw a
random sample of_46 hourly wage
_ earners from each
industry and find x1 = $7.50 and x2 = $7.90 with s1 =
2.00 and s2 = 1.80.
• Is there sufficient evidence to conclude (using  =
0.01) that the average hourly wage in industry 2 is
greater than the average in industry 1?
17
Testing p1- p2
• In a random survey of 850 companies in 1995, 73% of
the companies reported that there were no difficulties
_
with employee acceptance of job transfers.
In a
random survey of 850 companies in 1990, the
analogous proportion was 67%. Do these data provide
sufficient evidence to conclude that the proportion of
companies with no difficulties with employee
acceptance of job transfers has changed between 1990
and 1995? (Use  = 0.05)
18
Many Cases, Same Logic
statistic  (parameter | H 0 true )
t or z 
std.errorstatistic
• If you get a “small” test statistic, then there is a decent
probability that you could have drawn this sample with
H0 true—so not enough evidence to reject H0
• If you get a “large” test statistic, then there is a low
probability that you could have drawn this sample with
H0 true—the safe bet is that H0 is false
• Need the t or z distribution to distinguish “small” from
“large” via probability of occurrence
19
More Exercises
• A personnel department has developed an aptitude
test for a type of semiskilled worker. The test scores
are normally distributed. The developer of the test
claims that the mean score is 100. You give
_ the test to
36 semiskilled workers and find that x = 98 and s =
5. Do you agree that µ = 100 at the 5% level?
• Have 50% of all Cyberland Enterprises employees
completed a training program? Recall that for the
Cyberland Enterprises sample, 29 of the 50
employees sampled completed a training program.
(Use  = 0.05)
20
More Exercises
Predictor
Constant
age
seniorty
cognitve
strucint
manual
Manl*age
Coef
6.010
-0.006
0.011
-0.005
2.129
-1.513
-0.042
StDev
0.235
0.003
0.003
0.032
0.894
0.239
0.004
T
25.6
-1.71
3.56
-0.17
2.38
-6.33
-10.4
P
0.000
0.088
0.000
0.867
0.017
0.000
0.000
Dep. Var: Job
Performance
n=3525
Use =0.01
• On average, is performance related to seniority?
• Do those with structured interviews have higher average
performance levels than those without?
• Do those with structured interviews have higher average
performance levels at least two units greater than those without?
• Does the relationship between age and performance differ
between manual and non-manual jobs?
21
More Exercises
• A large company is analyzing the use of its Employee
Assistance Program (EAP). In a random sample of 500
employees, it finds:
Single Employees
number of employees
200
number using the EAP
75
Married Employees
300
90
• Using =0.01, is there sufficient evidence to conclude
that single and married employees differ in the usage
rate of the EAP?
22
More Exercises
• Independent random samples of male and female
hourly wage employees yield the following summary
statistics:
Male Employees
Female Employees
n_1 = 45
n
_2 = 32
x1 = 9.25
x2 = 8.70
s1 = 1.00
s2 = 0.80
• Is there sufficient evidence to conclude that, on
average, women earn less than men? (Use  = 0.10)
23