Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Hypothesis Testing
ESM 206
6 Feb. 2002
Example: Gas Mileage
Do “Small” cars have a
different average gas mileage
than “Compact” cars?
Data on mileage of 13 small
and 15 compact cars.
Mileage
35
30
25
20
Small
Small
Type
SMALL
COMPACT
Eagle Summit
Audi 80
Ford Escort
Buick Skylark
Ford Festiva
Chevrolet LeBaron
Honda Civic
Ford Tempo
Mazda Protégé
Honda Accord
Mercury Tracer
Mazda 626
Nissan Sentra
Mitsubishi Galant
Pontiac LeMans
Mitsubishi Sigma
Subaru Loyale
Nissan Stanza
Subary Justy
Oldsmobile Calais
Toyota Corolla
Peugeot 405
Toyota Tercel
Subaru Legacy
Volkswagen Jetta
Toyota Camry
Example: gas consumption
G   0  1P   2 I  3 N   4U
Which coefficients are different from
zero?
Data from 36 years in US.
Hypothesis testing
Define null hypothesis (H0)
Does direction matter?
Choose test statistic, T
Distribution of T under H0
Calculate test statistic, S
Probability of obtaining value at least as
extreme as S under H0 (P)
P small: reject H0
The null hypothesis
Statement about underlying parameters
of the population
We will either reject or fail to reject H0
Usually a statement of no pattern or of
not exceeding some criterion
Examples
The alternate hypothesis
Written HA
Is the logical complement of H0
Examples
One- and two-sided tests
One-sided test: direction matters
Pick a direction based on regulatory criteria
or knowledge of processes
 Direction must be chosen a priori

Two-sided: all that matters is a difference
One-sided has greater power
Must make decision before analyzing
data
Comparing means: the t-test
Compare sample mean to fixed value
(eqs. 1-4)
Compare regression coefficient to fixed
value (eq. 5)
Compare the difference between two
sample means to a fixed value (usually
0) (eqs. 6-7)
Assumptions of the t-test
The data in each sample are normally
distributed
The populations have the same
variance
Can correct for violations of this with the
Welch modification of df
 Test for difference among variances with Ftest

The P-value
P is the probability of observing your
data if the null hypothesis is true
P is the probability that you will be in
error if you reject the null hypothesis
P is not the probability that the null
hypothesis is true
Critical values of P
Reject H0 if P is less than threshold
P < 0.05 commonly used

Arbitrary choice
Other values: 0.1, 0.01, 0.001
Always report P, so others can draw
own conclusions
Example: Gas Mileage
Do “Small” cars have a
different average gas mileage
than “Compact” cars?
Data on mileage of 13 small
and 15 compact cars.
Mileage
35
30
25
20
Small
Small
Type
SMALL
COMPACT
Eagle Summit
Audi 80
Ford Escort
Buick Skylark
Ford Festiva
Chevrolet LeBaron
Honda Civic
Ford Tempo
Mazda Protégé
Honda Accord
Mercury Tracer
Mazda 626
Nissan Sentra
Mitsubishi Galant
Pontiac LeMans
Mitsubishi Sigma
Subaru Loyale
Nissan Stanza
Subary Justy
Oldsmobile Calais
Toyota Corolla
Peugeot 405
Toyota Tercel
Subaru Legacy
Volkswagen Jetta
Toyota Camry
Gas mileage: variances are
unequal
Min:
1st Qu.:
Mean:
Median:
3rd Qu.:
Max:
Total N:
NA's :
Variance:
Std Dev.:
Small
25.000000
28.000000
31.000000
32.000000
33.000000
37.000000
13.000000
0.000000
14.500000
3.807887
Compact
21.000000
23.000000
24.133333
24.000000
25.500000
27.000000
15.000000
0.000000
3.552381
1.884776
Gas mileage
Test Name:
Welch Modified Two-Sample t-Test
Estimated Parameter(s):
mean of x = 31
mean of y = 24.13333
Data:
DS2
x: Small in DS2 , and y: Compact in
Test Statistic:
t = 5.905054
Test Statistic Parameter:
df = 16.98065
P-value:
0.00001738092
95 % Confidence Interval:
LCL = 4.413064
UCL = 9.32027
Example: gas consumption
G   0  1P   2 I  3 N   4U
Which coefficients are different from
zero?
Data from 36 years in US.
Gas consumption
Value Std. Error
t value Pr(>|t|)
(Intercept)
-0.0898
0.0508
-1.7687
0.0868
GasPrice
-0.0424
0.0098
-4.3058
0.0002
Income
0.0002
0.0000
23.4189
0.0000
New.Car.Price
-0.1014
0.0617
-1.6429
0.1105
Used.Car.Price
-0.0432
0.0241
-1.7913
0.0830
Interpreting model coefficients
Is there statistical evidence that the
independent variable has an effect?

Is the parameter estimate significantly
different from zero?
Is the coefficient large enough that the
effect is important?

Must take into account the variation in the
independent variable

Use linear measure of variation – SD, IQ range,
etc.
Types of error
Type I: reject null hypothesis when it’s
really true

Desired level: a
Type II: fail to reject null hypothesis
when it’s really false
Desired level: 
 Is associated with a given effect size


E.g., want a probability 0.1 of failing to reject
when true difference between means is 0.35.
Types of error
In reality, H0 is
Your test
says that
H0 should
be:
True
False
Accepted
Correct
conclusion
Type II
error
Rejected
Type I
error
Correct
conclusion
Controlling error levels
a is controlled by setting critical P-value
 is controlled by a, sample size,
sample variance, effect size
Tradeoff between a and 
Need to balance costs associated with
type I and type II errors
Power is 1-