Download STP 226 - Arizona State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
Chapter7 Testing Hypothesis: summary
The Nature of Hypothesis Testing
Hypothesis is a statement about population parameters, like mean, proportion, two means
or two proportions.
For example:
1. We may wander if mean age (  )of people in juvenile detention in Arizona in 2008
was less than 16, the mean age in 2005.
2. Is the proportion of people that approve of a job our governor is doing ( p ) greater than
60%
3. Is mean age of cars driven in Arizona,  1 different than mean age of cars driven in
California,  2 ?
A hypothesis test allows us to make a decision of judgment about the value (s) of a
parameter (or parameters)
A hypothesis test involves two hypothesis – Null hypothesis and Alternative
(Research) hypothesis.
•
Null hypothesis (Ho): A hypothesis to be tested, for example for each of example
for each of the questions above null hypothesis will be respectively:
ex1 H0 :  =16
ex2 H0 : p=.6
ex3 H0 :  1 =  2 ,
null hypothesis will specify a value of the parameter of one population or will state that
two parameters from two populations are equal. Notice = sign in each statement.
•
Alternative hypothesis (Ha): A hypothesis to be considered as an alternate to the
null hypothesis.
- The choice of the alternative hypothesis depends on and should reflect the
purposes of the hypothesis test and depends on question asked in the problem.
ex1
Ha :
 < 16 - left-tailed test
ex2
Ha :
p > 0.6 - right-tailed test
ex3
Ha :  1 ≠  2 - two-tailed test
Hypothesis in ex1 and ex2 are called directional and in ex3 nondirectional.
The logic of hypothesis testing
1. Take a random sample from one population (or two samples from 2 populations)
2. If the data do not provide enough evidence in favor of the alternative hypothesis,
do not reject the null hypothesis.
3. If the data provide enough evidence in favor of the alternative, reject the null
hypothesis.
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
Terms, Errors, and Hypothesis
•
Test statistic: The statistic is a value computed from a sample (s) used as a basis
for deciding whether the null hypothesis should be rejected
Following terminology is used for the rejection region method of testing hypotheses, this
method is not presented it in our book.
•
Rejection region: The set of values for the test statistic that leads to rejection of
the null hypothesis (the rejection region method to test hypothesis is not used in
our book)
•
Non-rejection region: The set of values for the test statistic that leads to nonrejection of the null hypothesis
•
Critical values: The values of the test statistic that separate the rejection and nonrejection regions. critical values are part of the rejection region
Type I Error & Type II Errors
Do not reject H0
Reject H0
H0 is True
H0 is False
Correct Decision
Type I error
Type II error
Correct Decision
•
Type I error: Rejecting the null hypothesis when it is in fact true.
•
Type II error: Not rejecting the null hypothesis when it is in fact false.
Probabilities of Type I and Type II Errors
•
Significance level  : The probability of making a Type I error (rejecting a
true null hypothesis), always selected in advance of seeing data.
•
Power of a hypothesis test: Power = 1 – P(Type II error) = 1-  =
1 - The Probability of rejecting a false null hypothesis
Power near 0: the hypothesis test is not good at detecting a false null hypothesis.
Power near 1: the hypothesis is extremely good at detecting a false null
hypothesis
Relation between Type I and Type II error probabilities
For a fixed sample size, the smaller we specify the significance level,  , the larger
will be the probability,  , of not rejecting a false null hypothesis
Possible conclusions for a hypothesis test
Suppose a hypothesis test is conducted at a small significance level

1. If the null hypothesis is rejected, we conclude that the there is evidence for
alternative hypothesis and results are statistically significant at the level 
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
2. If the null hypothesis is not rejected, we conclude that the data do not provide
sufficient evidence to support the alternative hypothesis and results are not statistically
significant at the  level .
P-Value: To obtain a P-value (P) of a hypothesis test, we compute, assuming the null
hypothesis is true, the probability of observing a value of the test statistic as extreme or
more extreme than that observed. By extreme we mean far from what we would expect to
observe if the null hypothesis were true.
If test is left tailed, extreme means equal or much smaller than test statistics observed,
if test is right tailed, extreme means equal or much larger than test statistics observed,
if test is two tailed, extreme means either equal or larger than absolute value of test
statistics observed or equal or smaller than negative of absolute value of it.
P-value: referred to as observed significance level or probability value.
Two ways to decide if
H0 should be rejected or not:
1. If test statistics falls into rejection region, H0 should be rejected (again this
method is not presented in our book)
2. If p-value ≤ , H0 should be rejected
Guidelines for using the P-value to assess the evidence against H0
P-value
Evidence against H0
P > 0.10
Weak or none
Moderate
0.05 < P ≤ 0.10
Strong
0.01 < P ≤ 0.05
Very Strong
P ≤ 0.01
Steps in Hypotheses testing:
P-VALUE APPROACH
1.
State H0 and Ha
2.
Decide on α
3.
Compute test statistic
4.
determine the P-value
5.
If P ≤ α reject H0; otherwise, do not reject H0.
6. Interpret the result of the hypothesis test.
T-tests and Z-tests:
If we test hypothesis for one population mean or two populations means we have a z- test
if population standard deviation(s)  are known and a t-test if it is unknown and
replaced by sample standard deviations deviation s.
We concentrate only on the t tests, since population standard deviations are usually
unknown.
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
We have a t-test because test statistics has exactly or approximately t distribution if
H0 is true .
Type of the tests (z or t) and type of alternative hypothesis determines the way we
compute p-values.
Computing p-value for a t-test, observed test statistics is t s , use t-curve with
appropriate degrees of freedom
If Ha is
a) Two tailed ( ≠ ) compute area for two tails: t≤−∣t s∣ and t≥∣t s∣
b) right tailed ( > ) compute the area of a right tail: t≥t s
c) left tailed ( < ) compute the area of a left tail : t ≤t s
Assumption for all tests are: samples are simple random samples of independent
observations, populations are normal or samples are large ( n≥30 ). If two parameters
are compared, samples must also be independent.
Tests for Two Population Means
H 0 : 1= 2 vs
H a :  1≠2 or
H a :  12 or H a :  12
(independent samples, populations normal or large samples)
Non pooled t-test under assumption that
standard error of
y1 − y2 : SE y − y =
1
2

 1 ≠ 2
2
1
s
s 22
2
2
 =  SE1SE2 ,
n 1 n2
ȳ1− ȳ2
test statistics: t s =
has approximately t distribution (if null hypothesis is true)
SE ȳ −ȳ
Degrees of freedom are estimated by software:
1
df =
2
[s 21 /n1  s22 /n 2 ]2
s 21 /n1 2 s 22 /n2 2 , round it down to nearest integer.

n1 −1
n 2−1
If not given, liberal estimate df =n1 n2 −2 or conservative estimate
df =minn1−1,n 2−1 is used.
Following example illustrates nondirectional hypothesis test:
EX1 Is autism marked by different brain growth patterns in early life. Studies have linked
brain size in infants and toddlers to a number of future ailments, including autism. One
study looked at brain size of 30 autistic boys and 12 nonautistic boys (control) who had
received MRI scan as toddlers. The average brain volumes and standard deviations in
milliliters are given below. Two samples may be regarded as SRS from two populations.
Is there a difference between the means? Test appropriate hypothesis, use α = 0.05.
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
SE y
Group
Condition
n
s
True means
y
_____________________________________________________________________
A
1
Autistic
30
1297.6 88.4
16.14

2
Control
12
1179.3 70.7
20.4
C
(not pooled t-test)
We test H 0 : 1= 2 (no difference in mean brain volumes for 2 pop. ) vs
H a : 1 ≠ 2 (there is a difference in mean brain volumes for 2 pop. )
SE=26, t=4.55, df=25.3, use 25
P-value method: P value is the area under t curve with 25 df, left of -4.55 and right of
4.55
1
pv <.0005 , so pv<.001
from the tables , 4.55>3.7251, so ,
2
using calculator: 2* tcdf(4.55, 10^6, 25) =1.2*10-4 < .05 reject H0 , there is evidence for
alternative hypothesis . There is a difference between mean brain volumes for two
populations.
The Relation Between Hypothesis Tests and Confidence Intervals
If H 0 : 1= 2 is not rejected against two tailed (nondirectional) alternative, at given α
level than
(1- α)*100% CI for the difference between two means will contain 0,
otherwise it will not contain 0. If CI contains 0, there is no evidence that means are
different.
In EX1: 95 CI for  1− 2 is: (64.7, 171.9), clearly no 0 inside.
Previous example considered only a directional hypothesis , where H a :1 ≠ 2
Next we will examine tests with directional alternatives, H a :  12 or
H a :  12
We use a directional alternatives if we have reason to believe that the difference between
the means goes in a particular direction.
We add a following step to the test:
Check if the directionality is correct :
if we have H a :  12 , then we must have Ȳ1>Ȳ2
if we have H a :  12 , then we must have Ȳ1<Ȳ2
If directionality is incorrect , then p-value>0.50 and null hypothesis is not rejected.
If directionality is correct, we proceed with the test as in the case of nondirectional
alternative, just compute p-value as a left tail area of right tail area, depending on
alternative hypothesis.
Following example illustrates directional hypothesis test:
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
EX2. Researchers want to know if niacin supplement in a diet for young lambs is
effective in increasing weight (over a standard diet). Let μ1 = mean weight gain of all
lambs fed with standard diet plus niacin supplement and μ 2 = mean weight gain of all
lambs fed with standard diet only .
We test H 0 : 1= 2 ( niacin not effective) vs
Let use α=0.05
H a :  12 (niacin effective)
Let us consider following two situations
1) Suppose that ȳ1=10 lb and ȳ1=13 lb
Since ȳ1< ȳ2 , directionality is incorrect, so p-value>0.50 and we do not reject null
hypothesis. There is no evidence for alternative hypothesis and conclusion is that niacin
is not effective.
2) Suppose that ȳ1=15 lb and ȳ1=10 lb , in that case ȳ1> ȳ2 , so we have
correct directionality. In that case we proceed with the test. Suppose our data gives
15−10
=2.27 , palso : SE ȳ − ȳ =2.2 lb and 9 degrees of freedom, t s=
2.2
value=tcdf(2.27, 106, 9)=0.025< .05, so null hypothesis is rejected, we have evidence for
alternative hypothesis, niacin is effective.
1
2
EX3 Experiment was conducted to see if wounding a tomato plant would make it
improve its defense against insects. Researchers grew larvae of the tobacco hornworm on
wounded and unwounded plants, weight in mg after 7 days of growth was recorded.
Summary of the results are given below:
Wounded (1)
Control (2)
n
16
18
28.66
37.96
̄y
s
9.02
11.14
df=31.8 (use 31)
We test H 0 : 1= 2 ( wounding not effective) vs H a : μ 1<μ2 (wounding effective),
Let use α=0.01
28.66−37.96
=−2.69 , p-value=tcdf(-106, -2.69,
3.46
31)=.006<.01, so we reject null hypothesis, there is evidence for alternative hypothesis at
1% significance level. Yes, wounding appears to increase plant's defense against insects.
SE ȳ − ȳ =3.46 mg , t s=
1
2
Few remarks about statistical versus practical significance.
By declaring difference between the means “statistically significant” we mean that
difference is large enough not to be simply considered due only to a sampling error.
If difference is “not statistically significant”, we consider it small enough to be likely
caused by a sampling error.
If we find our results statistically significant (null hypothesis was rejected), often it is
only because we had very large samples, but our difference has no practical significance.
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
For example: Mean test score for two large schools are 73 and 74 , practically identical,
but if our statistics are based on 2 samples of size 1000 and sample SD-s are 5 and 6, then
73−74
=4.05 and for two-tailed test our
our test statistics is: t s= 2
√ 5 /1000+62 /1000
p-value=0.0001 (999=df) and we reject null hypothesis, results are statistically
significant, but have no practical significance.
Using CI-s to assess importance.
Consider following example;
A study records a blood pressure change (mmHg) for two types of medications. Suppose
we will consider difference between two population means important if it will exceed
9 mmHg.
For 95% CI for μ1 −μ 2 indicate if difference is statistically significant and/or
important.
a) (4.5, 6.7)
Significant (zero not inside)
b) (9.7, 10.2) Significant (zero not inside)
c)(-4.6, 6.8)
Not significant (zero inside)
d) (6.8, 9.5)
Significant (zero not inside)
e) (-6.8, 9.5)
not important (all numbers< 9)
important (all numbers>9)
not important (all numbers< 9)
can't tell if important (some numbers< 9,
some > 9))
Not significant (zero not inside) can't tell if important (some numbers
< 9, some > 9))
Optional:
Pooled t-test under assumption that
 1 = 2
Pooled sample standard deviation estimates common standard deviation:

n1 −1 s21 n2 −1 s22
sp =
n1 n2 −2
test statistics:
t s=
ȳ1− ȳ2
has t distribution with df =n1 n2 −2
s p √ 1/n 1+1/n 2
(if null hypothesis is true)
EX4. The number of friends consulted for advice before purchasing a car or a
computer was examined by a certain consumer research paper. Two independent
samples of consumers were selected. The summary statistics consistent with the
information in the paper are given in the following table:
Type of purchase Number of purchases (n) Mean number of friends
 )
consulted ( X
(1) car
12
3.65
(2) computer
15
4.26
Standard
deviation (S)
0.42
0.46
Is mean number of friends consulted before each purchase greater for people
purchasing computers? Test appropriate hypothesis at 5% significance level.
Assume that populations are normal with have equal standard deviations.
H a :1  2
t=
365−426
=−3.56 df=25, pooled t-test
.443  1/121/15
p-value: area left of -3.56 unted t-curve with 25 df is computed as follows:
Brief Lecture notes STP231 Instructor: Ela Jackiewicz
tcdf(-10^6, -3.56, 25)=7.7*10-4 < .05, Reject H0 , evidence that more friends are
consulted before computer purchase
p-value from tables:
3.56 > 2.287= t.005, so p-value< .005
Calculator (TI 83, 84)
All tests and CI-s are available in STATS TESTS option:
CI for 2 population means: 2-SampTInt (select POOLED YES or NO)
Tests for 2 populations means: 2-SampTTest (select POOLED YES or NO)
(we only covered non-pooled t-tests and CI-s, so select NO)