Survey
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Taylor's law wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Omnibus test wikipedia, lookup

Statistical hypothesis testing wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Transcript
```Hypothesis testing
Say not, „I have found the truth," but
rather, „I have found a truth.„
Kahlin Gibran “The Prophet”
What is hypothesis?
A statement about a population developed
for the purpose of testing
•Population is so large that it is not
feasible to study all the objects
•Alternative to measuring the entire
population is to take a sample from the
population
•Then we can test a statement to
determine whether the sample does or
does not support the statement
concerning the population
Examples:
•Eighty percent of those who play the state
lotteries regularly never win more than 100€ in
any one play
•The mean starting salary for graduates of fouryear bussiness schools is 3200€ per month
•Thirty-five percent of retirees in the upper
Midwest sell their home and move to a warm
climate within 1 year of their retirement
What is hypothesis testing?
A procedure based on sample evidence and
probability theory to determine whether the
hypothesis is a reasonable statement
about population parameter, e.g. mean
(hypothesis)
•We can also verify assumptions about
shape of statistical distribution
Example:
Hypothesis: Mean monthly commission of sales
associates in retail electronics stores is in fact
2000€
•Select a sample from the population to test the
assumption μ=2000
• Sample mean 1000€ would certainly cause
rejection of the hypothesis
•Mean 1995€?
Difference 5€ :
•Sampling error ?
•Or statistically significant difference?
Five-step procedure for testing
a hypothesis
Step 1
• State null and alternate hypotheses
Step 2
• Select a level of significance
Step 3
• Identify the test statistics
Step 4
• Formulate a decision rule
Step 5
• Take a sample, arrive at decision
Step 6
• Do not reject H0 or reject H0 and accept H1
Step 1:State the Null Hypothesis (H0)
Null hypothesis: A statement about the value of
a population parameter
• hypothesis being tested
•designated H0 and read „H sub zero“
•H stands for hypothesis
•Subscript zero implies „no difference“
•Often begin by stating: „There is no
significant difference between....“
•Will always contain the equal sign
e.g. H0 : μ=2000
Step 1: Alternate Hypothesis (H1)
Alternate hypothesis: A statement that is
accepted if the sample data provide sufficient
evidence that the null hypothesis is false
•It is written H1 and is read „H sub one“
•Often called the research hypothesis
•Never contain equal sign
•e.g. H1: μ≠2000
•We turn to the alternate hypothesis only if the
data suggests the null hypothesis is untrue
Step 2: Level of significance
The probability of rejecting the null
hypothesis when it is true
•Designated α (alpha)
•Sometimes called level of risk
•Decision is made to use:
• the 0,05 level (5% level)- traditionally
selected for consumer research projects
•the 0,01 level – for quality assurance
•the 0,1 level – for political polling
Or any other between 0 and 1
Possibility of two types of errors:
Type I error: Rejecting the null hypothesis, H0
when its true
•Probability of commiting a type I error is α
•1 -  probability of accepting H0 when its
true (accepting correct hypothesis)
Type II error: Accepting the null hypothesis
when it is false
•Probability of commiting type II errors is 
•1 -  power of the test
Type I and type II errors
f(H0)
f(H1)
1-
= P(H0/H1)
1-
 = P(H1/H0)
-probability of accepting H0
 -probability of accepting H1
when H1is true
when H0 is true
Type I and type II errors
•Type I error α and type II error  are closely
connected
•Reducing one type of error enlarge other type of
error
Compromise is necessary
 For this reason is usually selected α=0,05
Researcher
Null
hypothesis
Accepts H0
H0 Rejects
H0 is true
Correct decision Type I error
H0 is false
Type II error
Correct decision
Step 3: Select the test statistic
A value determined from the sample information,
used to determine whether to reject the null
hypothesis
For example: in hypothesis testing for the mean,
when σ is known or the sample size is large the
test statistics is computed by:
Formula depends
x
μ
0
on used test
u 
σ
n
Step 4: Formulate the decision rule
Decision rule – Statement of the specific
conditions under which the null hypothesis is
rejected and the conditions under which it is not
rejected
Critical value – the dividing point between the
region where the null hypothesis is rejected and
the region where it is not rejected
=>Computing test statistic, comparing it to the
critical value and making a decision to reject or
not to reject the null hypothesis.
Two tailed test
No direction is specified in the alternate hypothesis
H0 : = 0
H1 :  0
If |ucal| u1-/2 => do not reject H0
If |ucal| > u1-/2 => reject H0
One-tailed test
Alternate hypothesis states direction
e.g:
Null hypothesis includes equal sign
One way to determine the location of the
rejection region is to look at the direction in which
the inequality sign in the alternate hypothesis is
pointing (< either >). In this case < (to the left)
Notice
•The critical values for a one-tailed test are
different from a two-tailed test at the same
significance level.
•In two tailed test we split the significance level in
half and put half in lower tail and half in the upper
tail.
•In a one-tailed test we put all the rejection region
in one tail
Differences between one and two
tailed test
p–value in hypothesis testing
The probability of observing a sample value as
extreme as, or mote extreme than, the value
observed, given that the null hypothesis is true.
•If p-value<significance level => H0 is rejected
•If p-value>significance level => H0 is not rejected
•Gives us also additional insight into the strength of
the decision
•Very small p-value e.g. 0,0001 indicates that there is
little likelihood the H0 is true
•On the other hand p-value 0,2033 means that H0 is
not rejected and there is little likelihood that is false
Testing for a population mean
Let X to have normal distributed
population N(, 2)
H0 : = 0
est  =
x
H1 :  0
and N(, 2/n)
a) Variance of the population is
known, then test statistic: u
if |u| u1-/2 => do not reject H0
if |u| > u1-/2 => reject H0

x - μ0
σ
n
with …N(0,1)
b)Variance of the population is unknown,
est2 = s12 , large sample (n>30)
N(0,1) can be used
x -μ
u 
0
s1
n
If |u| u1-/2 => do not reject H0
if |u| > u1-/2 => reject H0
c) Variance of the population is unknown,
est2=s12 , small sample (n≤30)
Test statistics:
t 
Critical value t (n-1)
x - μ0
s1
n
Two sample test of hypothesis about
mean, independent samples
Let variable X1 is normally distributed....N(1, 12)
Let variable X2 is normally distributed ….N(2, 22)
Assume estimated means 1 and 2 are equal
=> H0 :1 = 2
Two tailed test
est 1 = x1
est 2 = x2
H1 :1  2
… N(1, 12/n1)
… N(2, 22/n2)
a) Variances of the population are known 12 ,
22 then
2
( x1  x2 )....N ( 1   2 ;

2
1
n1

2
n2
)
Test statistic:
u 
x1 - x2  ( μ1  μ2 )
n2 σ 12  n1σ 22
n1 .n2

x1 - x2
n2 σ 12  n1σ 22
n1 .n2
b) Variances of the populations 12 , 22 are
unknown and both samples are large n1>30,
n2>30
We can used same test statistic like before in a)
Variances of the populations will be replaced by
their point estimates:
est 12 = s112
est 22 = s122
c) Variances of the populations are unknown, at least
one sample is small (n1 30, or n2  30)
=>If we can assume equality of variances 12 = 22 = 2,
then we can use t-test statistic with student distribution.
t 
x1 - x2
n1 .n2
.
2
2
( n!  1 )s11  ( n2  1 )s12 n1  n2
n1  n2  2
Compared with critical value t  pre (n1 +n2 - 2) degrees
of freedom
d) Variances of the populations are unknown, at least
one sample is small (n1 30, or n2  30)
we can not assume equality of variances (12  22 )
( Verified by F test)
=>We can use Behrens-Fischer test for unequal variances
Two-sample tests of hypothesis:
Dependent samples
Samples are dependent, or related
Two types of dependent samples:
1. Those characterized by a measurement, an
intervention of some type, and then another
measurement,
2. Matching or pairing of observations – paired
samples
We make several measurements on the same statistical
units, we get:
Index of
x11 , x12, …x1j , …, x1n
measurement order
j = 1,2,…,n
x21, x22, …x2j , …, x2n
x ij
Index to distinguish set of measurements in time
i = 1,2
We can calculate difference for each pair:
dj = x1j - x2j ,
n
Est d =
d
1
2
est  
(dj d )

n  1 j 1
2
d
Ho : 1 = 2 or Ho : d = 0
Against alternate hypothesis
H1 : d  0
Test statistic have student distribution with (n-1)
degrees of freedom
t
d
d
n

d
n
( d
j 1
j
d )
n( n  1 )
What will be possible results?
2
Hypothesis testing of variance
A) Test of equality of variance with constant
H0 :2 = 20 , est 2 = s12
H1 :2 20
Test statistic
2
2 distribution with (n-1)

2
1
degrees of freedom
2
0
 
Rejection
region
2 1-
( n  1 ).s

/2
Do not reject H0
2
 /2
Rejection
region
B) Test for equality of variances in two samples
H0 :12 = 22 est 12 = s112 , est 22 = s122
H1 :12> 22 , one tailed test
Test statistics
2
11
2
12
s
F
s
Fischer distribution
With degrees of freedom:
1= n1-1, 2= n2-1
Note: Higher variance will be numerator => F>1
F < F  ( 1, 1)  do not reject H0, variances of two populations can
be considered equal
F  F  ( 1, 1)  H0 is rejected, variance of the first population
(numerator) is significantly greater
References:
•Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc
Chapter 10 and 11 => Recommended reading (do as I did ;-)
•Slovak lectures by prof. Ing. Zlata Sojková, CSc
•Another recommended study materials:
http://moodle.uniag.sk/fem/course/view.php?id=211
=> Moodle course of statistics
That`s all folks
Don`t worry, be happy ;-)
```
Related documents