Download 10-08 lecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Independent-Samples t-test
10/8
Comparing Two Groups
• Often interested in whether two groups have same mean
– Experimental vs. control conditions
– Comparing learning procedures, with vs. without drug, lesions, etc.
– Men vs. women, depressed vs. not
• Comparison of two separate populations
– Population A, sample A of size nA, mean MA estimates mA
– Population B, sample B of size nB, mean MB estimates mB
– mA = mB?
• Example: maze times
– Rats with hippocampus: Sample A = [43, 26, 35, 31, 28]
– Without hippocampus: Sample B = [37, 31, 27, 46, 33]
– MA = 32.6, MB = 34.8
– Is difference reliable? mA < mB?
• Null hypothesis: mA = mB
– No assumptions of what each is (e.g., mA = 10, mB= 10)
• Alternative Hypothesis: mA ≠ mB
Finding a Test Statistic
• Goal: Define a test statistic for deciding mA = mB vs. mA ≠ mB
• Constraints (apply to all hypothesis testing):
– Must be function of data (both samples)
– Sampling distribution must be fully determined by H0
• Can only assume mA = mB
• Can’t depend on mA or mB separately, or on s
– Alternative hypothesis should predict extreme values
• Statistic should measure deviation from mA = mB
• so that if mA ≠ mB, we’ll be able to reject H0
• Answer (preview):
– Based on MA – MB (just like M – m0 for one-sample t-test)
– .
MA  MB
"Standard
Error"
– M
A – MB has Normal distribution

– Standard error has (modified) chi-square distribution
– Ratio has t distribution
Likelihood Function for MA – MB
• Central Limit Theorem
 
MA ~ Normal m,
s
nA
 
MB ~ Normal m,
s
nB
• Distribution of MA – MB

– Subtract themeans: E(MA – MB) = E(MA) – E(MB) = m – m = 0
– Add the variances:
 SE  s
1
nA



2
nA
 sn B  s 2
2
1
nB
– M
. A  MB ~ Normal0, s
•

s
1
nA

1
nB

Just divide by standard error?
– . M A 1 M1B
s
nA

nB
~ Normal0,1
– but we don’t know s
– Need to estimate from data


1
nA

1
nB

Estimating s
• Already know best estimator for one sample
X  M 
2
s
n 1
• Could just use one sample or the other

– sA or sB
– Works, but not best use of the data
• Combining sA and sB
– Both come from averages of (X – M)2
– (X – M)2 for each individual score is estimate of s2
– Average them all together:
• Degrees of freedom
– (nA – 1) + (nB – 1) = nA + nB – 2

 X  M    X  M 
2
A
A
2
B
nA  nB  2
B
Independent-Samples t Statistic
t
Difference between sample means
MA  MB
Standard Error
Typical difference expected by chance

Standard Error  MSE 


1
nA

1
nB
Variance of MA – MB
Variance from MA

Variance from MB
Estimate of s
X  M    X  M 

MSE 
2
A
A
2
B
nA  nB  2
B
Sum of squared deviations
Degrees of freedom
Steps of I.S. t-test
1.
2.
State clearly the two hypotheses
Determine null and alternative hypotheses
•
•
3.
Compute the test statistic t from the data
•
4.

H0: mA = mB
H1: mA ≠ mB
•
5.
t
.
MA  MB
MSE 

1
nA

1
nB

Determine likelihood function for test statistic according to H0
t distribution with nA + nB – 2 degrees of freedom
Get p-value
•
•
p  ptnA nB  t 
1-pt(t,df)
or
or
p  2 ptnA nB  t 
2*(1-pt(|t|,df))
6. Choose alpha level

7a.
p > α: Retain nullhypothesis, mA = mB
7b. p < α: Reject null hypothesis, mA ≠ mB
Example
–
–
–
–
Rats with hippocampus: Sample A = [43, 26, 35, 31, 28]
Without hippocampus: Sample B = [37, 31, 27, 46, 33]
MA = 32.6, MB = 34.8, MA – MB = -2.2
df = nA + nB – 2 = 5 + 5 – 2 = 8
X  M    X  M 

MSE 
2
2
A
A
B
df


t

181.2  208.8
 48.75
8

1
nA

X
XMA
(X-MA)2
43
10.4
108.16
26
35
MA  MB
MSE 
B
1
nB
2.2
48.75  15  15 
-6.6
2.4
p=
43.56
.325.76

31
-1.6
2.56
28
-4.6
21.16
 .498
SA(X-MA)2
181.20
t8
X
X-MB
(X-MB)2
37
2.2
4.84
31
p-3.8
= .64
14.44
27
-7.8
60.84
46
11.2
125.44
33
-1.8
3.24
SB(X-MB)2
208.80
Homogeneity of Variance
 
~ Normalm, 
MA ~ Normal m,
MB
s
nA
s

M A  M B ~ Normal 0, s
1
nA

1
nB

nB


• t-test only works if s
A = sB
– Variance is homogenous
• Not assumption of H0, but of whole procedure
– H0: mA = mB & sA = sB
– H1: mA ≠ mB & sA = sB
• If variance is heterogeneous
– Standard procedure doesn’t work
– Trick for estimating standard error and reducing degrees of freedom
• What to remember
– Independent-samples t-test assumes homogenous variance
– If not true, you have to use alternative formulas for SE and df
Mean Squared Error
X  Xˆ 
2
MSE 
Population
Sample

Sample variance

I.S. t-test

MSE 
Choosing Xˆ  m gives population variance
N

X  Xˆ

2

df
Xˆ  M
Gives estimate of population variance
M A
ˆ
X  
M B
for sample A
for sample B
 X  M    X  M 
MSE 
2
A
2
A
B
df
B
Gives estimate of population variance
Degrees of Freedom
•
Applies to any sum-of-squares type formula
2
2
•

•

A X A  M A   B X B  M B 
X  M 
X  Xˆ
2
Tells how many numbers are really being added
X
3
7
– n = 2: only one number


– In general: one number determined by the rest

2
X–M
-2
2
Every statistic in formula that’s based on X removes 1 df
– M, MA, MB
– Fancy algebra to rewrite formula in terms of only X results in fewer summands
•
I will always tell you how to find df for each formula
•
To get average, divide by df
 X
MSE 
X  M 
s
2
A
n 1
•

 M A    X B  M B 
2
A

B
nA  nB  2
Distribution of a statistic depends on its df
– c2, t, F
2
(X – M)2
4
4