Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Independent-Samples t-test 10/8 Comparing Two Groups • Often interested in whether two groups have same mean – Experimental vs. control conditions – Comparing learning procedures, with vs. without drug, lesions, etc. – Men vs. women, depressed vs. not • Comparison of two separate populations – Population A, sample A of size nA, mean MA estimates mA – Population B, sample B of size nB, mean MB estimates mB – mA = mB? • Example: maze times – Rats with hippocampus: Sample A = [43, 26, 35, 31, 28] – Without hippocampus: Sample B = [37, 31, 27, 46, 33] – MA = 32.6, MB = 34.8 – Is difference reliable? mA < mB? • Null hypothesis: mA = mB – No assumptions of what each is (e.g., mA = 10, mB= 10) • Alternative Hypothesis: mA ≠ mB Finding a Test Statistic • Goal: Define a test statistic for deciding mA = mB vs. mA ≠ mB • Constraints (apply to all hypothesis testing): – Must be function of data (both samples) – Sampling distribution must be fully determined by H0 • Can only assume mA = mB • Can’t depend on mA or mB separately, or on s – Alternative hypothesis should predict extreme values • Statistic should measure deviation from mA = mB • so that if mA ≠ mB, we’ll be able to reject H0 • Answer (preview): – Based on MA – MB (just like M – m0 for one-sample t-test) – . MA MB "Standard Error" – M A – MB has Normal distribution – Standard error has (modified) chi-square distribution – Ratio has t distribution Likelihood Function for MA – MB • Central Limit Theorem MA ~ Normal m, s nA MB ~ Normal m, s nB • Distribution of MA – MB – Subtract themeans: E(MA – MB) = E(MA) – E(MB) = m – m = 0 – Add the variances: SE s 1 nA 2 nA sn B s 2 2 1 nB – M . A MB ~ Normal0, s • s 1 nA 1 nB Just divide by standard error? – . M A 1 M1B s nA nB ~ Normal0,1 – but we don’t know s – Need to estimate from data 1 nA 1 nB Estimating s • Already know best estimator for one sample X M 2 s n 1 • Could just use one sample or the other – sA or sB – Works, but not best use of the data • Combining sA and sB – Both come from averages of (X – M)2 – (X – M)2 for each individual score is estimate of s2 – Average them all together: • Degrees of freedom – (nA – 1) + (nB – 1) = nA + nB – 2 X M X M 2 A A 2 B nA nB 2 B Independent-Samples t Statistic t Difference between sample means MA MB Standard Error Typical difference expected by chance Standard Error MSE 1 nA 1 nB Variance of MA – MB Variance from MA Variance from MB Estimate of s X M X M MSE 2 A A 2 B nA nB 2 B Sum of squared deviations Degrees of freedom Steps of I.S. t-test 1. 2. State clearly the two hypotheses Determine null and alternative hypotheses • • 3. Compute the test statistic t from the data • 4. H0: mA = mB H1: mA ≠ mB • 5. t . MA MB MSE 1 nA 1 nB Determine likelihood function for test statistic according to H0 t distribution with nA + nB – 2 degrees of freedom Get p-value • • p ptnA nB t 1-pt(t,df) or or p 2 ptnA nB t 2*(1-pt(|t|,df)) 6. Choose alpha level 7a. p > α: Retain nullhypothesis, mA = mB 7b. p < α: Reject null hypothesis, mA ≠ mB Example – – – – Rats with hippocampus: Sample A = [43, 26, 35, 31, 28] Without hippocampus: Sample B = [37, 31, 27, 46, 33] MA = 32.6, MB = 34.8, MA – MB = -2.2 df = nA + nB – 2 = 5 + 5 – 2 = 8 X M X M MSE 2 2 A A B df t 181.2 208.8 48.75 8 1 nA X XMA (X-MA)2 43 10.4 108.16 26 35 MA MB MSE B 1 nB 2.2 48.75 15 15 -6.6 2.4 p= 43.56 .325.76 31 -1.6 2.56 28 -4.6 21.16 .498 SA(X-MA)2 181.20 t8 X X-MB (X-MB)2 37 2.2 4.84 31 p-3.8 = .64 14.44 27 -7.8 60.84 46 11.2 125.44 33 -1.8 3.24 SB(X-MB)2 208.80 Homogeneity of Variance ~ Normalm, MA ~ Normal m, MB s nA s M A M B ~ Normal 0, s 1 nA 1 nB nB • t-test only works if s A = sB – Variance is homogenous • Not assumption of H0, but of whole procedure – H0: mA = mB & sA = sB – H1: mA ≠ mB & sA = sB • If variance is heterogeneous – Standard procedure doesn’t work – Trick for estimating standard error and reducing degrees of freedom • What to remember – Independent-samples t-test assumes homogenous variance – If not true, you have to use alternative formulas for SE and df Mean Squared Error X Xˆ 2 MSE Population Sample Sample variance I.S. t-test MSE Choosing Xˆ m gives population variance N X Xˆ 2 df Xˆ M Gives estimate of population variance M A ˆ X M B for sample A for sample B X M X M MSE 2 A 2 A B df B Gives estimate of population variance Degrees of Freedom • Applies to any sum-of-squares type formula 2 2 • • A X A M A B X B M B X M X Xˆ 2 Tells how many numbers are really being added X 3 7 – n = 2: only one number – In general: one number determined by the rest 2 X–M -2 2 Every statistic in formula that’s based on X removes 1 df – M, MA, MB – Fancy algebra to rewrite formula in terms of only X results in fewer summands • I will always tell you how to find df for each formula • To get average, divide by df X MSE X M s 2 A n 1 • M A X B M B 2 A B nA nB 2 Distribution of a statistic depends on its df – c2, t, F 2 (X – M)2 4 4