* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Basics of Statistics - University of Delaware
Survey
Document related concepts
Transcript
Student’s t statistic Use Test for equality of two means E.g., compare two groups of subjects given different treatments Test for value of a single mean E.g., test to see if a single group of subjects differs from a known value Also ‘matched sample’ test where a single group is compared before and after treatment (test for zero treatment effect) Advanced Tests of significance of correlation/regression coefficients. Student’s t statistic Assumptions Parent population is normal Sample observations (subjects) are independent. Robustness To normality: Affects Type I error and power and may lead to inappropriate interpretation. In real life, we can’t expect exactly normal data but it should not be too much skewed Student’s t statistic Formula (single group) Let x1, x2, ….xn be a random sample from a normal population with mean µ and variance σ2, then the following statistic is distributed as Student’s t with (n-1) degrees of freedom. x t s/ n Student’s t statistic Formula (two groups) Case 1: Two matched samples The following statistic follows t distribution with n-1 d.f. d t sd / n Where, d is the difference of two matched samples and Sd is the standard deviation of the variable d. Student’s t statistic Formula (two groups) Case 2: Equal Population Standard Deviations: The following statistic is distributed as t distribution with (n1+n2 -2) d.f. t ( x1 x2 ) 1 1 Sp n1 n2 The pooled standard deviation, (n1 1) S12 (n2 1) S 22 Sp n1 n2 2 n1 and n2 are the sample sizes and S1 and S2 are the sample standard deviations of two groups. Student’s t statistic Formula (two groups) Case 3: Unequal population standard deviations The following statistic follows t distribution. t ( x1 x2 ) ( 1 2 ) s12 s22 n1 n2 The d.f. of this statistic is, s 2 / n1 s / n2 v 2 ( s1 / n1 ) 2 ( s22 / n2 ) 2 n1 1 n2 1 2 1 2 2 Student’s t statistic One-sided There can only be on direction of effect The investigator is only interested in one direction of effect. Greater power to detect difference in expected direction Two-sided Difference could go in either direction More conservative Student’s t statistic One group Two groups One sided A single mean differs Two means differ from from a known value in a one another in a specific direction. e.g. specific direction. e.g., mean > 0 mean2 < mean1 Two sided A single mean differs from a known value in either direction. e.g., mean ≠ 0 Two means are not equal. That is, mean1 ≠ mean2 Student’s t statistic SPSS One Group: Analyze>Compare Means> OneSample T Test Two Groups (Matched Samples): Analyze>Compare Means> Paired Samples T Test Two Groups: Analyze>Compare Means> Independent Samples T Test Student’s t statistic R The default t-test is t.test(x, y = NULL, alternative = "two.sided", mu = 0, paired = False, var.equal = FALSE, conf.level = 0.95) Where x and y are two data for two numeric variables. We need to change only default settings matching with the case we want to perform. For example, One Group: t.test(x, alternative=“greater”, mu=30) Two Groups (Matched Samples): t.test(x, y, alternative= "less", mu = 0, paired = TRUE,) Two Groups: t.test(x,y, alternative=“greater”, mu=0, var.equal = TRUE) Student’s t-statistic MS Excel (in Tools -> Data Analysis…) One Group: Not available Two Groups (Matched Samples): t-Test: Paired two sample for mean Two Groups (Independent Samples): t-Test: Two-Sample Assuming Equal Variances t-Test: Two-Sample Assuming Unequal Variances Example 1 Consider the heights of children 4 to 12 years old in dataset 1 of our course website (variable ‘hgt’). Suppose we want to test if the average height (µ) for this age group in the population is 50 inches, using our sample of 60 children. We will use 5% level of significance. This is a one-sample, two-sided test. Example 1 Hypotheses: H0: µ = 50 Ha: µ ≠ 50 Computation in Excel: Excel does not have a 1-sample test, but we can fool it. Create a dummy column parallel to the hgt column with an equal number of cells, all set to 0.0 Run the Matched sample test using hgt and the dummy column and 50 as the hypothesized mean difference. The p-value for two tail test is 0.0092 Example 1 Using SPSS: Analyze> Caompare Means >One Sample T Test > Select hgt > Test value: 50 > ok P-value is .009 Using R, t.test(df1$hgt, mu=50) Two-tail p-value is .0092 Example 2 Suppose we want to compare the height of two groups (hgt in each sex from dataset). H0: Mean heights are equal for the two sexes. Ha: Mean heights are not equal Using MS-Excel: Sort data by sex (data>sort>by:sex) In Data Analysis… t-test:Two-sample Assuming equal variance select the range of hgt for all sex = f as Variable 1 Range select the range of hgt for all sex = m as Variable 2 Range P-value for two-sided test = 0.205 Example 2 Using SPSS: Analyze>Compare Means>Independent-Samples Ttest> Select hgt as a Test Variable Select sex as a Grouping Variable In Define Groups, type f for Group 1 and m for Group 2 Click Continue then OK It gives us the p-value 0.205. We can assume equal variance as the p-value of F statistic for testing equality of variances is 0.845. Sign Test (Nonparametric) Use: (1) Compare the median of a single group with a specified value (instead of single sample t-test). (2) Compare medians of two matched groups (instead of Two matched samples t-test) Test Statistic: Number of positive difference of (median-c). The number of positive difference follows a Binomial distribution. Sign Test (Nonparametric) SPSS: Analyze> Nonparametric Tests> Binomial R: sign.test(x, y = NULL, md = 0, alternative = "two.sided", conf.level = 0.95) For testing the median (md) of a single sample, use data only for one variable. To compare paired data, use two paired variables. NB: This test requires the BSDA package Wilcoxon Signed-Rank Test: USE: Compares medians of two paired samples. Test Statistic: Consider n pairs of data of two variables x and Y, then the following statistic is known as Wilcoxon signed rank statistic. WS = Sum of the rank of positive differences after assigning ranks to the absolute value of differences. Wilcoxon Rank-Sum Test Use: Compares medians of two independent groups. Test Statistic: Let, X and Y be two samples of sizes m and n. Suppose N=m+n. Compute the rank of all N observations. Then, the statistic, Wm= Sum of the ranks of all observations of variable X. Wilcoxon Signed-Rank Test & Wilcoxon Rank-Sum Test SPSS: Two Matched Groups: Analyze> Nonparametric Tests> 2 Related Samples Two Groups: Analyze> Nonparametric Tests> 2 Independent Samples Wilcoxon Signed-Rank Test: /Wilcoxon Rank-Sum Test R: The default test is wilcox.test(x, y, alternative = "two.sided", mu = 0, paired = FALSE, exact = FALSE, conf.int = FALSE, conf.level = 0.95) Two matched Groups: wilcox.test(x, y, alternative = “less", paired = TRUE) Two Groups: wilcox.test(x, y, alternative = “greater“) Example 3 (two matched samples) Subject Hours of Sleep Difference Rank Ignoring Sign Drug Placebo 1 6.1 5.2 0.9 3.5 2 7.0 7.9 -0.9 3.5 3 8.2 3.9 4.3 10 4 7.6 4.7 2.9 7 5 6.5 5.3 1.2 5 6 8.4 5.4 3.0 8 7 6.9 4.2 2.7 6 8 6.7 6.1 0.6 2 9 7.4 3.8 3.6 9 10 5.8 6.3 -0.5 1 3rd & 4th ranks are tied hence averaged. P-value of this test is 0.02. Hence the test is significant at any level more than 2%, indicating the drug is more effective than placebo. Proportion Tests Use Test for equality of two Proportions E.g. proportions of subjects in two treatment groups who benefited from treatment. Test for the value of a single proportion E.g., to test if the proportion of smokers in a population is some specified value (less than 1) Proportion Tests Formula One Group: z Two Groups: z pˆ p0 p0 (1 p0 ) n pˆ 1 pˆ 2 1 1 pˆ (1 pˆ )( ) n1 n2 x1 x2 where pˆ . n1 n2 Proportion Test SPSS: One Group: Analyze> Nonparametric Tests> Binomial Two Groups? R: The default tests are: One Group: binom.test(x, n, p = 0.5, alternative = "two.sided", conf.level = 0.95) Two Groups: prop.test(c(x,y), c(m,n), p = NULL, alternative = "two.sided", conf.level = 0.95, correct = TRUE) X, Y are the number of successes and m and n are the sample sizes Example 4: Proportion of males in Dataset 1 R: n=60 and there are 30 males binom.test(30,60) returns a p-value of 1.0. SPSS: recode sex as numeric Transform> Recode>Into Different Variables> Make all selections there and click on Change after recoding character variable into numeric. Analyze> Nonparametric test> Binomial> select Test variable> Test proportion Set null hypothesis = 0.5 The p-value = 1.0 Chi-square statistic USE Testing the population variance σ2= σ02. Testing the goodness of fit. Testing the independence/ association of attributes Assumptions Sample observations should be independent. Cell frequencies should be >= 5. Total observed and expected frequencies are equal Chi-square statistic Formula: If xi (i=1,2,…n) are independent and normally distributed with mean µ and standard deviation σ, then, xi 2 is a distributi on with n d.f. i 1 n 2 If we don’t know µ, then we estimate it using a sample mean and then, xi x 2 is a distributi on with (n - 1) d.f. i 1 n 2 Chi-square statistic For a contingency table we use the following chi- square test statistic, 2 ( O E ) i 2 i , distribute d as 2 with (n - 1) d.f. Ei i 1 n Oi Observed Frequency Ei Expected Frequency Chi-square statistic SPSS: Analyze> Descriptive stat> Crosstabs> statistics> Chi-square Select variables. Click on Cell button to select items you want in cells, rows, and columns. Example 5 (class demonstration) Make a contingency table using two variables sex and grp from our dataset. Analyze> Descriptive statistics> crosstabs> select variables for rows and columns Statistics> Chi-square> Continue> Cells> selection> ok. It will give us a contingency table and p-value of Pearson Chi-square Tests. For this particular case, the p-value of PearsonChi-square test is 0.549 and d.f. is 2. F-statistic Use: Testing the equality of population variances. Testing the significance of difference of several means in analysis of variance. F-statistic Let X and Y be two independent Chi-square variables with n1 and n2 d.f. respectively, then the following statistic follows a F distribution with n1 and n2 d.f. Fn1 ,n2 X / n1 Y / n2 Let, X and Y are two independent normal variables with sample sizes n1 and n2. Then the following statistic follows a F distribution with n1 and n2 d.f. Fn1 ,n2 s x2 2 sy Where, sx2 and sy2 are sample variances of X and Y. F-statistic Hypotheses: H0: µ1= µ2=…. =µn Ha: µ1≠ µ2 ≠ …. ≠µn Comparison will be done using analysis of variance (ANOVA) technique. ANOVA uses F statistic for this comparison. The ANOVA technique will be covered in another class session.