Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit-IV t,F and Ψ2 - tests Making inferences from experimental sample to population using statistical tests Sample to Population Testing Hypotheses • t,F, and Ψ2 tests mathematically compare the distribution of an experimental sample – i.e. the mean and standard deviation of your results to a normal distribution whose parameters represent some hypothesised feature of the population, which you think your results support • How does this work? (without going through the derivation of the equations…!) • …CENTRAL LIMIT THEOREM t-tests: Testing Hypotheses About Means • Formula: ( x ) √n t s x sample mean population mean s x x sample standard deviation n = size of sample • For a z-test you need to know the population mean and s.d. Often you don’t know the s.d. of the hypothesised or comparison population, and so you use a t-test. This uses the sample s.d. instead. •This introduces a source of error, which decreases as your sample size increases •Therefore, the t statistic is distributed differently depending on the size of the sample, like a family of normal curves. The degrees of freedom (d.f. = sample size – 1) represents which of these curves you are relating your t-value to. There are different tables of p-values for different degrees of freedom. larger sample = more ‘squashed’ t-statistic distribution = easier to get significance Kinds of t-tests (formula is slightly different for these different kinds): • Single-sample: tests whether a sample mean is significantly different from 0 • Independent-samples: tests the relationship between two independent populations • Paired-samples: tests the relationship between two linked populations, for example means obtained in two conditions by a single group of participants t-tests: Worked Example of Single Sample t-test • We know that finger tapping speed in normal population: – Mean=100ms per tap • Finger tapping speed in 8 subjects with caffeine addiction: – Mean = 89.4ms – Standard deviation = 20ms • Does this prove that caffeine addiction has an effect on tapping speed? • Null Hypothesis H0: tapping speed not faster after caffeine • Preselected significance level was 0.05 • Calculate from t value, for ex. T(7)= √8 (89.4 -100) = -1.5 20 • Find area below t(7) = -1.5, get 0.07: i.e. 7% of the time we would expect a score as low as this • This value is above 0.05 => We could NOT reject H0! • We can’t conclude that caffeine addiction has an effect on tapping speed F-test ANOVA = analysis of variance involves calculating an F value whose significance is tested (similarly to a z or t value) • • • • Like t-tests, F-tests deal with differences between or among sample means, but with any number of means (each mean corresponding to a ‘factor’) Q/ do k means differ? A/ yes, if the F value is significant Q/ how do the k factors influence each other? A/ look at the interaction effects ANOVA calculates F values by comparing the variability between two conditions with the variability within each condition (this is what the formula does) – e.g. we give a drug that we believe will improve memory to a group of people and give a placebo to another group. We then take dependent measures of their memory performance, e.g. mean number of words recalled from memorised lists. – An ANOVA compares the variability that we observe between the two conditions to the variability observed within each condition. Variability is measured as the sum of the difference of each score from the mean. – Thus, when the variability that we predict (between the two groups) is much greater than the variability we don't predict (within each group) then we will conclude that our treatments produce different results. F-tests / ANOVAs: What are they? • ANOVA calculates an F value, which has a distribution related to the sample size and number of conditions (degrees of freedom) F MS MS factors • error The formula compares the variance between and within conditions or ‘factors’ as discussed above – we won’t worry about the derivation! (n.b. MS = mean squares) • If the F statistic is significant, this tells us that the means of the factors differ significantly => are not likely to have come from the same ‘population’ = our variable is having an effect • • When can we use ANOVAs? The formula is based on a model of what contributes to the value of any particular data point, and how the variance in the data is composed. This model makes a number of assumptions that must be met in order to allow us to use ANOVA – homogeneity of variance – normality – independence of observations • Remember: when you get a significant F value, this just tells you that there is a significant difference somewhere between the means of the factors in the ANOVA. Therefore, you often need to do planned or post-hoc comparisons in order to test more specific hypotheses and probe interaction effects ANOVAs: Worked Example • • Testing Differences between independent sample means: Following rTMS over the Right Parietal cortex, are the incorrectly cued trials in a cued RT task slowed down compared to the correctly cued trials? “Repeated measures” ANOVA: – 1 group of 14 healthy volunteers – Perform 100 trials pre- and 100 trials post- stimulation – Real vs Sham rTMS on two separate days • Within-session factors: – Correct vs Incorrect trials – Pre vs Post • Between-session factors: – Real vs Sham rTMS • • Null Hypothesis H0: there is no difference in the RTs of incorrectly cued trials Many possibilities if H0 is rejected: – All means are different from each other: meanICpreR vs. meanICpostR vs. meanICpreS vs. meanICpostS – Means in the Real condition are different from means in the Sham – Interaction of means might be different (pre_post in Real diff. pre_post in Sham) References • http://obelia.jde.aca.mmu.ac.uk/rd/arsham/o pre330.htm#ranova • ‘Statistical Methods for Psychology’ (2001), by David Howell • SPM website: http://www.fil.ion.ucl.ac.uk/spm/ Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics Purpose • To measure discontinuous categorical/binned data in which a number of subjects fall into categories • We want to compare our observed data to what we expect to see. Due to chance? Due to association? • When can we use the Chi-Square Test? – Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another?, Testing a population for expected proportions Assumptions: • • • • • • 1 or more categories Independent observations A sample size of at least 10 Random sampling All observations must be used For the test to be accurate, the expected frequency should be at least 5 Conducting Chi-Square Analysis 1) Make a hypothesis based on your basic biological question 2) Determine the expected frequencies 3) Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E 4) Find the degrees of freedom: (c-1)(r-1) 5) Find the chi-square statistic in the Chi-Square Distribution table 6) If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa. Example 1: Testing for Proportions HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. HA: Horned lizards eat more amounts of one species of ants than the others. Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1.25 0.2 0.45 χ2 = 1.90 χ2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. α = 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table. Example 1: Testing for Proportions χ2α=0.05 = 5.991 Example 1: Testing for Proportions Leaf Cutter Ants Carpenter Ants Black Ants Total Observed 25 18 17 60 Expected 20 20 20 60 O-E 5 -2 -3 0 (O-E)2 E 1.25 0.2 0.45 χ2 = 1.90 Chi-square statistic: χ2 = 5.991 Our calculated value: χ2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 ∴ We do not reject our null hypothesis. SAS: Example 1 Included to format the table Define your data Indicate what your want in your output SAS: Example 1 SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Reject null hypothesis. SAS: Example 1 High probability that Chi-Square statistic > our calculated chi-square statistic. We do not reject our null hypothesis. SAS: Example 1 Example 2: Testing Association c HO: Gender and eye colour are not associated with each other. HA: Gender and eye colour are associated each how other.much each cell cellchi2 =with displays contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics Example 2: More SAS Examples Example 2: More SAS Examples (2-1)(3-1) = 1*2 = 2 High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis. Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi-square value. Limitations • No categories should be less than 1 • No more than 1/5 of the expected categories should be less than 5 – To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more • Yates Correction* – When there is only 1 degree of freedom, regular chi-test should not be used – Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values What do these mean? Likelihood Ratio Chi Square Continuity-Adjusted Chi-Square Test Mantel-Haenszel Chi-Square Test QMH = (n-1)r2 r2 is the Pearson correlation coefficient (which also measures the linear association between row and column) ◦ http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/ default/viewer.htm#procstat_freq_a0000000659.htm Tests alternative hypothesis that there is a linear association between the row and column variable Follows a Chi-square distribution with 1 degree of freedom Phi Coefficient Contigency Coefficient Cramer’s V Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.44 0.59 1.03 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.53 0.72 1.25 TOTAL 23 17 40 Chi-Square Total Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION 2.28 Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.27 0.37 0.64 No Heart Disease 8 Expected 10.35 Chi-Square 0.33 TOTAL 23 Chi-Square Total 10 (|15-12.65| - 0.5)2 12.65 7.65 =0.45 0.27 17 18 18 0.78 40 1.42 Example 1: Testing for Proportions χ2α=0.05 = 3.841 Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. High Cholesterol Low Cholesterol Total Heart Disease 15 7 22 Expected 12.65 9.35 22 Chi-Square 0.27 0.37 0.64 No Heart Disease 8 10 18 Expected 10.35 7.65 18 Chi-Square 0.33 0.45 0.78 TOTAL 23 17 40 Chi-Square Total 3.841 > 1.42 ∴ We do not reject our null hypothesis. 1.42 Fisher’s Exact Test Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small pvalue = Likely negative association. Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. Two-Tail: Use this when there is no prior alternative. Yates & 2 x 2 Contingency Tables Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. Conclusion • The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment • There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories • We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq References Chi-Square Test Descriptions: http://www.enviroliteracy.org/pdf/materials/1210.pdf http://129.123.92.202/biol1020/Statistics/Appendix%206%2 0%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E. 2005. Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244. SAS Support website: http://www.sas.com/index.html “FREQ procedure” YouTube Chi-square SAS Tutorial (user: mbate001): http://www.youtube.com/watch?v=ACbQ8FJTq7k