Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Final Review Session Exam details • • • • Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 2006, 12-1:30 pm • Location: Osborne Centre, Unit 1 (”A”) QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Things to Review • Concepts • Basic formulae • Statistical tests Things to Review • Concepts • Basic formulae • Statistical tests Populations Samples Random sample First Half Parameters Estimates Null hypothesis Alternative hypothesis P-value Mean Median Mode Type I error Type II error Variance Standard deviation Categorical Nominal, ordinal Numerical Discrete, continuous Sampling distribution Standard error Central limit theorem Second Half Normal distribution Quantile plot Shapiro-Wilk test Data transformations Nonparametric tests Independent contrasts Observations vs. experiments Confounding variables Control group Replication and pseudoreplication Blocking Factorial design Power analysis Simulation Randomization Bootstrap Likelihood Example Conceptual Questions • (you’ve just done a two-sample t-test comparing body size of lizards on islands and the mainland) • What is the probability of committing a type I error with this test? • State an example of a confounding variable that may have affected this result • State one alternative statistical technique that you could have used to test the null hypothesis, and describe briefly how you would have carried it out. Randomization test Null hypothesis Randomized data Sample Calculate the same test statistic on the randomized data Test statistic compare Null distribution How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Things to Review • Concepts • Basic formulae • Statistical tests Things to Review • Concepts • Basic formulae • Statistical tests Sample Test statistic Null hypothesis compare Null distribution How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Statistical tests • Binomial test • Chi-squared goodnessof-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick reference summary: Binomial test • What is it for? Compares the proportion of successes in a sample to a hypothesized value, po • What does it assume? Individual trials are randomly sampled and independent • Test statistic: X, the number of successes • Distribution under Ho: binomial with parameters n and po. • Formula: n x nx P(x) p 1 p x P(x) = probability of a total of x successes p = probability of success in each trial n = total number of trials P = 2 * Pr[xX] Binomial test Null hypothesis Pr[success]=po Sample Test statistic x = number of successes compare Null distribution Binomial n, po How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Binomial test H0: The rela tive frequency of succ esses in the population is p0 HA: The rela tive frequency of succ esses in the population is not p0 Statistical tests • Binomial test • Chi-squared goodnessof-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick reference summary: 2 Goodness-of-Fit test • What is it for? Compares observed frequencies in categories of a single variable to the expected frequencies under a random model • What does it assume? Random samples; no expected values < 1; no more than 20% of expected values < 5 • Test statistic: 2 • Distribution under Ho: 2 with df=# categories - # parameters - 1 • Formula: 2 all classes Observedi Expectedi 2 Expectedi 2 goodness of fit test Null hypothesis: Data fit a particular Discrete distribution Sample Calculate expected values Test statistic 2 Observedi Expectedi 2 all classes Expectedi compar e Null distribution: 2 With N-1-param. d.f. How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho 2 Goodness-of-Fit test H0: The data come from a certain distribution HA: The data do not come from that distrubition Possible distributions n x nx Pr[x] p 1 p x e Pr X X! X Pr[x] = n * frequency of occurrence Proportional Given a number of categories Probability proportional to number of opportunities Days of the week, months of the year Binomial Number of successes in n trials Have to know n, p under the null hypothesis Punnett square, many p=0.5 examples Poisson Number of events in interval of space or time n not fixed, not given p Car wrecks, flowers in a field Statistical tests • Binomial test • Chi-squared goodnessof-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick reference summary: 2 Contingency Test • What is it for? Tests the null hypothesis of no association between two categorical variables • What does it assume? Random samples; no expected values < 1; no more than 20% of expected values < 5 • Test statistic: 2 • Distribution under Ho: 2 with df=(r-1)(c-1) where r = # rows, c = # columns • Formulae: Expected RowTotal* ColTotal GrandTotal 2 all classes Observedi Expectedi 2 Expectedi 2 Contingency Test Null hypothesis: No association between variables Sample Calculate expected values Test statistic 2 Observedi Expectedi 2 all classes Expectedi compar e Null distribution: 2 With (r-1)(c-1) d.f. How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho 2 Contingency test H0: There is no association between these two variables HA: There is an association between these two variables Statistical tests • Binomial test • Chi-squared goodnessof-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick reference summary: One sample t-test • What is it for? Compares the mean of a numerical variable to a hypothesized value, μo • What does it assume? Individuals are randomly sampled from a population that is normally distributed. • Test statistic: t • Distribution under Ho: t-distribution with n-1 degrees of freedom. • Formula: Y o t SEY One-sample t-test Null hypothesis The population mean is equal to o Sample Test statistic Y o t s/ n compare Null distribution t with n-1 df How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho One-sample t-test Ho: The population mean is equal to o Ha: The population mean is not equal to o Paired vs. 2 sample comparisons Quick reference summary: Paired t-test • What is it for? To test whether the mean difference in a population equals a null hypothesized value, μdo • What does it assume? Pairs are randomly sampled from a population. The differences are normally distributed • Test statistic: t • Distribution under Ho: t-distribution with n-1 degrees of freedom, where n is the number of pairs d do • Formula: t SE d Paired t-test Null hypothesis The mean difference is equal to o Sample Test statistic d do t SE d compare Null distribution t with n-1 df *n is the number of pairs How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Paired t-test Ho: The mean difference is equal to 0 Ha: The mean difference is not equal 0 Quick reference summary: Two-sample t-test • What is it for? Tests whether two groups have the same mean • What does it assume? Both samples are random samples. The numerical variable is normally distributed within both populations. The variance of the distribution is the same in the two populations • Test statistic: t • Distribution under Ho: t-distribution with n1+n2-2 degrees of freedom. 1 2 1 SEY Y sp Y Y 1 2 n1 n 2 • Formulae: t 1 SE Y Y 1 df1s12 df2 s22 s df1 df2 2 p 2 2 Two-sample t-test Null hypothesis The two populations have the same mean Sample 12 Test statistic Y1 Y2 t SE Y Y 1 compare Null distribution t with n1+n2-2 df 2 How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Two-sample t-test Ho: The means of the two populations are equal Ha: The means of the two populations are not equal Statistical tests • Binomial test • Chi-squared goodnessof-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA F-test for Comparing the variance of two groups H0 : 2 1 2 2 HA : 2 1 2 2 F-test Null hypothesis The two populations have the same variance Sample 2 1 22 Test statistic 2 1 2 2 s F s compare Null distribution F with n1-1, n2-1 df How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Welch’s t-test Null hypothesis The two populations have the same mean Sample 12 Test statistic t Y1 Y2 compare Null distribution t with df from formula s12 s22 n1 n2 How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Parametric One-sample and Paired t-test Two-sample t-test Nonparametric Sign test Mann-Whitney U-test Quick Reference Summary: Sign Test • What is it for? A non-parametric test to compare the medians of a group to some constant • What does it assume? Random samples • Formula: Identical to a binomial test with po= 0.5. Uses the number of subjects with values greater than and less than a hypothesized median as the test statistic. n x nx P(x) p 1 p P = 2 * Pr[xX] P(x) = probability of a total of x successes x p = probability of success in each trial n = total number of trials Sign test Null hypothesis Median = mo Sample Test statistic x = number of values greater than mo compare Null distribution Binomial n, 0.5 How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Sign Test • Ho: The median is equal to some value mo • Ha: The median is not equal to mo Quick Reference Summary: Mann-Whitney U Test • What is it for? A non-parametric test to compare the central tendencies of two groups • What does it assume? Random samples • Test statistic: U • Distribution under Ho: U distribution, with sample sizes n1 and n2 • Formulae: n = sample size of group 1 n1n1 1 U1 n1n2 R1 2 U2 n1n2 U1 1 n2= sample size of group 2 R1= sum of ranks of group 1 Use the larger of U1 or U2 for a two-tailed test Mann-Whitney U test Null hypothesis The two groups Have the same median Sample Test statistic U1 or U2 (use the largest) compare Null distribution U with n1, n2 How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick Reference Guide Correlation Coefficient • What is it for? Measuring the strength of a linear association between two numerical variables • What does it assume? Bivariate normality and random sampling • Parameter: • Estimate: r X X Y Y • Formulae: 1 r 2 i i r Xi X Yi Y 2 2 SEr n 2 Quick Reference Guide - t-test for zero linear correlation • What is it for? To test the null hypothesis that the population parameter, , is zero • What does it assume? Bivariate normality and random sampling • Test statistic: t • Null distribution: t with n-2 degrees of r freedom t • Formulae: SE r T-test for correlation Null hypothesis =0 Sample Test statistic r t SEr compare Null distribution t with n-2 d.f. How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick Reference Guide Spearman’s Rank Correlation • What is it for? To test zero correlation between the ranks of two variables • What does it assume? Linear relationship between ranks and random sampling • Test statistic: rs • Null distribution: See table; if n>100, use tdistribution • Formulae: Same as linear correlation but based on ranks Spearman’s rank correlation Null hypothesis =0 Sample Test statistic rs compare Null distribution Spearman’s rank Table H How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Assumptions of Regression • At each value of X, there is a population of Y values whose mean lies on the “true” regression line • At each value of X, the distribution of Y values is normal • The variance of Y values is the same at all values of X • At each value of X the Y measurements represent a random sample from the population of Y values OK Non-linear Non-normal Unequal variance Quick Reference Summary: Confidence Interval for Regression Slope • What is it for? Estimating the slope of the linear equation Y = + X between an explanatory variable X and a response variable Y • What does it assume? Relationship between X and Y is linear; each Y at a given X is a random sample from a normal distribution with equal variance • Parameter: • Estimate: b • Degrees of freedom: n-2 • Formulae: b t (2),df SE b b t (2),df SE b SE b MSresidual MS residual X X 2 i 2 (Y Y ) b (X i X )(Yi Y ) i n 2 Quick Reference Summary: t-test for Regression Slope • What is it for? To test the null hypothesis that the population parameter equals a null hypothesized value, usually 0 • What does it assume? Same as regression slope C.I. • Test statistic: t • Null distribution: t with n-2 d.f. • Formula: b t SE b T-test for Regression Slope Null hypothesis =0 Sample Test statistic b t SE b compare Null distribution t with n-2 df How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Statistical tests • Binomial test • Chi-squared goodness-of-fit – Proportional, binomial, poisson • Chi-squared contingency test • t-tests – One-sample t-test – Paired t-test – Two-sample t-test • F-test for comparing variances • Welch’s t-test • Sign test • Mann-Whitney U • Correlation • Spearman’s r • Regression • ANOVA Quick Reference Summary: ANOVA (analysis of variance) • What is it for? Testing the difference among k means simultaneously • What does it assume? The variable is normally distributed with equal standard deviations (and variances) in all k populations; each sample is a random sample • Test statistic: F • Distribution under Ho: F distribution with k-1 and N-k degrees of freedom Quick Reference Summary: ANOVA (analysis of variance) • Formulae: MSgroup F MS error SSerror SSerror MSerror df error N k SSgroup SSgroup MSgroup df group k 1 SSgroup n i (Y i Y ) 2 Y i = mean of group i Y = overall mean SSerror si2 (n i 1) ni = size of sample i N = total sample size ANOVA k Samples Test statistic MSgroup F MS error compare Null hypothesis All groups have the same mean Null distribution F with k-1, N-k df How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho ANOVA • Ho: All of the groups have the same mean • Ha: At least one of the groups has a mean that differs from the others ANOVA Tables Source of variation Treatment Error Total Sum of squares df SSgroup n i (Y i Y ) 2 k-1 SSerror si2 (n i 1) N-k SSgroup SSerror N-1 Mean Squares MSgroup SSgroup df group MSerror SSerror df error F ratio P Picture of ANOVA Terms SSTotal MSTotal SSGroup MSGroup SSError MSError Two-factor ANOVA Table Source of variation Sum of Squares df Mean Square F ratio Treatment 1 SS1 k1 - 1 SS1 k1 - 1 MS1 MSE Treatment 2 SS2 k2 - 1 SS2 k2 - 1 MS2 MSE Treatment 1 * Treatment 2 SS1*2 (k1 - 1)*(k2 - 1) SS1*2 MS1*2 (k1 - 1)*(k2 - 1) MSE Error SSerror XXX Total SStotal N-1 SSerror XXX P Interpretations of 2-way ANOVA Terms 70 pH 5.5 pH 6.5 pH 7.5 60 Growth Rate 50 40 30 20 10 0 25 30 35 Temperature 40 Interpretations of 2-way ANOVA Terms 45 Effect of Temperature, Not pH 40 pH 5.5 pH 6.5 pH 7.5 35 Growth Rate 30 25 20 15 10 5 0 25 30 35 Temperature 40 Interpretations of 2-way ANOVA Terms 35 pH 5.5 pH 6.5 pH 7.5 30 Effect of pH, Not Temperature Growth Rate 25 20 15 10 5 0 25 30 35 Temperature 40 Interpretations of 2-way ANOVA Terms 70 Effect of pH and Temperature, No interaction 60 pH 5.5 pH 6.5 pH 7.5 Growth Rate 50 40 30 20 10 0 25 30 35 Temperature 40 Interpretations of 2-way ANOVA Terms 45 40 Effect of pH and Temperature, with interaction 35 pH 5.5 pH 6.5 pH 7.5 Growth Rate 30 25 20 15 10 5 0 25 30 35 Temperature 40 Quick Reference Summary: 2-Way ANOVA • What is it for? Testing the difference among means from a 2-way factorial experiment • What does it assume? The variable is normally distributed with equal standard deviations (and variances) in all populations; each sample is a random sample • Test statistic: F (for three different hypotheses) • Distribution under Ho: F distribution Quick Reference Summary: 2Way ANOVA • Formulae: Just need to know how to fill in the table 2-way ANOVA Null hypotheses (three of them) Samples Test statistic MSgroup F MS error compare Null distribution F How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho 2-way ANOVA Samples Null hypotheses (three of them) Treatment 1 Null distribution compare Test statistic MSgroup F MS error F How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho 2-way ANOVA Samples Null hypotheses (three of them) Treatment 2 Null distribution compare Test statistic MSgroup F MS error F How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho 2-way ANOVA Samples Null hypotheses (three of them) Interaction Null distribution compare Test statistic MSgroup F MS error F How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho General Linear Models • First step: formulate a model statement • Example: Y TREATMENT General Linear Models • Second step: Make an ANOVA table • Example: Source of variation Treatme nt Error Total Sum of squares df Mean Squares k-1 MSgroup SSgroup df group SSerror si2 (n i 1) N-k MSerror SSerror df error SSgroup SSerror N-1 SSgroup n i (Y i Y ) 2 F ratio F MSgroup MS error P * Randomization test Null hypothesis Randomized data Sample Calculate the same test statistic on the randomized data Test statistic compare Null distribution How unusual is this test statistic? P < 0.05 Reject Ho P > 0.05 Fail to reject Ho Which test do I use? 1 Methods for a single variable How many variables am I comparing? 2 Methods for comparing two variables 1 Methods for a single variable How many variables am I comparing? 2 3 Methods for comparing two variables Methods for comparing three or more variables Methods for one variable Categorical Is the variable categorical or numerical? Comparing to a single proportion po or to a distribution? po Binomial test Numerical distribution 2 Goodnessof-fit test One-sample t-test Methods for two variables Y X Explanatory variable Response variable Categorical Numerical Contingency table Contingency Logistic Grouped bar graph Categorical analysis regression Mosaic plot Multiple histograms t-test Correlation Scatter plot Cumulative frequency distributions Numerical ANOVA Regression How many variables am I comparing? 1 2 Is the variable categorical or numerical? Categorical Explanatory variable Response variable Categorical Numerical Contingency table Logistic Contingency Grouped bar graph Categorical analysis regression Mosaic plot Multiple histograms t-test Correlation Scatter plot Cumulative frequency distributions Numerical ANOVA Regression Numerical Comparing to a single proportion po or to a distribution? One-sample t-test po Binomial test distribution 2 Goodnessof-fit test Contingency analysis