Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Statistics Class Overheads for APA 3381 - part 2 “Measurement and Data Analysis in Human Kinetics” by D. Gordon E. Robertson, PhD, FCSB School of Human Kinetics University of Ottawa Copyright © D.G.E. Robertson, October 2015 2 Hypothesis Testing Hypothesis: conjecture, proposition or statement based on published literature, data or a theory which may or may not be true. Statistical Hypothesis: conjecture about a population parameter. • usually stated in mathematical terms • two types, null and alternate Null Hypothesis (H0): states that there is NO difference between a parameter and a specific value or among several different parameters Alternate Hypothesis (H1): states that there is a “significant” difference between a parameter and a specific value or among several different parameters Examples: • H0: : = 82 kg H1: : 82 kg* • H0: : # 150 cm H1: : > 150 cm • H0: : $ 65.0 s H1: : < 65.0 s • H0: :0 = :1 H1: ì0 ì1* • H0: :0 $ :1 H1 : : 0 < : 1 Notice that the equality symbols are always with the null hypotheses. * These are called two-tailed tests; others are all “directional” or one-tailed tests. 3 Two-tailed vs One-tailed Tests Two-tailed: also called a non-directional test. • null hypothesis is “disproved” if sample mean falls in either tail • most appropriate test especially with no previous experimentation • less powerful than onetailed One-tailed: also called a directional test. • researcher must have reason that permits selecting in which tail the test will be done, i.e., will the experimental protocol increase or decrease the sample statistic • more powerful than two-tailed since it is easier to achieve a significant difference • fails to handle the situation when the sample means falls in the “wrong” tail One-tailed, left One-tailed, right 4 Statistical Testing To determine the veracity (truth) of an hypothesis a statistical test must be undertaken that yields a test value. This value is then evaluated to determine if it falls in the critical region of a appropriate probability distribution for a given significance or alpha (a) level. The critical region is the region of the probability distribution which rejects the null hypothesis. Its limit(s), called the critical value(s), are defined by the specified confidence level. The confidence level must be selected in advance of computing the test value. To do otherwise is statistical dishonesty. When in doubt one should always use a two-tailed test. Truth table: H0 is true and H1 is false H0 is false and H1 is true Test rejects H0 (accepts H1) Error (a) ; Type I error Correct (1 - b) ( (experiment succeeded) Test does not reject H0 (accepts H0) Correct (1 - a) ; (experiment failed) Error (b) ; Type II error 5 z-Test and t-Test Test for a Single Mean: • • used to test a single sample mean ( ) when the population mean (:) is known Is the sample taken from the population or is it different (greater, lesser, or either)? z-Test: • when population s.d. (s) is known Test value: • • if z is in critical region defined by critical value(s) then sample mean is “significantly different” from population mean, m if sample size is less than 30, data must be normally or approximately normally distributed t-Test: • • if s is unknown use s and use t-test and t-distribution with d.f. = n –1 if sample size is less than 30, data must be normally or approximately normally distributed Test value: 6 Flow Diagram for Choosing the Correct Statistical Test Same as flow diagram used for confidence intervals. Generally the sample’s mean and standard deviation are used with the t-distribution. The t-distribution becomes indistinguishable from the z-distribution (normal distribution) when n is 30 or greater. P-values of a Statistical Test Instead of reporting significance levels (a = 0.05) or equivalent probabilities (P<0.05) many researchers report the test values as probabilities or P-values (e.g., P = 0.0455, P = 0.253, P < 0.001). Advanced statistical programs report P-values, if not, use P<0.05 or P<0.01. If test shows P=0.000 then report P<0.0005. 7 Power of a Statistical Test Power: ability of a statistical test to detect a real difference. • • probability of rejecting the null hypothesis when it is false (i.e., there is a real difference) equal to 1 – b (1 – probability of Type II error) Ways of increasing power • • • • • • • • Increasing a will increase power but it also increases chance of a Type I error (usually not > 0.10). Increasing sample size is always a good choice but costs increase. Using ratio or interval data versus nominal or ordinal. Tests involving ratio/interval data are called parametric tests. Tests involving nominal and ordinal data are called nonparametric tests. Parametric tests are more powerful. Use them when you can. Using repeated-measures tests, such as, the repeatedmeasures t-test or repeated-measures ANOVA. By using the same subjects repeatedly, variability is reduced. But subjects could improve because of practice or become worse because of fatigue or boredom. If variances are equal use pooled estimates and the appropriate test. Using samples that represent extremes. Reduces generalizability of experiment results. Standardizing testing procedures reduces variability. Using one-tailed vs. two-tailed tests. Serious problem occurs if results are in wrong tail. Not recommended. 8 Testing Differences between Two Means Large Independent Sample Means: used to test whether the data from two samples come from the same population or two different populations. Assumptions: • data were randomly sampled • samples are independent, i.e., there can be no relationship between the two samples • standard deviations known and if sample size < 30 population(s) is/are normally distributed • if more than two sample means are tested adjustments must be made to significance level (e.g., Bonferroni correction, aBonferroni = a/number of tests) z-test: Test value: Critical value comes from standard normal (z) distribution. Use one- or two-tailed test. Conservatively, choose the two-tailed test. Values are also available at bottom of t-distribution. 9 The Step-by-Step Approach Step 1: State hypotheses Two-tailed: One-tailed: H0 : : 1 = : 2 H0: :1 # :2 or H0: :1 $ :2 H1 : : 1 : 2 H1: :1 > :2 or H1: :1 < :2 Step 2: Find critical value Look up z-score for specified significance (a) level and for oneor two-tailed test (selected in advance). Usually use a = 0.05 and two-tailed test, i.e., zcritical = ±1.960. For one-tailed tests use zcritical = –1.645 or +1.645. Step 3: Compute test value Step 4: Make decision Draw diagram of normal distribution and critical regions. If test value is in critical region reject the null hypothesis otherwise do not reject. Step 5: Summarize results Restate hypothesis (null or alternate) accepted in step 4. If null is rejected: There is enough evidence to reject the null hypothesis. If null is not rejected: There is not enough evidence to reject the null hypothesis. Optionally, reword hypothesis in “lay” terms. E.g., There is or is not a difference between the two populations or one population is greater/lesser than the other for the independent variable. 10 Testing Differences between Two Means Small Independent Sample Means: when population standard deviations are unknown and sample size is < 30 use t-distribution for critical values and t-test for test values. Use F-ratio to determine whether sample variances are equal or unequal. Then choose the appropriate t-test. Assumptions • samples are random samples • two samples must be independent, i.e., different subjects, if not, use “dependent groups t-test” • population must be normally distributed If sample variances are NOT equal: Use test value: for degrees of freedom (df) use smaller of n1 –1 and n2 –1 (i.e., conservative choice, higher critical value) If sample variances are equal: Use test value: and df = n1 + n2 – 2 Uses a “pooled” estimate of variance that combined with reduced degrees of freedom increases the test’s power. 11 Test for Equal Variances Also called Homogeneity of Variance • used primarily to determine which t-test to use • uses F-distribution and F-test (later used for ANOVA) • assume variances are equal and test if unequal • SPSS uses “Levine’s Test for Equality of Variances.” If P (Sig.) < a variances are NOT equal. Step 1: Always a two-tailed test. H0: s12 = s22 H1: s12 s22 Step 2: Find critical value (FCV) from F-distribution. Use degrees of freedom of larger variance (dfN = nlarger –1) as numerator and degrees of freedom of smaller variance as denominator (dfD = nsmaller –1). Step 3: Compute test value: Note, FTV will always be $1. Steps 4 and 5: If FTV > FCV then reject H0 and conclude variances are unequal. If FTV # FCV then do NOT reject H0 and conclude variances are equal. I.e., you have homogeneity of variances. You can now select the appropriate “Independent Groups t-test”. 12 Flow Diagram for Choosing the Correct Independent Samples t-Test Similar to flow diagram used for single sample means. But requires a test for equality of variances (homogeneity of variance). Generally the sample’s mean and standard deviation are used with the t-distribution. The t-distribution becomes indistinguishable from the z-distribution (i.e., normal distribution) when n is 30 or greater. Samples must be random and independent. 13 Testing Differences between Two Means Dependent Sample Means: Used when two samples are not independent. More powerful than independent groups t-test and easier to perform (no variance test required). Simplifies research protocol (i.e., fewer subjects) but dependence may limit generalizability. Examples: • repeated measures (test/retest, before/after) • matched pairs t-test (subjects matched by a relevant variable: height, weight, shoe size, IQ score, age) • twin studies (identical, heterozygotic, living apart) Step 1: Two-tailed: One-tailed: H0 : : D = 0 H0: :D # 0 or H0: :D $ 0 H1 : : D 0 H1 : : D > 0 or H1: :D < 0 Step 2: Critical value from t-distribution with degrees of freedom equal to number of data pairs minus one (df = n – 1). Step 3: Compute differences between pairs (D) then mean difference ( ) and sD : Test value: Step 4 and 5: If test value > critical value reject H0 otherwise there is no difference between the two trials/groups. 14 Correlation and Regression Linear Correlation: • Does one variable increase or decrease linearly with another? • Is there a linear relationship between two or more variables? Types of linear relationships: Positive linear No relationship Negative linear None or weak 15 Scattergrams Weak linear Strong Linear Other relationships: Nonlinear or Curvilinear Linear and Exponential? 16 Correlation Pearson Product Moment Correlation Coefficient: • simply called correlation coefficient, PPMC, or r-value • linear correlation between two variables Examples: Weight increases with height. IQ with brain size?! Used for calibration of instruments, force transducers, spring scales, electrogoniometers (measure joint angles). Multiple Correlation: • used when several independent variables influence a dependent variable • R-value (capital R vs r) Defined as: Y = A + B1 X1 + B2 X2 + B3 X3 + ... + Bn Xn Examples: • Heart disease is affected by family history, obesity, smoking, diet, etc. • Academic performance is affected by intelligence, economics, experience, memory, etc. • Lean body mass is predicted by a combination of body mass, thigh, triceps and abdominal skinfold measures. 17 Significance of Correlation Coefficient Method 1 Step 1: H0: r = 0; H1: r 0 Step 2: Look up rcrit for n – 2 degrees of freedom (Table I) Step 3: Compute sample r (as above) Step 4: Sample r is significant if it is greater than rcrit Step 5: If significance occurs data are linearly correlated otherwise they are not. If table of significant correlation coefficients is not available or significance level (a) is not 0.05 or 0.01 use Method 2. Method 2 Step 1: H0: r = 0; H1: r 0 Step 2: Look up tcrit for n –2 degrees of freedom Step 3: Compute sample r then t Step 4: Sample t is significant if it is greater than tcrit Step 5: If significance occurs data are linearly correlated otherwise they are not. 18 Regression Regression: can only be done if a significant correlation exists. • Equation of line or curve which defines the relationship between variables. • The “line of best fit”. • Mathematical technique is called “least squares” method. This technique computes the line that minimizes the squares of the deviations of the data from the line. 19 Coefficient of Determination and Standard Error of Estimate Coefficient of Determination • Measures the strength of the relationship between the two variables. • Equal to the explained variation divided by the total variation = r2. • Usually given as a percentage, i.e., coefficient of determination = r2 × 100% For example, an r of 0.90 has 81% of total variation explained but an r of 0.60 has only 36% of its variation. A correlation may be significant but explain very little. Standard Error Of Estimate • Measure of the variability of the observed values about the regression line • Can be used to compute a confidence interval for a predicted value standard error of estimate: 20 Possible Reasons for a Significant Correlation 1. There is a direct cause-and-effect relationship between the variables. That is, x causes y. For example, positive reinforcement improves learning, smoking causes lung cancer and heat causes ice to melt. 2. There is a reverse cause-and-effect relationship between the variables. That is, y causes x. For example, suppose a researcher believes excessive coffee consumption causes nervousness, but the researcher fails to consider that the reverse situation may occur. That is, it may be that an nervous people crave coffee. 3. The relationship between the variables may be caused by a third variable. For example, if a statistician correlated the number of deaths due to drowning and the number of cans of soft drinks consumed during the summer, he or she would probably find a significant relationship. However, the soft drink is not necessarily responsible for the deaths, since both variables may be related to heat and humidity. 4. There may be a complexity of interrelationships among many variables. For example, a researcher may find a significant relationship between students’ high school grades and college grades. But there probably are many other variables involved, such as IQ, hours of study, influence of parents, motivation, age and instructors. 5. The relationship may be coincidental. For example, a researcher may be able to find a significant relationship between the increase in the number of people who are exercising and the increase in the number of people who are committing crimes. But common sense dictates that any relationship between these two variables must be due to coincidence. 21 Comparing Frequencies using Chi-square Chi-square or c2: pronounced “ki squared”. • Used to test whether the frequency of nominal data fit a certain pattern (goodness of fit) or whether two variables have a dependency (test for independence). • Can be used to test whether data are normally distributed and for homogeneity of proportions • Frequency of each nominal category is computed and compared to an expected frequency. Goodness of Fit: • Need to know expected pattern of frequencies. • If not known assume equal distribution among all categories. Assumptions: • data are from a random sample • expected frequency for each category must be 5 or more Examples: • test for product / procedure preference (each is assumed equally likely to be selected) • test for “fairness” of coin, die, roulette wheel (expect each outcome equally) • test for expected frequency distribution (need theoretically expected pattern) 22 Goodness of Fit Test Step 1 H0: data fit the expected pattern H1: data do not fit expected pattern Step 2 Find critical value from c2 table. Test is always a one-tailed right test with n-1 degrees of freedom, where n is number of categories. Step 3 Compute test value from: O = observed freq. E = expected frequency Step 4 Make decision. If c2 > critical value reject H0. Step 5 Summarize the results. E.g., There is (not) enough evidence to accept/reject the claim that there is a preference for ________. E.g., Coin is fair / unfair Die is fair / “loaded” Wheel is fair / flawed 23 Test for Independence Step 1 H0: two variables are independent H1: two variables are dependent Step 2 Find critical value from c2 table. Test is always one-tailed right with (nrow –1)(ncol. –1) degrees of freedom, where nrow and ncol. are the number of categories of each variable. These correspond to the number of rows and columns in the contingency table. Step 3 Create the contingency table to derive the expected values (see next page). Compute test value from: O = observed freq. E = expected frequency Step 4 Make decision. If c2 > critical value reject H0. Step 5 Summarize the results. E.g. Getting a cold is dependent upon whether you took a cold vaccine. - Smoking and lung disease are dependent. - Is a cure dependent upon placebo vs. drug. 24 Contingency Table First, enter observed (O) scores and compute row and column totals. Col.1 Col.2 Col.3 Row 1 25 10 5 40 Row 2 10 20 5 35 Row 3 5 30 15 50 totals 60 25 125* / 125* 40 totals * Notice sums of row and column totals must be equal. Second, compute expected (E) values based of row and column totals. Col.1 Col.2 Col.3 Row 1 40x40/125=E 11 40x60/125=E 12 40x25/125=E 13 Row 2 35x40/125=E21 35x60/125=E22 35x25/125=E23 Row 3 50x40/125=E31 50x60/125=E32 50x25/125=E33 Finally, compute the test value: 25 Analysis of Variance (ANOVA) One-way ANOVA: • used to test for significant differences among sample means • differs from t-test since more than 2 groups are tested, simultaneously • one factor (independent variable) is analyzed, also called the “grouping” variable • dependent variable should be interval or ratio but factor is nominal Factorial Design: - groups must be independent (i.e., subjects in each group are different and unrelated) Assumptions: • data must be normally distributed or nearly • variances must be equal (i.e., homogeneity of variance) Examples: • Does fitness level (VO2Max) depend on province of residence? Fitness level is a ratio variable, residence is a nominal variable. • Does statistics grade depend of highest level of mathematics course taken? • Does hand grip strength vary with gender? (Can be done with t-test. t-test can handle equal or unequal variances.) 26 One-way ANOVA An ANOVA tests whether one or more samples means are significantly different from each other. To determine which or how many sample means are different requires post hoc testing. Two samples where means are significantly different. These two sample means are NOT significantly different due to smaller difference and high variability. Even with same difference between means, if variances are reduced the means can be significantly different. 27 One-way ANOVA Step 1 H0: all sample means are equal H1: at least one mean is different Step 2 Find critical value from F table (Table H). Tables are for one-tailed test. ANOVA is always one-tailed. Step 3 Compute test value from: Step 4 Make decision. If F > critical value reject H0. Step 5 Summarize the results with ANOVA table. All means are the same, i.e., come from the same population or at least one mean is significantly different. Step 6 If a significant difference is found, perform post hoc testing to determine which mean(s) is/are different. 28 ANOVA Summary Table Source Sums of d.f. squares Between SSB k-1 (also called Main effect) Mean F square SSB/(k-1)=sB 2 sB 2/sW 2 P Within SSW N–k SSW /(N–k)=sW 2 (also called Error term) Total SSB +SSW (k–1)+(N–k)= N–1 Examples: One-way Factorial Source Sums of squares Between 160.13 2 Within 104.80 12 Total 264.93 14 Two-way Factorial Source Sums of squares Factor A 3.920 Factor B 9.690 AxB 54.080 Within 3.300 Total 70.980 7 d.f. d.f. 1 1 65.552 4 Mean square 80.07 8.73 Mean square 3.920 9.680 79.456 0.825 F 9.17 F P <0.01 P 4.752 NS 11.733 <0.025 <0.005 29 Post Hoc Testing Post Hoc testing • used to determine which mean or group of means is/are significantly different from the others • many different choices depending upon research design and research question (Duncan’s, Scheffé’s, Tukey’s HSD, ...) • only done when ANOVA yields a significant F Scheffé test: • when sample sizes are unequal • conservative test is desirable • when all comparisons are to be tested Critical value: Use critical value from ANOVA and multiply by k–1. k = number of groups (means) F'critical = (k–1) Fcritical Test value: Decision: If Fs > F'critical, then the two means are significantly different. Summary: graph the sample means 30 Post Hoc Testing 2 Tukey HSD test: • sample sizes must be equal • used when less conservative test is desirable, i.e., more powerful • when only some comparisons are to be tested Critical value: Use Table N, where k = number of groups and v = degrees of freedom of sW 2 Test value: Decision: If q > critical value, then the means are significantly different. Summary: Graph the results and summarize. 31 Nonparametric Statistics Nonparametric or Distribution-free statistics: • used when data are ordinal (i.e., rankings) • used when ratio/interval data are not normally distributed (data are converted to ranks) • for studies not involving population parameters Advantages: • no assumptions about distribution of data • suitable for ordinal data • can test hypotheses that do not involve population parameters • computations are easier to perform • results are easier to understand Disadvantages: • less powerful (less sensitive) than parametric • uses less information than parametric (ratio data are reduced to ordinal scale) • less efficient, therefore larger sample sizes are needed to detect significance Examples: • Is there a bias in the rankings of judges from different countries? • Is there a correlation between the rankings of two judges? • Do different groups rank a professor’s teaching differently? 32 Paired Sign Test • used for repeated measures tests Step 1 H0: no change/increase/decrease between before and after tests H1: there was a change/increase/decrease Step 2 Find critical value from table (Table J) for given a level, sample size (>7) and whether one- or two-tailed hypothesis Step 3 Subtract “after” from “before” scores, then count number of positive (+) AND negative (–) differences. Zeros (pairs are equal) do not count. Step 4 Make decision. If smallest count (+ or –) less than critical value, reject H0. Step 5 Summarize the result. I.e., there was/was not a change or there was/was not and increase/decrease in the dependent variable. 33 Wilcoxon Rank Sum Test • • • also called Mann-Whitney U test used to compare two independent groups replacement for independent groups t-test Step 1 H0: no difference/increase/decrease in groups H1: there was a difference/increase/decrease between groups Step 2 Find critical value from z-table (Table E) for given a level and whether hypotenuse was one- or two-tailed. Step 3 • Rank all data together. • Sum ranks of group with smaller size (n1). Call this R. • Compute test value, z. (Note, n1 and n2 must be 10 or greater and n2 is larger of two sample sizes or equal to n1) Step 4 If test value (z) > critical value, reject H0. Step 5 Summarize the result. 34 Wilcoxon Signed-Rank Test • • • more powerful than Paired Sign Test use to compare two dependent samples (e.g., repeated measures) replaces dependent groups t-test Step 1 H0: no change/increase/decrease between groups H1: there is a change/increase/decrease Step 2 Find critical value from table (Table K) for given a level, sample size (5 or greater) and whether hypothesis was one- or two-tailed. Use z-table (Table E) and z test value if n>30. Step 3 • • • • • compute differences find absolute value of differences. rank the differences sum the positive and negative ranks separately and call smaller absolute value the test value, ws If n > 30 use test value, z, from: Step 4 If ws less than critical value or if z > critical z value, reject H0. Step 5 Summarize the result. 35 Kruskall-Wallis Test • • similar to Wilcoxon Rank Sum test but for more than 2 groups replacement for One-way ANOVA Step 1 H0: there is no difference in the groups H1: at least one group is different Step 2 Find critical value from c2-table (Table G) for given a level and degrees of freedom (k–1). Test is always onetailed (right-tailed). Step 3 • Rank all data together. • Sum ranks within each group, call them R1, R2, R3, ..., Rk. • Compute test value, H, from: where, N = n1 + n2 + n3 + ... nk Step 4 If test value > critical value, reject H0. Step 5 Summarize the result. I.e., there is a difference in at least one sample. 36 Spearman Correlation • • similar to Pearson except data are ordinal vs. ratio/interval data are ranked then correlated Step 1 H0: r = 0 H1: r 0 Step 2 Find critical value from table (Table L) for given a level and sample size, n (number of pairs), n must be greater than 5 for a=0.05 Step 3 • Rank data within each group • Compute differences between pairs, Di • Compute correlation coefficient from: where n is number of pairs (5 or more) Step 4 If absolute value of test value > critical value, reject H0. Step 5 Summarize the result. I.e., data are correlated or uncorrelated. Note, no regression line is possible since data were converted to ranks. 37 Comparison of Statistical Tests Data type Single sample Parametric Nonparametric Ratio, Ordinal Interval z-test, Sign test*, t-test K-S test* Frequency Nominal c2 Goodnessof-fit Two independent z-test, Wilcoxon Rank Sum samples t-test (Mann-Whitney U) (2 types) Two dependent Paired samples t-test Wilcoxon Paired Sign*, signed-rank More than two independent samples One-way ANOVA Two factors Two-way ANOVA* Correlation Pearson * not studied in this course Kruskall-Wallis c2 Test of Independence Spearman Phi*