Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Kin 304 Inferential Statistics “Statistics means never having to say you're certain” Inferential Statistics As the name suggests Inferential Statistics allow us to make inferences about the population, based upon the sample, with a specified degree of confidence Inferential Statistics The Scientific Method Select a sample representative of the population. The method of sample selection is crucial to this process along with the sample size being large enough to allow appropriate probability testing. Calculate the appropriate test statistic. The test statistic used is determined by the hypothesis being tested and the research design as a whole. Test the Null hypothesis. Compare the calculated test statistic to its critical value at the predetermined level of acceptance. Inferential Statistics Setting a Probability Level for Acceptance Prior to analysis the researcher must decide upon their level of acceptance. Tests of significance are conducted at pre-selected probability levels, symbolized by p or α. The vast majority of the time the probability level of 0.05, is used. – A p of .05 means that if you reject the null hypothesis, then you expect to find a result of this magnitude by chance only 5 in 100 times. Or conversely, if you carried out the experiment 100 times you would expect to find a result of this magnitude 95 times. You therefore have 95% confidence in your result. A more stringent test would be one where the p = 0.01, which translates to 99% confidence in the result. Inferential Statistics No Rubber Yard Sticks Either the researcher should pre-select one level of acceptance and stick to it, or do away with a set level of acceptance all together and simply report the exact probability of each test statistic. If for instance, you had calculated a t statistic and it had an associated probability of p = 0.032, you could either say the probability is lower than the pre-set acceptance level of 0.05 therefore a significant difference at the 95% level of confidence or simply talk about 0.032 as a percentage confidence (96.8%) Inferential Statistics Significance of Statistical Tests The test statistic is calculated The critical value of the test statistic is determined – based upon sample size and probability acceptance level (found in a table at the back of a stats book or part of the EXCEL stats report, or SPSS output) The calculated test statistics must be greater than the critical value of the test statistic to accept a significant difference or relationship Inferential Statistics Degrees Probability Degrees Probability of Freedom 0.05 0.01 of Freedom 0.05 0.01 1 .997 1.000 24 .388 .496 2 .950 .990 25 .381 .487 3 .878 .959 26 .374 .478 4 .811 .917 27 .367 .470 5 .754 .874 28 .361 .463 6 .707 .834 29 .355 .456 7 .666 .798 30 .349 .449 8 .632 .765 35 .325 .418 9 .602 .735 40 .304 .393 10 .576 .708 45 .288 .372 11 .553 .684 50 .273 .354 12 .532 .661 60 .250 .325 13 .514 .641 70 .232 .302 14 .497 .623 80 .217 .283 15 .482 .606 90 .205 .267 16 .468 .590 100 .195 .254 17 .456 .575 125 .174 .228 18 .444 .561 150 .159 .208 19 .433 .549 200 .138 .181 20 .423 .537 300 .113 .148 21 .413 .526 400 .098 .128 22 .404 .515 500 .088 .115 23 .396 .505 1,000 .062 .081 Kin 304 Tests of Differences between Means: t-tests SEM Visual test of differences Independent t-test Paired t-test Comparison Is there a difference between two or more groups? Test of difference between means – t-test (only – two means, small samples) ANOVA - Analysis of Variance Multiple – means ANCOVA covariates t Tests Standard Error of the Mean SD SEM n Describes how confident you are that the mean of the sample is the mean of the population t Tests Visual Test of Significant Difference between Means 1 Standard Error of the Mean A Mean B 1 Standard Error of the Mean Overlapping standard error bars therefore no significant difference between means of A and B No overlap of standard error bars therefore a significant difference between means of A and B at about 95% confidence Independent t-test Two independent groups compared using an independent T-Test (assuming equal variances) – e.g. Height difference between men and women The t statistic is calculated using the difference between the means in relation to the variance in the two samples A critical value of the t statistic is based upon sample size and probability acceptance level (found in a table at the back of a stats book or part of the EXCEL t-test report, or SPSS output) the calculated t based upon your data must be greater than the critical value of t to accept a significant difference between means at the chosen level of probability t Tests t statistic quantifies the degree of overlap of the distributions t Tests standard error of the difference between means s X1 X 2 s X1 s X 2 2 2 2 The variance of the difference between means is the sum of the two squared standard deviations. The standard error (S.E.) is then estimated by adding the squares of the standard deviations, dividing by the sample size and taking the square root. S .E . ( s1 s2 ) / n 2 t Tests 2 t statistic The t statistic is then calculated as the ratio of the difference between sample means to the standard error of the difference, with the degrees of freedom being equal to n - 2. t t Tests X1 X 2 ( s1 s 2 ) / n 2 2 Critical values of t Hypothesis: – There is a difference between means Degrees of Freedom = 2n – 2 tcalc > tcrit = significant difference Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 35 40 45 50 55 60 70 80 90 100 120 ? t Tests Probability 0.050 0.025 0.010 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.030 2.021 2.014 2.008 2.004 2.000 1.994 1.989 1.986 1.982 1.980 1.9600 25.452 6.205 4.176 3.495 3.163 2.969 2.841 2.752 2.685 2.634 2.593 2.560 2.533 2.510 2.490 2.473 2.458 2.445 2.433 2.423 2.414 2.406 2.398 2.391 2.385 2.379 2.373 2.368 2.364 2.360 2.342 2.329 2.319 2.310 2.304 2.299 2.290 2.284 2.279 2.276 2.270 2.2414 6.675 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.724 2.704 2.690 2.678 2.669 2.660 2.648 2.638 2.631 2.625 2.617 2.5758 Table 2-7.1: Critical values of the t statistic Paired Comparison Paired t Test sometimes called t-test for correlated data – “Before and After” Experiments – Bilateral Symmetry – Matched-pairs data – t Tests Paired t-test Hypothesis: – Is the mean of the differences between paired observations significantly different than zero the calculated t statistic is evaluated in the same way as the independent test t t Tests ( ( X 1 X 2 )) / n ( s1 s 2 ) / n 2 2 9 Subjects All Lose Weight Paired Weight Loss Data n=9 Weight Before (kg) Weight After (kg) Weight Loss (kg) 89.0 87.5 1.5 67.0 65.8 1.2 112.0 111.0 1.0 109.0 108.5 0.5 56.0 55.5 0.5 123.5 122.0 1.5 108.0 106.5 1.5 73.0 72.5 0.5 83.0 81.0 2.0 Mean of differences = +1.13 MS EXCEL t-Test: Independent Mean Variance Observations Pooled Variance WRONG ANALYSIS Before After 91.16666667 90.03333333 537.875 531.11 9 9 534.4925 Hypothesized Mean Difference 0 df 16 t Stat 0.103990367 P(T<=t) one-tail 0.459234679 t Critical one-tail 1.745884219 P(T<=t) two-tail 0.918469359 t Critical two-tail 2.119904821 MS EXCEL t-Test: Paired Mean Variance Observations CORRECT ANALYSIS Before After 91.16666667 90.03333333 537.875 531.11 9 9 Pearson Correlation 0.999741718 Hypothesized Mean Difference 0 df 8 t Stat 6.23354978 P(T<=t) one-tail 0.000125066 t Critical one-tail 1.85954832 P(T<=t) two-tail 0.000250133 t Critical two-tail 2.306005626 Kin 304 Tests of Differences between Means: ANOVA – Analysis of Variance One-way ANOVA ANOVA – Analysis of Variance Used for analysis of multiple group means Similar to independent t-test, in that the difference between means is evaluated based upon the variance about the means. However multiple t-tests result in an increased chance of type 1 error. F (ratio) statistic is calculated and is evaluated in comparison to the critical value of F (ratio) statistic Tests of Difference – ANOVA One-way ANOVA One grouping factor – – HO: The population means are equal HA: At least one group mean is different Two or more levels of grouping factor - Exposure = low, medium or high Age Groups = 7-8, 9-10, 11-12, 13-14 Tests of Difference – ANOVA Between Groups Within Groups F (ratio) Statistic The F ratio compares two sources of variability in the scores. The variability among the sample means, called Between Group Variance, is compared with the variability among individual scores within each of the samples, called Within Group Variance. TOTAL Tests of Difference – ANOVA Formula for sources of variation Tests of Difference – ANOVA Anova Summary Table SS df MS F Between Groups SS(Between) k-1 SS(Between) k-1 MS(Between) MS(Within) Within Groups SS(Within) N-k SS(Within) N-k Total SS(Within) + SS(Between) N-1 . Tests of Difference – ANOVA Assumptions for ANOVA The populations from which the samples were obtained are approximately normally distributed. The samples are independent. The population value for the standard deviation between individuals is the same in each group. If standard deviations are unequal transformation of values may be needed. Tests of Difference – ANOVA CFS Kids 17 – 19 years (Boys) Descriptivesa VO2MAX N 17.00 18.00 19.00 Total 198 154 121 473 Mean 5.1586 4.9896 5.0314 5.0710 Std. Deviation .75824 .76877 .79604 .77357 Std. Error .05389 .06195 .07237 .03557 95% Confidence Interval for Mean Lower Bound Upper Bound 5.0523 5.2649 4.8672 5.1120 4.8881 5.1747 5.0011 5.1409 Minimum 3.70 3.70 3.50 3.50 Maximum 6.20 6.20 6.00 6.20 a. SEX = 1.00 ANOVAa VO2MAX Between Groups Within Groups Total Sum of Squares 2.729 279.724 282.453 df 2 470 472 Mean Square 1.364 .595 F 2.292 Sig. .102 a. SEX = 1.00 ANOVA Dependent - VO2max Grouping Factor - Age (17, 18, 19) No Significant difference between means for VO2max (p>0.05) CFS Kids 17 – 19 years (Girls) Descriptivesa VO2MAX N 17.00 18.00 19.00 Total 146 132 152 430 Mean 3.7671 3.7174 3.6349 3.7051 Std. Deviation .38812 .37610 .33578 .37000 Std. Error .03212 .03274 .02724 .01784 95% Confidence Interval for Mean Lower Bound Upper Bound 3.7036 3.8306 3.6527 3.7822 3.5811 3.6887 3.6700 3.7402 Minimum 3.00 3.00 2.90 2.90 Maximum 5.00 5.20 4.50 5.20 a. SEX = 2.00 ANOVAa VO2MAX Between Groups Within Groups Total Sum of Squares 1.331 57.397 58.729 df 2 427 429 Mean Square .666 .134 F 4.953 Sig. .007 a. SEX = 2.00 ANOVA Dependent - VO2max Grouping Factor - Age (17, 18, 19) Significant difference between means for VO2max (p<0.05) Post Hoc tests Post hoc simply means that the test is a followup test done after the original ANOVA is found to be significant. One can do a series of comparisons, one for each two-way comparison of interest. E.g. Scheffe or Tukey’s tests The Scheffe test is very conservative Tests of Difference – ANOVA Multiple Comparisonsa Dependent Variable: VO2MAX Scheffe Scheffe’s (I) AGE 17.00 – Post Hoc Test 18.00 19.00 (J) AGE 18.00 19.00 17.00 19.00 17.00 18.00 Mean Difference (I-J) .1690 .1272 -.1690 -.0418 -.1272 .0418 Std. Error .08289 .08902 .08289 .09372 .08902 .09372 Sig. .126 .361 .126 .905 .361 .905 Boys 95% Confidence Interval Lower Bound Upper Bound -.0346 .3725 -.0914 .3458 -.3725 .0346 -.2719 .1883 -.3458 .0914 -.1883 .2719 a. SEX = 1.00 Multiple Comparisonsa Dependent Variable: VO2MAX Scheffe (I) AGE 17.00 18.00 19.00 (J) AGE 18.00 19.00 17.00 19.00 17.00 18.00 Mean Difference (I-J) .0497 .1323* -.0497 .0826 -.1323* -.0826 Std. Error .04403 .04249 .04403 .04362 .04249 .04362 Sig. .529 .008 .529 .168 .008 .168 Girls 95% Confidence Interval Lower Bound Upper Bound -.0585 .1579 .0279 .2366 -.1579 .0585 -.0246 .1897 -.2366 -.0279 -.1897 .0246 *. The mean difference is significant at the .05 level. a. SEX = 2.00 Boys – no significant differences, would not run post hoc tests Girls – VO2max for age19 is significantly different than at age17 ANOVA – Factorial design Multiple factors Test of differences between means with two or more grouping factors, such that each factor is adjusted for the effect of the other Can evaluate significance of factor effects and interactions between them 2 – way ANOVA: Two factors considered simultaneously Tests of Difference – ANOVA Tests of Between-Subjects Effects Dependent Variable: VO2MAX Between-Subjects Factors N AGE SEX 17.00 18.00 19.00 1.00 2.00 344 286 273 473 430 Source Corrected Model Intercept AGE SEX AGE * SEX Error Total Corrected Total Type III Sum of Squares 424.295 a 16946.730 3.032 403.923 .715 337.122 18407.560 761.417 df 5 1 2 1 2 897 903 902 Mean Square F 84.859 225.789 16946.730 45091.176 1.516 4.034 403.923 1074.742 .358 .952 .376 Sig. .000 .000 .018 .000 .386 a. R Squared = .557 (Adjusted R Squared = .555) Example: 2 way ANOVA Dependent - VO2max Grouping Factors – – AGE (17, 18, 19) SEX (1, 2) Significant difference in VO2max (p<0.05) by SEX=Main effect Significant difference in VO2max (p<0.05) by AGE=Main effect No Significant Interaction (p<0.05) AGE * SEX Analysis of Covariance (ANCOVA) Taking into account a relationship of the dependent with another continuous variable (covariate) in testing the difference between means of one or more factor Tests significance of difference between regression lines Tests of Difference – ANOVA Maximum Grip Strength (lbs) 75 70 Male 65 Female 60 55 50 45 ♂ ♀ 40 35 r = +0.78 r = +0.75 30 25 ♂+♀ r = +0.91 27.0 29.0 20 15 17.0 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 28.0 30.0 31.0 32.0 Skinfold-adjusted Forearm Girth (cm) Scatterplot showing correlations between skinfold-adjusted Forearm girth and maximum grip strength for men and women Use of T tests for difference between means? Group Statistics SAFAGR GRIPR SEX 1.0 2.0 1.0 2.0 N 20 23 20 23 Mean 25.801 21.355 52.310 35.304 Std. Deviation 1.9882 1.4569 7.8432 6.8536 Std. Error Mean .4446 .3038 1.7538 1.4291 Independent Samples Test Levene's Test for Equality of Variances F SAFAGR GRIPR Equal variances ass umed Equal variances not as sumed Equal variances ass umed Equal variances not as sumed 1.713 .525 Sig. .198 .473 t-tes t for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 8.437 41 .000 4.446 .5270 3.3816 5.5101 8.257 34.408 .000 4.446 .5385 3.3521 5.5397 7.589 41 .000 17.006 2.2407 12.4804 21.5309 7.517 38.101 .000 17.006 2.2623 12.4263 21.5850 Men are significantly (p<0.05) bigger than women in skinfold-adjusted forearm girth and grip strength ANCOVA Dependent – Maximum Grip Strength (GRIPR) Grouping Factor – Sex Covariate – Skinfold-adjusted Forearm Girth (SAFAGR) Tests of Between-Subjects Effects Dependent Variable: GRIPR Source Corrected Model Intercept SAFAGR SEX Error Total Corrected Total Type III Sum of Squares 4378.670a 234.150 1284.985 25.731 917.182 85596.020 5295.852 df 2 1 1 1 40 43 42 Mean Square 2189.335 234.150 1284.985 25.731 22.930 F 95.481 10.212 56.041 1.122 Sig. .000 .003 .000 .296 a. R Squared = .827 (Adjusted R Squared = .818) SAFAGR is a significant Covariate No significant difference between sexes in Grip Strength when adjusted for Covariate (representing muscle size) Therefore one regression line (not two, for each sex) fit the relationship 3-way ANOVA For 3-way ANOVA, there will be: - three 2-way interactions (AxB, AxC) (BxC) - one 3-way interaction (AxBxC) If for each interaction (p > 0.05) then use main effects results Typically ANOVA is used only for 3 or less grouping factors Tests of Difference – ANOVA Repeated Measures ANOVA Repeated measures design – the same variable is measured several times over a period of time for each subject Pre- and post-test scores are the simplest design – use paired t-test Advantage - using fewer experimental units (subjects) and providing a control for differences (effect of variability due to differences between subjects can be eliminated) Tests of Difference – ANOVA