Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht Lecture S3: 1 Descriptive statistics 2 Scores and probability distributions 3 Hypothesis testing and one-sample t-test 4 More on t-tests 5 Homegeneity and reliability 6 Correlation and prediction 7 Analysis of variance 8 Chi2 -test 9 Q&A lecture 1 / 47 Today Lecture S3: 2 / 47 Today’s overview Does the facebook diet really work? Recap Hypotheses testing (Chapter 8) I Test procedure ??? I Or is the Breezer diet better? Hypotheses H0 and H1 t-distribution (Chapter 9) One-sample t-test ??? Lecture S3: 3 / 47 Lecture S3: 4 / 47 Recap Hypothesis testing Recal from Lecture M1: Normal distribution: I Shape I Hypothesis Parameters µ and σ Empirical formulation of proposition, stated as relationship between variables. Calculations Sample and Sampling distribution: I Sample distribution = (theoretical) distribution of data when one person/item is measured I Examples: Sampling distribution = distribution of data when sample of n items are measured and average is taken. Central Limit Theorem: I Sampling distribution is approximately normal. I Confidence intervals Lecture S3: Recap 5 / 47 Hypothesis testing There is relation between a students’ IQ score and his grade point average. Students who live at home with their parents spend more time on Facebook. There is a positive correlation between Facebook use and narcissism. Lecture S3: Hypothesis testing Introduction 6 / 47 Hypothesis testing THE BRAND NEW FACEBOOK DIET (AS SEEN ON THE WEB)! Spend time on Randomly select test group Put test group on Facebook regime (8 hours per day) for 4 weeks Weigh test persons Compare with known mean from reference population Results? and lose People may gain weight due to inactivity People may lose weight due to lack of time to eat Weight may not change at all. . . without excessive exercises! Lecture S3: Hypothesis testing Introduction 7 / 47 Lecture S3: Hypothesis testing Introduction 8 / 47 Some data Basic experimental situation Mean weight of reference population (before diet) is µ = 80, σ = 20 After the test trial: Mean weight in test group is X = 76. Does this mean the diet works? Or is it random fluctuation? What if X = 86? What if X = 70? Lecture S3: Hypothesis testing Introduction 9 / 47 Hypothesis testing Lecture S3: Hypothesis testing Introduction 10 / 47 1. Formulation of the hypotheses Pose two possible, exclusive, hypotheses about the world or about a population: Hypotheses A statistical method that uses sample data to evaluate a hypothesis about a population Null hypothesis (H0 ): States that, in the general population, there is no change, no difference, or no relationship. Basic steps of hypothesis testing 1 Formulate hypotheses 2 Set criteria for decision 3 Collect data and compute sample statistic 4 Make decision Alternative hypothesis (H1 ): States that there is a change, a difference, or a relationship in the general population. Diet example: H0 : µafter = 80, (Facebook diet has no effect) H1 : µafter 6= 80, (Facebook diet has effect) Lecture S3: Hypothesis testing Procedure 11 / 47 Lecture S3: Hypothesis testing Procedure 12 / 47 2. Set criteria for decision 2. Set the criteria for the decision If H0 is true, which values for sample means are likely? Significance level or alpha level Defines boundary between likely and unlikely Denoted by symbol α Value is determined beforehand (i.e. before you take a sample!) Typical values are α = 0.05 or α = 0.01. Critical region The extreme sample values that are very unlikely Boundaries of critical region are determined by α. Lecture S3: Hypothesis testing Procedure Critical region of Z = 13 / 47 X − 80 for α = 0.05 σX Lecture S3: Hypothesis testing Procedure 14 / 47 Check! True or false? The critical region defines unlikely values if the null hypothesis is true. (True) If the alpha level is decreased, the critical region becomes smaller. (True) Lecture S3: Hypothesis testing Procedure 15 / 47 Lecture S3: Hypothesis testing Procedure 16 / 47 Critical region boundaries 3. Collect data and compute sample statistic Data is collected after hypotheses are formulated. Data is collected after criteria for decision are set. This sequence assures objectivity. Compute a sample statistic (in this case Z-score) to show the exact position of the sample. Lecture S3: Hypothesis testing Procedure 17 / 47 4. Make decision Lecture S3: Hypothesis testing Procedure 18 / 47 Examples Example outcome experiment A: Sample size n = 16 Observed sample mean is X = 75 After calculation of the sample statistic: X−µ 75 − 80 √ = = −1 20/4 (σ/ n) Decision: Not in critical region, so retain H0 Z-score for observed sample mean is Z = If sample data are in the critical region: null hypothesis is rejected. If the sample data are not in the critical region: the researcher fails to reject the null hypothesis. Example outcome experiment B: Sample size n = 25 Observed sample mean is X = 88 Z-score for observed sample mean is Z = X−µ 88 − 80 √ = = +2 20/5 σ/ n Decision: In critical region, so reject H0 Lecture S3: Hypothesis testing Procedure 19 / 47 Lecture S3: Hypothesis testing Procedure 20 / 47 Check! Why the null hypothesis? True or False Question: Seems odd to focus on null hypothesis, which we do not believe to be true? When the Z-score is quite large, it shows the null hypothesis is true. (False) Answer: In logic, it is easier to demonstrate that a universal hypothesis is false than to prove that it is true. A decision to retain the null hypothesis means you showed that the treatment has no effect. (False) (Recall Popper’s falsification criterion from Lecture M1!) Lecture S3: Hypothesis testing Procedure 21 / 47 What could possibly go wrong? Lecture S3: Hypothesis testing Procedure Summary (Test has indicated a non-existant treatment effect) Probability that type I error occurs is equal to significance level α. Type II error I H is not true, but outcome is such that H is not rejected. 0 0 H0 true (no effect) H1 true (effect exists) retain H0 OK Type II error reject H0 Type I error OK I I (Test has failed to detect a real treatment effect) I Probability of Type II error is sometimes denoted with symbol β. Lecture S3: Hypothesis testing Uncertainty and errors 22 / 47 Possible test outcomes Type I error I H is true, but by chance outcome is such that H is rejected. 0 0 I So: Usually the alternative hypothesis corresponds to your experimental hypothesis 23 / 47 Lecture S3: Hypothesis testing Uncertainty and errors 24 / 47 Some remarks Directional (one-tailed) test Terminology in literature: A result is called significant or statistically significant if it makes us reject the null hypothesis. So far: Two-sided (two-tailed) hypothesis: Does not indicate a direction for the possible effect or relation Factors influencing hypothesis test: Size of difference between sample mean and original population mean: Appears in numerator of the Z-score Variability of the scores: Influences size of the standard error What if you expect an effect in a certain direction? One-sided (one-tailed) hypothesis: Indicates a possible direction for the assumed effect or relation Number of scores in the sample: Influences size of the standard error Lecture S3: Hypothesis testing Uncertainty and errors 25 / 47 Example directional hypothesis Lecture S3: Hypothesis testing Directional hypothesis 26 / 47 Critical region THE BREEZER DIET (AS SEEN ON MTV)! How does drinking 4 Breezers a day affect your weight? We expect that Breezers make you gain weight. 1 Formulate the hypothesis: I H 0 : µafter ≤ 80 (null hypothesis) I 2 H1 : µafter > 80 (alternative hypothesis) Set criteria for decision: I Significance level α = 0.05 I Critical region: Z ≥ 1.65 (From Column C in Table B.1) We take a sample of n = 25 test persons. Lecture S3: Hypothesis testing Directional hypothesis 27 / 47 Lecture S3: Hypothesis testing Directional hypothesis 28 / 47 Example directional hypothesis 3 More often than not the population variance σ is unknown So also standard error of the mean σM is not known What to do? SS as estimate for σ 2 . Use sample standard deviation s2 = n −1 r r σ2 s2 with estimated standard error sM = Replace σM = n n If Variance σ known, use: If Variance σ unknown, use: σ s σM = √ sM = √ n n Collect data and compute sample statistic I Sample size n = 25 I Population σ = 20 I Sample mean X = 87 I Standard error of the means is σM = 20 =4 5 87 − 80 = 1.75 4 Make decision: I Z-score is in critical region, so we reject H 0 I 4 Unknown variance Z= Z= So: We reject H0 and conclude that Breezers makes you gain weight! Lecture S3: Hypothesis testing Directional hypothesis X−µ σM Z has a standard distribution under H0 29 / 47 t-distribution t= normal X−µ sM t has a t-distribution with df = n − 1 under H0 Lecture S3: Hypothesis testing Directional hypothesis 30 / 47 t-distribution: plots Is a family of distributions Resembles the standard normal distribition in shape and spread Has a bit “more mass” in the tails (flatter) Has one parameter: degrees of freedom (df) For df = ∞ the t-distribution equals the standard normal distribution William Sealey Gosset (1876–1937) Sometimes also called Student distribution. Lecture S3: Hypothesis testing t-distribution 31 / 47 Lecture S3: Hypothesis testing t-distribution 32 / 47 Example Example (continued) Assume we have a sample of n = 10 Information Science students We would like to test the following hypothesis Observed sample mean X = 21.2 Information science students on average spend 20 hours per week on INFOWO H0 : µ = 20 (null hypothesis) H1 : µ 6= 20 (alternative hypothesis Recall calculation of t: t= We set criteria for decision X−µ X−µ √ = sM (s/ n) Observed data: Significance level α = 0.05 Lecture S3: Hypothesis testing t-distribution Observed standard deviation s = 3.4 t= 33 / 47 X−µ 21.2 − 20.0 1.2 √ = √ = = 1.11 1.08 (s/ n) (3.4/ 10) Lecture S3: Hypothesis testing t-distribution t-test: critical value Example (continued) t Table cum. prob one-tail Two-sided test, significance level α = 0.05 two-tails df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 80 100 1000 Decision rule: I Reject H if t < −t 0 crit or if t > tcrit I Do not reject H0 if −tcrit ≤ t < tcrit How do we determine tcrit ? Look up value in Table B.2 z Lecture S3: Hypothesis testing t-distribution 34 / 47 35 / 47 t .50 t .75 t .80 t .85 t .90 t .95 t .975 t .99 t .995 t .999 t .9995 0.50 1.00 0.25 0.50 0.20 0.40 0.15 0.30 0.10 0.20 0.05 0.10 0.025 0.05 0.01 0.02 0.005 0.01 0.001 0.002 0.0005 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.681 0.679 0.678 0.677 0.675 1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854 0.851 0.848 0.846 0.845 0.842 1.963 1.386 1.250 1.190 1.156 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.058 1.057 1.056 1.055 1.055 1.050 1.045 1.043 1.042 1.037 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.292 1.290 1.282 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.671 1.664 1.660 1.646 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 1.990 1.984 1.962 31.82 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.390 2.374 2.364 2.330 63.66 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.639 2.626 2.581 318.31 22.327 10.215 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.232 3.195 3.174 3.098 636.62 31.599 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.460 3.416 3.390 3.300 0.000 0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 3.090 3.291 0% 50% 60% 70% 80% 90% 95% Confidence Level 98% 99% 99.8% 99.9% Lecture S3: Hypothesis testing t-distribution 0.001 36 / 47 One-sample t-test Example (continued) Two-sided test, significance level α = 0.05. Properties of the one-sample t-test tcrit = 2.262 (df = n − 1 = 9) Decision rule: I Reject H if t < −2.262 or t > 2.262 0 I Compare one sample mean with a reference value (a mean value that was determined earlier or beforehand) Population standard deviation σ unknown Do not reject H0 if −2.262 ≤ t < 2.262 Sample size n < 120 So? t = 1.11, so do not reject H0 . Use t-distribution: t= X−µ X−µ √ = sM (s/ n) Determine the correct value for degrees of freedom (df = n − 1) Use Table B.2 to determine critical value Lecture S3: Hypothesis testing t-distribution 37 / 47 One-sample t-test: income example 38 / 47 One-sample t-test: income example Hypothesis: Your income is different from what the average Dutch student, living away from home, needs? Formulation of hypotheses: I H : µ = 962 (null hypothesis) 0 Student Income Question: Is your income different from what the average Dutch student (living away from home) needs? According to the National Institute for Family Finance Information (NIBUD) the average Dutch student needs per month: e 962. Lecture S3: Hypothesis testing One-sample t-test Lecture S3: Hypothesis testing One-sample t-test I H1 : µ 6= 962 (alternative hypothesis) Significance level: α = 0.05 39 / 47 Lecture S3: Hypothesis testing One-sample t-test 40 / 47 One-sample t-test: income example One-sample t-test: income example Your net income Your income: n Income per month Valid 49.00 Missing 3.00 Mean 833.02 Std. Deviation 462.90 So: Your average income per month is e 833 This is e 129 less than the NIBUD average income for Dutch students √ Std.error Mean = Std.Deviation/ 49 ≈ 63.7 Question: Is this difference due to randomness or it is significant? Lecture S3: Hypothesis testing One-sample t-test Lecture S3: Hypothesis testing One-sample t-test 41 / 47 One-sample t-test: income example 42 / 47 One-sample t-test: Formula’s and output X−µ sM df = n − 1 s sM = √ n t= SPSS: Menu Analyze > Compare Means > One-sample T Test. . . SPSS Output So, 462.91 s = 462.91, sM = √ = 66.13 49 t= 833.02 − 962.00 = −1.95 66.13 tcrit ≈ 2.01, (df = 48), so H0 not rejected. Lecture S3: Hypothesis testing One-sample t-test 43 / 47 Lecture S3: Hypothesis testing One-sample t-test 44 / 47 One-sample t-test in SPSS: More output One-sample t-test: income example Conclusion: (Also: the proper way of reporting the result) Your average income (according to the questionnaire) is e 833 per month “Sig.” column: p-value or significance value of the test result Probability to get the measured data or smaller under the null hypothesis (or P(|t| > 1.950) under H0 ). This is e 129 less than the NIBUD average income for Dutch students, living away from home Indication of how extreme the measured data is. This difference is not significant: t = −1.95 (df = 48), p = .057 (two-sided) Rule If p-value ≥ α, do not reject H0 , If p-value < α, reject H0 . So? Do not reject H0 . Lecture S3: Hypothesis testing One-sample t-test 45 / 47 Lessons learnt What is inferential statistics? The proper procedure for hypothesis testing One-sample t-test: I What it is I And how to use it The role of the hypothesis in research Lecture S3: Hypothesis testing One-sample t-test 47 / 47 Lecture S3: Hypothesis testing One-sample t-test 46 / 47