Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Transcript

Using Statistics To Make Inferences 3 Summary Review the normal distribution Z test Z test for the sample mean t test for the sample mean 3.11 Wednesday, 24 May 2017 8:13 AM Goals To perform and interpret a Z test. To perform and interpret tests on the sample mean. To produce a confidence interval for the population mean. Know when to employ Z and when t. Practical Perform a t test. Perform a two sample t test, in preparation for next week. 3.22 Normal Distribution 0.80 0.70 0.60 0.50 Series1 Series2 Series3 0.40 0.30 Series 1 2 3 μ 0 0 1 σ 1 ½ 1 0.20 0.10 0.00 -6 -4 -2 0 2 4 6 Tables present results for the standard normal distribution (μ=0, σ=1). 3.33 Use of Tables Prob(1≤z≤∞) = Prob(-∞≤z≤-1) = 0.16 Prob(1.96≤z≤∞) = Prob(-∞≤z≤-1.96) = 0.025 Prob(2.58≤z≤∞) = Prob(-∞≤z≤-2.58) = 0.005 68% of the observations lie within 1 standard deviation of the mean 95% of the observations lie within 1.96 standard deviations of the mean 99% of the observations lie within 2.58 standard deviations of the mean 3.44 Use of Tables Prob(2.58≤z≤∞) Prob(1.96≤z≤∞) Prob(1≤z≤∞) == Prob(-∞≤z≤-2.58) Prob(-∞≤z≤-1.96) Prob(-∞≤z≤-1) ==0.16 0.005 0.025 Z 0.00 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09 -1.0 0.159 0.156 0.154 0.152 0.149 0.147 0.145 0.142 0.140 0.138 -1.9 0.029 0.028 0.027 0.027 0.026 0.026 0.025 0.024 0.024 0.023 -2.5 0.006 0.006 0.006 0.006 0.006 0.005 0.005 0.005 0.005 0.005 3.55 Testing Hypothesis Null H0 hypothesis Alternate H1 hypothesis assumes that there is no real effect present assumes that there is some effect 3.66 Z Test For a value x taken from a population with mean μ and standard deviation σ, the Z-score is z x 3.77 The Central Limit Theorem When taking repeated samples of size n from the same population. 1. The distribution of the sample means is centred around the true population mean 2. The spread of the distribution of the sample means is smaller than that of the original observations. 3. The distribution of the sample means approximates a Normal curve. 3.88 Central Limit Theorem If the standard deviation of the individual observations is σ then the standard error of the sample mean value is For a sample mean, x, standard deviation n n with mean μ and the Z-score is x z n For a single observation the previous equation (see 3.7) is obtained (n = 1 and x x ). 3.99 Example 1 mean we score 100symmetry standard deviation 16 Note, employ What ≤is-0.5) the =probability a score is higher Prob(z Prob(z ≥ 0.5) = 0.309 than 108? z x 108 100 8 0.5 16 16 Prob(x≥108) = Prob(z≥0.5) = 0.309 Z 0.00 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09 -0.5 0.309 0.305 0.302 0.298 0.295 0.291 0.288 0.284 0.281 0.278 3.10 10 Example 2 mean score 100 standard deviation 16 The sample mean of 25 individuals is found to be 110. The null hypothesis, no real effect present, is that μ = 100. Wish to test if the mean significantly exceeds this value. 3.11 11 Solution 2 x 110 100 10 z 3.125 16 3.2 n 25 Prob( x ≥ 100) = Prob(z≥3.125) = 0.0009, beyond our basic table Z 0.00 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09 -3.00 .001 .001 .001 .001 .001 .001 .001 .001 .001 .001 Since the p-value is less than 0.001 the result is highly significant, the null hypothesis is rejected. The sample average is significantly higher. 3.12 12 Estimating The Population Mean A confidence interval Confidence interval for is constructed the population mean around the estimate x z of a population n parameter. x n σ z Sample mean Sample size Population standard deviation (known) Tabulated value of the z-score that achieves a significance level of α in a two tail test Don’t forget to multiply or divide before you add or subtract This test is not available in SPSS 3.13 13 Estimating The Population Mean Confidence interval for the population mean x z x n σ z n Sample mean Sample size Population standard deviation (known) Tabulated value of the z-score that achieves a significance level of α in a two tail test We can be 100(1-2α)% certain the population mean lies in the interval , x z x z n n 3.14 14 Normal Values Conf. Prob. α level One Tail 90% 95% 99% 0.05 0.025 0.005 Zα 1.645 1.960 2.576 Notation commonly used to denote Z values for confidence interval is Zα where 100(1 - 2α) is the desired confidence level in percent. Z 0.00 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09 -1.6 0.055 0.054 0.053 0.052 0.051 0.049 0.048 0.047 0.046 0.046 -1.9 0.029 0.028 0.027 0.027 0.026 0.026 0.025 0.024 0.024 0.023 -2.5 0.006 0.006 0.006 0.006 0.006 0.005 0.005 0.005 0.005 0.005 3.15 15 Example 3 standard deviation 16 mean of a sample of 25 individuals is found to be 110 Require 95% confidence interval for the population mean x 110 n 25 16 z 1.96 3.16 16 Solution 3 x 110 n 25 16 z 1.96 x z n 16 110 1.96 [103.728, 116.272] 25 95% sure the population mean lies in the interval [103.7,116.3] 3.17 17 Is there a snag? 3.18 18 One Sample t-Test The basic test statistic is x t s n x Note now s not σ Sample mean n Sample size s Sample standard deviation t Calculated t statistic 3.19 19 Interpreting t-values The test has ν=n-1 degrees of freedom. ν the Greek letter nu If tcalc<tν(α) then we cannot reject the null hypothesis that μ=m. Critical value from tables If tcalc>tν(α) the null hypothesis is rejected, the true mean μ differs significantly at the 2α level from m. 3.20 20 If The Population Standard Deviation Is Not Available? t values s x t ( ) n with ν = n – 1 degrees of freedom (ν the Greek letter nu) Sample mean n Sample size ν Degrees of freedom, n-1 in this case s Sample standard deviation Proportion of occasions that the true α mean lies outside the range tν Critical value of t from tables x Note Don’tin this module, forget to typically, multiply or the sample divide variance before you is required. add or Divide subtract by 3.21 21 n-1 If The Population Standard Deviation Is Not Available? t values s s , x t n 1( ) x t n 1( ) n n Sample mean n Sample size ν Degrees of freedom, n-1 in this case s Sample standard deviation Proportion of occasions that the true α mean lies outside the range tν Critical value of t from tables x 3.22 22 Two Tail t To obtain confidence limits a two tail probability is employed since it refers to the proportion of values of the population mean, both above and below the sample mean. 3.23 23 Example 4 An experiment results in the following estimates. n 20 x 71.4 s 7.344 Obtain a 90% confidence interval for the population mean. 3.24 24 Example 4 Given x 71.4 n 20 s 7.344 t19 (0.05) 1.729 ν p=0.05 p=0.025 p=0.005 p=0.0025 p=0.0025 19 1.729 2.093 2.861 3.174 3.174 s x t ( ) n 7.344 71.4 1.729 [68.561,74.239] 20 We can be 90% (α=0.05) sure that the population mean lies in this interval [68.6,74.2]. 3.25 25 Example 5 Claimed mean is 75 seconds, the times taken for 20 volunteers are 72 70 71 65 64 58 76 73 69 64 60 69 82 81 78 84 76 75 64 77 H0: there is no effect so μ = 75 H1: μ ≠ 75 (two tail test) 3.26 26 Solution 5 72 70 71 65 64 58 76 73 69 64 60 69 82 81 78 84 76 75 64 77 n = 20 Σx = 72 + 64 + … + 84 + 77 = 1428 Σx2 = 722 + 642 + … + 842 + 772 = 102984 n = 20 Σx = 1428 Σx2 = 102984 3.27 27 Solution 5 n = 20 Σx = 1428 Σx2 = 102984 n x1 x2 ... xn x n x i i 1 n 1428 71 .40 20 3.28 28 Solution 5 n = 20 Σx = 1428 Σx2 = 102984 n x1 x2 ... xn x n n varx i 1 s = 7.34 i i 1 n 1428 71 .40 20 2 1 1 xi 2 102984 1428 n i 1 20 53 .9368 n 1 20 1 n xi2 x Note in this module, typically, the sample variance is required. Divide by n-1. To practice use mean-var.xls. 3.29 29 Solution 5 n 20 x 71.40 s 7.34 x 71.40 75 t 2.193 s 7.34 n 20 ν p=0.05 p=0.025 p=0.005 p=0.0025 p=0.0010 19 1.729 2.093 2.861 3.174 3.579 t19 (0.005) 2.861 t19 (0.025) 2.093 In an attempt to “estimate” p. 3.30 30 Conclusion 5 t19 (0.005) 2.861 t = 2.193 t19 (0.025) 2.093 Since 2.093<2.193<2.861 0.01<p-value<0.05 (note 2α since two tail) There is sufficient evidence to reject H0 at the 5% level. The experiment is not consistent with a mean of 75. In fact the 95% confidence interval is [68.0,74.8] which, as expected, excludes 75. 3.31 31 The precise p value may be found from software. SPSS 5 Analyze > Compare Means > One Sample t Test Note insertion of test value 3.32 32 SPSS 5 Basic descriptive statistics for a manual test One-Sample Statistics N V1 20 Mean 71.40 Std. Deviation 7.344 Std. Error Mean 1.642 3.33 33 SPSS 5 As predicted 0.01 < p-value < 0.05 One-Sample Test Test Value = 75 V1 t -2. 192 df 19 Sig. (2-tailed) .041 Mean Difference -3. 600 95% Confidence Int erval of the Difference Lower Upper -7. 04 -.16 The confidence interval is 75-7.04 to 75-0.16 that is [67.96, 74.84]. 3.34 34 Graph? Graph > Legacy Dialogs > Error Bar 3.35 35 Graph? Graph > Legacy Dialogs > Error Bar Error Bars show 95.0% Cl of Mean 74 V1 72 70 68 3.36 36 Example 6 Experimental data 0.235 0.323 0.248 0.252 0.241 0.284 0.312 0.284 0.298 0.264 0.306 0.320 Test whether these data are consistent with a population mean of 0.250. H0 is that μ = 0.250 3.37 37 Solution 6 x 0.2806 s 0.0318 n 12 x 0.2806 0.250 t 3.333 s 0.0318 n 12 ν 11 p=0.05 1.796 p=0.025 2.201 p=0.005 3.106 p=0.0025 p=0.0010 3.497 4.025 t11(0.005)=3.106 t11(0.0025)=3.497 In an attempt to “estimate” p. 3.38 38 Conclusion 6 t11(0.005) 3.106 t = 3.333 t11(0.0025) 3.497 Since 3.106 < 3.333 < 3.497 0.005 < p-value < 0.01 There is sufficient evidence to reject H0 at the 1% level. The experimental mean would not appear to be consistent with 0.250 3.39 39 SPSS 6 As predicted p-value < 0.01 One-Sample Test Test Value = 0.250 V1 t 3.333 df 11 Sig. (2-tailed) .007 Mean Difference .030583 95% Confidence Int erval of the Difference Lower Upper .01039 .05078 The confidence interval is 0.250+0.010 to 0.250+0.050 that is [0.26, 0.30]. 3.40 40 Read Read Howitt and Cramer pages 40-50 Read Russo (e-text) pages 134-145 Read Davis and Smith pages 133-134, 139-143, 200-205, 237-264 3.41 41 Practical 3 This material is available from the module web page. http://www.staff.ncl.ac.uk/mike.cox Module Web Page 3.42 42 Practical 3 This material for the practical is available. Instructions for the practical Practical 3 Material for the practical Practical 3 3.43 43 Whoops! From testimony by Michael Gove, British Secretary of State for Education, before their Education Committee: "Q98 Chair: [I]f 'good' requires pupil performance to exceed the national average, and if all schools must be good, how is this mathematically possible? "Michael Gove: By getting better all the time. "Q99 Chair: So it is possible, is it? "Michael Gove: It is possible to get better all the time. "Q100 Chair: Were you better at literacy than numeracy, Secretary of State? "Michael Gove: I cannot remember." 3.44 44 Oral Evidence, British House of Commons, January 31, 2012, p. 28 Whoops! 3.45 45 Whoops! 3.46 46