Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical inference Statistical inference Definition : generalization from a sample to a population. 2 cases: Is a sample belongs to an hypothetical population? Is two samples belong to the same hypothetical population? Statistical inference 1st possibility x 96 100 x x x x x x x x x 1 100 ? Inference ? 2 100 Statistical inference 2nd possibility x1 104 100 x x x x x x x x x x2 110 x x x x x x x x 1 2 0 ? Inference ? 1 2 0 Hypotheses 1 2 H0 : k H1 : k H 0 : 1 2 H1 : 1 2 H0= Null hypothesis H1 = Alternative hypothesis = Mean of the population k = Constant H0= Null hypothesis H1 = Alternative hypothesis 1 = Mean of the first population 2 = Mean of the second population We test H0 Decision From the sample(s) we decide if we reject or not the null hypothesis. When we are doing inference we are never certain that we took the right decision Population Sample Decision Identical Different Identical Good Error 2 Different Error 1 Good Decision 2 type of errors: 1 – If we inferred that 2 groups belong to two different populations when they don’t. We rejected H0 when H0 was true. 2 – If we inferred that 2 groups belong to the same population when they don’t. We kept H0 when H0 was false. Population Sample Decision Identical Different Identical Good Error 2 Different Error 1 Good 1- Inference about the mean of a population Sampling Distribution of the Mean Sample (n) Population x1 x2 x Sampling Distribution of the Mean x1 x2 x 72 3 72 x ? x1 x2 x Sampling Distribution of the Mean Characteristics: Follows a normal curve. The mean will be equal to the one of the population The standard deviation will be equal to x n Standard Error The larger the sample size is, the smaller the standard error will be. Sampling Distribution of the Mean N=9 Sample Population x1 x2 Sampling Distribution of the Mean x1 x2 x10000 72 3 x10000 71.9958 x 0.9959 x1 x2 x10000 Sampling Distribution of the Mean N=16 Sample Population x1 x2 Sampling Distribution of the Mean x1 x2 x10000 72 3 x10000 71.9984 x 0.74696 x1 x2 x10000 Sampling Distribution of the Mean N=36 Sample Population x1 x2 Sampling Distribution of the Mean x1 x2 x10000 72 3 x10000 72.0146 x 0.50165 x1 x2 x10000 Sampling Distribution of the Mean N=144 Sample Population x1 x2 Sampling Distribution of the Mean x1 x2 x10000 72 3 x10000 72.0014 x 0.24972 x1 x2 x10000 Test of Significance If we suppose that the null hypothesis is true, what is the probability of observing the giving sample mean? If it is unlikely, we will reject H0, else we will keep H0. Unlikely: 5% or 1% = a = significance threshold zx x x Test of Significance Example: one side H0: = 72 H1: < 72 (based on previous studies) a = 0.05 (5%) x 65 72 zx 4, 67 =9 x 1,5 x = 65 n = 36 9 9 x n 36 za = 1.65 6 1,5 Because zx is greater za we reject the null hypothesis and accept the alternative hypothesis Test of Significance Example : 2 sides H0: = 72 H1: 72 a = 0.05 (5%) =9 x = 68 n = 36 9 9 x n 36 za = 1.96 6 zx 1,5 x x 68 72 2, 667 1,5 Because zx is greater za we reject the null hypothesis and accept the alternative hypothesis Confidence intervals We are never sure that the mean of our sample is exactly the real mean of the population. Therefore, instead of given the mean only, it is possible de quantify our level of certitude by specifying a confidence interval around the mean. CI 1a x za x x za x Confidence intervals Example: CI = 95% x = 50,7 n = 100 = 20 20 20 x 2 n 100 IC0.95 50, 7 3,92 50, 7 3,92 IC0.95 46, 78 54, 62 10 a = 1-IC = 1-0,95 = 0,05 za = 1.96 IC0.95 50, 7 1,96 2 50, 7 1,96 2 Therefore, there is a 95% probability that the mean of the population is between 46.75 and 54.62 Confidence intervals Example: CI = 99% x = 50,7 n = 100 = 20 20 20 x 2 n 100 IC0.99 50, 7 5,16 50, 7 5,16 IC0.99 45,54 55,86 10 a = 1-IC = 1-0,99 = 0,01 za = 2.58 IC0.99 50, 7 2,58 2 50, 7 2,58 2 Therefore, there is a 99% probability that the mean of the population is between 445.54 and 55.86 2- Inference for the difference between two population means distribution of sample mean differences Samples (n) Population 72 3 Distribution of sample mean differences x1 x2 x1 x2 x x1 x 0 x x ? 1 2 x1 x2 x1 x Distribution of sample mean differences Characteristics: x x 2 2 1 2 x1 x2 Follows a normal distribution The mean will be equal to 0 (1-2=0) The standard deviation will be equal to: The standard error of mean difference Decision rule zx x 1 zx x 1 x 1 2 x 1 2 x2 1 2 x x x2 x x 1 2 1 2 , because 1 2 0 Test of Significance Example: What is the probability of observed difference between the following groups? H0: 1 = 2 (1 - 2 = 0) H1: 1 2 (1 - 2 0) a = 0.05 (5%) 1 5 5 x 1 n1 36 6 0,833 2 5 5 x2 0,833 n2 36 6 2 x x 1 2 2 5 5 1,18 6 6 x1 = 50 x2 = 48 1 = 5 2 = 5 n1 = 36 n2 = 36 z x1 x2 x1 x2 x x 1 2 50 48 1, 69 1,18 Critical z 1.96 Test of Significance Example: What is the probability of observed difference between the following groups? H0: 1 = 2 (1 - 2 = 0) H1: 1 2 (1 - 2 0) a = 0.05 (5%) x1 = 50 x2 = 48 1 = 5 2 = 5 n1 = 36 n2 = 36 Because the observed z is lower than the critical (za) we will keep the null hypothesis Confidence intervals IC1a x1 x2 za x 1 2 x1 x2 za x Test of Significance Example: a 95% confidence interval H0: 1 = 2 (1 - 2 = 0) H1: 1 2 (1 - 2 0) a = 0.05 (5%) x1 = 50 x2 = 48 1 = 5 2 = 5 n1 = 36 n2 = 36 IC1a x1 x2 za x 1 2 x1 x2 za x IC0,95 (50 48) 1, 96 1,18 1 2 (50 48) 1, 96 1,18 IC0,95 2 2.3128 1 2 2 2.3128 IC0,95 0.3128 1 2 4.3128 Test of Significance Example: a 95% confidence interval H0: 1 = 2 (1 - 2 = 0) H1: 1 2 (1 - 2 0) a = 0.05 (5%) x1 = 50 x2 = 48 1 = 5 2 = 5 n1 = 36 n2 = 36 Therefore there is a 95% probability that the mean difference between the populations is between -0.3128 and 4.3128