Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test Introduction to Statistical Inference Lecture 4: Proportion Tests Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kantala References Contents Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References Proportion Test Two Sample Proportion Test References Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References Proportion Test Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References Proportion tests can be used for example when testing proportions of faulty products in a production process. Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test Let x1 , x2 , ..., xn be the observed values of a random variable x. Assume, that the observed values are independent and identically distributed (i.i.d.) and come from the Bernoulli distribution with parameter p. (Now P(xi = 1) = p, P(xi = 0) = 1 − p, E[x] = p and the variance E[(x − E[x])2 ] = p(1 − p).) The null hypothesis H0 : p = p0 . The possible alternative hypotheses: H1 : p > p0 (one tailed), H1 : p < p0 (one tailed) or H1 : p 6= p0 (two tailed). References Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test • The test statistic C = Pn i=1 Two Sample Proportion Test xi . • If the null hypothesis H0 is true, then the test statistic follows binomial distribution with parameters n and p = p0 . • Under the null hypothesis H0 , the expected value of the test statistic is np0 (E[C] = np0 ) and the variance of the test statistic is np0 (1 − p0 ). • If the value of the test static is large or small (compared to the expected value np0 ), evidence against the null hypothesis H0 is found. • The null hypothesis H0 is rejected, if the p-value is small enough. References Binomial distribution Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References More about binomial distribution in Wikipedia. Proportion Test, p-value Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test The distribution of the test statistic C is tabulated and statistical softwares calculate p-values of the test. Let c denote the observed value of the test statistic C. Then the p-value of the test is given as follows: • If alternative hypothesis is H1 : p > p0 , then the p-value is p = P(C ≥ c). • If alternative hypothesis is H1 : p < p0 , then the p-value is p = P(C ≤ c). • If alternative hypothesis is H1 : p 6= p0 , then the p-value is p = 2 min{P(C ≥ c), P(C ≤ c)}. The probabilities P(C ≥ c) and P(C ≤ c) above are calculated under H0 . Two Sample Proportion Test References Asymptotic Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test If the sample size is large, then under the null hypothesis H0 , the standardized test statistic Z =p p̂ − p0 p0 (1 − p0 )/n Pn — where p̂ is the unbiased estimator p̂ = n1 i=1 xi of the parameter p — approximately follows the standard normal distribution. The approximation is usually good enough, if np̂ > 10 and n(1 − p̂) > 10. For smaller samples, the test relies on the exact distribution of the test statistic C. Two Sample Proportion Test References Numerical Example Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References Jack’s unpalatable princess cookies are sold in every store. The cookies are very popular, because some of the cookies have been made with a different recipe to achieve a horrible demon cookie taste. It is stated in the package that 10 % of the cookies are demon cookies. Susan selected 150 cookies randomly and 21 of the cookies tasted unpalatable. You wish to know if the package lies, and decide to test, on 5% significance level, the null hypothesis H0 : p = 0.10. Numerical Example Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test In this proportion test, the null hypothesis is H0 : p = 0.10 and the alternative hypothesis is H1 : p 6= 0.10. Since the sample size is large, normal can be used. Estimated Pn approximation 21 and the test statistic probability p̂ = n1 i=1 xi = 150 Z =p p̂ − p0 p0 (1 − p0 )/n =p 21 150 − 0.1 0.1 · 0.9/150 ≈ 1.632. The p-value is 2 ∗ (1 − 0.9484) = 0.1032 > 0.05. The null hypothesis is not rejected. References Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References Two Sample Proportion Test Two Sample Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test References In two sample proportion test, parameters of two independent Bernoulli distributed samples are compared. Two Sample Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test Let x1 , x2 , ..., xn be observed values of a random variable x and let y1 , y2 , ..., ym be observed values of a random variable y . Assume, that the observed values x1 , x2 , ..., xn are i.i.d. and come from the Bernoulli distribution with parameter px and that the observed values y1 , y2 , ..., ym are i.i.d. and come from the Bernoulli distribution with parameter py . Assume, that xi and yj are independent for all i, j. The null hypothesis H0 : px = py . The possible alternative hypotheses: H1 : px > py (one tailed), H1 : px < py (one tailed) or H1 : px 6= py (two tailed). References Two Sample Proportion Test Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta • Calculate the sample proportions pˆx = pˆy = 1 m Pn i=1 yi , and p̂ = npˆx +mpˆy n+m 1 n Pn i=1 xi , and Two Sample Proportion Test . References • Calculate the test statistic Z =q pˆx − pˆy p̂(1 − p̂) 1 n + 1 m . • If the sample size is large, then under the null hypothesis H0 , the test statistic Z approximately follows standard normal distribution. The approximation is usually good enough, if npˆx > 5, n(1 − pˆx ) > 5, mpˆy > 5 and m(1 − pˆy ) > 5. • If the value of the test static has large absolute value, evidence against the null hypothesis H0 is found. • The null hypothesis H0 is rejected, if the p-value is small enough. Proportion Test References Lecturer: Pauliina Ilmonen Slides: Ilmonen/Virtanen/Ailus/Kanta Proportion Test Two Sample Proportion Test J. S. Milton, J. C. Arnold: Introduction to Probability and Statistics, McGraw-Hill Inc 1995. J. Crawshaw, J. Chambers: A Concise Course in Advanced Level Statistics, Nelson Thornes Ltd 2013. R. V. Hogg, J. W. McKean, A. T. Craig: Introduction to Mathematical Statistics, Pearson Education 2005. Pertti Laininen: Todennäköisyys ja sen tilastollinen soveltaminen, Otatieto 1998, numero 586. Ilkka Mellin: Tilastolliset menetelmät, http://math.aalto.fi/opetus/sovtoda/materiaali.html. References