Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MIS 331 Data Mining 2016/2017 Fall Chapter 2 Sampliing Distribution Confidence Interval Estimation Hypothesis Testing for Variance of a Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-1 Outline Sampling Distributio of Sample Variances Confidence Interval Estimation for the Variance Tests of the Variance of a Normal Distribution Tests of Equality of Two Variances 6.4 Sampling Distributions of Sample Variances Sampling Distributions Sampling Distributions of Sample Means Sampling Distributions of Sample Proportions Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Sampling Distributions of Sample Variances Ch. 6-3 Sample Variance Let x1, x2, . . . , xn be a random sample from a population. The sample variance is n 1 2 s2 (x x ) i n 1 i1 the square root of the sample variance is called the sample standard deviation the sample variance is different for different random samples from the same population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-4 Sampling Distribution of Sample Variances The sampling distribution of s2 has mean σ2 E[s 2 ] σ 2 If the population distribution is normal, then 4 2σ Var(s 2 ) n 1 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-5 Chi-Square Distribution of Sample and Population Variances If the population distribution is normal then χ 2 n 1 (n - 1)s 2 σ 2 has a chi-square (2 ) distribution with n – 1 degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-6 The Chi-square Distribution The chi-square distribution is a family of distributions, depending on degrees of freedom: d.f. = n – 1 0 4 8 12 16 20 24 28 d.f. = 1 2 0 4 8 12 16 20 24 28 d.f. = 5 2 0 4 8 12 16 20 24 28 2 d.f. = 15 Text Appendix Table 7 contains chi-square probabilities Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-7 Expected value of a chi-square distribution with degree of freedom v is v E[2v] = v Variance of achi-square distribution with degree of freedom v is 2v Var[2v] = 2v Since (n-1)s2/2 has a chi-square distribution with df: n-1 E[(n-1)s2/2] = n-1 (n-1)/2E[s2] = n-1 E[s2] = 2, Similarly Var[(n-1)s2/2] = 2(n-1) (n-1)2/4)Var[s2] = 2(n-1) Var[s2] = 24/(n-1) Degrees of Freedom (df) Idea: Number of observations that are free to vary after sample mean has been calculated Example: Suppose the mean of 3 numbers is 8.0 Let X1 = 7 Let X2 = 8 What is X3? If the mean of these three values is 8.0, then X3 must be 9 (i.e., X3 is not free to vary) Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2 (2 values can be any numbers, but the third is not free to vary for a given mean) Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-10 Table 7 in Appandix d.f. versus probabilities for critical values P(210 < KL) = 0.05 KL = 3.940 hence P(210 < 3.940) = 0.05 P(210 > KU) = 0.05 KU = 18.31 hence P(210 > 18.31) = 0.05 Chi-square Example A commercial freezer must hold a selected temperature with little variation. Specifications call for a standard deviation of no more than 4 degrees (a variance of 16 degrees2). A sample of 14 freezers is to be tested What is the upper limit (K) for the sample variance such that the probability of exceeding this limit, given that the population standard deviation is 4, is less than 0.05? Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-12 Finding the Chi-square Value 2 (n 1)s χ2 σ2 Is chi-square distributed with (n – 1) = 13 degrees of freedom Use the the chi-square distribution with area 0.05 in the upper tail: 213 = 22.36 (α = .05 and 14 – 1 = 13 d.f.) probability α = .05 2 213 = 22.36 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-13 Chi-square Example (continued) 213 = 22.36 So: (α = .05 and 14 – 1 = 13 d.f.) (n 1)s2 2 P(s K) P χ13 0.05 16 2 (n 1)K 22.36 16 or so K (where n = 14) (22.36)(16 ) 27.52 (14 1) If s2 from the sample of size n = 14 is greater than 27.52, there is strong evidence to suggest the population variance exceeds 16. Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 6-14 7.5 Confidence Interval Estimation for the Variance Confidence Intervals Population Mean Population Proportion Population Variance (From a normally distributed population) σ2 Known σ2 Unknown Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-15 Confidence Intervals for the Population Variance Goal: Form a confidence interval for the population variance, σ2 The confidence interval is based on the sample variance, s2 Assumed: the population is normally distributed Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-16 Confidence Intervals for the Population Variance (continued) The random variable 2 n1 (n 1)s 2 σ 2 follows a chi-square distribution with (n – 1) degrees of freedom Where the chi-square value n1, denotes the number for which 2 P( χn21 χn21, α ) α Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-17 P(2n-1 > 2n-1,/2 ) = /2 P(2n-1 > 2n-1,1-/2 ) = 1 - /2 or P(2n-1 < 2n-1,1-/2 ) = /2 Finally, P(2n-1,1-/2 < 2n-1 < 2n-1,/2) = 1 - /2 - /2 =1 two numbers such that probability that chisquare with d.f. 6 is llaying between tham is 0.90 P(26,0.950 < 26 < 26,0.05) =0.90 The two numbers 26,0.950 = 1.635 26,0.05 = 12.932 hence P(1.635 < 26 < 12.935) =0.90 Confidence Intervals for the Population Variance (continued) The 100(1 - )% confidence interval for the population variance is given by (n 1)s 2 LCL 2 χ n1, α/2 (n 1)s UCL 2 χ n1, 1 - α/2 2 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-20 Example You are testing the speed of a batch of computer processors. You collect the following data (in Mhz): Sample size Sample mean Sample std dev 17 3004 74 Assume the population is normal. Determine the 95% confidence interval for σx2 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-21 Finding the Chi-square Values n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom = 0.05, so use the the chi-square values with area 0.025 in each tail: 2 χn21, α/2 χ16 , 0.025 28.85 2 χn21, 1 - α/2 χ16 , 0.975 6.91 probability α/2 = .025 probability α/2 = .025 216 = 6.91 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall 216 = 28.85 216 Ch. 7-22 Calculating the Confidence Limits The 95% confidence interval is 2 (n 1)s 2 (n 1)s 2 σ 2 2 χ n1, α/2 χn1, 1 - α/2 2 (17 1)(74) 2 (17 1)(74) σ2 28.85 6.91 3037 σ 2 12680 Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and 112.6 Mhz Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 7-23 9.6 Tests of the Variance of a Normal Distribution Goal: Test hypotheses about the population variance, σ2 (e.g., H0: σ2 = σ02) If the population is normally distributed, 2 n1 (n 1)s σ2 2 has a chi-square distribution with (n – 1) degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-24 Tests of the Variance of a Normal Distribution (continued) The test statistic for hypothesis tests about one population variance is χ 2 n 1 (n 1)s 2 2 σ0 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Chap 11-25 Decision Rules: Variance Population variance Lower-tail test: Upper-tail test: Two-tail test: H0: σ2 σ02 H1: σ2 < σ02 H0: σ2 ≤ σ02 H1: σ2 > σ02 H0: σ2 = σ02 H1: σ2 ≠ σ02 χ n21, χn21,1 Reject H0 if χ 2 n1 χ 2 n1,1 Reject H0 if χ n21 χ n21, Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall /2 /2 χ n21,1 / 2 χn21, / 2 Reject H0 if or χn21 χn21, / 2 χn21 χn21,1 / 2 Chap 11-26 Newbold 9.47 Test the hypothesis H0:2 <=100 againts H1 2 >100 a) s2 = 165, n=25 b) s2 = 165, n=29 c) s2 = 159, n=25 d) s2 = 67, n=38 Solution (n 1)s 2 24(165) 9.47 a. H 0 : 100; H 1 : 100; = 39.6, 2 100 2(24,.025) 39.36, 2(24,.010) 42.98 2 2 Therefore, reject 2 H 0 at the 2.5% level but not at the 1% level of significance. (n 1)s 2 28(165) b. H 0 : 100; H 1 : 100; = 46.2, 2 100 2(28,.025) 44.46, 2(28,.010) 48.28 2 Therefore, reject H 2 0 2 at the 2.5% level but not at the 1% level of significance Solution (n 1) s 2 24(159) c. H 0 : 100; H 1 : 100; = 38.16, 2 100 2(24,.050) 36.42, 2(24,.025) 39.36 2 2 Therefore, reject 2 H 0 at the 5% level but not at the 2.5% level of significance. 2 ( n 1) s 37(67) d. H 0 : 2 100; H 1 : 2 100; = 24.79, 2 100 2(37,.100) 48.36, 2(37,.05) 52.19 2 Therefore, do not reject H 0 at any common level of significance. Newbold 7.48 new safety device random sample for 8 days 618 660 638 625 571 598 639 582 management concenrs about variability test the null hypothesis variance less than 500 at a significance level of 10% 9.48 Solution 2 2 : 500; : H0 H 1 500; reject H 0 if 2(7,.10) > 12.02 2 ( n 1) s 7(933.982) 2 2 = 13.0757, Therefore, reject 500 H 0 at the 10% level 10.4 Tests for Two Population Variances F test statistic Tests of Equality of Two Variances Goal: Test hypotheses about two population variances H0: σx2 σy2 H1: σx2 < σy2 H0: σx2 ≤ σy2 H1: σx2 > σy2 H0: σx2 = σy2 H1: σx2 ≠ σy2 Lower-tail test Upper-tail test Two-tail test The two populations are assumed to be independent and normally distributed Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 10-32 Hypothesis Tests for Two Variances (continued) Tests for Two Population Variances F test statistic The random variable 2 x 2 y s /σ F s /σ 2 x 2 y Has an F distribution with (nx – 1) numerator degrees of freedom and (ny – 1) denominator degrees of freedom Denote an F value with 1 numerator and 2 denominator degrees of freedom by Fν 1,ν 2 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 10-33 Test Statistic Tests for Two Population Variances The critical value for a hypothesis test about two population variances is s F s F test statistic 2 x 2 y where F has (nx – 1) numerator degrees of freedom and (ny – 1) denominator degrees of freedom Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 10-34 Decision Rules: Two Variances Use sx2 to denote the larger variance. H0: σx2 = σy2 H1: σx2 ≠ σy2 H0: σx2 ≤ σy2 H1: σx2 > σy2 /2 0 Do not reject H0 Reject H0 Fnx 1,ny 1,α F Reject H0 if F Fnx 1,ny 1,α 0 Do not reject H0 F Reject H0 Fnx 1,ny 1,α / 2 rejection region for a twotail test is: Reject H0 if F Fnx 1,ny 1,α / 2 where sx2 is the larger of the two sample variances Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 10-35 Example: F Test You are a financial analyst for a brokerage firm. You want to compare dividend yields between stocks listed on the NYSE & NASDAQ. You collect the following data: NYSE NASDAQ Number 21 25 Mean 3.27 2.53 Std dev 1.30 1.16 Is there a difference in the variances between the NYSE NASDAQ at the = 0.10 level? Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall & Ch. 10-36 F Test: Example Solution Form the hypothesis test: H0: σx2 = σy2 (there is no difference between variances) H1: σx2 ≠ σy2 (there is a difference between variances) Find the F critical values for = .10/2: Degrees of Freedom: Numerator (NYSE has the larger standard deviation): nx – 1 = 21 – 1 = 20 d.f. Fnx 1, ny 1, α / 2 F20 , 24 , 0.10/2 2.03 Denominator: ny – 1 = 25 – 1 = 24 d.f. Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 10-37 F Test: Example Solution (continued) The test statistic is: H0: σx2 = σy2 H1: σx2 ≠ σy2 s2x 1.30 2 F 2 1.256 2 s y 1.16 F = 1.256 is not in the rejection region, so we do not reject H0 /2 = .05 Do not reject H0 Reject H0 F F20 , 24 , 0.10/2 2.03 Conclusion: There is not sufficient evidence of a difference in variances at = .10 Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch. 10-38