Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 1 Chapter 7 Estimates and Sample Sizes 7-1 Review and Preview 7-2 Estimating a Population Proportion 7-3 Estimating a Population Mean: σ Known 7-4 Estimating a Population Mean: σ Not Known 7-5 Estimating a Population Variance Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 2 Section 7-1 Review and Preview Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 3 Preview This chapter presents the beginning of inferential statistics. The two major activities of inferential statistics are: (1) to use sample data to estimate values of a population parameters (2) to test hypotheses or claims made about population parameters – later chapters. We introduce methods for estimating values of these important population parameters: proportions, means, and variances. We also present methods for determining sample sizes necessary to estimate those parameters. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 4 Section 7-2 Estimating a Population Proportion Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 5 Key Concept In this section we present methods for using a sample proportion (like presidential approval proportion in a poll of 1200) to estimate the value of a population proportion (entire US population). • ˆ The sample proportion p is the best point estimate (one number) of the unknown population proportion p. A point estimate is a single value (or point) used to approximate a population parameter. • We can use a sample proportion distribution to construct a confidence interval of a population proportion. phat is used as the point estimate of p because it is unbiased and it is the most consistent of the estimators that could be used. Unbiased – distribution of sample proportions tends to center about the value of p, not over/under estimate it. Consistent – standard deviation of sample proportions tends to be smaller than the standard deviation of any other unbiased estimators. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 6 Example: In the Chapter Problem we noted that in a Pew Research Center poll, 70% of 1501 randomly selected adults in the United States believe in global warming, so the sample proportion is p̂ = 0.70. Find the best point estimate of the proportion of all adults in the United States who believe in global warming. Because the sample proportion is the best point estimate of the population proportion, we conclude that the best point estimate of p is 0.70. When using the sample results to estimate the percentage of all adults in the United States who believe in global warming, the best estimate is 70%. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 7 Sample proportion is a good estimate of population proportion, but just how good is it??? A confidence interval (CI) is a range (or an interval) of values used to estimate the true value of a population parameter . For example: presidential approval rating is 51% with 3% margin of error => CI = [48%,54%]. It is based on known or approximated distribution of sample proportion – it is approximately normal via normal approximation of the binomial. A confidence level is the probability 1 – that the confidence interval actually does contain the population parameter, assuming that the estimation process of creating a confidence interval is repeated a large number of times. (The confidence level is also called degree of confidence, or the confidence coefficient.) Most common choices for confidence level are are 90%, 95%, or 99% ( = 10%), ( = 5%), ( = 1%) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 8 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 9 Interpreting a Confidence Interval We must be careful to interpret confidence intervals correctly. There is a correct interpretation and many different and creative incorrect interpretations of the confidence interval. For example: 0.677 < p < 0.723. (70% with margin of error 2.3%) “We are 95% confident that the interval from 0.677 to 0.723 actually contains the true value of the population proportion p.” This means that if we were to select many different samples of size 1501 and construct the corresponding confidence intervals, 95% of them would actually contain the value of the population proportion p. Note that in this correct interpretation, the level of 95% refers to the success rate of the process being used to estimate the proportion. Incorrect interpretations: 1) There is 95% chance that the true value of p falls between 0.677 and 0.723. Any specific population has a fixed and constant value p, and a confidence interval constructed from a sample either contains p or not – there is no probability involved. 2) 95% of sample proportions fall between 0.677 and 0.723. No – we are trying to estimate the true population proportion, not sample proportions. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 10 Interpreting a Confidence Interval 19 0.95 20 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 11 Critical Values We use normal approximation of the binomial distribution of sample proportion a standard z score is used to distinguish between sample statistics that are likely to occur and those that are unlikely to occur. Such a z score is called a critical value. Critical values are based on the following observations: 1) Under certain conditions, the sampling distribution of sample proportions can be approximated by a normal distribution. That is binomial approximation of normal distribution from Ch 6. 2) A z score associated with a sample proportion has a probability of /2 of falling in the right tail. 2) The z score separating the right-tail region is commonly denoted by z/2 and is referred to as a critical value because it is on the borderline separating z scores from sample proportions that are likely to occur from those that are unlikely to occur. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 12 Finding z2 for a 95% Confidence Level = 5% 2 = 2.5% = 0.025 TI 2nd VARS invNormal(0.975) Excel =NORMSINV(0.975) z2 -z2 StatDisk Analysis->Prob Distibutions->Normal > alpha = 0.05 > z_alpha = qnorm(1-alpha/2) > z_alpha [1] 1.959964 Critical Values Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 13 When data from a simple random sample are used to estimate a population proportion p, the margin of error, denoted by E, is the maximum likely difference (with probability 1 – , such as 0.95) between the observed proportion p̂ and the true value of the population proportion p. The margin of error E is also called the maximum error of the estimate and can be found by multiplying the critical value and the standard deviation of the sample proportions: Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 14 Margin of Error for Proportions How to calculate mean and standard deviation of the set of population proportion s? Suppose the sample size is n and the actual unknown population proportion is p. For proportion s, we have n independen t trials for each person/ele ment of a sample. Each trial is success with probabilit y p and failure with probabilit y q 1 - p. We deal with binomial distributi on. Its mean is np and standard deviation is npq . E z 2 ˆˆ pq n For large n, the binomial can be approximat ed by normal distributi on. However, here we are interested in the proportion rather tha n just number of successes. When we divide all elements by the total number of elements n, the mean and standard deviation are divided by the same n npq np pq p pˆ d n n n Furhermore , if each element in an approximat ely normal distributi on is divided by the same constant, the result wil l still be approximat ely normal with above parameters pˆ p z Normal (0,1) pq n For 1 CI take z / 2 pˆ There is 1 probabilit y that z / 2 z z / 2 z / 2 pˆ p z / 2 z / 2 pq n pq pˆ p z / 2 n p̂ pq n pq E pˆ p E pˆ E p E pˆ n pˆ Copyright E p©2010, pˆ 2007, E 2004 Pearson Education, Inc. All Rights Reserved. Let E z / 2 p̂ 7.1 - 15 Procedure for Constructing a Confidence Interval for p 1. Verify that the required assumptions are satisfied. 1)The sample is a simple random sample. 2) The conditions for the binomial distribution are satisfied. 3) The normal distribution can be used to approximate the distribution of sample proportions because np 5, and nq 5 are both satisfied. 2. Find the critical value z/2 that corresponds to the desired confidence level (table or technology). 3. Evaluate the margin of error ˆˆ n E z 2 pq 4. Using the value of the calculated margin of error, E and the value of the sample proportion, p, find the values of p – E and p + E. Substitute those values in the general format for the confidence interval: Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 16 Example: In the Chapter Problem we noted that a Pew Research Center poll of 1501 randomly selected U.S. adults showed that 70% of the respondents believe in global warming. The sample results are n = 1501, and pˆ 0.70 a. Find the margin of error E that corresponds to a 95% confidence level. b. Find the 95% confidence interval estimate of the population proportion p. c. Based on the results, can we safely conclude that the majority of adults believe in global warming? d. Assuming that you are a newspaper reporter, write a brief statement that accurately describes the results and includes all of the relevant information. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 17 Requirement check: 1) simple random sample; 2) binomial distribution: fixed number of trials = 1501, trials are independent, two categories of outcomes (believes or does not); probability remains constant. 3) Number of successes and failures 1501*0.7, 1501*0.3 are both >5. a) Use the formula to find the margin of error. ˆˆ pq E z 2 1.96 n E 0.023183 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 0.70 0.30 1501 7.1 - 18 b) The 95% confidence interval: pˆ E p pˆ E TI Stat->Tests->1-PropZInt 0.70 0.023183 p 0.70 0.023183 0.677 p 0.723 STATDISK Analysis->Confidence Intervals->One Sample Proportion Margin of error, E = 0.0231784 > pbar = 0.7 > n = 1501 > alpha = 0.05 > z_alpha = qnorm(1-alpha/2) > z_alpha [1] 1.959964 > > E = z_alpha * sqrt( pbar*(1-pbar)/n ) >E [1] 0.02318288 > CI = c(pbar - E, pbar + E) > CI [1] 0.6768171 0.7231829 95% Confidence Interval: 0.6770214 < p < 0.7233783 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 19 Excel Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 20 c) Based on the confidence interval obtained in part (b), it does appear that the proportion of adults who believe in global warming is greater than 0.5 (or 50%), so we can safely conclude that the majority of adults believe in global warming. Because the limits of 0.677 and 0.723 are likely to contain the true population proportion, it appears that the population proportion is a value greater than 0.5. d) Here is one statement that summarizes the results: 70% of United States adults believe that the earth is getting warmer. That percentage is based on a Pew Research Center poll of 1501 randomly selected adults in the United States. In theory, in 95% of such polls, the percentage should differ by no more than 2.3 percentage points in either direction from the percentage that would be found by interviewing all adults in the United States. Caution: Never follow the common misconception that poll results are unreliable if the sample size is a small percentage of the population size. The population size is usually not a factor in determining the reliability of a poll. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 21 Sample Size Suppose we want to collect sample data in order to estimate some population proportion with given accuracy. The question is how many sample items must be obtained? Note that sample size does NOT depend on the population size N !!! Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 22 TI 2nd VARS -> invNormal(0.975) > pbar = 0.73; qbar = 1-pbar; > alpha = 0.05 > E = 0.03 > z_alpha = qnorm(1-alpha/2) > z_alpha [1] 1.959964 > n = z_alpha^2 * pbar * qbar / E^2 >n [1] 841.2795 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. > pbar = 0.5; qbar = 1-pbar; > alpha = 0.05 > E = 0.03 > z_alpha = qnorm(1-alpha/2) > z_alpha [1] 1.959964 > n = z_alpha^2 * pbar * qbar / E^2 >n [1] 1067.072 7.1 - 23 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 24 My example Obtaining point estimate and margin of error from CI A journal article says that of 1200 people surveyed the 99% confidence interval for presidential approval is (37%,42%) > CI_Lower = 0.37 > CI_Upper = 0.42 > pbar = ( CI_Lower + CI_Upper ) /2 > pbar [1] 0.395 > E = (CI_Upper - CI_Lower) /2 >E [1] 0.025 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 25 Section 7-3 Estimating a Population Mean: Known In addition to knowing the values of the sample data or statistics, in this section we also assume, we know the value of the population standard deviation, . Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 26 Point Estimate of the Population Mean The sample mean x is the best point estimate of the population mean µ. 1. For all populations, the sample mean x is an unbiased estimator of the population mean , meaning that the distribution of sample means tends to center about the value of the population mean . (review Ch 6) 2. For many populations, the distribution of sample means x tends to be more consistent (with less variation) than the distributions of other sample statistics. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 27 Confidence Interval for Estimating a Population Mean (with Known) = population mean = population standard deviation x = sample mean n = number of sample values E = margin of error z/2 = z score separating an area of a/2 in the right tail of the standard normal distribution Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 28 Confidence Interval for Estimating a Population Mean (with Known) 1. The sample is a simple random sample. 2. The value of the population standard deviation is known. 3. Either or both of these conditions is satisfied: The population is normally distributed OR n > 30 – so that it is approximately normally distributed by CLT. In either case, the most important part is that: If we collect simple random samples of size n, the sample means x are either exactly or approximately Normal(, ) z x n Normal (0,1) n For 1 CI take z / 2 There is 1 probabilit y that z / 2 z z / 2 z / 2 x z / 2 z / 2 n x z / 2 n n E x E x E E x n x E ©2010, x2007, E 2004 Pearson Education, Inc. All Rights Reserved. Copyright Let E z / 2 7.1 - 29 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 30 > xbar = 172.55 > n = 40 > sigma = 26 > alpha = 0.05 > z_alpha = qnorm(1-alpha/2) > z_alpha [1] 1.959964 > > E = z_alpha * sigma/sqrt(n) >E [1] 8.057335 > CI = c(xbar - E, xbar + E) > CI [1] 164.4927 180.6073 TI Stat->Tests->Zinterval STATDISK Analysis->Confidence Intervals ->Mean One Sample Margin of error, E = 8.057328 95% Confident the population mean is within the range: 164.4927 < mean <180.6073 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 31 Finding a Sample Size for Estimating a Population Mean When collecting a simple random sample that will be used to estimate a population mean , how many sample values (n) must be obtained to get a certain margin or error? E z / 2 z / 2 n n E 2 If the computed sample size n is not a whole number, round the value of n up to the next larger whole number. If we want more accurate estimate, we need to decrease the margin of error E => sample size n must be substantially increased. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 32 Example: Assume that we want to estimate the mean IQ score for the population of statistics students. How many statistics students must be randomly selected for IQ tests if we want 95% confidence that the sample mean is within 3 IQ points of the population mean? Note that IQ tests are typically designed so that mean is around 100 and standard deviation is 15 = 0.05 /2 = 0.025 z / 2 = 1.96 E = 3 = 15 n = 1.96 • 15 2 = 96.04 = 97 3 With a simple random sample of only 97 statistics students, we will be 95% confident that the sample mean is within 3 IQ points of the true population mean . > E = 3; sigma = 15; > alpha = 0.05 > z_alpha = qnorm(1-alpha/2) > z_alpha [1] 1.959964 > n = ( z_alpha*sigma/E )^2 >n [1] 96.03647 TI 2nd VARS -> invNormal(0.975) Excel =NORMSINV(0.975) Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 33 Section 7-4 Estimating a Population Mean: Not Known – most typical situation Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 34 Sample Mean As before, the sample mean x is the best point estimate of the population mean. But now we do not assume that we know population standard deviation , still we can estimate it with sample standard deviation s. x NOT Normal, but t distributi on s n n For 1 CI take t / 2 from t table or calculator z x Normal (0,1) t There is 1 probabilit y that t / 2 t t / 2 t / 2 x s t / 2 E t / 2 and x E x E s n n William Gossett, Guinness Brewery or TI84 The number of degrees of freedom for a collection of sample data is the number of sample values that can vary after certain restrictions have been imposed on all data values. The degree of freedom is often abbreviated df. For our case the restriction is that the sample mean is already found: sum(x)/n = xbar – fixed number. If say you are averaging 25 test scores and already found that the sample mean is 77, then the sum is fixed at 77*25 =1925. Then, we can freely assign only 24 scores, the last one is fixed to have sum = 1925. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. => degrees of freedom df = n –7.1 1 - 35 Procedure for Constructing a Confidence Interval for µ (With σ Unknown) 1. Verify that the requirements are satisfied. 2. Using n – 1 degrees of freedom, refer to Table A-3 or use technology to find the critical value t2 that corresponds to the desired confidence level. 3. Evaluate the margin of error E = t2 • s / n . 4. Find the values of x E and x E. Substitute those values in the general format for the confidence interval: x E x E 5. Round the resulting confidence interval limits. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 36 TI 2nd VARS invT(0.975,6) STATDISK Analysis->Probability Distributions->Student t-distrib t Value: -2.446909 Prob Dens: 0.0339541 > # Example 1 How to find critical value t_alpha2 > alpha = 0.05 >n=7 > df = n-1 > t_alpha2 = qt(1-alpha/2,df) > t_alpha2 [1] 2.446912 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Cumulative Probs Left: 0.025000 Right: 0.975000 2 Tailed: 0.050000 Central: 0.950000 6 Degrees Freedom 7.1 - 37 Example: A common claim is that garlic lowers cholesterol levels. In a test of the effectiveness of garlic, 49 subjects were treated with doses of raw garlic, and their cholesterol levels were measured before and after the treatment. The changes in their levels of LDL cholesterol (in mg/dL) have a mean of 0.4 and a standard deviation of 21.0. Use the sample statistics of n = 49, x = 0.4 and s = 21.0 to construct a 95% confidence interval estimate of the mean net change in LDL cholesterol after the garlic treatment. What does the confidence interval suggest about the effectiveness of garlic in reducing LDL cholesterol? Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 38 Requirements are satisfied: simple random sample n = 49 (i.e., n > 30 – good enough even for arbitrary non-normal distribution). 95% implies alpha = 0.05. With n = 49, the df = 49 – 1 = 48 TI 2nd VARS invT(0.975,48) Closest df in table is 50, two tails, so t/2 = 2.009 Using t/2 = 2.009, s = 21.0 and n = 49 the margin of error is: Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 39 x 0.4, E 6.027 TI Construct the confidence interval: x E x E 0.4 6.027 0.4 6.027 5.6 6.4 STAT->TESTS->TInterval > xbar = 0.4 > n = 49 > s = 21.0 > alpha = 0.05 > df = n-1 > t_alpha2 = qt(1-alpha/2,df) > t_alpha2 [1] 2.010635 > > E = t_alpha2 * s/sqrt(n) >E [1] 6.031904 > CI = c(xbar - E, xbar + E) > CI [1] -5.631904 6.431904 We are 95% confident that the limits of –5.6 and 6.4 actually do contain the value of , the mean of the changes in LDL cholesterol for the population. Because the confidence interval limits contain the value of 0, it is very possible that the mean of the changes in LDL cholesterol is equal to 0, suggesting that the garlic treatment did not affect the LDL cholesterol levels. It does not appear that the garlic treatment is effective in lowering LDL cholesterol. STATDISK Analysis->Confidence Intervals->Mean One Sample Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Margin of error, E = 6.031898 95% Confident the population mean is within the range: 7.1 -5.631898 < mean <6.431898 - 40 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 41 Properties of the Student t Distribution 1. The Student t distribution is different for different sample sizes (see the picture, for the cases n = 3 and n = 12). 2. The Student t distribution has the same general symmetric bell shape as the standard normal distribution but it reflects the greater variability (with wider distributions) that is expected with small samples. 3. The Student t distribution has a mean of t = 0 (just as the standard normal distribution has a mean of z = 0). 4. The standard deviation of the Student t distribution varies with the sample size and is greater than 1 (unlike the standard normal distribution, which has a = 1). 5. As the sample size n gets larger, the Student t distribution gets closer to the normal distribution. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 42 Choosing the Appropriate Distribution Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 43 Example: 1) n 11, x 24, 14 and population has normal distributi on normal and σ is known use z / 2 2) n 11, x 24, s 14 and population has normal distributi on normal and σ is NOT known use t / 2 3) n 11, x 24, 14 and population has skewed distributi on NOT normal and n 30 can NOT use this section te chniques 4) n 40, x 24, 14 and population has skewed distributi on n 30 can use CLT, and σ is known use z / 2 5) n 40, x 24, s 14 and population has skewed distributi on n 30 and s is NOT known use t Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. /2 7.1 - 44 Example, CI for mean women weight. On TI calculator: Store values in a list L1. Stat->Tests-> TInterval > DataFrame = read.csv("FHEALTH.csv") > attach(DataFrame) > > x = WT # weight >x [1] 114.8 149.3 107.8 ….. > n = length(x) >n # check how big is the sample [1] 40 > qqnorm(x); qqline(x); # check normality assumption > > xbar = mean(x) > xbar [1] 146.22 > s = sd(x) # sigma is NOT given, use s - sample standard deviation >s [1] 37.62104 > alpha = 0.01 > df = n-1 > t_alpha2 = qt(1-alpha/2,df) > t_alpha2 [1] 2.707913 STATDISK > 1st use Datasets to open Female Health Exam > E = t_alpha2 * s/sqrt(n) # error margin 2nd Data-> Descriptive Statistics on column 3 of weights >E 3rd Analysis->Confidence Intervals->Mean One Sample [1] 16.10777 Margin of error, E = 16.10776 > CI = c(xbar - E, xbar + E) # CI for mean mu of the population > CI 99% Confident the population mean is within the range: [1] 130.1122 162.3278 130.1122 < mean <162.3278 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 45 Excel computation for pulse in FHEALTH.XLS Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 46 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 47 Finding the Point Estimate and E from a CI Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 48 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 49 skip Section 7-5 Estimating a Population Variance s Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 50 Chi-Square Distribution In a normally distributed population with variance 2 assume that we randomly select independent samples of size n and, for each sample, compute the sample variance s2 (which is the square of the sample standard deviation s). The sample statistic 2 (pronounced chi-square) has a sampling distribution called the chi-square distribution. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 51 Chi-Square Distribution = 2 (n – 1) s2 2 where n = sample size s 2 = sample variance 2 = population variance degrees of freedom = n – 1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 52 Properties of the Distribution of the Chi-Square Statistic 1. The chi-square distribution is not symmetric, unlike the normal and Student t distributions. As the number of degrees of freedom increases, the distribution becomes more symmetric. Chi-Square Distribution Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chi-Square Distribution for df = 10 and df = 20 7.1 - 53 Properties of the Distribution of the Chi-Square Statistic – cont. 2. The values of chi-square can be zero or positive, but they cannot be negative. 3. The chi-square distribution is different for each number of degrees of freedom, which is df = n – 1. As the number of degrees of freedom increases, the chisquare distribution approaches a normal distribution. In Table A-4, each critical value of 2 corresponds to an area given in the top row of the table, and that area represents the cumulative area located to the right of the critical value. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 54 Example A simple random sample of ten voltage levels is obtained. Construction of a confidence interval for the population standard deviation requires the left and right critical values of 2 corresponding to a confidence level of 95% and a sample size of n = 10. Find the critical value of 2 separating an area of 0.025 in the left tail, and find the critical value of 2 separating an area of 0.025 in the right tail. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 55 Example Critical Values of the Chi-Square Distribution Excel =CHIINV(0.025,9) =CHIINV(0.975,9) > alpha = 0.05 > n = 10 > df = n-1 > df [1] 9 > chi2L = qchisq(alpha/2,df) > chi2L [1] 2.700389 > chi2R = qchisq(1-alpha/2,df) > chi2R [1] 19.02277 STATDISK Analysis->Prob Distr->Chi Squared Chi Sq Value: 2.700392 Prob Dens: 0.0318663 Cumulative Probs Left: 0.025000 Right: 0.975000 9 Degrees Freedom Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 56 Estimators of 2 The sample variance s2 is the best point estimate of the population variance 2. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 57 Estimators of The sample standard deviation s is a commonly used point estimate of (even though it is a biased estimate). Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 58 Confidence Interval for Estimating a Population Standard Deviation or Variance Requirements: 1. The sample is a simple random sample. 2. The population must have normally distributed values (even if the sample is large). Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 59 Confidence Interval for Estimating a Population Standard Deviation or Variance If we obtain simple random samples of size n from a normally distribute d population with varia nce 2 , there is probabilit y (1 ) that the statistic : (n 1) s 2 2 2 L will fall between critical values L2 and R2 , i.e. (n 1) s 2 2 2 R 1 2 L 2 (n 1) s 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 1 2 R (n 1) s 2 2 R 2 (n 1) s 2 L2 7.1 - 60 Confidence Interval for Estimating a Population Standard Deviation or Variance Confidence Interval for the Population Standard Deviation n 1s 2 R 2 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. n 1s 2 2 L 7.1 - 61 Example: The proper operation of typical home appliances requires voltage levels that do not vary much. Listed below are ten voltage levels (in volts) recorded in the author’s home on ten different days. These ten values have a standard deviation of s = 0.15 volt. Use the sample data to construct a 95% confidence interval estimate of the standard deviation of all voltage levels. 123.3 123.5 123.7 123.4 123.6 123.5 123.5 123.4 123.6 123.8 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 62 Example: Requirements are satisfied: simple random sample and normality Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 63 > s = 0.15 > CIvar = c( (n-1)*s^2/chi2R, (n-1)*s^2/chi2L ) > CIvar [1] 0.01064514 0.07498918 > CIsd = sqrt(CI) > CIsd [1] 0.1031753 0.2738415 Example: n = 10 so df = 10 – 1 = 9 Use table A-4 to find: 2.700 2 L and 19.023 2 R Construct the confidence interval: n = 10, s = 0.15 n 1s 2 2 R 10 10.15 2 2 95% Confidence Interval for the st dev: 0.1031752 < SD < 0.2738414 L2 2 19.023 n 1s STATDISK Analysis->Confidence Intervals->St-Dev One Sample 10 10.15 2 2 2.700 95% Confidence Interval for the variance: 0.0106451 < VAR < 0.0749891 TI does not have it. There are some special programs you have to upload, see p 377. Excel use DDXL p 377. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 64 Example: Evaluation the preceding expression yields: 0.010645 0.075000 2 Finding the square root of each, yields this 95% confidence interval estimate of the population standard deviation: 0.10 volt 0.27 volt. Based on this result, we have 95% confidence that the limits of 0.10 volt and 0.27 volt contain the true value of . Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 65 Determining Sample Sizes The procedures for finding the sample size necessary to estimate 2 are much more complex than the procedures given earlier for means and proportions. Instead of using very complicated procedures, we will use Table 7-2. STATDISK also provides sample sizes. With STATDISK, select Analysis, Sample Size Determination, and then Estimate St Dev. Minitab, Excel, and the TI-83/84 Plus calculator do not provide such sample sizes. 7.1 - 66 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Determining Sample Sizes Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 67 Example: We want to estimate the standard deviation of all voltage levels in a home. We want to be 95% confident that our estimate is within 20% of the true value of . How large should the sample be? Assume that the population is normally distributed. From Table 7-2, we can see that 95% confidence and an error of 20% for correspond to a sample of size 48. => We should obtain a simple random sample of 48 voltage levels form the population of voltage levels. Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. 7.1 - 68