Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Empirical Rule for X Consider a sample of size n from a population with mean and standard deviation . Suppose X is normal ( or approximately normal), with = and = /n X X (This would be the case if the population is normal or if the sample size is large). Find the probability that X will be within (a) 2 of (b) 3 of . X X (a) P( X will be within 2 of ) X = (88) (b) P( X will be within 3 of ) X = In general the statement “X will be within k of “ means that X lies between X -k X and +k X If X is normal ( or approximately normal), then P( X will be within k of ) = P(-k < Z <k) X (89) Z Confidence Interval Suppose we are given the following: Normal Population: Scores on a standardized test. Population Mean : (unknown) Population S.D.: =1.5 To estimate we will take a srs of size n =25 and use X as our estimator. Recall that since the population is normal, X is normally distributed with = and = /n = 1.5/5 =.3 X X We would like to be able to express this estimate in the form X E or (X – E, X + E ). Here E is some error which determines the accuracy of our estimate. Let’s take E = 2 for now . X Thus we have For any given sample this interval may or may not contain the true mean . It would be useful to know what the probability is that this interval covers . If the interval covers the true mean then is somewhere in the interval above so thatX is in fact within 2 ( =0.6) of . X Thus P [ (X - 2 , X + 2 ) covers ] X X = P (X is within 2 of ) X = = (90) To make the probability above a nice number, .95, we should replace 2 by 1.96. Thus we can say “ For 95% of all samples of size n =25, the interval (X - 1.96 , X + 1.96 ) X X will cover the true value of .” Or, “ For 95% of all samples of size n =25, X will be within 1.96 of the true X population mean .” The 95% value is called the LEVEL OF CONFIDENCE. This tells us the probability the interval will cover . The 1.96 = .588 is called the margin of error. This tells us how accurate X is X (i.e. how closeX will be to for 95% of all samples). The interval (X - 1.96 , X + 1.96 ) is called a 95% X X Z-CONFIDENCE INTERVAL. The simulation below will illustrate how confidence intervals work. (91) MTB > random 25 c1-c40; SUBC> norm 10 1.5. MTB > zint 95 1.5 c1-c40. [ The first two command lines select 40 random samples each of size n =25 from a normal distribution with =10 and = 1.5. The third command line forms the 95% Z-CONFIDENCE INTERVAL for each sample] Confidence Intervals (The assumed sigma = 1.5) Variable C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 N 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 Mean 10.459 9.826 10.388 9.741 10.441 10.331 8.941 10.205 10.163 10.009 10.455 10.365 10.626 10.090 10.339 10.208 10.356 9.943 10.015 9.924 10.037 9.490 9.972 10.330 9.635 9.292 10.053 9.484 10.666 9.896 9.942 10.100 9.483 9.691 10.390 10.569 9.813 9.905 10.442 9.945 StDev 1.661 1.486 1.600 1.297 1.766 1.637 1.264 1.627 1.560 1.619 1.787 1.220 1.475 1.677 1.103 1.480 1.508 1.388 1.318 1.473 1.271 1.345 1.484 1.644 1.609 1.558 1.072 1.726 1.402 1.640 1.583 1.657 1.496 1.623 1.369 1.178 1.326 1.489 1.405 1.919 (92) SE Mean 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 0.300 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( 95.0% CI 9.871, 11.047) 9.238, 10.414) 9.800, 10.976) 9.153, 10.329) 9.853, 11.029) 9.743, 10.919) 8.353, 9.529) 9.617, 10.793) 9.575, 10.751) 9.421, 10.597) 9.867, 11.043) 9.777, 10.953) 10.038, 11.214) 9.502, 10.678) 9.751, 10.927) 9.620, 10.796) 9.768, 10.944) 9.355, 10.531) 9.427, 10.603) 9.336, 10.512) 9.449, 10.625) 8.902, 10.078) 9.384, 10.560) 9.742, 10.918) 9.047, 10.223) 8.704, 9.880) 9.465, 10.641) 8.896, 10.072) 10.078, 11.254) 9.308, 10.484) 9.354, 10.530) 9.512, 10.688) 8.895, 10.071) 9.103, 10.279) 9.802, 10.978) 9.981, 11.157) 9.225, 10.401) 9.317, 10.493) 9.854, 11.030) 9.357, 10.533) QUESTIONS 1. (a) In theory, how many of the above intervals would you expect to cover the true population mean (=10)? (b) In fact how many actually do? (c) If this simulation were repeated would you always find that exactly 36 of the 40 intervals contain ? Explain. 2. Suppose you selected 40 samples of size n =25 from a real population ( where typically the population mean and standard deviation are unknown). (a) Could you form a 95% Z- confidence interval for each sample? Explain. (b) If you knew and formed forty 95% Z-confidence intervals, how many of the intervals would you expect to cover the population ? Could you tell which? Explain. (93) Note: (i) 100(1-)% Z-confidence interval of is given by X Z/2 ; where X = /n X (ii) For 95% Z –confidence interval , = .05. hence 95% Z-confidence interval of is X 1.96 ; X where = /n X (iii) 99% Z-confidence interval of is X 2.5758 ; X where = /n X (iv) 90% Z-Confidence Interval of is X 1.6449 ; X where = /n X (94) The t-distribution The t-distribution depends on a single parameter. This parameter is called its degrees of freedom (df). If sampling is done from a normal distribution whose mean is and standard deviation , then X - Z = /n follows standard normal distribution. Since, in practice is mostly unknown; therefore, we can replace it by its estimate s. The random variable X - T = S /n follows t-distribution with n-1 degrees of freedom. Sketch of t-distribution In comparison with standard normal distribution, the tdistribution has more area in the tails while the standard normal distribution has more area in the middle. t-curve approaches Z-curve if df is large. (95) T-Interval: Confidence Interval for the Mean of a Normal Population ( unknown) If a random sample X1 , X2 . . . Xn is chosen from a normal distribution; then 100(1-)% Confidence Interval of is X t/2 SE where: df for t is n-1, SE = s/n = standard error of X ( the estimated sd of X), X = s2 = s= Margin of Error: E = t/2 SE = t/2 s/n Level of Confidence ( Reliability) : 100(1-)% Notes: 1. For all n, t/2 > z/2 . 2. For df = , t/2 = z/2 , which are the entries at the bottom of the t –table. 3. For large n (n >30), the normality assumption may be ignored because of the Central Limit Theorem. 4. The estimate of , X is the mid-point of the CI and the margin of error is one half the width of the CI. L X U Thus, X = (L+U)/2 (96) and E = (U – L)/2 Example: In a health study the birth weights of a random sample of 100 newborns from mothers with a low socioeconomic status in a large US city was recorded. The sample yielded a mean of 3.21 kg with a standard deviation of 0.71 kg. (a) Find a 90% confidence interval for the true mean birth weight of newborns from mothers with a low socioeconomic status. (b) Interpret the confidence interval. Solution: Here we wish to estimate = mean birth weight of all newborns from mothers with a low socioeconomic status in this US city. Given: n= x = [estimate of ] s= [estimate of ] Since n > 30, it is not necessary that the population be normal ( due to the CLT). For a 90% CI, t/2 = = , df = n –1 = 99 x t/2 s/n = = or, (c) x = _________ estimates the true population mean with margin of error E =____________ and level of confidence (Reliability)____________. The level of confidence gives the proportion of intervals found this way that would cover . (97) Note: The interpretation of a confidence interval as given in the example above is the popular interpretation often heard on television or reported in newspapers. A mathematically precise interpretation of the confidence interval for this example would be “ Prior to sampling there was a .90 probability that the confidence interval to be formed would contain the true population mean “. Example: For the data in the example above, find a 95% confidence interval for the true mean birth weight of newborns from mothers with a low socioeconomic status. Solution: Recall, n = 100, x = 3.21, For a 95% CI, t/2 = s = 0.71 . = , df = n –1 = 99 x t/2 s/n = = or, Interpretation: x = _________ estimates the true population mean with margin of error E =____________ and level of confidence (Reliability)____________. (98) Example: For the data in the example above, find a 99% confidence interval for the true mean birth weight of newborns from mothers with a low socioeconomic status. Solution: Recall, n = 100, x = 3.21, For a 99% CI, t/2 = s = 0.71 . = , df = n –1 = 99 x t/2 s/n = = or, Interpretation: x = _________ estimates the true population mean with margin of error E =____________ and level of confidence (Reliability)____________. Question: Considering these three examples, if the level of confidence is increased and all other things remain the same, the width of the confidence interval will_______________ . (99) Example: A study was conducted to determine the effect of acid rain on the lake water in an industrial region of the country. The data below gives the pH levels from a random sample of 10 lakes from this region. ( It was assumed that the sample came from a normal distribution). Minitab was used to find a 95% confidence interval for the mean pH level for all lakes in this region. C1: 6.6 7.1 7.3 6.7 6.8 6.2 6.5 5.9 6.9 6.3 MTB > tint 95 c1 One-Sample T: C1 Variable C1 N 10 Mean 6.630 StDev 0.424 SE Mean 0.134 ( 95.0% CI 6.326, 6.934) From the Minitab output answer the following: (a) What is the 95% confidence interval of ? (b) What is the estimate of and the estimated standard deviation of this estimate? (c) What is the margin of error E and level of confidence (reliability) for the estimate of ? (100) Sample Size Determination for Estimating Problem: Suppose you wish to estimate a population mean with a specified margin of error E and level of confidence. What sample size should be used? Solution: We know that E = t/2 s/n . Now we solve this equation for n. E2 = nE2 = n= = [t/2 s/E]2 Of course since we have not sampled yet we do not have values for s or t/2 . In practice t/2 is replaced by z/2 and s is replaced by a prior estimate . Thus n [z/2 / E]2 , rounded up to the next whole number. Example: How large a sample would be required to estimate the mean pH level for all lakes in the industrial region to within .1 with level of confidence 95%. Assume that prior estimate for is 0.424. (101)