Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Statistical Inference and Sampling Normal Curve for Population Individual observations, X’s, follow a normal distribution with mean = μ and standard deviation = σ. The following figure portrays the shape of normal population. μ x That is, X is a normal random variable. The corresponding standard normal variable Z can be obtained by the following. 2 Z X Examples on Normal Curve for Population The estimated miles-per-gallon ratings of a class of trucks are normally distributed with a mean of 12.8 and a standard deviation of 3.2. What is the probability that one of these trucks selected at random would get between 13 and 15 miles per gallon? Z 12.8 13 15 X P(13 X 15) ? X 13 12.8 0.0625 0.06 3.2 So, P(0 z z1) 0.0236 z1 Or, the area from mean to z1 = 0.0239 15 12.8 0.6875 0.69 3.2 So, P(0 z z 2) 0.2549 z2 3 0 z1 z2 P( z1 z z 2) ? z Or, the area from mean to z2 = 0.2549 Or, the area from z1 to z2 = ? So, the area from z1 to z2 = 0.2549 – 0.0239 = 0.231 Examples on Normal Curve for Population The examination committee of the American Society for Quality passes 40% of those that take the exam. If the scores follow a normal distribution with an average score of 75 and a standard deviation of 16, what is a minimum passing score? Z 40% 75 X X X P( X ?) 40% 0.40 X 75 16 X 75 (16)(0.26) X 75 4.16 79.16 0.26 40% 4 0 z Z P( z ?) 40% 0.40 The area from mean to z = 0.50 – 0.40 = 0.10 So, z = 0.26 [From Normal Dist. Table] Estimation Statistical estimation is the process of estimating a parameter of a population from a corresponding sample statistic. Example: Usually population means (μ) are unknown and have to be estimated from sample means (X ). Two Approaches to Statistical Estimation 5 Point estimate: A single value that represents the best estimate of the population value. For example, the sample mean (X ) is the best point estimate for the population mean (μ). Similarly, the sample standard deviation (s) is the best point estimate for the population standard deviation (σ). That is, μ = X-bar, and σ = s. Interval estimation: Builds on point estimate to arrive at a range of values that we are confident contain the population parameter. The range of values is called a confidence interval. For example, the confidence interval for population mean (μLL≤ μ ≤ μUL) can be estimated from the sample mean. X-bar μLL μUL Note that μLL and μUL are equidistant from X-bar, and are estimated from X-bar Distribution of X-bar X-bar is a random variable, because different samples drawn from the same population on a specific characteristic will result in different values of X-bar. Since the sample mean, X-bar, is used to estimate the population mean, μ, we need to understand how X-bar behaves. That is, if we observe values of X-bar indefinitely, where will they center and how will they spread out? X-bar is normally distributed regardless of the shape of the sampled population. That is, if we observe values of X-bar indefinitely and plot these values in a graph, we will obtain a normal curve. The distribution of X-bar is based on the Central Limit Theorem. Central Limit Theorem states that when obtaining large samples (generally n > 30) from any population, the sample mean, X-bar, will follow an approximate normal distribution. 6 X The probability distribution of X-bar is called the sampling distribution of Xbar. Sampling Distribution of the Sample Mean The mean of the distribution of X-bar is denoted by μX-bar and equals μ. That is, μX-bar = μ. The standard deviation of the distribution (denoted by σX-bar) equals σ/SQRT(n). That is, σX-bar = σ/SQRT(n). The standard deviation of the distribution is called the standard error. x = 7 σ n µx = μ X follows a normal distribution, centered at µ with a standard deviation / n X The corresponding standard normal variable Z of X-bar can be obtained by the following. Z X X X X n Normal Curves for Population and Sample Mean Population (mean = µ, standard deviation = ) X = value from this population Assumes the individual observations follow a normal distribution Random sample (mean = X, standard deviation = s X follows a normal distribution, centered at µ with a standard deviation / n 8 x = μ x µx = μ σ n X Example of Normal Population and Sampling Distribution of Mean The life span of Good Old Everglo Bulbs follows a normal distribution with a mean of life of 400 hours and a standard deviation of 30 hours. a) What percentage of bulbs sold would you expect to last more than 445 hours? b) What is the probability that 4 bulbs selected at random will have an average life span of more than 445 hours? x = = 30 30 4 P(X>445) μ = 400 445 x µx =μ=400 445 P(X-bar>445) X 9 Z X 445 400 1.5 30 P(X > 445) = P(Z>1.5) = 0.5 – 0.4332 = 0.0668 Z X n 445 400 45 3.0 30 15 4 P(X-bar > 445) = P(Z>3.0) = 0.5 – 0.4987 = 0.0013 Confidence Intervals (CI) for Population Mean According to the distribution of X-bar, the mean of all possible values of X-bar gives the population mean. Then why estimate the population mean? σ x = n X µx = μ CI for µ builds on sample mean to arrive at a range of values that will 10 include the population mean. The boundaries of these values are called confidence limits. There are two confidence limits – lower limit and upper limit. x = σ n X µLL µUL Confidence Intervals (CI) for Population Mean How can we obtain μLL and μUP? x = σ n z / 2 We know that, X µLL µUL -zα/2 +zα/2 For µLL, z / 2 X LL 11 For µLL, z / 2 n , LL X z / 2 n X UL X , UL X z / 2 n Therefore, CI for X z / 2 n n n Confidence Intervals (CI) for Population Mean How to obtain Z values? The values (–z) and (+z) are equidistant from the center of the curve. The area from (-z) to (+z) is called the confidence level (CL). Significance Level Confidence Level The significance level equals (1 – CL) and is denoted by α (alpha). We can obtain Z values if we know either the significance level or the confidence level. X µLL µUL -zα/2 +zα/2 Confidence Level + Significance Level = 1 To obtain the Z value, we need to know the area from the center of the curve to the Z value. This area equals (CL/2). Use Normal Distribution Table to obtain Z value. When the population standard deviation, σ, is known, the distribution 12 of X-bar follows a Z normal distribution. Therefore, we use the following to calculate the CI for population mean when σ is known. CI for X z / 2 n Examples on CI for Population Mean When σ Is Known A random sample of 100 observations is obtained from a normally distributed population with a standard deviation of 10. What is a 95% confidence interval for the mean of the population if the sample mean is 40? 0.95 - Zα/2 0.475 Z α/2 = 1.96 X-bar = 40, n = 100, σ = 10, Zα/2 = 1.96 13 10 95% CI for X z / 2 40 1.96 (38.04,41.96) n 100 Examples on CI for Population Mean When σ Is Known Find the 90% confidence interval for the mean of a normally distributed population using the following data. Assume a standard deviation of 5. 49 50 43 65 52 45 60 38 62 0.90 0.45 - Zα/2 Zα/2 = 1.65 X-bar = 464/9 = 51.56, n = 9, σ = 5, Zα/2 = 1.65 14 5 95% CI for X z / 2 51.56 1.65 (48.81,54.31) n 9 CI for Population Mean When σ Is Unknown When σ is unknown, (1) the distribution of X-bar follows a t normal distribution instead of Z normal distribution, and (2) σ is estimated by the sample standard deviation, s. s x = n We know that, X t / 2,n 1 s n X 15 µLL µUL -t α/2,n-1 +t α/2,n-1 X LL For µLL, t / 2 , n 1 , LL X t / 2,n 1 s n X UL For µLL, t / 2,n 1 , UL X t / 2,n 1 s n s Therefore, CI for X t / 2,n 1 n s n s n CI for Population Mean When σ Is Unknown (Cont.) However, when the sample size is large (n ≥ 30), t values get closer to z values. Also not all t values are available when degrees of freedom is more than 30. Therefore, for convenience’s sake, when n ≥ 30 and σ is unknown, we use z distribution instead. That is, s CI for X t / 2,n 1 n s CI for X z / 2 n ( is unknown and n 30) ( is unknown and n 30) How to obtain t values? 16 We need two parameters: (1) The area at the right of t value (2) Degrees of Freedom = n – 1. α tα Examples on How to Obtain t Values For a t distribution with 20 degrees of freedom, what is the value of the t value such that the following are true? 10% of the area under the t distribution is to the right of the t value. t0.10, 20 = 1.325 0.10 t0.10, 20 10% of the area under the t distribution is to the right of the t value. 0.90 t0.90, 20 = -1.325 17 0.10 - t0.10, 20 = t0.90, 20 5% of the area under the t distribution is to the left of the t value. -t0.05, 20 = -1.725 0.05 - t0.05, 20 0.05 t0.05, 20 Examples on CI for Population Mean When σ Is Unknown A random sample of size 20 is selected from a normally distributed population. The sample mean is 50 and the sample standard deviation is 10. Find a 90% confidence interval for the population mean. α/2 = 0.05 α/2 = 0.05 0.90 0.45 t0.05, 19 = 1.729 X-bar = 50, n = 20, s = 10, tα/2, n-1 = 1.729 18 10 95% CI for X t / 2,n 1 50 1.729 (46.13,53.87) n 20 Examples on CI for Population Mean When σ Is Unknown Find the 95% confidence interval for the mean of a normally distributed population using the following data. 49 50 43 65 52 45 60 α/2 = 0.025 α/2 = 0.025 0.95 0.475 t0.025, 8 = 2.306 X-bar = 464/9 = 51.55, n = 9, s = 9.15, tα/2, n-1 = 2.306 95% CI for X t / 2,n 1 19 n 9.15 51.56 2.306 (44.53,58.59) 9 38 X 49 50 43 65 52 45 60 38 62 X-bar (X - X-bar)2 51.56 6.55 51.56 2.43 51.56 73.27 51.56 180.63 51.56 0.19 51.56 43.03 51.56 71.23 51.56 183.87 51.56 108.99 670.22 X X 2 s2 n 1 62 670.22 83.78 8 s s 2 83.78 9.15 Margin of Error, E, And Determination of the Sample Size The general formula for constructing CI is: CI = statistic ± (critical value) × (standard error of the statistic) CI for X z / 2 n CI = statistic ± (Margin of Error) E z / 2 20 (1) σ is estimated by s. (2) σ is approximated by (H – L)/4. n z / 2 n E z / 2 n E Sample Size for Unknown σ: 2 Examples on Determination of the Sample Size A national retail association wants to estimate the average amount of dollars lost each month due to theft in its member stores. Past records show that the highest and lowest dollar amounts lost due to theft were $1325 and $25, respectively. If it wants to be 95% confident that the error in its estimate is no more than $100, how many stores would need to be included in the sample to produce an estimate of the desired accuracy? 0.475 0.95 - Zα/2 21 Zα/2 = 1.96 n = ?, E = 100, Zα/2 = 1.96, σ ≈ (H – L)/4 = (1325 – 25)/4 = 325 z / 2 (1.96)(325) n 40.58 41 100 E 2 2 Examples on CI for Population Mean When σ Is Unknown A national retail association wants to estimate the average amount of dollars lost each month due to theft in its member stores. Nine of its member stores lost the following dollar amounts last month. If it wants to be 95% confident that the error in its estimate is no more than $5, how many stores would need to be included in the sample to produce an estimate of the desired accuracy? 49 50 43 65 52 45 60 38 62 n = ?, E = 5, Zα/2 = 1.96, σ = ? X X 2 22 X 49 50 43 65 52 45 60 38 62 X-bar)2 X-bar (X 51.56 6.55 51.56 2.43 51.56 73.27 51.56 180.63 51.56 0.19 51.56 43.03 51.56 71.23 51.56 183.87 51.56 108.99 670.22 s2 n 1 670.22 83.78 8 s s 2 83.78 9.15 σ = s = 9.15 z (1.96)(9.15) n /2 12.86 13 5 E 2 2