* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download a normal distribution
Survey
Document related concepts
Transcript
Sampling: Distribution of the Sample Mean (Sigma Known) o If a population follows the normal distribution o Population is represented by X1,X2,…,XN. o The distribution of an X is Normal with mean μ and standard deviation σ. o X~Norm(μ, σ) => X1~Norm(μ, σ), X2~Norm(μ, σ),… o The Sample Mean isX ( X1 X 2 ... X n ) / n , and n < N. o The distribution of the sample Mean is: X ~ Norm( , / n ) Sampling: Distribution of the Sample Mean (Sigma Known) o If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution. o To determine the probability a sample mean falls within a particular region, use: X z n Note that: n is called the Standard Error of the Mean. Sampling: Distribution of the Sample Mean (Sigma Unknown) o If the population does not follow the normal distribution, but the sample is of at least 30 observations, the sample means will follow the normal distribution. o To determine the probability a sample mean falls within a particular region, use: X z s n Example: Suppose that X has a distribution with µ= 15 and σ= 14. a) If a random sample of size n=49 is drawn, find P (15 X 17) b) If a random sample of size n=64 is drawn, find P (15 X 17) c) x and x and x and x and Why should you expect the probability of part b to be higher than part a? a) x 15 14 x 2 49 When X = 15, z = 0 (since mean=15) When X = 17, z = (17-15)/2 = 1 P (15 X 17) P (0 z 1) 0.3413 b) x 15 14 x 1.75 64 When X = 15, z = 0 When X = 17, z = (17-15)/ 1.75 = 1.14 P (15 X 17) P (0 z 1.14) 0.3729 c) In a larger sample there is less variability. Therefore, an increase in sample size means an increase in probability of the sample mean being close to the population mean. Point Estimate Definition: The statistic computed from sample information and used to estimate the population parameter. Examples: Sample mean, X is a point estimate for the population mean, µ Sample standard error s is a point estimate of population standard deviation σ Sample proportion p is a point estimate of population proportion π Confidence Interval Definition: A range of values constructed from sample data so that the population parameter is likely to occur within that range at a specified probability. The specified probability is called the level of confidence. Ex. We are 90% sure that the mean yearly income of construction workers in the New York area is between $61,000 and $69,000. Confidence Interval (Sigma Known) What if we know that a population has a normal distribution (or the sample size is at least 30) with known standard deviation σ, but the mean µ of the populations is unknown? If the mean of a sample of size n is X then we can say that we are certain with K% level of confidence that the mean µ falls within the interval: X z P( X z n n where X z n )K Confidence Interval (Sigma Known) The CI is given by: [X z n ,X z n ] X , σ and n are known. We can obtain z from Appendix B.1 by looking for a value of z that satisfies: (Area from 0 to z) = K/2 = (level of confidence)/2 Confidence Interval (Sigma Known) Proof, P( X z P( z n n X z X z n n )K )K X X P( z z ) 2 P (0 z) K / n / n X P(0 z) K / 2 / n Confidence Interval (Sigma Known) Example(ex2, page 301): A sample of 81 observations is taken from a normal distribution with a SD of 5 and a sample mean of 40. Determine the 95% CI. X z n X 40 , σ=5 and n=81. (Area from 0 to z) =K/2=0.95/2=0.475 Using Appendix B.1 (Area from 0 to 1.96)=0.475 => z=1.96. CI [ X z ,X z n [38.911, 41.088] n ] [40 1.96 5 5 , 40 1.96 ] 81 81 Confidence Interval When looking for a value for z in the expression, X z n 95% of the sample means selected from a population will be within 1.96 SD’s of the population mean µ. (The z value for a confidence level of 95% is 1.96). 99% of the sample means will lie within 2.58 SD’s of the population mean. (The z value for a confidence level of 99% is 2.58). Confidence Interval How did we get the 1.96 and the 2.58 for the 95% and 99% confidence intervals? For the 95% CI: Probability area is 0.95/2=0.475. In Appendix B.1, the z value for .475 is 1.96. Use same reasoning and calculations for the 99% CI. Probability area is 0.99/2=0.495. In Appendix B.1, the z value for .495is 2.58. Confidence Interval Example: we get a sample (>30) of recent college graduates and compute the sample mean annual starting salary. The mean is $39,000. The SD (or the standard error) is $200. The 95% CI lies between what values? The confidence limits are: $39,000 ±1.96($200); ($38,608 and $39,392). Standard Error Estimation When population SD is known: When population SD is unknown: X n s sX n Confidence Interval for the Population Mean if SD σ known or n ≥30 X z σ known n s X z n n ≥30 Unknown Population SD & a Small Sample This situation is not covered by the central limit theorem. However, we can reason that the population is normal or reasonably close to a normal distribution. Under these conditions, we replace the standard normal distribution with the t distribution. The t distribution is a continuous distribution with many similarities to the standard normal distribution. The Student’s t Distribution Developed in the early 1900’s by William Gosset Also called “Student’s” distribution. Gosset was concerned with the behavior of the term: X t s n Gosset was worried about the discrepancy between s and σ when s was calculated from a very small sample. Characteristics of the t Distibution It is, like the z distribution, a continuous distribution. It is, like the z distribution, bell-shaped and symmetrical. There is a family of t distributions. They all have a mean of 0. The SD differs according to sample size (SD larger with smaller n). It is more spread out and flatter at the center than the standard normal distribution. As sample size increases, t distribution becomes closer to the standard normal distribution, because the errors in using s to estimate σ decrease with larger samples. Confidence Interval for the Population Mean, σ unknown s X t n When to Use the t Distribution Is the population normal? No Yes Is the population SD known? Is n 30 or more? No Use a nonparametric test Yes Use the z distribution No Use the t distribution Yes Use the z distribution Example: A tire manufacturer would like to investigate the tread life of its tires. A sample of 10 tires driven 50,000 miles revealed a sample mean of .32 inch of tread remaining with a SD of .09 inch. Construct a 95% CI for the population mean. We assume the population distribution is normal. We assume the population distribution is normal. Since n=10, we use the formula To find the value of t, we use Appendix B.2. Locate the 95% CI column. Move down to df of 9 (10-1). The value in the cell is 2.262. Substitute the values in the above formula: s X t n 0.09 0.32 2.262 0.32 0.064 (0.256, 0.384) 10 It is reasonable to conclude that the population mean is in this interval. The manufacturer can be 95% confident that the mean remaining tread depth is between 0.256 and 0.384 inches. Appendix B.2, page 785 Confidence Interval, c df 90% 95% 98% 6 7 8 9 2.776 1.833 2.262 2.821 10 degrees of freedom = df= n-1 = 10-1=9 99% Example(ex12, page309): The ASPA want to estimate the mean yearly sugar consumption. A sample of 16 people reveals the mean yearly consumption to be 60 pounds with a standard deviation of 20 pounds. a) What is the value of the population mean? What is the best estimate for this value? The population mean is unknown, but the best estimate is 60, the sample mean. b) Explain why we need to use the t distribution. What assumption do we need to make? Use the t distribution as the standard deviation is unknown and the sample size is small. However, assume the population is normally distributed. c) For a 90 % confidence interval, what is the value of t ? 1.753, is obtained from Appendix B.2 for a CI of 90% and df=16-1=15. c) Develop the 90% confidence interval for the population mean. Between 51.235 and 68.765, found by 20 60 1.753 16 d) Would it be reasonable to conclude that the population mean is 63 pounds? That value is reasonable because it is inside the interval. Sample Size for Estimating Population Mean zs n E 2 n is the sample size; z is the standard normal value corresponding to the desired level of confidence; s is an estimate of the population SD; E is the maximum allowable error (1/2 length of the CI). If the result is not a whole number, round up. Choosing an Appropriate Sample Size 2. The maximum allowable error E Is the amount added and subtracted to the sample mean to determine the endpoints for the CI. It is the amount of error the researchers are willing to tolerate. A small allowable error will require a large sample. A large allowable error will permit a small sample size. Choosing an Appropriate Sample Size 3. The population SD If the population is widely dispersed, a large sample is required. If the population is concentrated (homogeneous), the required sample size will be smaller. It may be necessary to estimate the population SD. Example: A student wants to determine the mean amount of earnings per month of city council members. The error in estimating the mean is to be less than $100 with a 95% level of confidence. The student found a report by the Department of Labor that estimated the SD to be $1,000. What is the required sample size? E=100, z=1.96 and s=1000 Example (cont’d) The maximum allowable error, E, is $100. The value of z for a 95% level of confidence is 1.96, and the estimate for the SD is $1,000. substituting these values in the formula: 2 2 zs (1.96)($ 1,000) 2 n (19.6) 384.16 $100 E A sample of 385 is required to meet the specifications. Example (cont’d) What if the student wanted to increase the level of confidence to 99%? The corresponding z value is 2.58. 2 2 zs (2.58)($ 1,000) 2 n (25.8) 665.64 $100 E The recommended sample size is now 666. Notice the change in the required sample size for the different levels of confidence. There is an increase of 281 observations. This could greatly increase the cost and the time of the study. Therefore, the level of confidence should be considered carefully. Example (cont’d) What if the student wanted to increase the level of confidence to 99%? The corresponding z value is 2.58. 2 2 zs (2.58)($ 1,000) 2 n (25.8) 665.64 $100 E The recommended sample size is now 666. Notice the change in the required sample size for the different levels of confidence. There is an increase of 281 observations. This could greatly increase the cost and the time of the study. Therefore, the level of confidence should be considered carefully. Proportion The fraction, ratio, or percent indicating the part of the sample or the population having a particular trait of interest. Example: A recent survey indicated that 92 out of 100 surveyed favored the continued use of daylight savings time in the summer. The sample proportion is 92/100, or .92, or 92%. Assumptions for Proportion CI Construction 1. 2. The binomial conditions have been met: a. Sample data is a result of counts. b. There are only 2 possible outcomes (Success and Failure). c. The probability of a success remains the same from one trial to the next. d. The trials are independent. The values nπ and n(1-π) should be both ≥5. (π is the population proportion) so that we can use the CLT (z-distribution) Sample Proportion X p n If π is the population proportion, then p is a point estimator for π. Confidence Interval for a Population Proportion p z p Standard Error of the Sample Proportion p p(1 p) n Confidence Interval for a Population Proportion p(1 p) pz n Example: The union representing ABC company is considering a merger with Teamsters Union. According to ABC union bylaws, at least three-fourth of the union membership must approve any merger. A random sample of 2,000 current ABC members reveal 1,600 plan to vote for the merger proposal. What is the estimate of the population proportion? Develop a 95% confidence interval for the population proportion. Basing your decision on this sample information, can you conclude that the necessary proportion of ABC members favor the merger? Why? Sample size is N=2000, Number that approve the merger is X=1600. The sample proportion p=X/N=1600/2000 = 0.8. We determine the 95% CI. The z value is 1.96. p(1 p ) .80(1 .80) pz .80 1.96 .80 .018 n 2,000 Example (cont’d) The endpoints are .782 and .818. The lower limit is greater than .75. So, we conclude that the merger proposal will likely pass because the interval estimate includes values greater than 75% of the union membership. Sample Size for the Population Proportion Three items need to be specified: 1. The desired level of confidence. 2. The margin of error in the population proportion. 3. An estimate of the population proportion. z n p (1 p ) E 2 If an estimate of π is not available, use p=0.5 to approximately estimate the sample size. Example: A group of students want to estimate the proportion of cities with subsidized transportation systems. They want the estimate to be within .10 of the population proportion. The desired level of confidence is 90%. No estimate for the population proportion is available. What is the required sample size? E= .10 The level of confidence is 90%. The corresponding z value is 1.65. No estimate for p is available, so we use .50. 2 2 z 1.65 n (1 ) (.5)(1 .5) 68.0625 E .10 Round up, so a random sample of 69 cities is needed.