* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 8
Survey
Document related concepts
Transcript
Statistical Inference: Estimation for Single Populations Chapter 8 MSIS 111 Prof. Nick Dedeke PowerPoint presentations prepared by Lloyd Jaisingh, Morehead State University Learning Objectives • Know the difference between point and interval estimation. • Estimate a population mean from a sample mean when s is known. • Estimate a population mean from a sample mean when s is unknown. • Estimate the population variance from a sample variance. • Estimate the minimum sample size necessary to achieve given statistical goals. Concept of Inferential Statistics • In inferential statistics, the objective is to estimate parameters of a large sample using the statistics of a smaller sample drawn from it. Known Statistics: sample mean sample variance z-value Unknown Parameters: Population mean population variance Example: Concept of Estimation • Three managers wanted to investigate absenteeism in their organization. Each of them took a random sample of 2,000 employees. Here are the results: – Bill’s sample yield average of 4 days per year. – Chen’s sample yielded average of 3.2 days per year. – Ayo’s sample yielded average of 3.7 days per year. What should we accept as the average absenteeism for all the 10,000 employees of the firm? Concept of Confidence Level • After one has specified an interval, the question becomes the following: How confident one is that the population parameter will truly lie in the range we define? • This is an area where central limit theorem may help us. • Central limit theorem states that, given a sufficiently large sample size, the distribution of the sample means would be normally distributed. Confidence Level Distribution of the means of all samples drawn from the population. X X X Z 2 X Z Z 2 If we picked three different samples, and calculated the sample means and intervals, we could have the intervals shown above. We see that the three different intervals, of same width, would include the population mean. Confidence Level 95% confidence lines are defined to ensure that the area between mean and z is 0.95/2. The area in the grey area is 0.95. X X X Z X Z 2 2 Z 95% Confidence level means that if one took several different samples from the population, and calculated the sample mean, 95 out of 100 sample means would fall within the area. Z 2 Z 2 Confidence Level and Interval Estimates 40% confidence interval line 60% confidence interval line Z 2 X Z 2 95% confidence interval line Z We see that the three different intervals presented are of different width. Specifically, to have larger confidence, the interval estimate is wider. Narrower interval estimates reduce our confidence that population mean parameter would lie in interval. Known Population Standard Deviation • The following presents two samples that were taken from the same population. In the first case the mean is higher than the population mean in the other case it is lower. xmax xmax xmax X s μ X s xmax xmax xmin Confidence Interval Estimates • Interval estimate approach defines upper and lower limits around the sample mean using confidence levels. If the acceptable mean of population falls within the limits, the population is accepted, if not it is rejected. Confidence interval #1 xmax Confidence interval #2 xmax xmax X s μ X s xmax xmax xmin Inferential Statistics Assumptions • For interferential statistics to be accurate, some assumptions must be fulfilled: – The process that the objects or entities passed through are stable, i.e. the variations in attribute observations are not due to special causes – The sample is statistically drawn from the population. – The sample is large enough to represent the population. – The distribution of values for the attribute of the sample and population could be assumed to be normal. – Having statistical estimates about a population can be reasonably used as a basis for decisionmaking Statistical Estimation • Point estimate -- the single value of a statistic calculated from a sample which is used to estimate a population parameter • Interval Estimate -- a range of values calculated from a sample statistic(s) and standardized statistics, such as the z – Selection of the standardized statistic is determined by the sampling distribution. – Selection of critical values of the standardized statistic is determined by the desired level of confidence. Concept of Inferential Statistics Z statistic can be used if both the sample mean and sample standard deviation and the population standard deviation are known. Known Statistics: sample mean sample variance z-value Unknown Parameters: Population mean known population standard deviation Confidence Interval Estimate for when s is Known x x n • Point estimate • Interval Estimate x z / 2 s n or x z / 2 s n x z / 2 s n Distribution of Sample Means for (1-)% Confidence 2 2 Z 2 0 X Z 2 Z Areas Under Curve: (1-)% Confidence 2 .5 .5 2 2 2 Z 2 0 X Z 2 Z Distribution of Sample Means for (1-)% Confidence 2 1 2 2 1 2 Z 2 0 X Z 2 Z Distribution of Sample Means for 95% Confidence .025 .025 95% .4750 .4750 X Z -1.96 0 1.96 Example: 95% Confidence Interval for (s known) x 510, s 46, n 85, z / 2 1.96 x z / 2 s x z / 2 s n n 46 46 510 1.96 510 1.96 85 85 510 9.78 510 9.78 500.22 519.78 95% Confidence Intervals for 95% X X X X X X X 95% Confidence Intervals for Is our interval, 95% X X X X X X X 500.22 519.78, in the red? Example: Interval Estimates 90% confidence (Text 8.1) x 10.455, s 7.7, n 44. 90% confidence z / 2 1.645 x z / 2 s x z / 2 s n n 7.7 7.7 10.455 1.645 10.455 1.645 44 44 10.455 1.910 10.455 1.910 8.545 12.365 Concept of Inferential Statistics Z statistic can not be used if the population standard deviation is unknown. If distribution is not normal and sample size exceeds 30. We can estimate the Unknown parameter. Parameters: Known Statistics: sample mean sample variance z-value Population mean Population standard deviation Confidence Interval to Estimate and s is Unknown x z /2 s n /2 s x z n or x z /2 s n Car Rental Firm Example x 85.5, sample st. dev. (s) 19.3, and n 110. 99% confidence z 2.575 s xz n 19.3 85.5 2.575 110 85.5 4.7 80.8 s xz n 19.3 85.5 2.575 110 85.5 4.7 90.2 Exercise: Derive Z Values for Common Levels of Confidence Confidence Level 90% 95% P(z/2) z/2 Value ?? 1.96 98% ??? 99% ??? = [0.5 –(1-0.95)/2)] = 0.5 – 0.025 = 0.475 = from page 788 Table A5. z/2 = 1.96 Estimating the Mean of a Normal Population: Unknown s • The population has a normal distribution. • The value of the population standard deviation is unknown. • z distribution is not appropriate for these conditions • t distribution is appropriate The t Distribution • Developed by British statistician, William Gosset • A family of distributions -- a unique distribution for each value of its parameter, degrees of freedom (d.f.) • Symmetric, Unimodal, Mean = 0, Flatter than a z x t • t formula s n Comparison of Selected t Distributions to the Standard Normal Standard Normal t (d.f. = 25) t (d.f. = 5) t (d.f. = 1) -3 -2 -1 0 1 2 3 Table of Critical Values of t df 1 2 3 4 5 t0.100 t0.050 t0.025 t0.010 t0.005 3.078 1.886 1.638 1.533 1.476 6.314 2.920 2.353 2.132 2.015 12.706 4.303 3.182 2.776 2.571 31.821 6.965 4.541 3.747 3.365 63.656 9.925 5.841 4.604 4.032 1.714 25 1.319 1.318 1.316 1.708 2.069 2.064 2.060 2.500 2.492 2.485 2.807 2.797 2.787 29 30 1.311 1.310 1.699 1.697 2.045 2.042 2.462 2.457 2.756 2.750 40 60 120 1.303 1.296 1.289 1.282 1.684 1.671 1.658 1.645 2.021 2.000 1.980 1.960 2.423 2.390 2.358 2.327 2.704 2.660 2.617 2.576 23 24 1.711 t With df = 24 and = 0.05, t = 1.711. Confidence Intervals for of a Normal Population: Unknown s x t / 2,n 1 s n or x t / 2,n 1 df n 1 s s x t / 2,n 1 n n Solution for Demonstration Problem 8.3 x 2.14, s 1.29, n 14, df n 1 13 1 .99 0.005 2 2 t .005,13 3.012 s x t / 2, n 1 n 1.29 2.14 3.012 14 2.14 1.04 1.10 s x t / 2, n 1 n 1.29 2.14 3.012 14 2.14 1.04 3.18 Determining Sample Size when Estimating • z formula z x s n • Error of Estimation (tolerable error) • Estimated Sample Size E x z s n E 2 2 2 • Estimated s s 1 range 4 2 z s E 2 2