Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript

Chapter 8 Estimation Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama Statistical Inference Drawing inference about the unknown population parameter based on the information from a sample. Statistical inference is studied in two parts: Estimation and Testing of Hypothesis. Estimation When we can not perform a census, we can not know the value of the population parameter. For example US census bureau may want to find the average expenditure per month incurred by a household in eating outside. Population average is unknown. So we collect a sample and assign the value(s) based on sample values This is estimation Examples of Estimation 1. Since we do not know the average expenditure per week on outside eating, we collect a sample and compute a sample mean and assign the value of sample mean to unknown population mean. 2. We do not know the proportion p of all the smokers in united states so we collect a sample of people and compute the sample proportion x/n Estimation procedure • Select a sample • Collect the required information from the members of the sample. • Calculate the value of sample statistics Assign the value(s) to the corresponding population parameter. An estimator may be a point estimator or interval estimator Point Estimation The single value of a sample statistics is called point estimate of the corresponding population parameter. Example: When population mean is unknown th en the sample mean x is an estimator of we write ˆ x sample mean is a point estimator of the population mean when population proportion p is unknown th en the sample proportion p̂ x/n is an estimator of p p̂ is a point estimator of p. Point Estimation Consider the problem of finding the average GPA of all students at USA (around 13,000) Let be the population average, we do not know the value Collect a sample of size say n=80 students. Collect their GPAs. Compute the sample mean. Suppose the sample mean is 3.04 x is called estimator of 3.04 is an estimate of we collect another sample of size n=100 and compute the sample mean. Suppose the sample mean is 2.99 Then 2.99 is another estimate of Error in the estimation In the last chapter we have seen that there is a difference between value of and the value of x is the error in estimation. Margin of error = 1.96 x x Interval Estimator To estimate an unknown value, instead of using a single point value, we use an interval of the values. This is called interval estimation. In interval estimation, an interval is constructed around the point estimate. It is said that this interval is likely to contain unknown value of population parameter Question : how likely? Can we assign likelihood to our statement? Interval estimator Consider the estimation of the population mean of GPA of the students at USA After collecting a sample we computed the sample mean . Suppose sample mean x = 3.05 We add and subtract a number from x and ask the question: how confident are we that the interval contains unknown value of 3.05+.15 = 3.20 and 3.05-.15 = 2.90 What is the probability that value of lies between 3.20 and 2.90? Questions: what number should be added and subtracted? How to attach probability (confidence level) with an interval? Interval estimation Confidence level and Confidence interval: Confidence level associated with an interval states how much confidence we have that this interval contains the true value of the population parameter. Such an interval is called confidence interval Confidence level is denoted by (1-)% Interval estimation of Recall that x is a point estimator of and from chapter 7 that x ~ N(, x ) =N(,/n) whenever the sample size is large. For large samples, the (1-α)*100% confidence interval for µ is given by x z x if is known x zs x if is unknown recall x / n and s x s / n value of z is read from z - table for given confidence level E z x or zsx is called maximum error estimate for Confidence interval For large samples x z x if is known x zs x if is unknown recall x / n and s x s / n value of z is read from z - table for given confidence level Question: How to compute z for given (1-α) in the above formula? For a 95% confidence interval (1-α) = .95 α = .05, α/2 = .025 thus z = 1.96 Similarly we can compute z α/2 α/2 (1-α) for 99%, 98% etc. confidence levels -4 -3 -2 -z -1 0 1 2 z 3 4 Example of Confidence Interval 8.11 Given n=64, x = 24.5, and s = 3.1 a. Point estimate of µ is 24.5 b. Margin of error associated with the point estimate of µ is 1.96 x =1.96*s/√n = 1.96*3.1/8 =.7595 c. 99% confidence interval for µ is x zs x = 24.5± z* 3.1/8 To compute z, 1-α=.99 , α=.01, α/2 = .005 z=2.58 Thus confidence interval is given by 24.5± 2.58* 3.1/8 = (24.5± .99975) =(23.50025,25.49975) d. Maximum error of estimate is .99975 Interpretation of Confidence Interval In the earlier example we constructed a 99% confidence interval for µ, which is =(23.50025,25.49975) This means that we are 99% confident that the unknown value of µ lies between 23.50025 and 25.49975 This does not mean that the interval contains µ with probability .99 This means that if we draw all possible samples of size 64 from the given population, then 99% of all such intervals will contain the value of µ. Interpretation of Confidence Interval Recall the formula for confidence interval for µ x z x x z n Note the following; • The values in the interval depend on the sample chosen • The width of the interval is 2 z n • A narrow interval is a better interval • The width depends on – Z-value which in turn depends on confidence level – Size of the sample These are the two quantities which we can control. To decrease the width of the interval – Lower the confidence level (not a good choice) – Increase the sample size . Application: Ex. 8.22 X= amount of time spent/week online by mothers with children under age 18. n=1000 x = 16.87 hrs, s = 3.2 To construct 95% confidence interval for µ. It’s a large sample, so formula to construct such interval is x zs x = 16.87± z *3.2/√1000 = 16.87± 1.96 *3.2/√1000 = 16.87± .1983 =(16.6717 ,17.06833) Interpretation: If we draw a large number of samples each of size 1000, and construct a confidence interval corresponding to each sample, then 95% of all such intervals will trap the true value of µ Small samples case Objective: To construct confidence interval for µ when the sample is small. T-distribution is used to construct a confidence interval for µ if 1. The population from which sample is drawn is approximately normal 2. Sample size is small 3. Population standard deviation σ is unknown. Formula is The (1 - )% conficence interval for is x ts x where s x s n the value of t is obtained from the t - table for n - 1 degrees of freedom and give level of confidence What is a t-distribution? •A specific bell shaped sampling distribution •Only parameter is (n-1) where n is size of the sample •(n-1) is called degrees of freedom •Shape depends on degrees of freedom (n-1) •t-distribution approaches to standard normal for larger values of n •Values of t are tabulated for different degrees o freedom and right tails. Picture borrowed from:http://www.aiaccess.net/tutor_demo/tutor_t_1.htm Exercise 8.39,.40,.41 8.39(a) Area in the right tail = .05, df =12 From the t-table value of t =1.782 4.40(a) Given that n= 21, area in the left tail is .10 Here df = n-1 = 20 and since t-curve is symmetric first we find t-value for area in the right tial = .10 and then assign a negative sign for the required value For df=20 and area in right tail =.1, t=1.325 For df=20 and area in left tail = .1, t= -1.325 4.41(a) Given that t-value = 2.467 and df= 28, to find the area in right tail In the t-table in the first column look for 28. then in the row of 28, look for a t-value =2.467, find the corresponding area in the top row. = .01 http://lib.stat.cmu.edu/DASL / Exercise 8.43(a) Given confidence level = 99% 1-α = .99 α= .01 α/2 = .005 Also given that df = 13 Thus from table for df=13 and α/2 = .005 t-value = 3.012 Exercise 8.49 X= time spent in waiting in a line to…. Assumption X~ N(µ,σ) both unknown Draw a sample of size n=16 Computed x = 31, s = 7 minutes To construct a 99% CI for µ Note that 1. Population is approximately normal 2. Population standard deviation is unknown 3. Sample size is small Then formula for CI is x ts x where s x s n =31±t*7/√16 = 31±t*7/4 Computation of t-value α/2=.005 df=n-1 = 15 thus from table t= 2.947 Exercise 8.49 continued