Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Econ 140 Inference about a Mean Part II Lecture 7 Lecture 7 1 Today’s Plan Econ 140 • Confidence Intervals • Hypothesis testing – Small samples – Large samples • Types of errors • Quick review of what we’ve learned so far Lecture 7 2 What we’ve seen so far Econ 140 • We’ve worked with univariate populations – Recall that we have the standardized normal Z distributed Z ~ N(0,1): Z Y Y • Ask question E(Y)? What is the probability that someone selected at random will have earnings of $300? Lecture 7 3 What we’ve seen so far Econ 140 • Before when we were considering the distribution around y we were considering the distribution of Y • Now we are considering Y as a point estimator for y • The difference is that the distribution for Y has a variance of 2/n where as Y has a variance of 2 • Having obtained an estimate of a parameter (y), and considered the properties of the estimator (BLUE), we need to find out how ‘good’ the estimate is. Estimation is the first side of statistical inference. • The other side of statistical inference: hypothesis testing Lecture 7 4 Confidence Intervals Econ 140 • Recall our picture showing the distributions of Y and Y 2 n 2 y Y Y • You repeatedly take samples from the population and get different estimates of Y – The sampling distribution is the probability distribution for the values that Y takes on in the different samples Lecture 7 5 Confidence Intervals (2) Econ 140 • How do we assign probability bounds on our estimate? • We don’t know what µy is, but we know the sample size and the sample estimates of Y • We can estimate µy give or take some amount of error Y Y allowance for random error • We know that Yis distributed Y ~ N (Y , s 2 n) – We use s2 as an estimate of 2 – Our distribution of Y: 2 n Lecture 7 y 6 Confidence Intervals (3) Econ 140 • Remember: – Large samples: use the Z distribution – Small samples: use the t distribution • We’ll use the Z distribution for this example – Our expression for the Z statistic is Y Y Z Z (s Z (s s n ~ N (0,1) n ) Y Y n ) Y Y Y Y Z ( s Lecture 7 n) 7 Confidence Intervals (4) Econ 140 • We have the standard normal distribution around µy -Z +Z y • We want to describe how much area is between -Z and +Z • We can create a 95% confidence interval around Z Lecture 7 8 Confidence Intervals (5) Econ 140 • We can write the confidence interval as Y Y Pr 1.96 1.96 s n • Where did we get the values -1.96 and +1.96? – Look at the standard normal table – We see that 47.5% of the area under the curve can be found between 0 and 1.96. Or 95% between +/- 1.96 47.5% Lecture 7 -1.96 47.5% 0 +1.96 9 Confidence Intervals (6) Econ 140 • So we can rewrite Y Y Pr 1.96 1.96 s n 1.96s n Y Y 1.96s Y Y 1.96s n n • This is the confidence interval estimate for µy at a 95% level of confidence – You can choose other levels – As you increase the confidence level you increase the number of possible values µy can take Lecture 7 10 Using the t distribution Econ 140 • If we have a small sample, we should use the t distribution • Our t statistic looks like t Y Y s n • What will our confidence interval look like? – We substitute t for Z Y Y t 2 s n • We don’t know the underlying population distribution – But we can use the central limit theorem to assume that the sample distribution is approximately normal – We can use the t distribution to approximate the sample distribution Lecture 7 11 Using the t distribution (2) Econ 140 • We have to choose the confidence interval (1- ) that requires a choice of • The area between the two t values is the confidence interval -t +t Confidence interval y • The usual accepted confidence level is 95% ( = 0.05) Lecture 7 12 Using the t distribution (3) Econ 140 • If (1- ) is the area between the two t values, then () is the sum of the area under the two tails – if =0.95, (1- )=0.05 – 0.05/2 = .025 – So for a 95% confidence level, 0.025 of the area of the curve is found in each tail of the distribution Lecture 7 13 The t table Econ 140 • In the first row, there is an upper number and a lower number – The upper gives you the area in one tail given a two tail test – The lower number gives the area in one tail or in two tails combined • At an infinite number of observations, 2.5% of the area under the curve is found in each of the tails when our t statistic is 1.96 - it approximates the normal • If our sample size is 10, 95% of the area under the t distribution is between -2.228 and +2.228 – Note: the t has fatter tails than the standard normal Lecture 7 14 The t Table Econ 140 • For a small sample size, the t values corresponding to a 95% confidence interval are larger in absolute value than the Z values for the same interval • Depending on 3 things we get a very different approximation of the confidence interval – Sample size – Whether or not we use the population estimate for – These determine the type of distribution we use Lecture 7 15 Hypothesis Testing Econ 140 • We want to ask: – What is the probability that µy is equal to some value? • Using hypothesis testing we can determine whether or not it’s plausible that µy equals a certain value • We have two types of samples – Large: n > 30 – Small: 30 n Lecture 7 16 Large Samples Econ 140 • Large samples – Doesn’t matter if the population distribution is skewed or normal – Doesn’t matter if the population variance is known or unknown – Use the Z table Lecture 7 17 Small Samples Econ 140 • Small samples – If the population is normally distributed and the population variance is known, use either the Z or t table – If the population is normally distributed but the population variance is unknown use the t distribution with n-1 degrees of freedom (calculate the sample variance as an estimate of the population). – If the population is non-normally distributed, use neither the t nor the Z (I will never give you a case like this) Lecture 7 18 Setting Up Hypotheses Econ 140 • In hypothesis testing you set up a null hypothesis H0 • Under the null hypothesis µy will take a particular value – Example: we can create a null such that H° : µy = 300 • Once we have a null hypothesis we can set up an alternative hypothesis H1 Lecture 7 19 One and Two - Tailed Tests Econ 140 • We can represent this in the following graph: s n y 300 • One-tail tests – We calculate the area in the right-hand tail if H1 : µy > 300 – We calculate the area in the left-hand tail if H1 : µy <300 • Two tail test: – Find the area under both tails if H1 : µy 300 Lecture 7 20 Intervals and Regions Econ 140 • We also need to assign a significance level (or confidence interval) • For a two-tailed test we are looking to see if a value of 300 lies within the confidence interval • With hypothesis tests we are creating an acceptance region bound by critical values – Critical values are taken off the Z and t tables – The regions in the tails are the critical regions Lecture 7 21 Intervals and Regions (2) Critical value Critical region /2 Econ 140 Critical value 1- Acceptance Region Critical region /2 • is the significance level • If you fail to reject the null, the Z or t statistic must fall in the acceptance region • If you reject the null, the Z or t must fall in one of the critical regions Lecture 7 22 Types of Errors Econ 140 • Type I errors – Rejecting a hypothesis when it is in fact true – Example: In the confidence interval example we constructed the confidence interval (254 y 380). If the true pop. mean is 400 we can make H0 : y = 400. In this case we’d falsely reject the null hypothesis! • Type II errors – Accepting a false hypothesis – Example: if the true mean is 400 but we accept H0 : y =300 we would be accepting a false hypothesis Lecture 7 23 Types of Errors (2) Econ 140 • Statisticians worry about Type I errors – They choose a significance level that minimizes Type I errors • To minimize Type I errors choose a small , where is the total area in both tails – Thus the area in each tail is /2 Lecture 7 24 Types of Errors (3) Econ 140 • As decreases, the likelihood of rejecting a true null hypothesis also decreases • Most of the time = 5% is used, and /2 = 2.5% • We can say that we accept or reject the null, but we can’t say that we accept the alternative! Lecture 7 25 Hypothesis Testing in General • Null (H0): Y 0 Alternative (H1) Y 0 right tail Y 0 left tail Y 0 Econ 140 two tail Critical Region Z Z Z Y 0 n Y 0 n Y 0 n Z Z Z 2 • If you are using the t instead, replace the Z’s with t’s Lecture 7 26 Where are we now? Econ 140 • So far we have learned about inference and testing hypotheses using assumptions about distributions • Distributions – We had samples and populations and used weights to make inferences about the population using sample statistics – We assumed distributional forms such as the Z or t distributions • Sampling distribution of the mean – You should know the difference between E(Y) and E (Y ) Lecture 7 27 Where are we now? (2) Econ 140 • BLUE: we’ll return to this in the next few lectures • Estimation and hypothesis testing • We now look to return to the regression line and consider the estimators for a and b from: Yˆi a bX i ei Yi Yˆi • Have to consider the properties of the OLS estimator (BLUE), and how do we construct hypothesis tests on the estimates of the parameters a and b? Lecture 7 28