Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Econ 140 Univariate Populations Lecture 3 Lecture 3 1 Today’s Plan Econ 140 • Univariate statistics - distribution of a single variable • Making inferences about population parameters from sample statistics - (For future reference: how can we relate the ‘a’ and ‘b’ parameters from last lecture to sample data) • Dealing with two types of probability – ‘A priori’ classical probability – Empirical classical Lecture 3 2 A Priori Classical Probability Econ 140 • Characterized by a finite number of known outcomes • The expected value of Y can be defined as Y E Y Yk pk k • The expected value will always be the mean value µY is the population mean Y is the sample mean • The outcome of an experiment is a randomized trial Lecture 3 3 Flipping Coins Econ 140 • Example: flipping 2 fair coins – Possible outcomes are: HH, TT, HT, TH – we know there are only 4 possible outcomes – we get discreet outcomes because there are a finite number of possible outcomes – We can represent known outcomes in a matrix One coin Two coins Number of Heads T T 0 T H 1 H T 1 H H 2 Lecture 3 4 Flipping Coins (2) Econ 140 m • The probability of some event A is Pr( A) n – where m is the number of events keeping with event A and n is the total number of possible events. – If A is the number of heads when flipping 2 coins we can represent the probability distribution function like this: Number of Probability Distribution Heads 0 1 2 Lecture 3 Function (PDF) 0.25 = 1/4 0.50 = 1/2 0.25 = 1/4 5 Flipping Coins (3) Econ 140 • If we graph the PDF we get Probability Distribution Function Probability 1.00 0.75 0.50 0.25 0.00 -1 0 1 2 3 Num ber of Heads • The expected value is Y E Y Yk pk = 0(0.25) + 1(0.5) + 2(0.25) • Lecture 3 k 6 Empirical Classical Probability Econ 140 • Characterized by an infinite number of possible outcomes • With empirical classical probability, we use sample data to make inferences about underlying population parameters – Most of the time, we don’t know what the population values are, so we need to use a sample • Example: GPAs in the Econ 140 population – We can take a sample of every 5th person in the room – Assuming that our sample is random (that Econ 140 does not sit in some systematic fashion), we’ll have a representative sample of the population Lecture 3 7 Empirical Classical Probability Econ 140 • Statisticians/economists collect sample data for many other purposes • CPS is another example: sampling occurs at the household level • CPS uses weights to correct data for oversampling – Over-sampling would be if we picked 1 in 3 in front of the room and only 1 in 5 in the back of the room. In that case we would over-sample the front – There’s a spreadsheet example on the course website (the weighted mean is our best guess of the population mean, whereas the unweighted mean is the sample mean) Lecture 3 8 Empirical Classical Probability Econ 140 • On the course website you’ll find an Excel spreadsheet that we will use to calculate the following: – Expected value – PDF and CDF – Weights to translate sample data into population estimates – Examine the difference between the sample (unweighted) mean and the estimated population (weighted) mean: Weighted mean = sum(EARNWKE*EARNWT)/sum(EARNWT) • This approximates the population mean estimate Lecture 3 9 Empirical Classical Probability(3)Econ 140 • So how do we construct a PDF for our spreadsheet example? – Pick sensible earnings bands (ie 10 bands of $100) – We can pick as many bands as we want - the greater the number of bands, the more accurate the shape of the PDF to the ‘true population’. More bands = more calculation! Lecture 3 10 Empirical Classical Probability(2)Econ 140 • Constructing PDFs: – Count the number of observations in each band to get an absolute frequency – Use weights to translate sample frequencies into estimates of the population frequencies – Calculate relative frequencies for each band by dividing the absolute frequency for the band by the total frequency Lecture 3 11 Empirical Classical Probability(4)Econ 140 – A weighted way to approximate the PDF: avg of weights within band Avg weights avg of all weights – When we have k bands, always check: k pk 1 if the probabilities don’t sum to 1, we’ve made a mistake! Lecture 3 12 Empirical Classical Probability(5)Econ 140 • Going back to our expected value… The expected value of Y will be: E Y Y k p k k – The pk are frequencies and they can be weighted or not – The Yk are the earnings bands midpoints (50, 150, 250, and so on in the spreadsheet) • From our spreadsheet example our weighted mean was $316.63 and the unweighted mean was $317.04 – Since the sample is so large, there is little difference between the sample (unweighted) mean and the population (weighted) mean Lecture 3 13 Empirical Classical Probability(6)Econ 140 • We can also calculate the weighted and unweighted expected values: E(Weighted value): $326.85 E(Unweighted value: $327.31 • Why are the expected values different from the means? – We lose some information (bands for the wage data) in calculating the expected values! • So why would we want to weight the observations? – With a small sample of what we think is a large population, we might not have sampled randomly. We use weights to make the sample more closely resemble the population. Lecture 3 14 Empirical Classical Probability(7)Econ 140 • The mean is the first moment of distribution of earnings • We may also want to consider how variable earnings are – we can do this by finding the variance, or standard error • Calculate the variance – In our example, the unweighted variance is: 2 Yk Y 2 pk 30,353.78 – The weighted variance is 29730.34 – The difference between the two is 623.44 Lecture 3 15 Empirical Classical Probability(8)Econ 140 The weighted PDF is pink It’s tough to see, but the weighting scheme makes the population distribution tighter Lecture 3 16 Empirical Classical Probability(9)Econ 140 • We can use our PDF to answer: – What is the probability that someone earns between $300 and $400? • But we can’t use this PDF to answer: – What is the probability that someone earns between $253 and $316? • Why? – The second question can’t be answered using our PDF because $253 and $316 fall somewhere within the earnings bands, not at the endpoints Lecture 3 17 Standard Normal Curve Econ 140 • We need to calculate something other than our PDF, using the sample mean, the sample variance, and an assumption about the shape of the distribution function • Examine the assumption later • The standard normal curve (also known as the Z table) will approximate the probability distribution of almost any continuous variable as the number of observations approaches infinity Lecture 3 18 Standard Normal Curve (2) Econ 140 • The standard deviation (measures the distance from the mean) is the square root of the variance: 2 68% area under curve 95% 99.7% 3 Lecture 3 2 y 2 3 19 Standard Normal Curve (3) Econ 140 • Properties of the standard normal curve – The curve is centered around y – The curve reaches its highest value at y and tails off symmetrically at both ends – The distribution is fully described by the expected value and the variance • You can convert any distribution for which you have estimates of y and 2 to a standard normal distribution Lecture 3 20 Standard Normal Curve (4) Econ 140 • A distribution only needs to be approximately normal for us to convert it to the standardized normal. • The mass of the distribution must fall in the center, but the shape of the tails can be different or 2 1 y Lecture 3 21 Standard Normal Curve (5) Econ 140 • If we want to know the probability that someone earns at most $C, we are asking: PY C ? (Y ) C We can (Y ) C rearrange P ( Z C*) ? terms to get: where Z (Y ) • Properties for the standard normal variate Z: – It is normally distributed with a mean of zero and a variance of 1, written in shorthand as Z~N(0,1) Lecture 3 22 Standard Normal Curve (5) Econ 140 • If we have some variable Y we can assume that Y will be normally distributed, written in shorthand as Y~N(µ,2) • We can use Z to convert Y to a normal distribution • Look at the Z standardized normal distribution handout – You can calculate the area under the Z curve from the mean of zero to the value of interest – For example: read down the left hand column to 1.6 and along the top row to .4 you’ll find that the area under the curve between Z=0 and Z=1.64 is 0.4495 Lecture 3 23 Standard Normal Curve (6) Econ 140 • Going back to our earlier question: What is the probability that someone earns between $300 and $400 [P(300Y 400)]? 316.6 Z1 Z2 2 25608 25608 160 P(300Y 400) 300 316.6 0.104 160 300 316.6 400 316.6 Z 400 0.52 160 P (0.104 Z 0) 0.0418 P (0 Z 0.52) 0.1985 P (0.104 Z 0.52) 0.0418 0.1985 .2403 Z300 Lecture 3 400 24 What we’ve done Econ 140 • ‘A priori’ empirical classical probability – There are a finite number of possible outcomes – Flipping coins example • Empirical classical probability – There are an infinite number of possible outcomes – Difference between sample and population means – Difference between sample and population expected values – Difference in calculating PDF’s of a Univariate population. • Use of standard normal distribution. Lecture 3 25