Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Econ 140 Univariate Populations Lecture 3 Lecture 3 1 Today’s Plan Econ 140 • Univariate statistics - distribution of a single variable • Making inferences about population parameters from sample statistics - (For future reference: how can we relate the ‘a’ and ‘b’ parameters from last lecture to sample data) • Dealing with two types of probability – ‘A priori’ classical probability – Empirical classical Lecture 3 2 A Priori Classical Probability Econ 140 • Characterized by a finite number of known outcomes • The expected value of Y can be defined as Y E Y Yk pk k • The expected value will always be the mean value µY is the population mean Y is the sample mean • The outcome of an experiment is a randomized trial Lecture 3 3 Flipping Coins Econ 140 • Example: flipping 2 fair coins – Possible outcomes are: HH, TT, HT, TH – we know there are only 4 possible outcomes – we get discreet outcomes because there are a finite number of possible outcomes – We can represent known outcomes in a matrix One coin Two coins Number of Heads T T 0 T H 1 H T 1 H H 2 Lecture 3 4 Flipping Coins (2) Econ 140 m • The probability of some event A is Pr( A) n – where m is the number of events keeping with event A and n is the total number of possible events. – If A is the number of heads when flipping 2 coins we can represent the probability distribution function like this: Number of Probability Distribution Heads 0 1 2 Lecture 3 Function (PDF) 0.25 = 1/4 0.50 = 1/2 0.25 = 1/4 5 Flipping Coins (3) Econ 140 • If we graph the PDF we get Probability Distribution Function Probability 1.00 0.75 0.50 0.25 0.00 -1 0 1 2 3 Num ber of Heads • The expected value is Y E Y Yk pk = 0(0.25) + 1(0.5) + 2(0.25) • Lecture 3 k 6 Empirical Classical Probability Econ 140 • Characterized by an infinite number of possible outcomes • With empirical classical probability, we use sample data to make inferences about underlying population parameters – Most of the time, we don’t know what the population values are, so we need to use a sample • Example: GPAs in the Econ 140 population – We can take a sample of every 5th person in the room – Assuming that our sample is random, we’ll have a representative sample of the population Lecture 3 7 Empirical Classical Probability Econ 140 • Statisticians/economists collect sample data for many other purposes • CPS is another example: sampling occurs at the household level • CPS uses weights to correct data for oversampling – Over-sampling would be if we picked 1 in 3 in front of the room and only 1 in 5 in the back of the room. In that case we would over-sample the front – There’s a spreadsheet example on the course website (the weighted mean is our best guess of the population mean, whereas the unweighted mean is the sample mean) Lecture 3 8 Empirical Classical Probability Econ 140 • On the course website you’ll find an Excel spreadsheet that we will use to calculate the following: – Expected value – PDF and CDF – Weights to translate sample data into population estimates – Examine the difference between the sample (unweighted) mean and the estimated population (weighted) mean: Weighted mean = sum(EARNWKE*EARNWT)/sum(EARNWT) • This approximates the population mean estimate Lecture 3 9 Empirical Classical Probability(3)Econ 140 • So how do we construct a PDF for our spreadsheet example? – Pick sensible earnings bands (ie 10 bands of $100) – We can pick as many bands was we want - the greater the number of bands, the more accurate the shape of the PDF to the ‘true population’. More bands = more calculation! Lecture 3 10 Empirical Classical Probability(2)Econ 140 • Constructing PDFs: – Count the number of observations in each band to get an absolute frequency – Use weights to translate sample frequencies into estimates of the population frequencies – Calculate relative frequencies for each band by dividing the absolute frequency for the band by the total frequency Lecture 3 11 Empirical Classical Probability(4)Econ 140 – An alternative way to approximate the PDF: avg of weights within band Avg weights avg of all weights – When we have k bands, always check: k pk 1 if the probabilities don’t sum to 1, we’ve made a mistake! Lecture 3 12 Empirical Classical Probability(5)Econ 140 • Going back to our expected value… The expected value of Y will be: E Y Y k p k k – The pk are frequencies and they can be weighted or not – The Yk are the earnings bands midpoints (50, 150, 250, and so on in the spreadsheet) • From our spreadsheet example our weighted mean was $316.63 and the unweighted mean was $317.04 – Since the sample is so large, the is little difference between the sample (unweighted) mean and the population (weighted) mean Lecture 3 13 Empirical Classical Probability(6)Econ 140 • We can also calculate the weighted and unweighted expected values: E(Weighted value): $326.85 E(Unweightedvalue: $327.31 • Why are the expected values different from the means? – We lose some information (bands for the wage data) in calculating the expected values! • So why would we want to weight the observations? – With a small sample of what we think is a large population, we might not have sampled randomly. We use weights to make the sample more closely resemble the population. Lecture 3 14 Empirical Classical Probability(7)Econ 140 • The mean is the first moment of distribution of earnings • We may also want to consider how variable earnings are – we can do this by finding the variance, or standard error • Calculate the variance – In our example, the unweighted variance is: 2 Yk Y 2 pk 30,353.78 – The weighted variance is 29730.34 – The difference between the two is 623.44 Lecture 3 15 Empirical Classical Probability(8)Econ 140 The weighted PDF is pink It’s tough to see, but the weighting scheme makes the population distribution tighter Lecture 3 16 Empirical Classical Probability(9)Econ 140 • We can use our PDF to answer: – What is the probability that someone earns between $300 and $400? • But we can’t use this PDF to answer: – What is the probability that someone earns between $253 and $316? • Why? – The second quesiton can’t be answered using our PDF because $253 and $316 fall somewhere within the earnings bands, not at the endpoints Lecture 3 17 What we’ve done Econ 140 • ‘A priori’ empirical classical probability – There are a finite number of possible outcomes – Flipping coins example • Empirical classical probability – There are an infinite number of possible outcomes – Difference between sample and population means – Difference between sample and population expected values – Difference in calculating PDF’s of a Univariate population. Lecture 3 18