Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Econ 140
Univariate Populations
Lecture 3
Lecture 3
1
Today’s Plan
Econ 140
• Univariate statistics - distribution of a single variable
• Making inferences about population parameters from
sample statistics - (For future reference: how can we relate
the ‘a’ and ‘b’ parameters from last lecture to sample data)
• Dealing with two types of probability
– ‘A priori’ classical probability
– Empirical classical
Lecture 3
2
A Priori Classical Probability
Econ 140
• Characterized by a finite number of known outcomes
• The expected value of Y can be defined as
Y  E Y    Yk  pk
k
• The expected value will always be the mean value
µY is the population mean
Y is the sample mean
• The outcome of an experiment is a randomized trial
Lecture 3
3
Flipping Coins
Econ 140
• Example: flipping 2 fair coins
– Possible outcomes are:
HH, TT, HT, TH
– we know there are only 4 possible outcomes
– we get discreet outcomes because there are a finite
number of possible outcomes
– We can represent known outcomes in a matrix
One coin Two coins Number of Heads
T
T
0
T
H
1
H
T
1
H
H
2
Lecture 3
4
Flipping Coins (2)
Econ 140
m
• The probability of some event A is Pr( A) 
n
– where m is the number of events keeping with event A
and n is the total number of possible events.
– If A is the number of heads when flipping 2 coins we
can represent the probability distribution function like
this:
Number of
Probability Distribution
Heads
0
1
2
Lecture 3
Function (PDF)
0.25 = 1/4
0.50 = 1/2
0.25 = 1/4
5
Flipping Coins (3)
Econ 140
• If we graph the PDF we get
Probability Distribution Function
Probability
1.00
0.75
0.50
0.25
0.00
-1
0
1
2
3
Num ber of Heads
• The expected value is
Y  E Y    Yk  pk = 0(0.25) + 1(0.5) + 2(0.25)
•
Lecture 3
k
6
Empirical Classical Probability
Econ 140
• Characterized by an infinite number of possible outcomes
• With empirical classical probability, we use sample data to
make inferences about underlying population parameters
– Most of the time, we don’t know what the population
values are, so we need to use a sample
• Example: GPAs in the Econ 140 population
– We can take a sample of every 5th person in the room
– Assuming that our sample is random, we’ll have a
representative sample of the population
Lecture 3
7
Empirical Classical Probability
Econ 140
• Statisticians/economists collect sample data for many other
purposes
• CPS is another example: sampling occurs at the household
level
• CPS uses weights to correct data for oversampling
– Over-sampling would be if we picked 1 in 3 in front of
the room and only 1 in 5 in the back of the room. In
that case we would over-sample the front
– There’s a spreadsheet example on the course website
(the weighted mean is our best guess of the population
mean, whereas the unweighted mean is the sample
mean)
Lecture 3
8
Empirical Classical Probability
Econ 140
• On the course website you’ll find an Excel spreadsheet that
we will use to calculate the following:
– Expected value
– PDF and CDF
– Weights to translate sample data into population
estimates
– Examine the difference between the sample
(unweighted) mean and the estimated population
(weighted) mean:
Weighted mean = sum(EARNWKE*EARNWT)/sum(EARNWT)
• This approximates the population mean estimate
Lecture 3
9
Empirical Classical Probability(3)Econ 140
• So how do we construct a PDF for our spreadsheet
example?
– Pick sensible earnings bands (ie 10 bands of $100)
– We can pick as many bands was we want - the greater
the number of bands, the more accurate the shape of the
PDF to the ‘true population’. More bands = more
calculation!
Lecture 3
10
Empirical Classical Probability(2)Econ 140
• Constructing PDFs:
– Count the number of observations in each band to get
an absolute frequency
– Use weights to translate sample frequencies into
estimates of the population frequencies
– Calculate relative frequencies for each band by dividing
the absolute frequency for the band by the total
frequency
Lecture 3
11
Empirical Classical Probability(4)Econ 140
– An alternative way to approximate the PDF:
avg of weights within band
Avg weights 
avg of all weights 
– When we have k bands, always check:
 k pk  1
if the probabilities don’t sum to 1, we’ve made a
mistake!
Lecture 3
12
Empirical Classical Probability(5)Econ 140
• Going back to our expected value…
The expected value of Y will be: E Y   Y k  p k
k
– The pk are frequencies and they can be weighted or not
– The Yk are the earnings bands midpoints (50, 150, 250,
and so on in the spreadsheet)
• From our spreadsheet example our weighted mean was
$316.63 and the unweighted mean was $317.04
– Since the sample is so large, the is little difference
between the sample (unweighted) mean and the
population (weighted) mean
Lecture 3
13
Empirical Classical Probability(6)Econ 140
• We can also calculate the weighted and unweighted
expected values:
E(Weighted value): $326.85
E(Unweightedvalue: $327.31
• Why are the expected values different from the means?
– We lose some information (bands for the wage data) in
calculating the expected values!
• So why would we want to weight the observations?
– With a small sample of what we think is a large
population, we might not have sampled randomly. We
use weights to make the sample more closely resemble
the population.
Lecture 3
14
Empirical Classical Probability(7)Econ 140
• The mean is the first moment of distribution of earnings
• We may also want to consider how variable earnings are
– we can do this by finding the variance, or standard error
• Calculate the variance
– In our example, the unweighted variance is:
 2   Yk  Y 2  pk  30,353.78
– The weighted variance is 29730.34
– The difference between the two is 623.44
Lecture 3
15
Empirical Classical Probability(8)Econ 140
The weighted PDF is pink
It’s tough to see, but the
weighting scheme makes the
population distribution
tighter
Lecture 3
16
Empirical Classical Probability(9)Econ 140
• We can use our PDF to answer:
– What is the probability that someone earns between
$300 and $400?
• But we can’t use this PDF to answer:
– What is the probability that someone earns between
$253 and $316?
• Why?
– The second quesiton can’t be answered using our PDF
because $253 and $316 fall somewhere within the
earnings bands, not at the endpoints
Lecture 3
17
What we’ve done
Econ 140
• ‘A priori’ empirical classical probability
– There are a finite number of possible outcomes
– Flipping coins example
• Empirical classical probability
– There are an infinite number of possible outcomes
– Difference between sample and population means
– Difference between sample and population expected
values
– Difference in calculating PDF’s of a Univariate
population.
Lecture 3
18