Download Lecture 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Econ 140
Univariate Populations
Lecture 3
Lecture 3
1
Today’s Plan
Econ 140
• Univariate statistics - distribution of a single variable
• Making inferences about population parameters from
sample statistics - (For future reference: how can we relate
the ‘a’ and ‘b’ parameters from last lecture to sample data)
• Dealing with two types of probability
– ‘A priori’ classical probability
– Empirical classical
Lecture 3
2
A Priori Classical Probability
Econ 140
• Characterized by a finite number of known outcomes
• The expected value of Y can be defined as
Y  E Y    Yk  pk
k
• The expected value will always be the mean value
µY is the population mean
Y is the sample mean
• The outcome of an experiment is a randomized trial
Lecture 3
3
Flipping Coins
Econ 140
• Example: flipping 2 fair coins
– Possible outcomes are:
HH, TT, HT, TH
– we know there are only 4 possible outcomes
– we get discreet outcomes because there are a finite
number of possible outcomes
– We can represent known outcomes in a matrix
One coin Two coins Number of Heads
T
T
0
T
H
1
H
T
1
H
H
2
Lecture 3
4
Flipping Coins (2)
Econ 140
m
• The probability of some event A is Pr( A) 
n
– where m is the number of events keeping with event A
and n is the total number of possible events.
– If A is the number of heads when flipping 2 coins we
can represent the probability distribution function like
this:
Number of
Probability Distribution
Heads
0
1
2
Lecture 3
Function (PDF)
0.25 = 1/4
0.50 = 1/2
0.25 = 1/4
5
Flipping Coins (3)
Econ 140
• If we graph the PDF we get
Probability Distribution Function
Probability
1.00
0.75
0.50
0.25
0.00
-1
0
1
2
3
Num ber of Heads
• The expected value is
Y  E Y    Yk  pk = 0(0.25) + 1(0.5) + 2(0.25)
•
Lecture 3
k
6
Empirical Classical Probability
Econ 140
• Characterized by an infinite number of possible outcomes
• With empirical classical probability, we use sample data to
make inferences about underlying population parameters
– Most of the time, we don’t know what the population
values are, so we need to use a sample
• Example: GPAs in the Econ 140 population
– We can take a sample of every 5th person in the room
– Assuming that our sample is random (that Econ 140
does not sit in some systematic fashion), we’ll have a
representative sample of the population
Lecture 3
7
Empirical Classical Probability
Econ 140
• Statisticians/economists collect sample data for many other
purposes
• CPS is another example: sampling occurs at the household
level
• CPS uses weights to correct data for oversampling
– Over-sampling would be if we picked 1 in 3 in front of
the room and only 1 in 5 in the back of the room. In
that case we would over-sample the front
– There’s a spreadsheet example on the course website
(the weighted mean is our best guess of the population
mean, whereas the unweighted mean is the sample
mean)
Lecture 3
8
Empirical Classical Probability
Econ 140
• On the course website you’ll find an Excel spreadsheet that
we will use to calculate the following:
– Expected value
– PDF and CDF
– Weights to translate sample data into population
estimates
– Examine the difference between the sample
(unweighted) mean and the estimated population
(weighted) mean:
Weighted mean = sum(EARNWKE*EARNWT)/sum(EARNWT)
• This approximates the population mean estimate
Lecture 3
9
Empirical Classical Probability(3)Econ 140
• So how do we construct a PDF for our spreadsheet
example?
– Pick sensible earnings bands (ie 10 bands of $100)
– We can pick as many bands as we want - the greater the
number of bands, the more accurate the shape of the
PDF to the ‘true population’. More bands = more
calculation!
Lecture 3
10
Empirical Classical Probability(2)Econ 140
• Constructing PDFs:
– Count the number of observations in each band to get
an absolute frequency
– Use weights to translate sample frequencies into
estimates of the population frequencies
– Calculate relative frequencies for each band by dividing
the absolute frequency for the band by the total
frequency
Lecture 3
11
Empirical Classical Probability(4)Econ 140
– A weighted way to approximate the PDF:
avg of weights within band
Avg weights 
avg of all weights 
– When we have k bands, always check:
 k pk  1
if the probabilities don’t sum to 1, we’ve made a
mistake!
Lecture 3
12
Empirical Classical Probability(5)Econ 140
• Going back to our expected value…
The expected value of Y will be: E Y   Y k  p k
k
– The pk are frequencies and they can be weighted or not
– The Yk are the earnings bands midpoints (50, 150, 250,
and so on in the spreadsheet)
• From our spreadsheet example our weighted mean was
$316.63 and the unweighted mean was $317.04
– Since the sample is so large, there is little difference
between the sample (unweighted) mean and the
population (weighted) mean
Lecture 3
13
Empirical Classical Probability(6)Econ 140
• We can also calculate the weighted and unweighted
expected values:
E(Weighted value): $326.85
E(Unweighted value: $327.31
• Why are the expected values different from the means?
– We lose some information (bands for the wage data) in
calculating the expected values!
• So why would we want to weight the observations?
– With a small sample of what we think is a large
population, we might not have sampled randomly. We
use weights to make the sample more closely resemble
the population.
Lecture 3
14
Empirical Classical Probability(7)Econ 140
• The mean is the first moment of distribution of earnings
• We may also want to consider how variable earnings are
– we can do this by finding the variance, or standard error
• Calculate the variance
– In our example, the unweighted variance is:
 2   Yk  Y 2  pk  30,353.78
– The weighted variance is 29730.34
– The difference between the two is 623.44
Lecture 3
15
Empirical Classical Probability(8)Econ 140
The weighted PDF is pink
It’s tough to see, but the
weighting scheme makes the
population distribution
tighter
Lecture 3
16
Empirical Classical Probability(9)Econ 140
• We can use our PDF to answer:
– What is the probability that someone earns between
$300 and $400?
• But we can’t use this PDF to answer:
– What is the probability that someone earns between
$253 and $316?
• Why?
– The second question can’t be answered using our PDF
because $253 and $316 fall somewhere within the
earnings bands, not at the endpoints
Lecture 3
17
Standard Normal Curve
Econ 140
• We need to calculate something other than our PDF, using
the sample mean, the sample variance, and an assumption
about the shape of the distribution function
• Examine the assumption later
• The standard normal curve (also known as the Z table) will
approximate the probability distribution of almost any
continuous variable as the number of observations
approaches infinity
Lecture 3
18
Standard Normal Curve (2)
Econ 140
• The standard deviation (measures the distance from the
mean) is the square root of the variance:
2
 
68%
area under curve
95%
99.7%
3
Lecture 3
2

y

2
3
19
Standard Normal Curve (3)
Econ 140
• Properties of the standard normal curve
– The curve is centered around  y
– The curve reaches its highest value at  y and tails off
symmetrically at both ends
– The distribution is fully described by the expected
value and the variance
• You can convert any distribution for which you have
estimates of  y and  2 to a standard normal distribution
Lecture 3
20
Standard Normal Curve (4)
Econ 140
• A distribution only needs to be approximately normal for
us to convert it to the standardized normal.
• The mass of the distribution must fall in the center, but the
shape of the tails can be different
or
2
1
y
Lecture 3
21
Standard Normal Curve (5)
Econ 140
• If we want to know the probability that someone earns at
most $C, we are asking: PY  C   ?
(Y   )  C   
We can
(Y   ) C   



rearrange
P ( Z  C*)  ?
terms to get:
where Z 
(Y   )

• Properties for the standard normal variate Z:
– It is normally distributed with a mean of zero and a
variance of 1, written in shorthand as Z~N(0,1)
Lecture 3
22
Standard Normal Curve (5)
Econ 140
• If we have some variable Y we can assume that Y will be
normally distributed, written in shorthand as Y~N(µ,2)
• We can use Z to convert Y to a normal distribution
• Look at the Z standardized normal distribution handout
– You can calculate the area under the Z curve from the
mean of zero to the value of interest
– For example: read down the left hand column to 1.6 and
along the top row to .4 you’ll find that the area under
the curve between Z=0 and Z=1.64 is 0.4495
Lecture 3
23
Standard Normal Curve (6)
Econ 140
• Going back to our earlier question: What is the probability
that someone earns between $300 and $400 [P(300Y
400)]?
  316.6
Z1
Z2
 2  25608
  25608  160
P(300Y 400)
300  316.6
 0.104
160
300 316.6
400  316.6
Z 400 
 0.52
160
P (0.104  Z  0)  0.0418
P (0  Z  0.52)  0.1985
P (0.104  Z  0.52)  0.0418  0.1985  .2403
Z300 
Lecture 3
400
24
What we’ve done
Econ 140
• ‘A priori’ empirical classical probability
– There are a finite number of possible outcomes
– Flipping coins example
• Empirical classical probability
– There are an infinite number of possible outcomes
– Difference between sample and population means
– Difference between sample and population expected
values
– Difference in calculating PDF’s of a Univariate
population.
• Use of standard normal distribution.
Lecture 3
25