Download ECON1003: Analysis of Economic Data - Ka

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Lesson 1:
Analysis of Economic Data
is difficult but intuitive
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-1
Outline
Capture-Recapture experiment
Estimator
Simulations
What is Statistics?
Sampling
How to estimate unemployment rate?
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-2
Capture/Re-capture
Goal:
1. Illustrate that how to estimate the population size when
the cost of counting all individuals is prohibitive.
2. Illustrate how intuitive statistics could be. Statistics need
not be completely deep, murky, and mysterious. Our
common sense can help us to negotiate our way through
the course.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-3
Counting the stones
 We are interested in knowing the number of black stones
in the box.
 We only need to do to obtain a reasonable estimate of
stones in the box – allowing for errors of counting or
estimation.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-4
Two examples
 Example #1: The box contains only a small number of
stones.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-5
Two examples
 Example #2: The box contains a lot of stones that will take
days to count.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-6
History and examples of
capture / recapture method
 Capture-recapture methods were originally developed in
the wildlife biology to monitor the census of bird, fish, and
insect populations (counting all individuals is prohibitive).
Recently, these methods have been utilized considerably in
the areas of disease and event monitoring.
 http://www.pitt.edu/~yuc2/cr/history.htm
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-7
The fish example
 Estimating the number (N) of fish in a lake or pond.
1. C fish is caught, tagged, and returned to the lake.
2. Later on, R fish are caught and checked for tags. Say T
of them have tags.
3. The numbers C, R, and T are used to estimate the fish
population.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-8
Stones in a box
 The objective is to estimate the number (N) of
fish (represented by black stones) in a pond.
 Capture one handful of fish (black stones).
Count them and call it C. Mark the fish by
replacing the black stones with red stones. Put
them back into the pond.
 Capture another handful of fish (stones). Count
the total number of fish or stones (R) and the
number of marked fish or white stones (T).
 Based on this information,
 How to obtain a reasonable estimate of the
number of fish in the pond or stones in the
box?
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-9
Stones in a box
 We know that C/N ≈ T/R
 Hence, a simple estimate is
N=CR/T
 C= the number of fish or stones captured in
the first round.
 R= the total number of fish or stones captured
in the second round.
 T= the number of marked fish or white stones
captured in the second round.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-10
Stones in a box
N=CR/T is called an estimator.
Give me any box of stones, I can use the same “method” or
“procedure” or “formula” to estimate the number of stones.
Give me a specific box of stones, I can give you an estimate
of the number of stones in that box (using the estimator
N=CR/T). For example, I estimate that there are 510
stones in the box. “510 stones” is an estimate of number of
stones in the box.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-11
Simulations to see the properties of this
proposed estimator
 How good is the proposed estimator?
 To see the properties of this proposed estimator, I have
used MATLAB to simulate our Capture-recapture
experiment with different numbers of capture (C) and
different numbers of recapture (R), relative to the total
number of fish in the pond.
 Throughout,
 N=500 and
 1000 simulations
That is, I will give you 1000 boxes of stones, each having 500
stones. I am not telling you the number of stones in each box.
You will have to produce 1000 estimates of stones in these
1000 boxes. I would like to see how good your estimator
(“method” or “procedure” or “formula”) is.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-12
Definition: Estimator
 Estimator is a formula or a rule that takes a set of data and
returns an estimate of the population quantity (also known
as population parameter) we are interested in.
θ(x1,x2,...,xn)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-13
Example: An estimator for the population mean
 If we are interested in the population mean, a very
intuitive estimator of the population mean based on a
sample (x1,x2,...,xn) is
θ(x1,x2,...,xn)= (x1+x2+...+xn)/n
 Suppose someone suggest
θ(x1,x2,...,xn)= (x1+x2+...+xn+1)/n
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-14
Simulating the properties of a sample
mean estimator
 If we were to study the properties of the following two
estimators for the population mean:
θ(x1,x2,...,xn)= (x1+x2+...+xn)/n
versus
θ(x1,x2,...,xn)= (x1+x2+...+xn+1)/n
 With some basic computing skills, we may perform Monte
Carlo simulations to compare their properties.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-15
Simulating the properties of a sample mean estimator
1. We will need to define a population. Suppose the
population consists of 10 balls numbered from 1 to 10 in a
bag. We know that the population mean is
(1+2+3+4+5+6+7+8+9+10)/10 = 5.5.
2. We will need to define the sampling process. Suppose we
draw a sample of size 5 with replacement. For the sample,
compute the two sample mean estimates of the population
mean.
3. We will need to decide on the number of repetitions.
Suppose we will repeat the process for 10,000 times.
4. After repeating the sampling process 10,000 times, we will
have 10,000 sample means for each of the estimator, each
of them are estimate of the population mean based on the
respective samples.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-16
Simulating the properties of a sample
mean estimator
 The above simulation is performed using MATLAB. The
means of the 10,000 sample means of the two estimators
are
 5.4990 and
 5.6990.
 The first estimator appears unbiased. That is, on average,
the estimator correctly estimates the population mean.
 5.4990 is very closed to 5.5.
 The second estimator appears biased. That is, on average,
the estimator does not correctly estimate the population
mean.
 5.6990 is not closed to 5.5.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-17
Which estimator is more desirable?
The estimator with the purple distribution is preferred to
the blue one because although both appear unbiased, the
blue one is less precise, i.e., more likely to yield an
estimate that is far from the truth.
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
The estimator with the blue distribution is
preferred to the green one because the green
one appears to have a mean different from the
truth but the blue one does not.
0
Ka-fu Wong © 2007
5
10
ECON1003: Analysis of Economic Data
15
20
Lesson1-18
Simulation design – via MATLAB
 Individual simulation experiment:
 Create 500 “black” fish, labelled 1 to 500.
 Capture a random sample of C fish, mark them by
converting their label to zero (i.e., red fish).
 Capture another random sample of R fish. Count the
number of marked fish in the sample. Call it T.
 Compute the estimate as CR/T.
If T=0, we are in trouble. Such experiments with T=0
are dropped.
 Repeat this experiment 1000 times. Hence, we have 1000
estimates.
 Compute the mean and standard deviation of these 1000
estimates.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-19
Properties of our estimator
Increasing C and R
N
C
R
S
Mean
Std
500
40
40
971
640.76
401.57
500
60
60
1000
579.22
321.54
500
80
80
1000
533.61
154.67
500
100
100
1000
522.85
104.29
500
120
120
1000
513.82
77.41
500
140
140
1000
507.04
60.98
500
250
250
1000
500.64
22.93
500
500
500
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with at least one marked fish in recapture.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-20
Properties of our estimator
Constant C and increasing R
N
C
R
S
Mean
Std
500
120
40
1000
507.86
75.07
500
120
60
1000
513.40
79.55
500
120
80
1000
508.19
73.56
500
120
100
1000
511.24
74.55
500
120
120
1000
510.93
75.41
500
120
140
1000
511.21
75.63
500
120
250
1000
510.49
74.04
500
120
500
1000
507.47
77.32
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with at least one marked fish in recapture.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-21
Properties of our estimator
Increasing C and constant R
N
C
R
S
Mean
Std
500
40
120
961
646.59
405.72
500
60
120
1000
582.17
327.97
500
80
120
1000
533.28
142.23
500
100
120
1000
512.28
95.40
500
120
120
1000
508.78
78.75
500
140
120
1000
507.50
60.61
500
250
120
1000
500.86
22.38
500
500
120
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with at least one marked fish in recapture.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-22
Conclusion from the simulations
 The proposed estimator generally overestimate the
number of fish in pond, i.e., estimate is larger than the true
number of fish in pond.
 That is, there is a bias.
 Holding R constant, increasing the number of capture (C)
helps:
 Bias is reduced, i.e., Mean is closer to the true
population
 The estimator is more precise, i.e., standard deviation of
the estimator is smaller.
 Holding C constant, increasing the number of recapture (R)
does not help:
 Bias is more or less unchanged.
 The precision of the estimator is more or less
unchanged.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-23
Additional issues
 Our proposed estimator is good enough but it can be better.
Alternative estimators have been developed to reduce or
eliminate the bias of estimating N.
 For instance, Seber (1982, p.60) suggests an estimator of
N
(C+1)(R+1)/(T+1) – 1
(Note that our proposed formula is CR/T.)
Seber, G. (1982): The Estimation of Animal Abundance and Related
Parameters, second edition, Charles.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-24
Simulations to see the properties of this
modified estimator
 How good is the modified estimator?
 To see the properties of this modified estimator, we repeat
the above simulation exercise with this new formula.
(C+1)(R+1)/(T+1) – 1
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-25
Properties of modified estimator
Increasing C and R
N
C
R
S
Mean
Std
500
40
40
1000
488.60
271.05
500
60
60
1000
504.39
202.16
500
80
80
1000
498.88
121.47
500
100
100
1000
501.72
91.20
500
120
120
1000
498.10
72.01
500
140
140
1000
501.14
58.44
500
250
250
1000
498.60
21.72
500
500
500
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-26
Properties of modified estimator
Constant C and increasing R
N
C
R
S
Mean
Std
500
120
40
1000
498.55
67.38
500
120
60
1000
500.05
71.54
500
120
80
1000
495.58
69.22
500
120
100
1000
497.01
71.14
500
120
120
1000
498.45
71.05
500
120
140
1000
495.17
67.46
500
120
250
1000
500.41
75.29
500
120
500
1000
496.73
74.27
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-27
Properties of modified estimator
Increasing C and constant R
N
C
R
S
Mean
Std
500
40
120
1000
491.84
291.00
500
60
120
1000
499.33
216.81
500
80
120
1000
496.51
117.05
500
100
120
1000
493.50
87.53
500
120
120
1000
503.24
73.65
500
140
120
1000
498.59
56.30
500
250
120
1000
499.76
22.58
500
500
120
1000
500.00
0.00
•N = Total number of fish in the pond.
•C = number of captured fish.
•R = number of re-captured fish.
•S = number of simulation with non-zero marked fish in recapture.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-28
Conclusion from the simulations
 The modified estimator performs better than the original
estimator.
 There is no apparent bias.
 The estimator is more precise.
 Holding R constant, increasing the number of capture (C)
helps:
 The estimator is more precise, i.e., standard deviation of
the estimator is smaller.
 Holding C constant, increasing the number of recapture (R)
does not help:
 The precision of the estimator is more or less unchanged.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-29
What is Meant by Statistics?
Statistics is the science of
1. collecting,
2. organizing,
3. presenting,
4. analyzing, and
5. interpreting numerical data
to assist in making more effective decisions.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-30
Who Uses Statistics?
Statistical techniques are used extensively by
 Economists,
 marketing,
 accounting,
 quality control,
 consumers,
 professional sports people,
 hospital administrators,
 educators,
 politicians,
 physicians, etc...
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-31
Who Uses Statistics?
 As economists,
 We must verifying our models with data.
 We need to provide forecast of the economy (GDP
growth).
 We need quantitative estimates of
How individual decisions are influenced by policy
variables (such as unemployment benefits,
education subsidy) in order to forecast the impact
of public policies.
How macro policies (government expenditure) will
affect output.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-32
Who Uses Statistics?
 In the business community,
 managers must make decisions based on what will
happen to such things as
demand,
costs, and
profits.
 These decisions are an effort to shape the future of the
organization.
 If the managers make no effort to look at the past and
extrapolate into the future, the likelihood of achieving
success is slim.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-33
Why do we need to understand Statistics?
 We are constantly deluged with statistics in the media
(newspapers, magazines, journals, text books, etc.).
 We need to have a means to condense large quantities of
information into a few facts or figures.
 We need to predict what will likely occur given what has
occurred in the past.
 We need to generalize what we have learned in specific
situations to the more general case.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-34
We are users of statistics
 We do not want to become professors of statistics.
 We do not want to develop advanced statistics theory.
 We are users of statistics
 To be effective users, we need to have a good grip of
basic statistics theory.
 We need to practice using the tools.
 This course will give you the basic, enough for you to
move on to your next Econometrics class.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-35
Populations and Samples
 A population is a collection of all possible individuals,
objects, or measurements of interest.
 A sample is a portion, or part, of the population of interest.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-36
Populations and Samples
Population
Sample
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-37
Sampling a Population of Existing Units
 Random Sampling
 A procedure for selecting a subset of the population
units in such a way that every unit in the population
has an equal chance of selection
 Sampling with replacement
 When a unit is selected as part of the sample, its
value is recorded and placed back into the
population for possible reselection
 Sampling without replacement
 Units are not placed back into the population after
selection
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-38
Approximate Random Samples
Frame
A list of all population units. Required for random
sampling, but not for approximate random sampling
methods like systematic and voluntary response
sampling.
Systematic Sample
Every k-th element of the population is selected for
the sample
Voluntary Response Sample
Sample units are self-selected (as in radio/TV
surveys)
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-39
How to estimate the unemployment rate
First, survey a large number of individuals (say, 1000)
 Are you 15 and over? If not, you are definitely not in the
labor force.
 If you are 15 and over,
 Have you work for pay or profit during the seven days
before enumeration or have a formal job attachment?
 If yes, you are counted as employed.
 If not employed,
Have you been available for work during the seven
days before enumeration? And
Have you sought work during the 30 days before
enumeration?
If yes to both questions, you are counted as
unemployed.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-40
How to estimate the unemployment rate
 The unemployment rate is computed as
#unemployed/ (#unemployed + #employed)
 Note that the estimate of the unemployment rate is based
on a random subset (which we call a sample) of the
individuals of an economy -- not all individuals in an
economy.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-41
Simulate to understand the estimator:
An estimation of the unemployment rate
 A process of estimating unemployment rate may be
simulated at home or in a classroom with a bag of black
and white stones (as in a game of GO).
 Suppose black stones stand for unemployed and white
stones stand for employed individuals. A random selection
of 20 individuals is like randomly grabbing 20 stones from
the bag.
 We ask each selected individuals whether they are white
(employed) or black (unemployed). The unemployment
rate may be computed using the formula
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-42
What to take away today
 Statistics is difficult but could be intuitive.
 Statistics need not be completely deep, murky, and
mysterious.
 Our common sense can help us to negotiate our way
through the course.
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-43
Lesson 1:
Analysis of Economic Data is difficult
but intuitive
- END -
Ka-fu Wong © 2007
ECON1003: Analysis of Economic Data
Lesson1-44