Download QM0 L3 Monte Carlos in Stata

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
SGPE QM Lab 3: Monte Carlos
Mark Schaffer
version of 4.10.2010
Introduction
This lab introduces you to useful practical tool in statistics and econometrics: Monte
Carlo simulations.
In a Monte Carlo (MC) simulation, we look at the simulated performance of an
estimator or test statistic under various scenarios. The structure of a typical Monte
Carlo exercise is as follows:
1. Specify the “data generation process” (DGP). These are the assumptions that
you make about where the data come from and what their properties are.
2. Choose a sample size N for your MC simulation.
3. Choose the number of times you will repeat your MC simulation. 10,000 is
traditional, but while debugging your code you might choose a much smaller
number, e.g., 100.
4. Generate a random sample of size N based on your DGP.
5. Using the random sample generated in (4), calculate the statistics of interest.
These might be parameter estimates, statistics for tests of hypotheses
involving these estimated parameters, specification tests, or whatever. Save
these.
6. Go back to (4) and repeat (4)-(5) until you have done it 10,000 times.
7. Examine your 10,000 parameter estimates, test statistics, etc. and see what
conclusions you reach.
Specify the DGP
Stata has functions that will generate random numbers according to various
probability distributions; see help functions in Stata’s on-line help.
For example, say you want to examine the behaviour of the OLS estimator in a simple
bivariate estimation with a sample size of 100. u is an error term drawn from the
normal distribution with mean=0 and standard deviation=2, x is an explanatory
variable randomly drawn from a uniform distribution over [0,1] and uncorrelated with
u, α and β parameters equal to 1 and 2, respectively, and y = α + βx + u = 1 + 2x + u.
You would code this in Stata as follows:
drop _all
set obs 100
gen u = rnormal(0, 2)
gen x = runiform()
gen y = 1 + 2*x + u
And then you would run the regression
reg y x
1
Comments on the above:
We start by dropping any variables that happen to be in memory. (Thinking ahead –
we are going to be replicating this process 10,000 time and we always want to start
with a clean slate.)
“set obs 100” tells Stata that the dataset we will create will have 100 observations.
The function rnormal() takes the mean and SD as its arguments. When no
arguments are provided, it returns a draw from the standard normal, i.e., rnormal() is
equivalent to rnormal(0,1).
The function runiform() returns a value drawn from the uniform probability
distribution. If you wanted a random variables uniformly distributed over, say, [-2,2],
you would say 4*runiform()-2.
The random numbers generated as “pseudo-random” numbers. You can replicate the
sequence of random numbers generated by choosing the “seed”. Useful for
replicability. Psuedo-random numbers are fine for MC simulations, but you wouldn’t
want to use them for encryption purposes!
Task 1
Open up Stata and the do-file editor. Enter the code above. Run the code 10 times,
each time making a note of the estimated value of the coefficient on x and the
intercept. Draw some conclusions.
Next, change the DGP and repeat the exercise. Options for changing the DGP:
-
Increase or decrease the variance of the error u.
Increase or decrease the variance of the explanatory variable x.
Use a different probability distribution for x or u. See help functions
for options.
Estimating the mean
In this lab, we’ll be working with one of the simplest statistics imaginable: the sample
mean. A quick review:
We have a sample of size N. We have observations on a variable x. We calculate the
sample mean of x. Call this sample mean x , i.e.,
x
1
N
N
x
i 1
i
We assume that the population from which our sample is drawn has a finite mean μ
and finite variance σ2.
2
As is always the case in practice, we don’t actually know the true values of μ and σ2.
The natural thing to do is to use the sample mean x to estimate the population mean
μ, and similarly we use the sample variance ˆ 2 to estimate the population variance of
x, where
ˆ 2 
1 N
( x i  m) 2

N  1 i 1
(This is the traditional formula for calculating the variance and standard deviation,
and is reported by Stata after the summarize command. Note the “finite sample”
correction in the division by N-1. Asymptotically our results would be the same if we
didn’t use this adjustment and simply divided by N.)
The sample mean x is a statistic; it is a function of the sample. Our statistic has a
distribution, and we use the knowledge of that distribution to perform inference, e.g.,
test hypotheses about μ.
Under the assumptions above, the sample mean is an unbiased estimator of the
population mean:
E (m)  
This is a “finite-sample” result. It doesn’t require the sample size N going off to
infinity.
The next set of results do require the sample size going off to infinity, i.e., they are
asymptotic approximations that rely on the Central Limit Theorem. The distribution
of the sample mean is:
x ~ N ( ,
ˆ
N
)
and if we define the test statistic Z to be
x
Z
SE (Z )
where the SE of the statistic Z is
̂
N
, then under the null hypothesis that the
population mean is indeed μ,
Z ~ N (0,1)
We can use the test statistic Z to test hypotheses about the mean.
These are asymptotic results, i.e., they are approximations are true in the limit
as the sample size N approaches infinity. In finite samples they won’t be exactly
right, and in some cases they may be very poor approximations indeed.
3
If x is itself normally distributed, more precise (finite sample) results are available,
but we won’t be making use of these.
Monte Carlos and simulations in Stata
We will use Stata to generate 10,000 random samples according to a DGP that we
specify. In each random sample, we will calculate the sample mean x and sample
standard deviation ̂ . After we have collected the results, we will have a new Stata
dataset, consisting of 10,000 observations, where each observation has a x and a ̂ .
We can then look at the distribution of x , calculate a test statistic Z and look at its
distribution, and various other things.
To do this in Stata we make use of the simulate command. The following is a
simple use of simulate:
simulate m=r(m), reps(1000) : mysim
The option reps(1000) is easy to understand – it means repeat the MC exercise 1,000
times. We will typically debug out program with 100 repetitions and then ask for
10,000 repetitions to get serious results.
The key to the rest is what follows the “:”. mysim is a Stata program that we have to
write. (It can be called anything, by the way, but “mysim” is easy to remember.) The
program will do what is in Steps (4)-(5) on p. 1: generate a random sample, calculate
statistics, and return them to Stata. Specifically, in the example above, Stata will call
mysim 1,000 times, and each time mysim will return a statistic “m”. Stata will save
each of these, so that when simulate is done running, the dataset in memory will
have one variable “m” with 1,000 values. Each of those is a value from a call to
mysim.
will be easiest to work with if it is a bit flexible. It is possible to write mysim
so that it takes its own options, but the easiest thing to do is to use global macros. In
this lab, we will write mysim so that it looks for the number of observations N in a
global macro called $obs.
mysim
Here is the version of mysim we will start with:
program define mysim , rclass
drop _all
set obs $obs
gen x = rnormal(1,2)
sum x
return scalar m = r(mean)
end
“rclass” means mysim will save its stored results in r() macros, like other rclass
commands such as summarize. (Look again at the call to simulate and note the use
of r(mean).)
“drop _all” drops any variables in memory (remember – start with a clean slate).
4
“set obs $obs” tells Stata to set the number of observations to whatever is in the global
macro $obs.
The next two lines generate a random variable from the normal distribution with
mean=1 and SD=2, and then summarizes it so that the sample mean is available. The
last line of the program tells mysim to store the sample mean in r(m) so that
simulate can access it. The “end” command ends the program.
Task 2: A simple Monte Carlo
Conduct a simple MC using the example above. In a new do file, insert the following:
capture program drop mysim
program define mysim , rclass
drop _all
set obs $obs
gen x = rnormal(1,2)
sum x
return scalar m = r(mean)
end
global obs 25
simulate m=r(m), reps(100) : mysim
Note the additional line at the top. This tells Stata to drop any existing version of
mysim before defining a new one. If we tried to define a new one when one already
existed, we would get an error. The capture trick is standard (see Lab 2). Also note
the line defining the global “obs”.
Save and execute the do file, first with 100 repetitions (as above), to ensure it works,
and then with 10,000 repetitions.
You will have a dataset with 10,000 observations and one variable called m. Each
observation is a sample mean x from one replication.
Summarize m using the summarize command with the detail option.
Plot m using the histogram command, and overlay a normal distribution:
hist m , normal
By default, hist produces a histogram with a density on the vertical axis. This is fine
for now.
You will compare the distribution of m when N=25 with the distribution when the
sample size is something else. Put the following in your do file:
hist m, bin(20) normal name(n25, replace)
This creates a “named” graph in memory called n25. The “replace” option means
overwrite any existing graph in memory with that name. The “bin(20)” option means
force Stata to use 20 bars (bins).
5
Now add to your do file lines to create and graph m when the sample size is 10, and
when the sample size is 100, e.g.,
global obs = 10
simulate m=r(m), reps(10000): mysim
sum m, detail
hist m, bin(20) normal name(n10, replace)
Finally, combine the three graphs using the graph combine command:
graph combine n10 n25 n100, xcommon ycommon col(1)
“xcommon” and “ycommon” force Stata to use the same scaling for all the X and Y
axes. “col(1)” means put them in one column. (If you want to see what it looks like
without this, or with “row(1)”, try it.)
What do you conclude from an intraocular test of the three graphs? (Intraocular = “it
hits you between the eyes.)
Size, power, and Type I and Type II errors
Quick review:
Type I error: Incorrectly rejecting the null when null is actually true. The probability
of a Type I error is often denoted by α. In hypothesis testing, α is the “significance
level” or “size” of the test.
Type II error: Incorrectly not rejecting the null when the null is actually wrong. The
probability of a Type I error is often denoted by β. In hypothesis testing, (1-β) is the
“power” of the test.
In empirical work, we want to use tests that have good size properties. In other
words, if we choose 5% as our significance level, we are saying that we are willing to
incorrectly reject the null 5% of the time. We want the test that we use to actually
behave this way. An example of a test statistic with poor size properties would be one
where we choose a nominal size of 5%, but we actually incorrectly reject the null 40%
of the time.
We also want to use tests that have statistical power, i.e., that are good at rejecting the
null when it’s wrong. For example, if we have two estimators that are both unbiased,
but one has a standard error that is smaller than the other, it will be more powerful
and, everything else equal, we will want to use it in preference to the one with the
large SE.
A Monte Carlo simulation is one way of examining the size and power properties of
tests and estimations.
6
Task 3: The size properties of tests of the sample mean
In this task, we augment the MC simulation so that we can calculate the test statistic Z
(see above). When then examine whether the test has good size properties. Note: the
answer is not a foregone conclusion! Remember – we are relying on the Central
Limit Theorem, which is an asymptotic approximation. If the sample size is small,
and/or the original DGP is “not very normal”, the approximation could be rather poor.
Your DGP should be to generate a normal random variable with mean=1 and SD=2
for a sample size of N=25. Add the following at the end of your mysim program, just
after the command that returns the mean m:
return scalar se = r(sd)/sqrt($obs)
And change your call to simulate so it looks like this:
simulate m=r(m) se=r(se), reps(100) : mysim
Save and run your do file with reps=100 to confirm it works, then run it again with
reps=10,000 to get your results.
You now have a dataset with 2 variables, m and se. To examine the size properties of
tests using the asymptotic approximation given by the Central Limit Theorem, we
choose a significance level of 5% and ask what happens if we test the null hypothesis
H0: μ=1. (Remember, in our DGP, the true mean is indeed 1.)
Add the following to your do file:
gen z = (m-1)/se
If z is standard normal, as the CLT approximation predicts, then for how many
observations should z be > -1.96? How many times should z be < 1.96? How many
times should z be between -1.96 and 1.96? What do you actually find in the data?
(Hint: use the count command; see help count for examples. You may want to add
this to your do file.)
Next, generalize this graphically by looking at the p-values for the test statistic z,
assuming that z is indeed standard normal. Add the following to your do file:
gen p = normal(z)
If z is standard normal, then what should the distribution of p look like? That is, what
should p look like if the z test statistic has good size properties?
Graph the distribution of z. Use the “percent” option to ease interpretation. Save the
graph as a named graph called “size25”:
hist p , bin(20) percent name(size25, replace)
Repeat the exercise but for a sample size of N=10 by adding the required code to the
bottom of your do file:
7
drop _all
global obs = 10
simulate m=r(m) se=r(se) , reps(10000): mysim
sum m, detail
gen z = (m-1)/se
gen p = normal(z)
hist p , bin(20) percent name(size10, replace)
Do you see any signs of size distortion now?
Now change the DGP in mysim so that instead of a normally distributed random
variable with mean=1 and SD=2, it’s a Bernoulli random variable that takes the value
-1 with probability 50% and the value 4 with probability 50%. This random variable
will also have mean=1 and SD=2.
gen x = 4*rbinomial(1,0.5) - 1
Do you see signs of size distortions with N=10? With N=25? With N=100?
Task 4: The power properties of tests of the mean
Now we consider the power of the test to reject the null when the null is false. This
will be most easily represent by a “power curve”: on the horizontal axis we put the
value of μ being hypothesized, and on the vertical axis will be the probability of
rejecting the null. We will do this for a size of 5%, i.e., for tests at the 5%
significance level.
If our test has good size properties, then if we look at the power curve at the point
where the value of μ being hypothesized is the true value of 1 (on the X axis), the
probability of rejecting the null will be 5% (on the Y axis).
We also expect that the probability of rejection should be higher when we test
hypothesized values that are >1, and when we test hypothesized values that are <1.
We want variables for our power curve plot: a variable with hypothesized values of μ
that we will call “hypoth” and a variable with the corresponding probability of
rejecting the null that we will call “testpower”.
The data in Stata memory we have so far consists of our 10,000 replications. To
calculate a point on the power curve, we need to use all 10,000 values. Because our
new variables hypoth and testpower have nothing to do with individual replications,
generating the power curve data in Stata is rather fiddly. It can be done in various
ways, some of which we can’t use because we haven’t yet shown how to work with
matrices in Stata.
Here is how to do it using a Stata “loop”:
8
capture drop hypoth
gen hypoth = -2 if _n<=100
replace hypoth = hypoth[_n-1] + 0.05 if _n>1 & _n<=100
capture drop testpower
gen testpower = .
forvalues i=1/100 {
capture drop z
qui gen z = (m - hypoth[`i'])/se
capture drop reject
qui gen reject = (z < -1.96 | z > 1.96)
qui sum reject
qui replace testpower = r(mean) if _n == `i'
}
line testpower hypoth
We will put a do file with this content in the QM folder so that you can simply load it
into your do file editor and execute it from there.
Explanation of how it works:
-
-
-
-
-
-
-
We start by creating a variable “hypoth” that is all -2s. We do this only for the
first 100 rows in the dataset. The special Stata variable _n indexes rows.
Starting with the 2nd value in hypoth, we replace the contents with the value
from the preceding observation plus 0.05. We do this for the first 100 rows
only. At the end of this, we have a variable “hypoth” that is -2.00, -1.95,
-1.90, …, 2.95. These are the hypothesized values of μ that we will plot on
the horizontal axis.
Create a variable “testpower” that is initially all missings.
Next, loop through all the rows, and for each row, based on the value in
hypoth, calculate the probability of rejecting the null hypothesis that the true
mean = the value in hypoth, saving this probability in “testpower”.
forvalues is a Stata loop command. The code to be executed in the loop is in
{}. The loop makes use of a “local macro” i. This local macro is a scalar that
starts at 1. Each time through the loop, i is incremented by 1. The loop stops
executing when i passes 100.
Note that to reference a local macro, it needs to be surrounded by ` on the LHS
(the character above 1 on your keyboard) and by ' on the RHS (the character to
the right of ; on your keyboard).
Each time through the loop, we first calculate the test statistic z, and then the
fraction of times we would have rejected z based on the Normal critical values
of -1.96 and 1.96. Note that “|” is Stata’s logical “or”.
Finally, we save the mean of the variable “reject” – the proportion of times we
would have rejected the null – in the corresponding row of the variable
“testpower”.
We conclude by doing a line plot of the power curve.
If time allows, compare the power curve for N=100 and N=1,000. Do this by
executing the code above after a simulation with obs=100 and obs=1000. The line
plot for obs=100 would be saved as
line testpower hypoth, name(power100)
and similarly for obs=1000. Combine them using graph combine. What happens to
the power of the test as the sample size increases?
9