Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SGPE QM Lab 3: Monte Carlos Mark Schaffer version of 4.10.2010 Introduction This lab introduces you to useful practical tool in statistics and econometrics: Monte Carlo simulations. In a Monte Carlo (MC) simulation, we look at the simulated performance of an estimator or test statistic under various scenarios. The structure of a typical Monte Carlo exercise is as follows: 1. Specify the “data generation process” (DGP). These are the assumptions that you make about where the data come from and what their properties are. 2. Choose a sample size N for your MC simulation. 3. Choose the number of times you will repeat your MC simulation. 10,000 is traditional, but while debugging your code you might choose a much smaller number, e.g., 100. 4. Generate a random sample of size N based on your DGP. 5. Using the random sample generated in (4), calculate the statistics of interest. These might be parameter estimates, statistics for tests of hypotheses involving these estimated parameters, specification tests, or whatever. Save these. 6. Go back to (4) and repeat (4)-(5) until you have done it 10,000 times. 7. Examine your 10,000 parameter estimates, test statistics, etc. and see what conclusions you reach. Specify the DGP Stata has functions that will generate random numbers according to various probability distributions; see help functions in Stata’s on-line help. For example, say you want to examine the behaviour of the OLS estimator in a simple bivariate estimation with a sample size of 100. u is an error term drawn from the normal distribution with mean=0 and standard deviation=2, x is an explanatory variable randomly drawn from a uniform distribution over [0,1] and uncorrelated with u, α and β parameters equal to 1 and 2, respectively, and y = α + βx + u = 1 + 2x + u. You would code this in Stata as follows: drop _all set obs 100 gen u = rnormal(0, 2) gen x = runiform() gen y = 1 + 2*x + u And then you would run the regression reg y x 1 Comments on the above: We start by dropping any variables that happen to be in memory. (Thinking ahead – we are going to be replicating this process 10,000 time and we always want to start with a clean slate.) “set obs 100” tells Stata that the dataset we will create will have 100 observations. The function rnormal() takes the mean and SD as its arguments. When no arguments are provided, it returns a draw from the standard normal, i.e., rnormal() is equivalent to rnormal(0,1). The function runiform() returns a value drawn from the uniform probability distribution. If you wanted a random variables uniformly distributed over, say, [-2,2], you would say 4*runiform()-2. The random numbers generated as “pseudo-random” numbers. You can replicate the sequence of random numbers generated by choosing the “seed”. Useful for replicability. Psuedo-random numbers are fine for MC simulations, but you wouldn’t want to use them for encryption purposes! Task 1 Open up Stata and the do-file editor. Enter the code above. Run the code 10 times, each time making a note of the estimated value of the coefficient on x and the intercept. Draw some conclusions. Next, change the DGP and repeat the exercise. Options for changing the DGP: - Increase or decrease the variance of the error u. Increase or decrease the variance of the explanatory variable x. Use a different probability distribution for x or u. See help functions for options. Estimating the mean In this lab, we’ll be working with one of the simplest statistics imaginable: the sample mean. A quick review: We have a sample of size N. We have observations on a variable x. We calculate the sample mean of x. Call this sample mean x , i.e., x 1 N N x i 1 i We assume that the population from which our sample is drawn has a finite mean μ and finite variance σ2. 2 As is always the case in practice, we don’t actually know the true values of μ and σ2. The natural thing to do is to use the sample mean x to estimate the population mean μ, and similarly we use the sample variance ˆ 2 to estimate the population variance of x, where ˆ 2 1 N ( x i m) 2 N 1 i 1 (This is the traditional formula for calculating the variance and standard deviation, and is reported by Stata after the summarize command. Note the “finite sample” correction in the division by N-1. Asymptotically our results would be the same if we didn’t use this adjustment and simply divided by N.) The sample mean x is a statistic; it is a function of the sample. Our statistic has a distribution, and we use the knowledge of that distribution to perform inference, e.g., test hypotheses about μ. Under the assumptions above, the sample mean is an unbiased estimator of the population mean: E (m) This is a “finite-sample” result. It doesn’t require the sample size N going off to infinity. The next set of results do require the sample size going off to infinity, i.e., they are asymptotic approximations that rely on the Central Limit Theorem. The distribution of the sample mean is: x ~ N ( , ˆ N ) and if we define the test statistic Z to be x Z SE (Z ) where the SE of the statistic Z is ̂ N , then under the null hypothesis that the population mean is indeed μ, Z ~ N (0,1) We can use the test statistic Z to test hypotheses about the mean. These are asymptotic results, i.e., they are approximations are true in the limit as the sample size N approaches infinity. In finite samples they won’t be exactly right, and in some cases they may be very poor approximations indeed. 3 If x is itself normally distributed, more precise (finite sample) results are available, but we won’t be making use of these. Monte Carlos and simulations in Stata We will use Stata to generate 10,000 random samples according to a DGP that we specify. In each random sample, we will calculate the sample mean x and sample standard deviation ̂ . After we have collected the results, we will have a new Stata dataset, consisting of 10,000 observations, where each observation has a x and a ̂ . We can then look at the distribution of x , calculate a test statistic Z and look at its distribution, and various other things. To do this in Stata we make use of the simulate command. The following is a simple use of simulate: simulate m=r(m), reps(1000) : mysim The option reps(1000) is easy to understand – it means repeat the MC exercise 1,000 times. We will typically debug out program with 100 repetitions and then ask for 10,000 repetitions to get serious results. The key to the rest is what follows the “:”. mysim is a Stata program that we have to write. (It can be called anything, by the way, but “mysim” is easy to remember.) The program will do what is in Steps (4)-(5) on p. 1: generate a random sample, calculate statistics, and return them to Stata. Specifically, in the example above, Stata will call mysim 1,000 times, and each time mysim will return a statistic “m”. Stata will save each of these, so that when simulate is done running, the dataset in memory will have one variable “m” with 1,000 values. Each of those is a value from a call to mysim. will be easiest to work with if it is a bit flexible. It is possible to write mysim so that it takes its own options, but the easiest thing to do is to use global macros. In this lab, we will write mysim so that it looks for the number of observations N in a global macro called $obs. mysim Here is the version of mysim we will start with: program define mysim , rclass drop _all set obs $obs gen x = rnormal(1,2) sum x return scalar m = r(mean) end “rclass” means mysim will save its stored results in r() macros, like other rclass commands such as summarize. (Look again at the call to simulate and note the use of r(mean).) “drop _all” drops any variables in memory (remember – start with a clean slate). 4 “set obs $obs” tells Stata to set the number of observations to whatever is in the global macro $obs. The next two lines generate a random variable from the normal distribution with mean=1 and SD=2, and then summarizes it so that the sample mean is available. The last line of the program tells mysim to store the sample mean in r(m) so that simulate can access it. The “end” command ends the program. Task 2: A simple Monte Carlo Conduct a simple MC using the example above. In a new do file, insert the following: capture program drop mysim program define mysim , rclass drop _all set obs $obs gen x = rnormal(1,2) sum x return scalar m = r(mean) end global obs 25 simulate m=r(m), reps(100) : mysim Note the additional line at the top. This tells Stata to drop any existing version of mysim before defining a new one. If we tried to define a new one when one already existed, we would get an error. The capture trick is standard (see Lab 2). Also note the line defining the global “obs”. Save and execute the do file, first with 100 repetitions (as above), to ensure it works, and then with 10,000 repetitions. You will have a dataset with 10,000 observations and one variable called m. Each observation is a sample mean x from one replication. Summarize m using the summarize command with the detail option. Plot m using the histogram command, and overlay a normal distribution: hist m , normal By default, hist produces a histogram with a density on the vertical axis. This is fine for now. You will compare the distribution of m when N=25 with the distribution when the sample size is something else. Put the following in your do file: hist m, bin(20) normal name(n25, replace) This creates a “named” graph in memory called n25. The “replace” option means overwrite any existing graph in memory with that name. The “bin(20)” option means force Stata to use 20 bars (bins). 5 Now add to your do file lines to create and graph m when the sample size is 10, and when the sample size is 100, e.g., global obs = 10 simulate m=r(m), reps(10000): mysim sum m, detail hist m, bin(20) normal name(n10, replace) Finally, combine the three graphs using the graph combine command: graph combine n10 n25 n100, xcommon ycommon col(1) “xcommon” and “ycommon” force Stata to use the same scaling for all the X and Y axes. “col(1)” means put them in one column. (If you want to see what it looks like without this, or with “row(1)”, try it.) What do you conclude from an intraocular test of the three graphs? (Intraocular = “it hits you between the eyes.) Size, power, and Type I and Type II errors Quick review: Type I error: Incorrectly rejecting the null when null is actually true. The probability of a Type I error is often denoted by α. In hypothesis testing, α is the “significance level” or “size” of the test. Type II error: Incorrectly not rejecting the null when the null is actually wrong. The probability of a Type I error is often denoted by β. In hypothesis testing, (1-β) is the “power” of the test. In empirical work, we want to use tests that have good size properties. In other words, if we choose 5% as our significance level, we are saying that we are willing to incorrectly reject the null 5% of the time. We want the test that we use to actually behave this way. An example of a test statistic with poor size properties would be one where we choose a nominal size of 5%, but we actually incorrectly reject the null 40% of the time. We also want to use tests that have statistical power, i.e., that are good at rejecting the null when it’s wrong. For example, if we have two estimators that are both unbiased, but one has a standard error that is smaller than the other, it will be more powerful and, everything else equal, we will want to use it in preference to the one with the large SE. A Monte Carlo simulation is one way of examining the size and power properties of tests and estimations. 6 Task 3: The size properties of tests of the sample mean In this task, we augment the MC simulation so that we can calculate the test statistic Z (see above). When then examine whether the test has good size properties. Note: the answer is not a foregone conclusion! Remember – we are relying on the Central Limit Theorem, which is an asymptotic approximation. If the sample size is small, and/or the original DGP is “not very normal”, the approximation could be rather poor. Your DGP should be to generate a normal random variable with mean=1 and SD=2 for a sample size of N=25. Add the following at the end of your mysim program, just after the command that returns the mean m: return scalar se = r(sd)/sqrt($obs) And change your call to simulate so it looks like this: simulate m=r(m) se=r(se), reps(100) : mysim Save and run your do file with reps=100 to confirm it works, then run it again with reps=10,000 to get your results. You now have a dataset with 2 variables, m and se. To examine the size properties of tests using the asymptotic approximation given by the Central Limit Theorem, we choose a significance level of 5% and ask what happens if we test the null hypothesis H0: μ=1. (Remember, in our DGP, the true mean is indeed 1.) Add the following to your do file: gen z = (m-1)/se If z is standard normal, as the CLT approximation predicts, then for how many observations should z be > -1.96? How many times should z be < 1.96? How many times should z be between -1.96 and 1.96? What do you actually find in the data? (Hint: use the count command; see help count for examples. You may want to add this to your do file.) Next, generalize this graphically by looking at the p-values for the test statistic z, assuming that z is indeed standard normal. Add the following to your do file: gen p = normal(z) If z is standard normal, then what should the distribution of p look like? That is, what should p look like if the z test statistic has good size properties? Graph the distribution of z. Use the “percent” option to ease interpretation. Save the graph as a named graph called “size25”: hist p , bin(20) percent name(size25, replace) Repeat the exercise but for a sample size of N=10 by adding the required code to the bottom of your do file: 7 drop _all global obs = 10 simulate m=r(m) se=r(se) , reps(10000): mysim sum m, detail gen z = (m-1)/se gen p = normal(z) hist p , bin(20) percent name(size10, replace) Do you see any signs of size distortion now? Now change the DGP in mysim so that instead of a normally distributed random variable with mean=1 and SD=2, it’s a Bernoulli random variable that takes the value -1 with probability 50% and the value 4 with probability 50%. This random variable will also have mean=1 and SD=2. gen x = 4*rbinomial(1,0.5) - 1 Do you see signs of size distortions with N=10? With N=25? With N=100? Task 4: The power properties of tests of the mean Now we consider the power of the test to reject the null when the null is false. This will be most easily represent by a “power curve”: on the horizontal axis we put the value of μ being hypothesized, and on the vertical axis will be the probability of rejecting the null. We will do this for a size of 5%, i.e., for tests at the 5% significance level. If our test has good size properties, then if we look at the power curve at the point where the value of μ being hypothesized is the true value of 1 (on the X axis), the probability of rejecting the null will be 5% (on the Y axis). We also expect that the probability of rejection should be higher when we test hypothesized values that are >1, and when we test hypothesized values that are <1. We want variables for our power curve plot: a variable with hypothesized values of μ that we will call “hypoth” and a variable with the corresponding probability of rejecting the null that we will call “testpower”. The data in Stata memory we have so far consists of our 10,000 replications. To calculate a point on the power curve, we need to use all 10,000 values. Because our new variables hypoth and testpower have nothing to do with individual replications, generating the power curve data in Stata is rather fiddly. It can be done in various ways, some of which we can’t use because we haven’t yet shown how to work with matrices in Stata. Here is how to do it using a Stata “loop”: 8 capture drop hypoth gen hypoth = -2 if _n<=100 replace hypoth = hypoth[_n-1] + 0.05 if _n>1 & _n<=100 capture drop testpower gen testpower = . forvalues i=1/100 { capture drop z qui gen z = (m - hypoth[`i'])/se capture drop reject qui gen reject = (z < -1.96 | z > 1.96) qui sum reject qui replace testpower = r(mean) if _n == `i' } line testpower hypoth We will put a do file with this content in the QM folder so that you can simply load it into your do file editor and execute it from there. Explanation of how it works: - - - - - - - We start by creating a variable “hypoth” that is all -2s. We do this only for the first 100 rows in the dataset. The special Stata variable _n indexes rows. Starting with the 2nd value in hypoth, we replace the contents with the value from the preceding observation plus 0.05. We do this for the first 100 rows only. At the end of this, we have a variable “hypoth” that is -2.00, -1.95, -1.90, …, 2.95. These are the hypothesized values of μ that we will plot on the horizontal axis. Create a variable “testpower” that is initially all missings. Next, loop through all the rows, and for each row, based on the value in hypoth, calculate the probability of rejecting the null hypothesis that the true mean = the value in hypoth, saving this probability in “testpower”. forvalues is a Stata loop command. The code to be executed in the loop is in {}. The loop makes use of a “local macro” i. This local macro is a scalar that starts at 1. Each time through the loop, i is incremented by 1. The loop stops executing when i passes 100. Note that to reference a local macro, it needs to be surrounded by ` on the LHS (the character above 1 on your keyboard) and by ' on the RHS (the character to the right of ; on your keyboard). Each time through the loop, we first calculate the test statistic z, and then the fraction of times we would have rejected z based on the Normal critical values of -1.96 and 1.96. Note that “|” is Stata’s logical “or”. Finally, we save the mean of the variable “reject” – the proportion of times we would have rejected the null – in the corresponding row of the variable “testpower”. We conclude by doing a line plot of the power curve. If time allows, compare the power curve for N=100 and N=1,000. Do this by executing the code above after a simulation with obs=100 and obs=1000. The line plot for obs=100 would be saved as line testpower hypoth, name(power100) and similarly for obs=1000. Combine them using graph combine. What happens to the power of the test as the sample size increases? 9