Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Monte Carlo Simulation and Resampling Tom Carsey (Instructor) Jeff Harden (TA) ICPSR Summer Course Summer, 2011 — Monte Carlo Simulation and Resampling 1/114 Introductions and Overview What do I plan for this course? What do you want from this course? What are the expectations for everyone involved? Overview of syllabus — Monte Carlo Simulation and Resampling 2/114 What is the Objective? The fundamental objective of scientific research is inference. By that I mean we want to use the data we observe to draw broader conclusions about a process we care about that extend beyond our data. We have a sample of data we can study, but the goal is to learn about the population from which it came. Monte Carlo simulations and resampling methods help us meet these objectives. — Monte Carlo Simulation and Resampling 3/114 Why Use Simulations? Analysis where observable data is not available Mimic the repeated sampling framework of classical frequentist statistics Provide solutions where analytic solutions are not available or are intractable Testing hypothetical processes Robustness Checks Mimics an experimental lab — Monte Carlo Simulation and Resampling 4/114 What is a Monte Carlo Simulation? Computer simulation that generates are large number of simulated samples of data based on an assumed Data Generating Process (DGP) that characterizes the population from which the simulated samples are drawn. Patterns in those simulated samples are then summarized and described. Such patterns can be evaluated in terms of substantive theory or in terms of the statistical properties of some estimator. — Monte Carlo Simulation and Resampling 5/114 What is a DGP? A DGP describes how a values of a variable of interest are produced in the population. Most DGP’s of interest include a systematic component and a stochastic component. We use statistical analysis to infer characteristics of the DGP by analyzing observable data sampled from the population. In applied statistical work, we never know the DGP – if we did, we wouldn’t need statistical estimates of it. In Monte Carol simulations, we do know the DGP because we create it. — Monte Carlo Simulation and Resampling 6/114 What is Resampling? Like Monte Carlo simulations, resampling methods use a computer to generate a large number of simulated samples of data. Also like Monte Carlo simulations, patterns in these simulated samples are then summarized, and the results used to evaluate substantive theory or statistical estimators. What is different is that the simulated samples are generated by drawning new samples (with replacement) from the sample of data you have. In resampling methods, the researcher DOES NOT know or control the DGP, but the goal of learning about the DGP remains the same. — Monte Carlo Simulation and Resampling 7/114 Simulations as Experiments Experiments rest on control of the research environment. Control achieved by balanced (often through randomization) assignment of observations to groups. Then, all members of all groups are treated equally except for one factor. If differences emerge between groups, causality is attributed to that factor, which is generally called the treatment effect. Examples in applied research — Monte Carlo Simulation and Resampling 8/114 Simulations as Experiments (2) Computer simulations follow the same logic. The computer is the “lab” and the researcher controls how simulated samples are generated. One factor is varied across groups of simulated samples and any differences that appear are attributed to that factor (again, generally called the treatment effect). — Monte Carlo Simulation and Resampling 9/114 Simulations as Experiments (3) The power of experiments rests in their control of the environment and the resulting claims of causality. Of course, finding out that the treatment causes some response does not necessarily explain why that response emerges. The limitations of experimental work include: They can quickly become very complex Results may not generalize well to the (necessarily more complex) real world outside of the lab. — Monte Carlo Simulation and Resampling 10/114 Populations and Samples The distinction between the population DGP and the sample(s) of data we generate or have available to us is critical. If the goal is inference (descriptive or causal), then we are attempting to make statements about the population based on some sample data. The fundamental difference between Monte Carlo simulation and resampling is that we create/control the population DGP in Monte Carlo simuations, but not in resampling. Both methods allow us to evaluate theoretical and/or statistical assumptions. Both methods offer opportunities to relax or eliminate some statistical assumptions. — Monte Carlo Simulation and Resampling 11/114 Monte Carlo Simulation of OLS Ordinary Least Squares (OLS) Regression assumes some dependent variable (often labeled Y ) is a linear function of some set of independent variables (often labeled as X ’s), plus some stochastic (random) component (often labeled as ε). A set of parameters describes the relationship between the X ’s and Y . They are often represented as β’s. The Model might be represented like this: Yi = β0 + β1 X1i + β2 X2i + . . . + εi (1) Or like this in matrix notation β +ε Y = Xβ — Monte Carlo Simulation and Resampling (2) 12/114 4 ^ +β ^ x y^i = β 0 1 i 3 ε4 1 2 ^ β 1 } ^ β 0 0 Dependent Variable -- Y 5 6 Component Parts of a Simple Regression 0 1 2 3 Independent Variable -- X 4 5 Monte Carlo Simulation of OLS (2) Next we need to specify more about the stochastic component of the model. In OLS, we generally assume that the residual follows a normal distribution with a mean of zero and a constant variance. This can be expressed as: εi ∼ fN (ei | 0, σ 2 ) (3) where σ 2 represents a constant variance. We have now specified the systematic and the stochastic components of Y . — Monte Carlo Simulation and Resampling 14/114 Monte Carlo Simulation of OLS (3) We can rewrite these two components as follows: Y ∼ fN (yi | µi , σ 2 ) (4) µi = X β (5) This set-up models the randomness in Y directly, and makes clear that the conditional mean of Y is captured by X β. The value of this set-up is it can be generalized, like this: Y ∼ f (y | θ, σ 2 ) θ = g(X , β) (6) (7) This makes clear that the functions f and g must be clearly specified as part of the DGP for Y . Monte Carlo simulations focus all the nitty gritty of specifying these functions. — Monte Carlo Simulation and Resampling 15/114 Know Your Assumptions To simulate a DGP with the goal of evaluating a statistical estimator, you need to know the assumptions of that estimator. For OLS, the key ones are: Independent variables are fixed in repeated samples The model’s residuals are independently and identically distributed (iid) The residuals are distributed normally No perfect collinearity among the independent variables These assumptions must be properly incorporated into the simulation, but then can be examined one by one through repeating the simulation. — Monte Carlo Simulation and Resampling 16/114 “Fixed in Repeated Samples – Really?” In experimental analysis, this assumption is plausible. Researchers often fix the exact values of the treatment variable. In observational analysis (like most of social science), it is not. X ’s are random variables just like Y . Thus, there is some DGP out there for the X ’s as well. The key element of this assumption boils down to assuming that the X ’s are uncorrelated with the residual (ε) from the regression model. In short, the DGP for the X ’s must be uncorrelated with the DGP for the residuals. We’ll see how measurement error in X messes this up. — Monte Carlo Simulation and Resampling 17/114 Simulating OLS in R set.seed(123456) # Set the seed for reproducible results sims ¡- 500 # Set the number of simulations at the top of the script alpha.1 ¡- numeric(sims) # Empty vector for storing the simulated intercepts B.1 ¡- numeric(sims) # Empty vector for storing the simulated slopes a ¡- .2 # True value for the intercept b ¡- .5 # True value for the slope n ¡- 1000 # sample size X ¡- runif(n, -1, 1) # Create a sample of n observations on the variable X. # Note that this variable is outside the loop, because X # should be fixed in repeated samples. for(i in 1:sims)– # Start the Y ¡- a + b*X + rnorm(n, 0, 1) model ¡- lm(Y ˜ X) # Estimate alpha.1[i] ¡- model$coef[1] # # B.1[i] ¡- model$coef[2] # Put ˝ # End loop — Monte Carlo Simulation and Resampling loop # The true DGP, with N(0, 1) error OLS Model Put the estimate for the intercept in the vector alpha.1 the estimate for X in the vector B.1 18/114 0 2 4 6 8 Density 12 Simulated Distribution of Intercept 0.10 0.15 0.20 0.25 0.30 Estimated Values of Parameters 4 2 0 Density 6 Simulated Distribution of Slope 0.3 0.4 0.5 0.6 Estimated Values of Parameters 0.7 What Did We Learn? We see that the estimated intercepts and slopes vary from one simulated sample to the next. We see that they tend to be centered very near the true values we specified in the DGP We see that their distributions are at least bell-shaped, if not perfectly normal. We can learn a lot more, however, if we manipulate features of the DGP, re-run the simulation, and then observe what, if anything, changes. I’ll leave the nuts and bolts to lab, but let’s look at one example. — Monte Carlo Simulation and Resampling 20/114 Multicollinearity in OLS What is Multicollinearity? What does it do to OLS results? Let’s investigate this with a simulation — Monte Carlo Simulation and Resampling 21/114 Multicollinearity Simulation Model with 2 independent variables, correlated at .1, .5, .9. and -.9 Each sample size is 1,000 I draw 1,000 simulated samples at each level of correlation True values for β0 = 0, β1 = .5, and β2 = .5 Here is what I get — Monte Carlo Simulation and Resampling 22/114 Low Corr Medium Corr High Corr 0 2 4 6 8 Density 12 Density Estimate of B0 by Level of Multicollinearity -0.10 -0.05 0.00 0.05 0.10 Estimated Values of B0 Low Corr Medium Corr High Corr 0 2 4 6 8 Density 12 Density Estimate of B1 by Level of Multicollinearity 0.3 0.4 0.5 0.6 0.7 Estimated Values of B1 Density Estimate of B2 by Level of Multicollinearity Density 0 2 4 6 8 12 Low Corr Medium Corr High Corr 0.3 0.4 0.5 Estimated Values of B2 0.6 0.7 0.6 0.5 0.3 0.3 0.4 0.5 0.6 0.7 0.3 0.4 0.5 0.6 0.7 Population Correlation = 0.9 Population Correlation = 0.9 0.6 0.5 0.4 0.3 0.4 0.5 0.6 Estimate of Beta 1 0.7 Estimate of Beta 1 0.7 Estimate of Beta 1 0.3 Estimate of Beta 2 0.4 Estimate of Beta 2 0.6 0.5 0.4 0.3 Estimate of Beta 2 0.7 Population Correlation = 0.5 0.7 Population Correlation = 0 0.3 0.4 0.5 0.6 Estimate of Beta 1 0.7 -0.04 -0.02 0.00 0.02 Diff in Cor of X1 and Y compared to X2 and Y Randomness and Probability Making inference requires use of probability and probability distributions. We draw a sample, but we want to speak about the larger population. We can make those statements if we have a sense of the probability of drawing the sample that we have. The key element to drawing a useful sample is randomness. — Monte Carlo Simulation and Resampling 25/114 Randomness For a sample to be random, it means that every element in the larger population had a fair or equal chance of being selected. If the sample is large enough, it will include mostly “typical” cases. It will also include some “odd” cases. When the sample is large, it will have enough cases that are odd in different ways to cancel out, and enough typical cases to outweigh the few odd ones. Thus, large random samples give us a great deal of statistical power. — Monte Carlo Simulation and Resampling 26/114 Probability Model To make inference, we need to develop a probability model for the data. We need to generate a belief about the probability that the population of possible cases would produce a sample of observations that looks like the one we have. A probability model for a single variable describes the range of possible values that variable could have and the probability, or likelihood, of the various possible values occurring in random sample. Note that OLS (logit, probit – really any single equation model) is really a probability model about a single variable – Y . It’s just a probability that is conditional on some X ’s. — Monte Carlo Simulation and Resampling 27/114 Probability Model (cont.) A random variable represents a random draw from the population. Each data point is a particular realization of a random variable. That means that it’s value for the variable under consideration is just one observed value from the whole range of possible values that could have been observed. The range of all possible values for a random variable is called the distribution of that variable. The shape of that distribution describes how likely we are to observe particular values of a random variable if we were to draw one out by chance. This is the probability distribution for the random variable in question. — Monte Carlo Simulation and Resampling 28/114 Drawing a Random Sample Each observation is just one of many we could have selected. Each entire sample is just one of many we could have selected. We could learn a lot about the probability distribution that describes the population if we could draw lots and lots of samples. In observational work, we usually only have one sample in our hands, which is why we end up making some assumptions about the probability distribution that describes the larger population from which our one sample was taken. But in simulations, we can generate lots and lots of samples. — Monte Carlo Simulation and Resampling 29/114 What is Probability? Probability involves the study of randomness and uncertainty. At a fundamental level, a probability is a number between 0 and 1 that describes how likely something is to occur. Another way to think about it is how frequently an outcome will result if an action, or trial, is repeated many times. This so-called “Frequentist” view of probability lies at the heart of classical statistical theory and the notion of repeated samples. The idea of an “expected” outcome is how we test hypotheses. We compare what we observe to what we expected given some set of assumptions and we try to decide how likely it was to observe what we observed. — Monte Carlo Simulation and Resampling 30/114 Example: Flipping a Coin Suppose I have a coin and I toss it in the air. What is the probability that it will come up “Heads?” We can make an assumption about the coin being fair and assert based on that assumption that the probability is .5. Or we could flip the coin a lot of times and see how frequently we get Heads. If the coin is fair, it should be about half the time. The first approach defines a probability as a logical consequence based on assumptions. The second approach relies on the law of large numbers to approach the true probability. The law of large numbers says that increasing the number of observations leads the observed average to converge toward the true average. The coin toss example is shown in the next slide. — Monte Carlo Simulation and Resampling 31/114 1.0 0.9 0.8 0.7 0.6 0.5 Cumulative Proportion of Success Outcomes 0.4 0 100 200 300 400 500 Number of Trials Figure: Cumulative Frequency of the Proportion of Coin Flips that come up Heads Randomness Again Random does NOT mean haphazard or chaotic. Any one observation or trial might be hard to predict. However, Random Variables are systematic. They follow rules and they show stability in the long run. Again, the way to think about this is that the range of possible values and how likely each one is to occur is described by a probability distribution. You either need to assume what that distribution is, find a way to uncover it, or find methods that are robust to various distributions. — Monte Carlo Simulation and Resampling 33/114 Properties of Probabilities All probabilities fall between 0 and 1. The probability of some event, E, happening, often written as P(E ), is defined as 0 ≤ P(E ) ≤ 1. The sum of the probabilities of all possible outcomes must equal 1. If E is a set of possible outcomes for an event, then P(E ) will equal the sum of the probabilities of all of the events included in set E . Finally, for any set of outcomes E , P(E ) + P(not E ) = 1. In other words, P(E ) + (1 − P(E )) = 1. — Monte Carlo Simulation and Resampling 34/114 Conditional Probability So far, we have been dealing with independent events. When the probability of one outcome changes depending on some other factor, then the probability is conditional on that other factor. For example, the probability that a citizen might turn out to vote could depend upon whether that person lives in a place where the campaign is close and hotly contested. The conditional probability of event E happening given that event F has happened, is generally written like this: P(E |F ). — Monte Carlo Simulation and Resampling 35/114 Conditional Probability (cont.) A conditional probability can be computed like this: P(E |F ) = P(E ∩ F ) P(F ) From this, we can say E is independent of F if and only if P(E |F ) = P(E ) (which also implies that P(F |E ) = P(F )). Another way to think about independence is that two events E and F are independent if: P(E ∩ F ) = P(E )P(F ) This second expression captures what is called the “multiplicative” rule regarding conditional probabilities. — Monte Carlo Simulation and Resampling 36/114 Probability Distributions Describes the range of possible values and probability of observing those values in a random draw (with replacement). PDF - Probability Distribution Function (Discrete) or Probability Density Function (continous) CDF - Cumulative Distribution/Density Function. Total Area under a PDF sums to 1. The CDF records the accumulated probability as is approaches 1. — Monte Carlo Simulation and Resampling 37/114 0.2 0.0 0.1 Normal PDF of X 0.3 0.4 PDF of the Normal −3 −2 −1 0 1 2 3 X — Monte Carlo Simulation and Resampling 38/114 0.6 0.4 0.0 0.2 Normal CDF of X 0.8 1.0 CDF of the Normal −3 −2 −1 0 1 2 3 X — Monte Carlo Simulation and Resampling 39/114 PDF’s and CDF’s Sum of area under a PDF equals 1. The CDF shows this summation across the range of the variable. Note the Mass of the PDF centered around the Mean. That is because the Expected Value of a random Variable is its Mean. OLS is about estimating the probability that Y (the dependent variable) takes on some value conditional on the values of the X , or independent, variables. These conditional probability models are predicting the expected value of the dependent variable, which is the mean. — Monte Carlo Simulation and Resampling 40/114 Random Variables Two types of Random Variables — Discrete and Continuous. These are roughly similar to categorical and continuous. Discrete Random Variables can only take on integer values “Heads” or “Tails” “Strongly Agree”, “Agree”, “Disagree”, “Strongly Agree” A count of objects or events — How many “Heads” or How many “Wars”? Continuous Random Variables can take on any value on the Real Number line. — Monte Carlo Simulation and Resampling 41/114 Discrete Example Toss a coin three times and record the number of “Heads.” (a count) One possible sequence of how three tosses might come out is (H,T,T). The set of all possible outcomes, then, is the set: (H,H,H),(H,H,T),(H,T,H),(H,T,T),(T,H,H),(T,H,T),(T,T,H),(T,T,T). If X is the number of Heads, it can be either 0,1,2, or 3. If the coin we are tossing is fair, then each of the eight possible outcomes are equally likely. Thus, P(X = 0) = 18 ,P(X = 1) = 38 ,P(X = 2) = 38 , and P(X = 3) = 18 . — Monte Carlo Simulation and Resampling 42/114 Discrete (cont.) This describes the discrete probability distribution function of X. Note that the individual probabilities of all of the possible events sum to 1. The cumulative probability distribution function would be represented like this: P(X ≤ 0) = 18 ,P(X ≤ 1) = 84 ,P(X ≤ 2) = 78 , and P(X ≤ 3) = 88 . You can use the sample() function in R to generate observation of a discrete random variable. My.Sample ¡- sample(k,size=n,prob=p,replace=TRUE) — Monte Carlo Simulation and Resampling 43/114 Example using sample() Tossing 3 coins 800 times set.seed(23212) # Allows results to be reproduced n ¡- 800 # Sample size I want to draw k ¡- c(”0 Heads”,”1 Head”,”2 Heads”,”3 Heads”) # possible outcomes p ¡- c(1,3,3,1)/8 # probability of getting 0, 1, 2, or 3 Heads My.Sample ¡- sample(k,size=n,prob=p,replace=TRUE) table(My.Sample) My.Sample 0 Heads 1 Head 2 Heads 3 Heads 94 312 293 101 — Monte Carlo Simulation and Resampling 44/114 Probability Distributions Common Discrete Distributions include: Bernoulli, Binomial, Multinomial, Poisson, Negative Binomial Common Continuous Distributions include: Uniform, Normal, Chi-2 (or χ2 ), F, and Student-t. The last three are sampling distributions that have a degrees of freedom parameter. PDF’s of discrete distributions are represented as spike plots while PDF’s of continuous distributions are represented as density plots. — Monte Carlo Simulation and Resampling 45/114 Spike Plot of a Binomial 0.25 ● ● 0.15 0.10 ● ● 0.05 Binomial PDF of X 0.20 ● 0.00 ● ● ● ● 2 4 6 8 Number of Trials — Monte Carlo Simulation and Resampling 46/114 Continuous PDF’s Continuous PDF’s don’t really describe the probability of getting any precise value because the probability of getting any precise value is effectively 0. Rather, they are used to describe the probability of getting a value that falls between an upper and lower bound. We can consider one tail of the distribution, two tails, or all but the tails. Example with the Normal — Monte Carlo Simulation and Resampling 47/114 Areas Under a Normal PDF P (X ≤ − 1.5) X=−1.5 X=1.5 P (X ≤ − 1.5) + (1 − P (X ≤ 1.5)) X=−1.5 — Monte Carlo Simulation and Resampling X=1.5 1 − P (X ≤ 1.5) X=1.5 X=−1.5 P (X ≤ − 1.5) + (1 − P (X ≤ 1.5)) X=−1.5 X=1.5 48/114 Conclusions Probability is about uncertainty and randomness. Randomness does not mean haphazard. Random variables follow probability distributions that can be defined with some assumptions or through frequentist repeated samples. The expected value of a random variable is the mean of the distribution from which it was drawn. Thus, we build probability and conditional probability models for data based on classical probability theory. — Monte Carlo Simulation and Resampling 49/114 Generating Random Variables R has many functions that generate random variables that follow many types of distributions. runif() rnormal() rt() rf() rchisq() rbinom() # # # # # # Random Random Random Random Random Random Uniform distribution Normal distribution Student’s T distribution F distribution Chi-Square distribution Binomial distribution If you type help(Distributions) in R , you will get a complete listing of those built into R . Many others are available in other packages. — Monte Carlo Simulation and Resampling 50/114 Generating Random Variables (2) However, you are not limited to only those distributions already programmed into R . You can simulate a random draw from any PDF if you know the formula for the PDF. When thinking about a probability distribution, you need to consider the number of parameters that describe its location and shape. Key elements to consider include: Mean Variance Range of valid values Skewness (symmetry of the distribution) Kurtosis (“peakedness” of middle; “heaviness” of tails) When selecting a distribution function for generating a random variable, you have to make sure it is producing the type of variable you want. — Monte Carlo Simulation and Resampling 51/114 Examples of Random Variables Suppose I want a vector of 10 values randomly and uniformly distributed between 0 and 1? ¿ Random10 ¡- runif(10) ¿ Random10 [1] 0.51932983 0.03848523 0.29820136 0.15254877 0.26798912 [6] 0.28751082 0.82063644 0.92177149 0.30496555 0.60416280 Now, what if I repeat the command? Will they be the same? ¿ Random10 ¡- runif(10) ¿ Random10 [1] 0.19536123 0.87823454 0.49350686 0.67970321 0.03955143 [6] 0.75172914 0.56054510 0.33262119 0.35444109 0.19775575 Why are they different? Well, they are random (100,000,001 numbers between 0 and 1, inclusive, out to 8 decimal places) Actually, they are pseudo-random numbers — Monte Carlo Simulation and Resampling 52/114 Pseudo-Random Number Generators Pseudo-random number generators are actually complex computer formulas that generate long strings of numbers that behave as if they were random. They insert a starting value, called a seed, into the formula, and then it cycles So, you can re-create a “random” sequence by staring with the same seed R picks a new seed when you start a new session. STATA picks the same seed at the start of every session. — Monte Carlo Simulation and Resampling 53/114 Setting the Seed You can set the seed in R using the set.seed() function ¿ set.seed(682879) ¿ Random5 ¡- runif(5) ¿ Random5 [1] 0.3506136 0.9191146 0.6758455 0.9105095 0.8402629 ¿ Random5 ¡- runif(5) ¿ Random5 [1] 0.32960079 0.70555853 0.23793750 0.68339820 0.10286161 ¿ set.seed(682879) ¿ Random5 ¡- runif(5) ¿ Random5 [1] 0.3506136 0.9191146 0.6758455 0.9105095 0.8402629 ¿ Random5 ¡- runif(5) ¿ Random5 [1] 0.32960079 0.70555853 0.23793750 0.68339820 0.10286161 — Monte Carlo Simulation and Resampling 54/114 Setting the Seed (2) Very important to know how software sets the seed and whether it resets it automatically or not. As noted, R resets its seed to something new every time you open the software, but STATA resets its seed to the same value every time the software is opened. A website for a group called Random.org (http://www.random.org/) offers truly random numbers based on atmospheric noise and a discussion of them. — Monte Carlo Simulation and Resampling 55/114 Example of a Random Normal Variable Using the rnorm() function ¿ set.seed(17450) ¿ Normal500 ¡- rnorm(500,mean=5,sd=2) ¿ Normal500 [1] 3.3420737 5.7393314 7.2544833 . . [493] 4.8450481 4.4310396 5.8419443 [499] 3.0382390 6.1508388 1.2088551 7.5385345 4.758752 4.3937012 4.5475633 8.598194 The first argument sets N. The second sets the Mean, and the third sets the Standard Deviation We can check the Mean and SD like this: ¿ mean(Normal500) [1] 5.052378 ¿ sd(Normal500) [1] 1.940603 — Monte Carlo Simulation and Resampling 56/114 Conclusions Simulations as experiments give researchers new leverage we don’t have in observational analysis. Interesting models have both systematic and stochastic components. Getting the distribution of the stochastic component right is critical for inference and a major topic of focus for simulations. Monte Carlo simulations let you define the population DGP. Resampling methods do not. — Monte Carlo Simulation and Resampling 57/114 Properties of Statistical Estimators There are three basic properties of statistical estimators that researchers might want to evaluate using Monte Carlo simulations: Bias Efficiency Consistency Bias is about getting the right answer on average. Efficiency is about minimizing the variance around an estimate. Consistency is about getting closer and closer to the right answer as your sample size increases. — Monte Carlo Simulation and Resampling 58/114 Unbiased and Inefficient Biased and Inefficient Biased and Efficient Unbiased and Efficient Figure: Illustration of Bias and Inefficiency of Parameter Estimates Properties of Statistical Estimators (2) In the OLS/GLM context, most tend to equate bias with the estimates of the β’s and efficiency with the estimates of their standard errors. At one level, this makes sense. We want to know if our point estimates are unbiased, and the Standard Errors measure their distribution (which we generally want to be small). This is O.K. in some settings, but this is not exactly right. The parameters and their standard errors that are computed using sample data are both estimates of something. Either could be biased (e.g. systematically wrong) or inefficient (estimated with less precision that we’d like). Monte Carlo simulation can be used to evaluate both. — Monte Carlo Simulation and Resampling 60/114 Evaluating Bias Again, Bias is about systematically getting the wrong answer. One way to measure it is absolute bias: abs(True Parameter - Simulated Parameter) You can repeat the simulation multiple times and compute the mean of this difference and also show its distribution. Next, you might vary some feature of the simulation and show how changing that feature affects absolute bias. An example. — Monte Carlo Simulation and Resampling 61/114 0.5 0.4 0.3 Absolute Bias 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Measurement Error Variance Figure: Impact of Measurement Error on Absolute Bias in Simple OLS Evaluating Bias (2) In this example, the initial impact of measurement error appears to be small. It then grows more rapidly, but that growth rate appears to slow down. In this example, True β1 = .5, True X ranges from -1 to 1, but observed X has random measurement error distributed normally with a mean of zero and variance that grows to 1. Correct interpretation of the previous figure requires knowing the scale of all of these bits of information. If True β1 equalled 27, then absolute bias that never exceeds .4 is not bad. What about the ratio of variance in Observed X due to True X versus measurement error? In this case, the maximum of 1 results in a variance in Observed X of about 1.3, while the variance in True X equals about .33. — Monte Carlo Simulation and Resampling 63/114 Evaluating Bias (3) Simulations for bias must consider the plausible ranges of values for X and the factor that might cause bias. One option would be to re-label each axis in the figure to express the relative bias and proportion of variance in X due to measurement error. Other thing to notice in the figure is how the distribution of absolute bias changes as the level of measurement error changes. The variance is lowest at very low and very high levels of measurement error. Why? At low values of error, the parameter is consistently estimated near the true value. At high values of error, the parameter is consistently estimated to be near zero. This is more clear if I double the maximum variance of the measurement error from 1 to 2 — Monte Carlo Simulation and Resampling 64/114 0.5 0.4 0.3 Absolute Bias 0.2 0.1 0.0 0.0 0.5 1.0 1.5 2.0 Measurement Error Variance Figure: Impact of Measurement Error on Absolute Bias in Simple OLS Efficient Estimates of Parameters There might be several ways to estimate a parameter – How can we evaluate their efficiency? In a case like multicollinearity, we can see that slopes are less efficiently estimated as multicollinearity increases by looking at standard error estimates. However, it is not accurate to say that the method with the smallest standard error are the most efficient as a general rule. We can standard errors that are wrongly estimated to be small. OLS assumptions that have efficiency implications DO NOT always inflate standard error estimates. Better to look at the distribution of the simulated values of the parameter in question. — Monte Carlo Simulation and Resampling 66/114 Efficient Estimation of the “Average” Two very common methods of measuring the “Average” or central tendency of a variable are the Mean and the Median. The Mean is the sum of all values divided by N The Median is the middle value – the 50th percentile value. In a single variable case with a symmetric distribution, both will provide unbiased estimates of the central tendency of the variable. Which is more efficient? — Monte Carlo Simulation and Resampling 67/114 A Simulation Study set.seed(89498) Sims ¡- 10000 N ¡- 100 Results ¡- matrix(NA,nrow=Sims,ncol=2) i ¡- 1 for(i in 1:Sims)– Y ¡- runif(N) Results[i,1] ¡- mean(Y) Results[i,2] ¡- median(Y) ˝ — Monte Carlo Simulation and Resampling 68/114 Measures of Central Tendency from a Uniform(0,1) Variable 8 6 4 2 0 Density 10 12 14 Mean Median 0.40 0.45 0.50 0.55 Measures of Central Tendency 0.60 4 Measures of Central Tendency from A Standard Normal Variable 2 1 0 Density 3 Mean Median -0.4 -0.2 0.0 0.2 Measures of Central Tendency 0.4 Results of Simulation We clearly see that in either case, the Mean is a more efficient estimator of central tendency than is the Median. The look more similar when the underlying distribution from which the sample is being drawn is normal rather than uniform, but that’s also a function of scales, so be careful. Notice we used the distribution of the estimates themselves – we did not compute a standard error. What we’ve done regarding bias and efficiency for parameter estimates could also be applied to estimates of standard errors (they too can be right or wrong, and they too can be widely dispersed or tightly clustered in repeated samples). Epilogue: is the Mean always more efficient? — Monte Carlo Simulation and Resampling 71/114 4 Measures of Central Tendency from A Laplace(0,1) 2 1 0 Density 3 Median Mean -0.4 -0.2 0.0 Measures of Central Tendency 0.2 0.4 Performance of Standard Errors A Standard Error is meant to serve as a measure of the uncertainty of a parameter estimate. It can thought of as an estimate of the standard deviation of all possible estimates of a given parameter based on equally sized samples randomly drawn from the same population. We generally use Standard Errors for hypothesis testing and the construction of confidence intervals. Still, any analytic computation of a standard error relies on some assumptions – if those assumptions are not met, the formula will not produce a proper estimate of the standard error. If the standard error is wrong, our hypothesis tests and confidence intervals will be wrong. — Monte Carlo Simulation and Resampling 73/114 What is a Confidence Interval? Suppose we run a regression and see the following Results Coefficient Standard Error Constant 0.5 0.2 X1 1.3 0.4 X2 2.8 1.6 Assuming a large sample, a normal distribution, etc. we could compute a 95% confidence interval for the coefficient operating on X 1 like this: 95% CI = 1.3 ± 1.96*0.4 95% CI = 2.084 to 0.516 I can do the same for the coefficient operating on X 2: 95% CI = 2.8 ± 1.96*1.6 95% CI = 5.936 to -0.336 How would you interpret these results? — Monte Carlo Simulation and Resampling 74/114 Confidence Intervals (cont.) The 95% CI has the estimated parameter at its center, and extends ± 1.96 standard errors if we assume the coefficient estimates are normally distributed. How to interpret this? If I had a lot of samples drawn from the same population, 95% of the CI’s I computed like this would contain the True value of the parameter. In any one sample, the CI either does or does not include the True parameter – you can’t make a probabilistic statement about it (e.g. you do not have a 95% chance that our CI includes the true value). What it does suggest is a plausible range of values for the parameter. — Monte Carlo Simulation and Resampling 75/114 Performance of Standard Errors (2) Thus, one way to evaluate the performance of standard errors in a Monte Carlo simulation is to determine whether they meet their intended definition: In a large number of repeated samples, a CI set at XX% should include the true population parameter XX% of the time. If it includes the True parameter more than it should, the confidence interval is too large and you risk accepting a Null hypothesis when it is False. If it includes the True parameter less than it should, the confidence interval is too small and you risk rejecting a Null hypothesis when it is True. — Monte Carlo Simulation and Resampling 76/114 Coverage Probabilities In R , what you need to do is compute a confidence interval at a given level (let’s say XX%) each time through the simulation (each of the 1,000 iterations). At each point, check to see if that confidence interval contains the True population parameter or not (and you set the Truth, so you know what it is). Record a 1 when it does and a 0 when it does not. The percentage of times you score a 1 equals the percentage of times that your confidence interval included the True value. If this percentage is approximately equal to XX%, your standard error estimates are accurate. — Monte Carlo Simulation and Resampling 77/114 What About Type II Error? Coverage probabilities describe the proportion of estimated confidence intervals that contain the true population parameter. An accurate 95% CI corresponds to a 5% probability of Type I error – rejecting a Null hypothesis that is True. What about Type II error – the failure to reject a Null hypothesis that is false? For Type I error, there is only one true parameter to compare to the CI that is computed. For Type II error, there are an infinite number of False Null hypotheses. Pick a plausible one (say, one exactly 1.96 standard errors away from the True parameter), then compute the proportion of times your simulated CI includes that plausible False Null. — Monte Carlo Simulation and Resampling 78/114 Choosing between Bias or Inefficiency? Which should I worry about more, bias or inefficiency? Classical Frequentists, Shrinkage Models, Bayesians In any given sample, your parameter estimates might deviate from the Truth because they are biased or because there is variance in their estimation. One way to approach this is to adopt a strategy that considers both factors. Mean Squared Error. — Monte Carlo Simulation and Resampling 79/114 Mean Squared Error Mean Squared Error (MSE) is exactly what it sounds like – you compute a series of errors or differences, you square each of those differences, and you compute the mean. This is commonly reported for OLS models as the MSE of the regression by computing the MSE of the model residuals. But this can be applied to anything, including parameter estimates. In a Monte Carlo simulation, I can estimate lots of slope coefficients. Each time, I can compute the difference between the estimated value and the True value and then square that difference. The mean of those squared differences is the MSE — Monte Carlo Simulation and Resampling 80/114 Mean Squared Error (2) If the MSE = 0, then the estimator always perfectly recovers the population parameter. Of course, that is not realistic. If the estimator is unbiased, then the observed squared errors capture only sampling variance – our uncertainty about the parameter estimate. If the estimator is biased, then the observed squared errors capture both this bias and sampling variance. Specifically, the MSE of θ̂ = Var(θ̂) + Bias(θ̂, θ)2 So MSE is a method of comparison that considers both Bias and Inefficiency in evaluating performance, where smaller MSE is better. — Monte Carlo Simulation and Resampling 81/114 Limitations of MSE It is a loss function that considers both Bias and Inefficiency, but just one specific loss function – a quadratic one. The implied weighting of Bias and Inefficiency might not be the ratio you desire. MSE is sensitive to outliers. Means are more sensitive to outliers, and squaring differences also emphasizes large differences. Alternatives include using the mean of absolute errors rather than squared errors, or using methods that rely on medians rather than means. You will see examples in Lab. — Monte Carlo Simulation and Resampling 82/114 Consistency Consistency is about an estimator converging toward the true value as sample size increases. This assumption gets scant attention in OLS, but is fundamental to MLE, where the small sample properties are unknown. This raises a more general concern with the finite sample properties of estimators compared to their asymptotic properties. (e.g. Beck and Katz, 1995). Simulations can be extremely valuable in revealing finite sample properties. This is the same as saying that an easy factor to vary in a Monte Carlo simulation is the size of each simulated sample that you draw. — Monte Carlo Simulation and Resampling 83/114 Other Performance Evaluations You can evaluate the performance of models are all sorts of other factors. These might include: Explained Variance Within Sample Predictive accuracy Out of Model Forecasting You can add a parsimony discount factor (or use things like AIC or BIC) The burden is on the researcher to identify a characteristic that is appropriate and a way to measure performance on that characteristic. The trick is to make sure your simulation is doing what you think it is doing. — Monte Carlo Simulation and Resampling 84/114 Simulation Error Simulation error can emerge from a number of places: The most common is operator error – you make a mistake in your program OR in your logic. You stumble across an oddity in the pseudo random number generator – I generally run simulations several times starting from different seeds to guard against this. Simulations themselves are probabilistic. You randomly draw some finite sample of data, and you randomly draw some finite number of those samples. Larger N at either stage can have implications for your study, though some might limit the idea of simulation error just to the number of samples you draw. — Monte Carlo Simulation and Resampling 85/114 Other Reasons to do Simulations Evaluate Distributional Assumptions of Estimators Evaluating the range of DGP’s that might produce a variable. Evaluate the behavior of a statistic that has no or weak support from analytic theory. Robustness of sample estimates to different distributional assumptions. — Monte Carlo Simulation and Resampling 86/114 Distributional Assumptions Monte Carlo simulations are well suited to evaluating distributional assumptions of models. Since you control the DGP, you can vary the distributional assumption and observe if/how the results change. The important question is often one of magnitude. Examples — Monte Carlo Simulation and Resampling 87/114 Normal Distribution in OLS OLS assumes the residuals of the model are drawn from a normal distribution. What if the distribution has high Kurtosis? You can look different distributions like the Laplace, or you can vary it using the Student t at different degrees of freedom or the Pearson Type VII. What if it the distribution is skewed? You can draw a vector of size N from a Chi-2 Distribution, then standardize the values of that vector. The result will be a vector of random observations with a mean of zero, a variance of 1, but with a positive skew proportional to the degrees of freedom in the Chi-2 distribution. — Monte Carlo Simulation and Resampling 88/114 1.2 Simulated Standardized Chi-Square Distributions 0.6 0.4 0.2 0.0 Density 0.8 1.0 DF=1 DF=2 DF=5 DF=20 -2 0 2 4 Distributional Assumptions Sometimes we have multiple estimators we could use to estimate a model that might vary in the distributional assumption. How can we tell which one to use? An example is Jeff’s work on using OLS or Median Regression (MR) to estimate a linear model. OLS estimates the conditional mean of Y and assumes the residuals are drawn from a normal distribution. MR estimates the conditional median of Y and assumes the residuals are drawn from a Laplace distribution. We’ll save it for Lab. — Monte Carlo Simulation and Resampling 91/114 The Beta Distribution The Beta distribution is particularly useful if you want to explore a range of distributions over the 0-1 space. The Beta distribution is governed by two parameters, often called a and b or α and β. A Beta(α = 1, β = 1) is the uniform(0,1) distribution A Beta(α < 1, β < 1) is U-shaped A Beta(α < 1, β ≥ 1) is strictly decreasing A Beta(α > 1, β > 1) is unimodal A Beta where both α and β are positive, but with α < β will have positive skew; α > β will have negative skew. This makes it extremely flexible in exploring how estimators behave over different distributional shapes. — Monte Carlo Simulation and Resampling 92/114 Figure: Examples of different Beta distributions Substantive Example using Beta Mooney (1997, p. 72-77) Lijphart and Crepaz (1991) score the United States as a -1.341 on a standardized scale of corporatism. According to Ligphart and Crepaz (1991, p. 235), corporatism, “. . . refers to an interest group system in which groups are organized into national, specialized, hierarchical and monopolistic peak organizations.” Since this is a standardized score, it should have a mean of zero and a standard deviation of 1. The question is: is the level of corporatism in the U.S. significantly lower than average? — Monte Carlo Simulation and Resampling 94/114 Example (2) There are two problems: We don’t have a good theory about the proper probability distribution for the DGP. Even if we did, they only measured 18 countries. Thus, an analytic approach is unwise because they depend on strong theory and/or large samples (e.g. asymptotic properties). Mooney shows that we can use a simulation to get a sense of how likely a score of -1.341 is and what sorts of probability distributions for the DGP are likely or unlikely to produce such scores. — Monte Carlo Simulation and Resampling 95/114 Example (3) Mooney’s simulation: Defines a range of Beta distributions Draws a very large sample from that distribution Standardizes the sample (thus, mean of zero like the original scale) Records attributes of the sample (level of α, β, kurtosis, and skewness) Computes the proportion of observations that fall below -1.341 Treats that proportion as the Probability of Type 1 error (e.g. level of statistical significance) — Monte Carlo Simulation and Resampling 96/114 Example (4) I ran this simulation ranging both α and β from 1 through 30 NOTE: a Beta(1,1) is a uniform distribution, a Beta(30,30) is effectively a normal distribution. Those in between have various levels of skewness and kurtosis. I drew samples of 10,000 from each distribution. I plotted the resulting levels Type 1 error as a function of these attributes of the various Beta distributions. — Monte Carlo Simulation and Resampling 97/114 0.00 0.02 0.04 0.06 0.08 Probability of Type 1 Error 0.10 0.12 Mooney Replication, Slide 1 0 5 10 15 Level of A 20 25 30 0.00 0.02 0.04 0.06 0.08 Probability of Type 1 Error 0.10 0.12 Mooney Replication, Slide 2 0 5 10 15 Level of B 20 25 30 0.00 0.02 0.06 20 25 30 10 15 0 5 10 15 Level of A 20 25 30 0 5 Level of B 0.08 0.10 0.12 Probability of Type 1 Error 0.04 0.14 0.16 Mooney Replication, Slide 3 0.00 0.02 0.04 0.06 0.08 Probability of Type 1 Error 0.10 0.12 Mooney Replication, Slide 4 2 3 4 5 Level of Kurtosis 6 7 8 0.00 0.02 0.04 0.06 0.08 Probability of Type 1 Error 0.10 0.12 Mooney Replication, Slide 5 -2 -1 0 Level of Skewness 1 2 0.00 0.02 0 1 2 -2 -1 1 2 3 4 5 6 Level of Kurtosis 7 8 9 Level of Skewness 0.06 0.08 0.10 0.12 Probability of Type 1 Error 0.04 0.14 0.16 Mooney Replication, Slide 6 What Did We Learn? Under a wide range of distributions, a score of -1.341 or occurs more than 5% of the time in large samples. When -1.341 is rare is when α is relatively low, but is not responsive to β. More specifically, this is most likely when there is a positive skew to the DGP. This makes sense since a variable with a positive skew has a short tail on the left and long tail on the right. If the mean is 0, then a short tail on the negative side makes -1.341 relatively rare. Is the U.S. significantly below average? Only if there is a strong positive skew to the DGP — Monte Carlo Simulation and Resampling 104/114 Statistics with No Analytic Support Some statistics have weak or no theoretical/analytic support at all, and many other have weak support in small samples. Mooney (1997) notes a few: The ratio of two correlated regression coefficients (Bartels 1993) Jackman’s (1994) estimator legislative vote to seat bias. Difference between two medians. You know enough now to imagine the approach: Simulate data with plausible characteristics (e.g. define the systematic and stochastic components of the DGP) Compute the statistic in question Examine the simulated distribution of the statistic Alter one feature at a time in your DGP, repeat the simulation, and observe any changes in the pattern describing your statistic. — Monte Carlo Simulation and Resampling 105/114 Your Results and Your Data Another great use of simulations centers on evaluating the robustness of your findings from the analysis of your sample of data. In this sort of study, your sample data and initial analysis provide the information you need to define the population DGP for the simulation study. Use the actual values of the X ’s in your data, or generate X ’s that look like your observed X ’s in your sample. Use your OLS estimates of the β’s as the values of your population parameters. Simulate the stochastic component of Y based on the observed residuals of your model. Use all of this to generate simulated samples of Y . Your simulation then re-runs your analysis using your X ’s and simulated Y ’s. — Monte Carlo Simulation and Resampling 106/114 Your Results and Your Data (2) You can evaluate the simulation in two basic ways: Does your simulations recover the “True” parameters? How similar are your simulated Y ’ to your actual observed Y ’s. Of course, the real power comes when you begin to manipulate features of your simulation and then re-evaluate the performance of your statistical analysis as noted above. Features you might vary include: N, attributes of X, attributes of the stochastic component of the model, etc. This allows you to determine how sensitive your original results are to different assumptions or different features in the sample of data you have. — Monte Carlo Simulation and Resampling 107/114 Other Kinds of Data Everything thus far has involved continuous variables and continuous probability distributions. Of course, there are lots of variables (and associated probability distributions) that are not continuous in the population DGP or are at least not observed as continuous. Common ones include: Dichotomous variables Ordered categorical variables Unordered categorical variables Count variables And other kinds of data structures, including: Clustered/Multi-level data. Panel and Time Series Cross Section (TSCS) data. We’ll let Jeff do that in Lab! (except . . .) — Monte Carlo Simulation and Resampling 108/114 Time Series Cross Section Data Data that has observations for the same set of multiple units across multiple time periods (e.g. 50 states over 30 years). Tremendous debate on the proper way to analyze such data. Political Analysis (2007: 15(2) Special Issue Political Analysis (2011:19(2)) Symposium on Fixed-Effects Vector Decomposition Simple Question: Does a lagged value of Y capture unit fixed effects? — Monte Carlo Simulation and Resampling 109/114 TSCS (2) The TSCS Model looks like this: Yit = Xit β + eit (8) With a residual like this: eit = µi + αt + εit (9) Unit effects capture the history of each unit. They may be viewed as fixed or random (too much to sort out now) But, does the previous value of Y also capture that history? — Monte Carlo Simulation and Resampling 110/114 A Simulation Study Simulated Yit for 50 units over 50 years. I set Yit as a function of its own past value (parameter = 0.5). I varied the degree of unit effect by adding a random unit effect drawn from a normal distribution with a zero mean and a variance that ranged from 0 to 4 in increments of 0.5. I drew 1,000 samples at each level of unit effect. I estimated two models: Y regressed just on its lagged value and Y regressed on its lagged value plus a full set of unit fixed effects. I plot the simulated parameters operating on the lagged value of Y for both models at each level of unit effect. — Monte Carlo Simulation and Resampling 111/114 Unit Var = 1.0 0.5 0.7 0.9 25 0.3 0.5 0.7 0.9 0.3 Unit Var = 1.5 Unit Var = 2.0 Unit Var = 2.5 Density 40 0 Density 0 0.7 0.9 0.3 0.5 0.7 0.9 0.3 0.5 0.7 Estimated B1 Estimated B1 Unit Var = 3 Unit Var = 3.5 Unit Var = 4.0 0.9 Density 50 0 0 0.7 Estimated B1 150 150 100 Density 50 150 100 0.5 0.9 250 Estimated B1 50 Density 20 40 Density 20 0.5 0.9 20 40 60 80 Estimated B1 0 0.3 0.7 Estimated B1 0 0.3 0.5 Estimated B1 60 0.3 15 0 5 0 0 5 Density 10 Density 15 20 No Unit Effects Unit Effects 5 Density Unit Var = 0.5 10 15 20 Unit Var = 0.0 0.3 0.5 0.7 Estimated B1 0.9 0.3 0.5 0.7 Estimated B1 0.9 Results of TSCS Simulation Failure to account for fixed effects when they are present results in positive bias in the estimate of the coefficient operating on the lagged value of Y . Not shown, but it is clear that controlling for fixed effects when they are present significantly improves the fit of the model. In other words, just including the lagged value of Y is NOT sufficient to capture unit effects. Some evidence that the coefficient operating on the lagged value of Y is biased slightly downward when a full set of fixed effects are included. Variance in the parameter is larger when fixed effects are included (likely due to multicollinearity) — Monte Carlo Simulation and Resampling 113/114 Wrapping Up Monte Carlos Monte Carlo simulations as experiments. Vast array of applications for substantive and methodological research. Great teaching tool. Can get very complex very quickly. Be careful – make sure your simulation is doing what you think it is doing. — Monte Carlo Simulation and Resampling 114/114