Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Labor Economics Exercise session # 1 Artificial Data Generation TA: Natalia Shestakova October, 2007 Overview Generating random variables Graphing Throwing seeds Generating random dummy variables from sample Drawing from multivariate distributions Loops and distribution of estimated coefficients Generating random variables-1 Random-number functions: uniform() returns uniformly distributed pseudorandom numbers on the interval [0,1). uniform() takes no arguments, but the parentheses must be typed. invnormal(uniform()) returns normally distributed random numbers with mean 0 and standard deviation 1. Reminder: Discrete uniform distribution: all values of a finite set of possible values are equally probable, continuous: all intervals of the same length are equally probable Normal distribution: family of continuous probability distributions. Each member of the family may be defined by two parameters, location and scale: the mean ("average") and standard deviation ("variability"), respectively Generating random variables-2 Examples: 500 draws from the uniform distribution on [0,1] set obs 500 gen x1 = uniform() 500 draws from the standard normal distribution, mean 0, variance 1 gen x2 = invnorm(uniform()) 500 draws from the distribution N(1,2) gen x3 = 1 + 4*invnorm(uniform()) 500 draws from the uniform distribution between 3 and 12 gen x4 = 3 + 9*uniform() 500 observations of the variable that is a linear combination of other variables gen z = 4 - 3*x4 + 8*x2 0 -4 .2 -2 .4 cx1 0 .6 2 .8 1 4 0 0 10 .5 Frequency Density 20 1 30 1.5 Graphing 0 .2 x1 .4 x1 .6 .8 x2 1 0 0 .2 .2 .4 x1 .6 .4 x1 .8 .6 1 .8 1 Throwing seeds => Allows you to generate a particular sample anytime again: set obs 500 set seed 2 gen z1 = invnorm(uniform()) set seed 2 gen z2 = invnorm(uniform()) set seed 19840607 gen z3 = invnorm(uniform()) dotplot z1 z2 z3 Generating random dummy variables from sample Task: generate a variable that characterizes whether an individual smokes (smoke=1) or does not (smoke=0) smoke. (a) for period 1, assume that (s)he smokes with probability 30%, (b) for each of the following 30 periods, there is a 65% chance that a smoker keeps smoking and a 5% chance that a non-smoker starts smoking Solution: (a) Note, that a uniformly distributed at [0,1) variable is less than 0.3 with 30% chance. Then: gen smoke = uniform()<.3 (b) first, for every individual, give her/him an ID and create observations for 30 years (they will be the same); then, step by step, update probabilities to smoke in every year for every ID: by pid: replace smoke=uniform()<(.05+.6*smoke[_n-1]) if _n>1 Drawing from multivariate distributions Task: generate a number of variables that are correlated with each other (have multivariate distribution) Solution: (a) drawnorm: draws a sample from a multivariate normal distribution with desired means and covariance matrix drawnorm x y, n(1000) means(m) corr(C) (b) corr2data: creates an artificial dataset with a specified correlation structure (is not a sample from an underlying population with the summary statistics specified) corr2data x y, n(1000) means(m) corr(C) Note: matrices m and C can be specified using mat Loops and distribution of estimated coefficients Why to use loops? -> low probability that one randomly drawn sample coincides with the real one -> drawing more samples for estimating a coefficient of interest and taking the average of these coefficients makes the estimate closer to the real one How to use loops? gen b1=0 /* all observations of b1 are assigned 0 value local i=1 /* i is a counter variable in the following loop set more off /* useful command so we do not have to hit enter every time the regression runs while `i'<=500 { /* command to start a loop of 500 repeatitions drop _all /* drop all specified observations so we can randomly generate them again /*generate random variables /*regression scalar d =_b[x1] /* store the output of regression into a variable replace b1 = scalar(d) if _n==`i‘ /* put the estimated coefficient in the ith regression into ith observation of variable b1 local i=`i'+1 /* adds 1 to the counter } /*end of the loop Any questions???