Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Law of large numbers wikipedia , lookup

Transcript
Labor Economics
Exercise session # 1
Artificial Data Generation
TA: Natalia Shestakova
October, 2007
Overview
Generating random variables
Graphing
Throwing seeds
Generating random dummy variables from sample
Drawing from multivariate distributions
Loops and distribution of estimated coefficients
Generating random variables-1
Random-number functions:

uniform() returns uniformly distributed pseudorandom
numbers on the interval [0,1). uniform() takes no
arguments, but the parentheses must be typed.

invnormal(uniform()) returns normally distributed
random numbers with mean 0 and standard deviation 1.
Reminder:


Discrete uniform distribution: all values of a finite set of possible values are equally
probable, continuous: all intervals of the same length are equally probable
Normal distribution: family of continuous probability distributions. Each member of
the family may be defined by two parameters, location and scale: the mean ("average")
and standard deviation ("variability"), respectively
Generating random variables-2
Examples:
500 draws from the uniform distribution on [0,1]
set obs 500
gen x1 = uniform()
500 draws from the standard normal distribution, mean 0, variance 1
gen x2 = invnorm(uniform())
500 draws from the distribution N(1,2)
gen x3 = 1 + 4*invnorm(uniform())
500 draws from the uniform distribution between 3 and 12
gen x4 = 3 + 9*uniform()
500 observations of the variable that is a linear combination of other variables
gen z = 4 - 3*x4 + 8*x2
0
-4
.2
-2
.4
cx1
0
.6
2
.8
1
4
0
0
10
.5
Frequency
Density
20
1
30
1.5
Graphing
0
.2
x1
.4
x1
.6
.8
x2
1
0
0
.2
.2
.4
x1
.6
.4
x1
.8
.6
1
.8
1
Throwing seeds
=> Allows you to generate a particular sample anytime again:
set obs 500
set seed 2
gen z1 = invnorm(uniform())
set seed 2
gen z2 = invnorm(uniform())
set seed 19840607
gen z3 = invnorm(uniform())
dotplot z1 z2 z3
Generating random dummy variables
from sample
Task: generate a variable that characterizes whether an individual smokes
(smoke=1) or does not (smoke=0) smoke.
(a) for period 1, assume that (s)he smokes with probability 30%,
(b) for each of the following 30 periods, there is a 65% chance that a
smoker keeps smoking and a 5% chance that a non-smoker starts
smoking
Solution:
(a) Note, that a uniformly distributed at [0,1) variable is less than 0.3 with
30% chance. Then:
gen smoke = uniform()<.3
(b) first, for every individual, give her/him an ID and create observations
for 30 years (they will be the same); then, step by step, update
probabilities to smoke in every year for every ID:
by pid: replace smoke=uniform()<(.05+.6*smoke[_n-1]) if _n>1
Drawing from multivariate distributions
Task: generate a number of variables that are correlated with
each other (have multivariate distribution)
Solution:
(a) drawnorm: draws a sample from a multivariate normal distribution with
desired means and covariance matrix
drawnorm x y, n(1000) means(m) corr(C)
(b) corr2data: creates an artificial dataset with a specified correlation
structure (is not a sample from an underlying population with the
summary statistics specified)
corr2data x y, n(1000) means(m) corr(C)
Note: matrices m and C can be specified using mat
Loops and distribution of estimated
coefficients
Why to use loops?
-> low probability that one randomly drawn sample coincides with the real one
-> drawing more samples for estimating a coefficient of interest and taking the
average of these coefficients makes the estimate closer to the real one
How to use loops?
gen b1=0
/* all observations of b1 are assigned 0 value
local i=1
/* i is a counter variable in the following loop
set more off
/* useful command so we do not have to hit enter every time the regression runs
while `i'<=500 {
/* command to start a loop of 500 repeatitions
drop _all
/* drop all specified observations so we can randomly generate them again
/*generate random variables
/*regression
scalar d =_b[x1]
/* store the output of regression into a variable
replace b1 = scalar(d) if _n==`i‘
/* put the estimated coefficient in the ith regression into ith observation of variable b1
local i=`i'+1
/* adds 1 to the counter
}
/*end of the loop
Any questions???