Download I. Generating random samples. (Example and exercise combined)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Law of large numbers wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
PHP2510 Lab2
Objectives:
At the end of this session our goal is for you to


Be able to generate a random sample of a larger dataset
Understand the concept of random sample, population mean and sample mean
Commands used in this lab
Tab
Set seed <new #>
Sample <% of sample you want>
Display
Tabstat
Mean
I. Generating random samples. (Example and exercise combined)
1. Generating random sample of larger database.
In this lab you will create multiple 0.1% samples of the “census” database in order to illustrate some parts
of probability. You will compare mean and range of variables from your samples with those from the
original dataset.
A. Open the census data that you have saved, find answers to the following question using STATA
i)
How many people are in this town?
N= 10000
ii)
49.61%
What % of the census data is female? Recall that the variable “gender” is coded as 1=male
and 0=female.
If I randomly pick one person from the census what is the probability that that person is female?
0.4961
. tab gender
gender
Freq.
Percent
0
1
4,961
5,039
49.61
50.39
Total
10,000
100.00
iii)
Cum.
49.61
100.00
What is average age and range of age? (This average is referred to as “population mean”)
. tabstat age, statistics( mean range ) columns(variables)
stats |
age
---------+---------mean | 37.58071
range | 109.3736
--------------------
iv)
What is the average weight and range of weight?
stats | weight
---------+---------mean | 70.5642
range | 185.6118
--------------------
B. Create a 0.1% random sample
. sample .1
. tab gender
gender |
Freq.
Percent
Cum.
------------+----------------------------------0|
5
50.00
50.00
1|
5
50.00
100.00
------------+----------------------------------Total |
10
100.00
i)
Suppose the question of interest is the “number of females in your sample”. What is the
sample space? What do you observe?
{0,1,2,3,4,5,6,7,8,9,10} I observe 5.
ii)
What is the average age in your sample? Is it the same as the “population mean” you found
in A iii)?
40.57. It is different from the “population mean”.
iii)
What is the average weight in your sample? Is it the same as the “population mean” you
found in A iv)?
70.38. It is different from the “population mean”.
Now you will each create 7 independent random samples and record the number of females in each
sample using the table below:
Trial
1
2
3
4
5
6
7
# female
5
7
3
6
5
3
3
Reload original 10000 observation full dataset (use the filename that you saved the data under)
. use "C:\Users\Xiaoxing Cheng\Desktop\2510TA\Lab2\newcensus2500.dta", clear
Setting individual seed.
Use your birth date (MMDDYYYY) as the seed.
use "C:\Users\Xiaoxing Cheng\Desktop\2510TA\Lab2\newcensus2500.dta", clear
. set seed put_your_birthday_here
. sample .1
(9990 observations deleted)
. tab gender (this is the example from the instructor’s random seed , your result may be different)
gender |
Freq.
Percent
Cum.
------------+----------------------------------0|
5
50.00
50.00
1|
5
50.00
100.00
------------+----------------------------------Total |
10
100.00
You only have to set this seed once. Now STATA will use its algorithm to generate a new independent
random sample during this session each time you reload and sample. If you keep giving STATA the same
seed it will draw the same random sample each time (that does not sound so “random”, does it? In fact, all
computer generated random numbers are pseudo-random. )
Now you can continue to fill in the table provided.
. use "C:\Users\Xiaoxing Cheng\Desktop\2510TA\Lab2\newcensus2500.dta", clear
. sample .1
(9990 observations deleted)
. tab gender
What is average number of females from the 7 trials? To use STATA as calculator you can use
display. For example,
display (5+7+3+6+5+3+3)/7
The average age and weight in my 7 trials using seed 3427897 are as follows:
Age:
Weight:
35.03 35.82 43.31 27.33 38.24 36.47 38.57
64.04 61.98 70.90 62.79 75.07 77.85 55.04
iv)
What is the minimum and maximum number of females seen in your table?
Min=3 and Max=7
v)
What is the range of the “average age in a sample of 10” in your 7 experiments? Compare
with the range in the entire population.
Range of age is from 43.31-27.33=15.98 much narrower than the range in the population.
vi)
What is the range of the “average weight” in your 7 experiments? Compare with the range in
the entire population.
Range of weight is from 77.85-55.04=22.81 much narrower than the range in the population.