Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PHP2510 Lab2 Objectives: At the end of this session our goal is for you to Be able to generate a random sample of a larger dataset Understand the concept of random sample, population mean and sample mean Commands used in this lab Tab Set seed <new #> Sample <% of sample you want> Display Tabstat Mean I. Generating random samples. (Example and exercise combined) 1. Generating random sample of larger database. In this lab you will create multiple 0.1% samples of the “census” database in order to illustrate some parts of probability. You will compare mean and range of variables from your samples with those from the original dataset. A. Open the census data that you have saved, find answers to the following question using STATA i) How many people are in this town? N= 10000 ii) 49.61% What % of the census data is female? Recall that the variable “gender” is coded as 1=male and 0=female. If I randomly pick one person from the census what is the probability that that person is female? 0.4961 . tab gender gender Freq. Percent 0 1 4,961 5,039 49.61 50.39 Total 10,000 100.00 iii) Cum. 49.61 100.00 What is average age and range of age? (This average is referred to as “population mean”) . tabstat age, statistics( mean range ) columns(variables) stats | age ---------+---------mean | 37.58071 range | 109.3736 -------------------- iv) What is the average weight and range of weight? stats | weight ---------+---------mean | 70.5642 range | 185.6118 -------------------- B. Create a 0.1% random sample . sample .1 . tab gender gender | Freq. Percent Cum. ------------+----------------------------------0| 5 50.00 50.00 1| 5 50.00 100.00 ------------+----------------------------------Total | 10 100.00 i) Suppose the question of interest is the “number of females in your sample”. What is the sample space? What do you observe? {0,1,2,3,4,5,6,7,8,9,10} I observe 5. ii) What is the average age in your sample? Is it the same as the “population mean” you found in A iii)? 40.57. It is different from the “population mean”. iii) What is the average weight in your sample? Is it the same as the “population mean” you found in A iv)? 70.38. It is different from the “population mean”. Now you will each create 7 independent random samples and record the number of females in each sample using the table below: Trial 1 2 3 4 5 6 7 # female 5 7 3 6 5 3 3 Reload original 10000 observation full dataset (use the filename that you saved the data under) . use "C:\Users\Xiaoxing Cheng\Desktop\2510TA\Lab2\newcensus2500.dta", clear Setting individual seed. Use your birth date (MMDDYYYY) as the seed. use "C:\Users\Xiaoxing Cheng\Desktop\2510TA\Lab2\newcensus2500.dta", clear . set seed put_your_birthday_here . sample .1 (9990 observations deleted) . tab gender (this is the example from the instructor’s random seed , your result may be different) gender | Freq. Percent Cum. ------------+----------------------------------0| 5 50.00 50.00 1| 5 50.00 100.00 ------------+----------------------------------Total | 10 100.00 You only have to set this seed once. Now STATA will use its algorithm to generate a new independent random sample during this session each time you reload and sample. If you keep giving STATA the same seed it will draw the same random sample each time (that does not sound so “random”, does it? In fact, all computer generated random numbers are pseudo-random. ) Now you can continue to fill in the table provided. . use "C:\Users\Xiaoxing Cheng\Desktop\2510TA\Lab2\newcensus2500.dta", clear . sample .1 (9990 observations deleted) . tab gender What is average number of females from the 7 trials? To use STATA as calculator you can use display. For example, display (5+7+3+6+5+3+3)/7 The average age and weight in my 7 trials using seed 3427897 are as follows: Age: Weight: 35.03 35.82 43.31 27.33 38.24 36.47 38.57 64.04 61.98 70.90 62.79 75.07 77.85 55.04 iv) What is the minimum and maximum number of females seen in your table? Min=3 and Max=7 v) What is the range of the “average age in a sample of 10” in your 7 experiments? Compare with the range in the entire population. Range of age is from 43.31-27.33=15.98 much narrower than the range in the population. vi) What is the range of the “average weight” in your 7 experiments? Compare with the range in the entire population. Range of weight is from 77.85-55.04=22.81 much narrower than the range in the population.