Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Homework 1 F0612006 陈旻 5061209173 1. Load data “cgss05.csv” into EViews. Obtain descriptive statistics for “income”, “edu”, and “expr”. The statistics should include number of observations, min, max, mean, median, std, skewness, kurtosis, quantile(0.25), quantile(0.75). Solution: Step1: Select “file”→“new”→“workfile range” Step2: Select “file” →“import” → “trad text-lotus excel” Step3: Exercise the command operation “income.hist”, “edu.hist”, and “expr.hist”. Then we can get the graphs as follows: (1) Income: 6000 Series: INCOME Sample 1 5778 Observations 5778 5000 4000 3000 2000 1000 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 10165.39 6000.000 400000.0 30.00000 14259.29 7.549850 132.1436 Jarque-Bera Probability 4070135. 0.000000 0 0 100000 200000 300000 400000 From the graph, we can know that the skewness of income is 7.549850 » 0, and the kurtosis of the income is 132.1436 » 3. Then we can tell that the data of income does not follow normal distribution. (2) Education: 1600 Series: EDU Sample 1 5778 Observations 5778 1200 800 400 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 8.696262 9.000000 20.00000 0.000000 4.234267 -0.386088 2.563698 Jarque-Bera Probability 189.3775 0.000000 0 0 2 4 6 8 10 12 14 16 18 20 From the graph, we can know that the skewness of education is -0.386088 ‹ 0, and the kurtosis of the income is 2.563698 ‹ 3. Then we can tell that the data of the education roughly follows the normal distribution. Also it skewed left, and is a peaked distribution. (3) Experience: 800 Series: EXPR Sample 1 5778 Observations 5778 600 400 200 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 20.14503 20.00000 47.00000 1.000000 10.40013 0.075313 2.214287 Jarque-Bera Probability 154.0881 0.000000 0 0 5 10 15 20 25 30 35 40 45 From the graph, we can know that the skewness of experience is 0.075313 › 0, and the kurtosis of the income is 2.214287 ‹ 3. Then we can tell that the data of experience roughly follows normal distribution. Also it skewed right, and is a peaked distribution. 2. Generate a new variable that is the log of income, say, “logy”. Obtain histograms, densities, and QQ-plots (versus the standard normal distribution) for both “income” and “logy”. Compare the distributions of these two variables. Solution: Step1: Exercise the command operation “genr logy = log(income)” Step2: Select “view”→“descriptive statistics”→“histogram and stats” both in “income” and “logy”. Step3: Select “view”→“distribution graphs”→“kernel density” both in “income” and “logy”. Step4: Select “view”→“ distribution graphs”→“quantile-quantile plot” both in “income” and “logy”. Then we can get the graphs as follows: (1) Histograms: (i) Income 6000 Series: INCOME Sample 1 5778 Observations 5778 5000 4000 3000 2000 1000 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 10165.39 6000.000 400000.0 30.00000 14259.29 7.549850 132.1436 Jarque-Bera Probability 4070135. 0.000000 0 0 100000 200000 300000 400000 From the graph, we can know that the skewness of income is 7.549850 » 0, and the kurtosis of the income is 1321436 » 3. Then we can tell that the data of income does not follow normal distribution. (ii) Logy 800 Series: LOGY Sample 1 5778 Observations 5778 600 400 200 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 8.636962 8.699515 12.89922 3.401197 1.130683 -0.196602 2.833913 Jarque-Bera Probability 43.86347 0.000000 0 3 .7 5 5 .0 0 6 .2 5 7 .5 0 8 .7 5 1 0 .0 0 1 1 .2 5 1 2 .5 0 From the graph, we can know that the skewness of logy is -0.196602 ‹ 0, and the kurtosis of the logy is 2.833913 ‹ 3. Then we can tell that the data of logy roughly follows normal distribution. Also it skewed left, and is a peaked distribution. (2) Densities: (i) Income Kernel Density (Epanechnikov, h = 2498.5) 0.00008 0.00006 0.00004 0.00002 0.00000 0 100000 200000 INCOME (ii) Logy 300000 400000 Kernel Density (Epanechnikov, h = 0.3984) 0.4 0.3 0.2 0.1 0.0 4 6 8 10 12 LOGY (3) QQ-plots: (i) Income 4 Normal Quantile 2 0 -2 -4 0 100000 200000 300000 400000 500000 INCOME We can see that the trends of the points shown in the graph is not a line. Then we can say that the data of the income does not follow normal distribution. (ii) Logy 4 Normal Quantile 2 0 -2 -4 2 4 6 8 10 12 14 LOGY We can see that the trends of the points shown in the graph is roughly a line. Then we can say that the data of the logy follows normal distribution. 3. Obtain scatter plots between “logy” and “edu”, between “logy” and “expr”. Solution: Step1: Select “quick”→“graph”→“logy edu” →“scatter diagram”. Step2: Select “quick”→“graph”→“logy expr” →“scatter diagram”. Then we can get the scatter diagrams as follows: (1) Between “logy” and “edu” 25 20 EDU 15 10 5 0 2 4 6 8 10 12 14 LOGY We can tell the trends from the graph that the longer the subjects are educated, the larger logy is. That is to say the longer the subjects are educated, the more income they earn. (2) Between “logy” and “expr” 50 40 EXPR 30 20 10 0 2 4 6 8 LOGY 10 12 14 We can not tell the obvious trends from the graph. That is to say experience is not directly related to logy (or say income). 4. Select males from the sample. Obtain descriptive statistics and graphs for the subsample. (Note: you can use menu: Sample in the Workfile window to do the sample selection.) Solution: Step1: Select “sample”→input “female = 0” into the “if condition” box. Step2: Select “view”→“descriptive statistics”→“histogram and stats” both in “income”, “edu”, and “expr”. Then we can get the graphs as follows: (1) Income: 2000 Series: INCOME Sample 1 5778 Observations 2937 1500 1000 500 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 11961.98 8000.000 200000.0 100.0000 14441.47 4.680376 41.24098 Jarque-Bera Probability 189680.8 0.000000 0 0 40000 80000 120000 160000 200000 From the graph, we can know that the skewness of income is 4.680375 » 0, and the kurtosis of the income is 41.24098 » 3. Then we can tell that the data of male’s income does not follow normal distribution. (2) Education: 1000 Series: EDU Sample 1 5778 Observations 2937 800 600 400 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 9.433776 9.000000 20.00000 0.000000 3.769200 -0.390185 2.966716 Jarque-Bera Probability 74.65924 0.000000 200 0 0 2 4 6 8 10 12 14 16 18 20 From the graph, we can know that the skewness of education is -0.390185 ‹ 0, and the kurtosis of the income is 2.966716 ‹ 3. Then we can tell that the data of the male’s education roughly follows the normal distribution. Also it skewed left, and is a peaked distribution. (3) Experience: 400 Series: EXPR Sample 1 5778 Observations 2937 300 200 100 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 20.29213 21.00000 47.00000 1.000000 10.46353 0.033172 2.154916 Jarque-Bera Probability 87.93481 0.000000 0 0 5 10 15 20 25 30 35 40 45 From the graph, we can know that the skewness of experience is 0.033172 › 0, and the kurtosis of the income is 2.154916 ‹ 3. Then we can tell that the data of male’s experience roughly follows normal distribution. Also it skewed right, and is a peaked distribution. 5. Do the same things for the female sample. Solution: Step1: Select “sample”→input “female = 1” into the “if condition” box. Step2: Select “view”→“descriptive statistics”→“histogram and stats” both in “income”, “edu”, and “expr”. Then we can get the graphs as follows: (1) Income: 3000 Series: INCOME Sample 3 5777 Observations 2841 2500 2000 1500 1000 500 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 8308.094 4600.000 400000.0 30.00000 13827.69 11.29411 256.2755 Jarque-Bera Probability 7653971. 0.000000 0 0 100000 200000 300000 400000 From the graph, we can know that the skewness of income is 11.29411 » 0, and the kurtosis of the income is 256.2755 » 3. Then we can tell that the data of female’s income does not follow normal distribution. (2) Education: 800 Series: EDU Sample 3 5777 Observations 2841 600 400 200 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 7.933826 9.000000 20.00000 0.000000 4.543048 -0.248314 2.182541 Jarque-Bera Probability 108.2987 0.000000 0 0 2 4 6 8 10 12 14 16 18 20 From the graph, we can know that the skewness of education is -0.248314 ‹ 0, and the kurtosis of the income is 2.182541 ‹ 3. Then we can tell that the data of the education roughly follows the normal distribution. Also it skewed left, and is a peaked distribution. (3) Experience: 400 Series: EXPR Sample 3 5777 Observations 2841 300 200 100 Mean Median Maximum Minimum Std. Dev. Skewness Kurtosis 19.99296 20.00000 46.00000 1.000000 10.33383 0.119463 2.282415 Jarque-Bera Probability 67.71220 0.000000 0 0 5 10 15 20 25 30 35 40 45 From the graph, we can know that the skewness of experience is 0.119463 › 0, and the kurtosis of the income is 2.282415 ‹ 3. Then we can tell that the data of experience roughly follows normal distribution. Also it skewed right, and is a peaked distribution.