Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Choice modelling wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Linear regression wikipedia , lookup
Regression analysis wikipedia , lookup
Forecasting wikipedia , lookup
Data assimilation wikipedia , lookup
Time series wikipedia , lookup
CALIFORNIA STATE UNIVERSITY, HAYWARD DEPARTMENT OF STATISTICS ECOMOMICS 6400 Seminar in Econometrics Introduction to gretl Open the data file data2-1 SAT scores. File > Open Data > sample file > Rananathan… Select data2-1 SAT Scores. To see that the data has been opened. Data > Display values > all variables To see information about the data file. Data > Read info Problem: Perform a one sample t-test to see if there is evidence that the population mean Math SAT scores are different from 500 points. First, we need to compute the sample statistics. Data > Summary statistics > all variables Computer Output: What are the sample means, sample standard deviations, and the sample sizes? Summary Statistics, using the observations 1 - 427 (missing values denoted by -999 will be skipped) Variable vsat msat Variable vsat msat MEAN 501.803 566.323 S.D. 91.3142 93.1192 MEDIAN 500 570 C.V. 0.181972 0.164428 MIN 200 330 SKEW -0.334184 -0.227059 MAX 700 770 EXCSKURT 0.0932293 -0.557107 Second, to see the distribution of each sample, plot the stem-and-leaf plot of each variable. Select variable 1 vsat by clicking on it in the main gretl window to turn it blue. Variable > Frequency distribution Computer Output: Frequency distribution for vsat, obs 1-427 (427 valid observations) number of bins = 12, mean = 501.803, sd = 91.314 interval < 205 251 296 342 388 433 479 525 570 616 >= 205 251 296 342 388 433 479 525 570 616 662 662 midpt 203 228 274 319 365 410 456 502 547 593 639 681 frequency 1 1 8 14 18 49 62 96 93 36 37 12 * * **** ***** ******** ******* *** *** * Test for null hypothesis of normal distribution: Chi-square(2) = 8.280 with p-value 0.01592 Select variable 2 msat by clicking on it in the main gretl window to turn it blue. Try the other plotting command. Variable > Frequency plot > against Normal Third, we will perform the hypothesis test. Utilities > Test calculator Enter the sample mean = 566.323 Enter the sample standard deviation = 93.1192 Enter the sample size = 427 Enter the hypothesized mean = 500 Computer Output: Null hypothesis: population mean = 500 Sample size: n = 427 Sample mean = 566.323, std. deviation = 93.1192 Test statistic: t(426) = (566.323 - 500)/4.50635 = 14.7177 Two-tailed p-value = 6.203e-040 (one-tailed = 3.102e-040) 0.5 t(426) sampling distribution test statistic 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -15 -10 -5 0 5 10 standard errors Use the p-value finder to confirm that the computed p-value is correct. Utilities > p-value finder Select t. Enter the df = 426 Enter the value of the test statistics = 14.7177 Leave the mean = 0 and std. deviation = 1 Click Find. Computer Output: t(426): area to the right of 14.717700 = 3.101e-040 (two-tailed value = 6.201e-040; complement = 1) Problem: Perform a one sample t-test to see if there is evidence that the population mean vsat scores are different from 500 points. Problem: What is the correlation between the Verbal and Math SAT Scores? 15 First plot the data to see what the correlation might be. Data > Graph specified vars > X-Y scatter Choose msat as the X-axis variable. Add vsat as the Y-axis variable. vsat versus msat (with least squares fit) 700 650 600 550 vsat 500 450 400 350 300 250 200 350 400 450 500 550 600 650 msat Second compute the correlation between vsat and msat. Data > correlation matrix > all variables Computer Output: Correlation Coefficients, using the observations 1 - 427 (missing values denoted by -999 will be skipped) 5% critical value (two-tailed) = 0.095 for n = 427 1) vsat 1.000 2) msat 0.420 1.000 (1 (2 Problem: Compute the ols equation for vsat (Y) and msat (X) Model > Ordinary Least Squares Choose vsat as the Dependent variable. Leave the const (constant intercept) in the model. Add msat as the Independent variable. Click OK. 700 750 Test the hypothesis that the slope is different from zero. Computer Output: Model 1: OLS estimates using the 427 observations 1-427 Dependent variable: vsat VARIABLE COEFFICIENT const msat 268.550 0.411874 0) 2) STDERROR T STAT 24.7744 0.0431678 10.840 9.541 2Prob(t > |T|) < 0.00001 *** < 0.00001 *** Mean of dependent variable = 501.803 Standard deviation of dep. var. = 91.3142 Sum of squared residuals = 2.92547e+006 Standard error of residuals = 82.9667 Unadjusted R-squared = 0.176413 Adjusted R-squared = 0.174475 F-statistic (1, 425) = 91.0351 (p-value < 0.00001) Durbin-Watson statistic = 1.71342 First-order autocorrelation coeff. = 0.138202 MODEL SELECTION STATISTICS SGMASQ HQ GCV 6883.47 6967.81 6915.86 AIC SCHWARZ RICE 6915.71 7048.37 6916.01 FPE SHIBATA 6915.71 6915.41 Check the residuals to see if they are normal. In the output window for ols Graphs > residual plot > by observations number Regression residuals (= observed - fitted vsat) 300 200 residual 100 0 -100 -200 -300 -400 0 50 100 150 200 250 index 300 350 400 450 To test if the residuals are normal. Tests > normality of residuals To see the actual fitted values and the residuals for each value of x. Model data > Display actual, fitted, residuals To compute the confidence interval for the slope of the regression line. Model data > Confidence intervals for the coefficients. Computer Output: t(425, .025) = 1.966 VARIABLE 0) 2) COEFFICIENT const msat 95% CONFIDENCE INTERVAL 268.550 0.411874 (219.854, 317.245) (0.327025, 0.496722) To save the fitted values from the ols. Model data > Add to data set > fitted values Note that you will get a new variable in your data set fitted with the description: yhat1 fitted value from model 1. Plot the line on the scatter plot. From the main gretl window Data > Graph specified vars > X-Y scatter Choose msat as the X-axis variable. Add vsat as a Y-axis variable. Add yhat1 as a Y-axis variable. Click OK. Computer Output: 700 vsat 650 yhat1 600 550 500 450 400 350 300 250 200 350 400 450 500 550 msat 600 650 700 750