Download CALIFORNIA STATE UNIVERSITY, HAYWARD

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia, lookup

Instrumental variables estimation wikipedia, lookup

Linear regression wikipedia, lookup

Regression analysis wikipedia, lookup

Forecasting wikipedia, lookup

Data assimilation wikipedia, lookup

Time series wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Coefficient of determination wikipedia, lookup

Transcript
CALIFORNIA STATE UNIVERSITY, HAYWARD
DEPARTMENT OF STATISTICS
ECOMOMICS 6400
Seminar in Econometrics
Introduction to gretl
Open the data file data2-1 SAT scores.
File > Open Data > sample file > Rananathan…
Select data2-1 SAT Scores.
To see that the data has been opened.
Data > Display values > all variables
To see information about the data file.
Data > Read info
Problem: Perform a one sample t-test to see if there is evidence that the population
mean Math SAT scores are different from 500 points.
First, we need to compute the sample statistics.
Data > Summary statistics > all variables
Computer Output:
What are the sample means, sample standard deviations, and the sample sizes?
Summary Statistics, using the observations 1 - 427
(missing values denoted by -999 will be skipped)
Variable
vsat
msat
Variable
vsat
msat
MEAN
501.803
566.323
S.D.
91.3142
93.1192
MEDIAN
500
570
C.V.
0.181972
0.164428
MIN
200
330
SKEW
-0.334184
-0.227059
MAX
700
770
EXCSKURT
0.0932293
-0.557107
Second, to see the distribution of each sample, plot the stem-and-leaf plot of each
variable.
Select variable 1 vsat by clicking on it in the main gretl window to turn it blue.
Variable > Frequency distribution
Computer Output:
Frequency distribution for vsat, obs 1-427 (427 valid observations)
number of bins = 12, mean = 501.803, sd = 91.314
interval
<
205 251 296 342 388 433 479 525 570 616 >=
205
251
296
342
388
433
479
525
570
616
662
662
midpt
203
228
274
319
365
410
456
502
547
593
639
681
frequency
1
1
8
14
18
49
62
96
93
36
37
12
*
*
****
*****
********
*******
***
***
*
Test for null hypothesis of normal distribution:
Chi-square(2) = 8.280 with p-value 0.01592
Select variable 2 msat by clicking on it in the main gretl window to turn it blue. Try the
other plotting command.
Variable > Frequency plot > against Normal
Third, we will perform the hypothesis test.
Utilities > Test calculator
Enter the sample mean = 566.323
Enter the sample standard deviation = 93.1192
Enter the sample size = 427
Enter the hypothesized mean = 500
Computer Output:
Null hypothesis: population mean = 500
Sample size: n = 427
Sample mean = 566.323, std. deviation = 93.1192
Test statistic: t(426) = (566.323 - 500)/4.50635 = 14.7177
Two-tailed p-value = 6.203e-040
(one-tailed = 3.102e-040)
0.5
t(426) sampling distribution
test statistic
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-15
-10
-5
0
5
10
standard errors
Use the p-value finder to confirm that the computed p-value is correct.
Utilities > p-value finder
Select t.
Enter the df = 426
Enter the value of the test statistics = 14.7177
Leave the mean = 0 and std. deviation = 1
Click Find.
Computer Output:
t(426): area to the right of 14.717700 = 3.101e-040
(two-tailed value = 6.201e-040; complement = 1)
Problem: Perform a one sample t-test to see if there is evidence that the population
mean vsat scores are different from 500 points.
Problem: What is the correlation between the Verbal and Math SAT Scores?
15
First plot the data to see what the correlation might be.
Data > Graph specified vars > X-Y scatter
Choose msat as the X-axis variable.
Add vsat as the Y-axis variable.
vsat versus msat (with least squares fit)
700
650
600
550
vsat
500
450
400
350
300
250
200
350
400
450
500
550
600
650
msat
Second compute the correlation between vsat and msat.
Data > correlation matrix > all variables
Computer Output:
Correlation Coefficients, using the observations 1 - 427
(missing values denoted by -999 will be skipped)
5% critical value (two-tailed) = 0.095 for n = 427
1) vsat
1.000
2) msat
0.420
1.000
(1
(2
Problem: Compute the ols equation for vsat (Y) and msat (X)
Model > Ordinary Least Squares
Choose vsat as the Dependent variable.
Leave the const (constant intercept) in the model.
Add msat as the Independent variable.
Click OK.
700
750
Test the hypothesis that the slope is different from zero.
Computer Output:
Model 1: OLS estimates using the 427 observations 1-427
Dependent variable: vsat
VARIABLE
COEFFICIENT
const
msat
268.550
0.411874
0)
2)
STDERROR
T STAT
24.7744
0.0431678
10.840
9.541
2Prob(t > |T|)
< 0.00001 ***
< 0.00001 ***
Mean of dependent variable = 501.803
Standard deviation of dep. var. = 91.3142
Sum of squared residuals = 2.92547e+006
Standard error of residuals = 82.9667
Unadjusted R-squared = 0.176413
Adjusted R-squared = 0.174475
F-statistic (1, 425) = 91.0351 (p-value < 0.00001)
Durbin-Watson statistic = 1.71342
First-order autocorrelation coeff. = 0.138202
MODEL SELECTION STATISTICS
SGMASQ
HQ
GCV
6883.47
6967.81
6915.86
AIC
SCHWARZ
RICE
6915.71
7048.37
6916.01
FPE
SHIBATA
6915.71
6915.41
Check the residuals to see if they are normal. In the output window for ols
Graphs > residual plot > by observations number
Regression residuals (= observed - fitted vsat)
300
200
residual
100
0
-100
-200
-300
-400
0
50
100
150
200
250
index
300
350
400
450
To test if the residuals are normal.
Tests > normality of residuals
To see the actual fitted values and the residuals for each value of x.
Model data > Display actual, fitted, residuals
To compute the confidence interval for the slope of the regression line.
Model data > Confidence intervals for the coefficients.
Computer Output:
t(425, .025) = 1.966
VARIABLE
0)
2)
COEFFICIENT
const
msat
95% CONFIDENCE INTERVAL
268.550
0.411874
(219.854, 317.245)
(0.327025, 0.496722)
To save the fitted values from the ols.
Model data > Add to data set > fitted values
Note that you will get a new variable in your data set fitted with the description: yhat1
fitted value from model 1.
Plot the line on the scatter plot. From the main gretl window
Data > Graph specified vars > X-Y scatter
Choose msat as the X-axis variable.
Add vsat as a Y-axis variable.
Add yhat1 as a Y-axis variable.
Click OK.
Computer Output:
700
vsat
650
yhat1
600
550
500
450
400
350
300
250
200
350
400
450
500
550
msat
600
650
700
750