Download Simple Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Lasso (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Lecture 6: Simple Regression
Regression Analysis
The development of a formula for weighting or combining the values of 1 or more
independent variables to predict or explain variation in values of a dependent variable.
Always 1 DV. It'll be labeled Y.
Simple Regression
1 independent variable.
It'll be labeled X.
Multiple Regression
2 or more independent variables
The first will be labeled X1. The second will be labeled X2. And so forth.
Linear Regression
The formula involves only the first power of X(s) and no higher powers or products.
e.g., Y = a + b1X1 + b2X2
Nonlinear Regression
The formula involves powers of X(s) or transformations of X(s) such as logarithmic or
exponential transformations.
.
e.g., Y = a + bX12 + cX1X2 + d*log(X4)
Simple Linear Regression
The formula has the form: Predicted Y = a + bX.
Multiple Linear Regression
The formula has the form Predicted Y = ay.12 + by1.2X1 + by2.1X2
It will also be written as Predicted Y = a + b1X1 + b2X2.
Later on, we’ll write this as Predicted Y = B0 + B1X1 + B2X2. (B0 + B1X1 + B2X2 if I
forget to subscript the subscripts)
Symbols that will be used to stand for predicted Y
Predicted Y is written as Y-hat, Y’, or Y
Regression Analysis Vs. Correlation Analysis.
Correlation: An analysis which assesses the strength and direction of relationship between X and Y.
Regression: Requires a prior correlation analysis confirming a relationship between X and Y.
An analysis that allows you to predict or explain Y from X.
So correlation analysis is a step on the way to regression analysis. More on this later.
Lecture 6 – Simple Regression - 1
5/8/2017
Why perform regression analysis?
1. Convenience. The formula serves as a convenient way to generate predicted Y values for
persons for whom we have only X or X's.
See the following data matrix of test performance and 1st year sales.
What would be the predicted SALES for a person who scored 30 on the test?
PERSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
TEST
32
23
36
34
31
32
26
21
27
38
32
29
28
32
36
37
36
22
29
33
31
28
22
29
24
SALES
890
790
1330
855
990
1285
865
900
725
1115
1135
1060
1160
1195
1100
1165
1295
720
1090
1040
805
925
520
1070
975
2. Objectivity. The formula serves as an objective way to generate predicted Y values for persons
for whom we have only X or X's.
Suppose the boss's daughter or son scored 25 on the test. Should he/she be hired? The formula is a
way of generating predictions which depend only on the data in an objective fashion. You can say,
“Gosh, Boss. I’d really like to hire your kid, but the formula won’t allow it.”
3. Regression Extras. A byproduct of the analysis allows us to determine the accuracy of our
predictions. That is, beside the fact that the formula gives us a convenient, objective prediction, we
also are able to say how accurate that prediction is (by way of a confidence interval).
4. Theory. The form of the relationship (linear vs. nonlinear) may be of theoretical interest. Some
theories predict linear relationships between variables. Others predict specific forms of nonlinear
relationships. Regression analysis affords tests of those predictions.
LaHuis, D. M., Martin, N. R., & Avis, J. M.(2005). Investigating Nonlinear Conscientiousness–Job
Performance Relations for Clerical Employees. Human Performance, 18, 199-212.
5. Statistical Control. Multiple regression analysis allows us to investigate the effects of variables
while statistically controlling for the effects of other variables. These statistical controls are often
the only kinds which are possible for many real life data analytic situations.
Lecture 6 – Simple Regression - 2
5/8/2017
SIMPLE REGRESSION LECTURE EXAMPLE – Start here on 2/21/17.
Consider an insurance company's desire to predict performance of its sales persons. Clearly it
would be of benefit to the company to know which prospective employees would be good
salespersons and which would be expected to be poor. The company could hire only those expected
to do well in the sales position.
Suppose that a test of sales ability is being given to all current employees. In addition, a record
of sales in the first year on the job is available for all employees.
The interest here is in using the relationship between test scores and first year sales performance
of previously hired employees to predict first year sales of prospective employees.
Suppose data on 25 current employees are available.
Test scores can range from 0 to 50, with 50 representing the best possible test score.
Sales figures can range from 0 to $2,000,000. For sake of exposition, sales figures are expressed
in no. Of $1000's, ranging from 0 to 2000.
PERSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
TEST
32
23
36
34
31
32
26
21
27
38
32
29
28
32
36
37
36
22
29
33
31
28
22
29
24
SALES
890
790
1330
855
990
1285
865
900
725
1115
1135
1060
1160
1195
1100
1165
1295
720
1090
1040
805
925
520
1070
975
Lecture 6 – Simple Regression - 3
5/8/2017
The formula method for computing a and b.
The formula method computes b and a such that the sum of squares of the differences between
Y's and Y-hats is smaller than it would be for any other values.
Called a least squares solution.
The mean is a least squares measure. The variance (and standard deviation) about the mean is
smaller than it is about any other value. The mean is “closer” to all the scores (in the least squares
sense) than any other value.
Estimate of b
The estimate of b can be obtained using several formula's.
bYX = Covariance of X and Y divided by Variance of X's.
b=
(X-X-bar)(Y-Y-bar) / n
---------------------------(X-X-bar)2/ n
SY
= r * ------------SX
The y-intercept.
Once b has been computed, a is computed from
a = Y-bar - b*X-bar
When r = 0.
Note that b depends on r. If r is 0, b is 0, and all predictions will be the same value, Y-bar.
That is, the prediction equation would be
Predicted Y = a + b*X = a + 0*X = a
Predicted Y = a = Y-bar.
So, if there is no correlation between Xs and Ys, the predicted Y for every person is simply the mean
of the Ys.
Lecture 6 – Simple Regression - 4
5/8/2017
The stat package method. (Formula method in disguise.)
There’ll be an extended SPSS example later in the lecture.
Model Summary
Model
R
R Square
Std. Error of the
Adjusted R Square Estimate
1
.684a
.468
.445
149.203
a. Predictors: (Constant), TEST
Coefficientsa
Unstandardized Coefficients
Standardized
Coefficients
B
Std. Error
Beta
(Constant) (a)
176.474
185.606
TEST (b)
27.524
6.123
Model
1
.684
Standard error: The amount that
the estimate would be expected to
vary if multiple regressions on
different samples were conducted.
It’s the estimated standard deviation
of values across multiple samples.
Sig: p-value for a test of the null
hypothesis that in the population the
coefficient is zero.
t
Sig.
.951
.352
4.495
.000
a. Dependent Variable: SALES
Predicted Y = 176.474 + 27.524*TEST
For TEST = 20, Predicted Y = 176.474 + 27.524*20 = 726.924
Lecture 6 – Simple Regression - 5
5/8/2017
Visual representation of the regression analysis
A scatterplot with observed points, predicted points, a couple of residuals and the best fitting
straight line for this analysis. (Created by hand.)
1500
1400
Actual point
1300
1200
Predicted point
1100
1000
Positive
Residual
900
Sales
800
700
600
Negative
Residual
500
400
300
Residual: Observed Y – Predicted Y.
200
Positive Residual: Y is bigger than predicted. Observed was better than predicted.
100
Negative Residual: Y is smaller than predicted. Observed was worse than predicted.
0
0
10
20
Test
30
40
Interpretation of the regression parameters
a
Expected value of Y when X = 0.
In this example, as in most psychological examples, the X=0 situation is of little interest.
What is 0 on most psychological scales is arbitrary.
b
Interpretation 1: Expected difference in Y between two people who differ by 1 on X
Interpretation 2: Expected change in Y when X increases by 1 if X is manipulable.
Here we would expect about a 27, 500 difference in sales between two people who differed
by 1 point on the test.
Lecture 6 – Simple Regression - 6
5/8/2017
50
Regression as a model for a relationship
Model: Something not as complex or detailed as the original that performs similarly to the original.
The above shows how a regression equation serves as a model of a relationship.
The filled in points are an idealization of the Test~Sales relationship.
It shows how Sales would relate to Test if it weren’t for the errors introduced by idiosyncrasies of
individuals.
You will hear data analysts speak of the regression model of the data. The scatterplot of filled
points is what they are referring to.
What to get from the simple regression.
1. The r-squared.
If it is not significantly different from 0, the regression analysis is of no value in explanation or
prediction except to rule out this X as a predictor/explainer of Y.
2. The regression parameters.
a:
b:
It may be of interest to know what Y would be expected when X=0. (Probably not, though.)
It may be of interest to know how much Y would be expected to change if X increased by 1.
3. Predicted Ys.
It may be of interest to know the expected Y value of a person with a particular value of X.
4. The residuals.
It may be of interest to know how a person did relative to what he/she was expected to do.
Lecture 6 – Simple Regression - 7
5/8/2017
Difference between a Correlation Analysis and a Regression Analysis
Correlation
1. No Independent/Dependent distinction
Regression
There is a Dependent variable and an
Independent variable.
2. A binomial effect size table is appropriate.
A binomial effect size table is appropriate.
3. The correlation coefficient is computed.
The correlation coefficient is computed.
4. A scatterplot is typically created.
A scatterplot with BFSL is created.
5.
Prediction equation with regression parameters
is reported.
6. Interpretation focuses on the relationship.
Interpretation focuses on relationship and on
predicted values vs observed values.
What to watch for in both. . .
1. Is the relationship essentially linear?
2. Are there any points that are particularly poorly predicted?
3. Are there any points that appear to be too influential?
4. Is the relationship what your theory said it would be?
Lecture 6 – Simple Regression - 8
5/8/2017
Regression Statistics for Individual Cases (boring, arcane, grit your teeth)
Here for your
reference.
Be sure to take
Central Tendency:
Mean of predicted Ys is the mean of the Y's.
2
Variability:
Variance of the predicted Ys is r *Variance of the Ys. advantage of it if you
must.
1) The fact that the variance of the predicted Ys is less than or equal to the variance of Ys means
that the predicted Ys will be conservative – predicted Y for a large Y will be slightly smaller
than the actual Y, for example. Predicted Ys regress to the mean.
Predicted Y's:
Predicted Y = a + b *X
2) The predicted Ys are perfectly linearly related to the Xs. The predicted Ys are simply a
restatement or linear recoding of the Xs. The predicted Ys are the Xs, thinly disguised.
3) When you plot predicted Ys vs Xs, you’ll always get a perfectly straight line of points..
Residuals:
Y - Y-hat.
Central Tendency: Mean of residuals is 0.
Variability:
Variance of the residuals = (1-r2)*Variance of Ys.
A positive residual: Y outperformed the prediction. Y overachieved..
A negative residual: Y underperformed the prediction. Y underachieved.
Information about residuals
1) Residuals represent variation in the Ys that is unrelated to the Xs. So the correlation
between the residuals and the Xs (and the Y-hats) is perfectly 0.
2) The regression analysis has extracted all the variation in Ys that is related to Xs and
embodied it in the predicted Ys. All the variation in Ys that remains is variation that is not
related to Xs.
3) This variation may be related to variables other than X. In fact, if the variance of the
residuals is large, this is an indication that there is variation in Y remaining to be predicted.
4) So large residuals in a simple regression will cause us to search for other predictors
of Y.
5) Residuals are said to represent the unique variation of Y with respect to X.
Partitioning the Variance of the Ys.
We often say that regression analysis divides the variation of the Ys into two components –
a) variation that is completely related to X – the variation of the Y-hats - and
b) variation that is completely independent of X – the variation of the residuals
Lecture 6 – Simple Regression - 9
5/8/2017
Summary Statistics for the whole sample
Coefficient of determination, R2 (It may often be symbolized as r2.)
It is interpreted as the percentage of variance of the Y’s which is linearly related to the X’s.
(Y-hat - Y-bar)2 / N
Coefficient of determination = -------------------------- =
(Y - Y-bar)2 / N
Variance of the Y-hats
---------------------------Variance of the Y's
Coefficient of determination ranges from 0 to 1.
0: Y is not related to X in a linear fashion..
1: Y is perfectly related to X.
The coefficient of determination, i.e., r2, is the most often used measure of goodness of fit of the
regression model.
Standard Deviation of the residuals
SY-Y-hat = (Y-Y-hat - 0)2 / n
Typically, it is written as
SY-Y-hat = (Y-Y-hat)2 / n since the mean of the residuals = 0.
Standard Error of Estimate
S-hatY-Y-hat = (Y-Y-hat-0)2 / (n-2) since the mean of the residuals = 0.
Standard error measures how much the points vary about the regression line.
Roughly, it’s a measure of how close we could expect an actual Y to be to its predicted Y.
A large Standard Error of Estimate means that prediction is poor.
A small Standard Error of Estimate means that the prediction equation is doing a good job.
If normal distribution assumptions are met, about 2/3 of Y’s will be within 1 SEE of Y-hat.
About 95% of Y’s will be within 2 SEE’s of Y-hat.
Talking about regression
We always regress the dependent variable onto the independent variable(s).
Lecture 6 – Simple Regression - 10
5/8/2017
Regression with Standardized Variables, Z-scores
If all X's and Y's are converted to their respective Z-scores . . .
Predicted ZY = r * ZX
Since r is invariably less than 1, this equation predicts regression to the mean. The distance of ZY
from its mean will be predicted to be less than the distance of ZX from its mean.
Identifying outliers and influential Cases
X-outlier
X
A cases whose X-value is way out in the upper or lower tail of the Xdistribution.
Compute ZX. Those values >= 2 in absolute value are suspect.
Regression outlier
A case whose residual is way out in the upper or lower tail of the
distribution of residuals.
A case whose Y is especially poorly predicted by the regression equation.
Compute ZY-Y-hat. Those values >= 2 in absolute value are suspect.
DFBETA
A measure of the extent to which a case affects a parameter of the
regression equation.
DFBETAa for a case = a computed from all cases minus a with the case excluded.
Measures how much a person’s presence in analysis affects a.
DFBETAb for a case = b computed from all cases minus b with the case excluded.
Measures how much a person’s presence in analysis affects b.
In each instance, DFBETA is the amount by which the parameter changed when the case was
included. That is, adding case i changed a (or b) by DFBETAa or b.
Large positive dfbetab
Large positive dfbetaa
On the left, the case represented by the small circle affects the y-intercept (a) but not the slope.
On the right, the case represented by the small circle primarily affects the slope (b).
Lecture 6 – Simple Regression - 11
5/8/2017
SPSS Worked Out Example
Prediction of P5130 scores from P5100/P5110 Scores
We’ll examine the extent to which P5130 scores can be predicted from P5100/P5110 scores. The
scores are proportion of total possible points in the course. The data below are real, gathered over
the past nearly 20 years.
Here’s a scatterplot of the relationship . . .
The scatterplot, along with SPSS’s best fitting straight line and the r2 value printed in the scatterplot
present much of what many data analysts would like to know about the situation.
It shows that the overall relationship is strong and positive.
But there is scatter about that relationship, enough to show that a person who does poorly in the
fall course doesn’t necessarily have to do as poorly in the spring course and that a person who does
well in the fall course won’t necessarily do as well in the spring course.
Lecture 6 – Simple Regression - 12
5/8/2017
Here are univariate statistics on each variable . . .
Statistics
p511g
N
Valid
p513g
358
358
0
0
Mean
.8763
.8751
Median
.8800
.8900
.07591
.08841
Missing
Std. Deviation
Both distributions are unimodel. The distribution of P513 scores is slightly more negatively skewed
than the P511g distribution.
Lecture 6 – Simple Regression - 13
5/8/2017
Simple regression analysis
Lecture 6 – Simple Regression - 14
5/8/2017
The Output
Syntax, if you’re interested . . .
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
Boilerplate syntax that represents defaults
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT p513g
/METHOD=ENTER p511g
/SCATTERPLOT=(*ZRESID ,*ZPRED)
/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).
Regression
[DataSet1] G:\MdbO\Dept\Validation\GRADSTUDENTS.sav
(P511yr < 2010)
Descriptive Statistics
Mean
Std. Deviation
N
p513g
.8751
.08841
358
p511g
.8763
.07591
358
Correlations
p513g
Pearson Correlation
Sig. (1-tailed)
N
p511g
p513g
1.000
.739
p511g
.739
1.000
p513g
.
.000
p511g
.000
.
p513g
358
358
p511g
358
358
Variables Entered/Removedb
Model
1
Variables
Variables
Entered
Removed
p511ga
Default output
Method
. Enter
a. All requested variables entered.
b. Dependent Variable: p513g
Lecture 6 – Simple Regression - 15
5/8/2017
Model Summaryb
Model
R
R Square
.739a
1
Adjusted R
Std. Error of the
Square
Estimate
.546
.545
Key
information.
.05964
a. Predictors: (Constant), p511g
Standard error of estimate
– variability of points
about the regression line.
b. Dependent Variable: p513g
ANOVAb
Model
1
Sum of Squares
df
Mean Square
Regression
1.524
1
1.524
Residual
1.266
356
.004
Total
2.790
357
F
Sig.
.000a
428.526
This is redundant
information for a simple
regression analysis –
redundant with the
information printed below.
a. Predictors: (Constant), p511g
b. Dependent Variable: p513g
Coefficientsa
Model
1
Unstandardized
Standardized
Coefficients
Coefficients
B
Std. Error
(Constant) a
.121
.037
p511g b
.861
.042
Beta
t
.739
Sig.
3.301
.001
20.701
.000
a. Dependent Variable: p513g
Residuals Statisticsa
Minimum
Predicted Value
Maximum
Mean
Std. Deviation
N
.6544
1.0418
.8751
.06534
358
-.19796
.22065
.00000
.05955
358
Std. Predicted Value
-3.377
2.551
.000
1.000
358
Std. Residual
-3.319
3.700
.000
.999
358
Residual
a. Dependent Variable: p513g
The prediction equation: Predicted P5130 scores = 0.121 + 0.861*P5111g.
So, if a student got the lowest possible A in 511g, 0.90, predicted P5130 score would be
.121+.861*.90 = .896 ~~ .9, also the lowest possible A.
A student with lowest possible B would be predicted to earn .121+.861*.8 = .81 in P5130.
Lecture 6 – Simple Regression - 16
5/8/2017
Key
information.
Charts
Should be a classic US distribution.
Should be a straight line.
This plot should be a classic zero correlation
scatterplot – a shotgun blast on a wall.
Lecture 6 – Simple Regression - 17
5/8/2017
Comparing observed performance to expected performance
The results of a regression analysis can be used to develop an expectation of a person’s
performance.
The person’s actual performance can then be compared to that expected performance.
Characterizing scores without any expectation
Consider the P5130 scores above. Below is a distribution of the scores with one specific student
identified.
The student’s score in P5130 was .92.
The Z-score corresponding to a .92 was
Z = (Y – Mean of Ys)/SD of Ys = (.92 - .8751)/.08841 =+ .51.
So the student was about ½ standard deviation above the mean in a pretty rigorous statistics course.
Not bad, huh?
This is a characterization of the score without any expectations for that student based on prior
knowledge.
Lecture 6 – Simple Regression - 18
5/8/2017
Characterizing scores with an expectation
But, let’s compute the student’s expected performance in P5130, based on the student’s P5100/5110
performance.
That student’s P5100/5110 score was 1.02. (I know this from access to the data set.)
Using the above regression equation, based on the P5100/5110 performance, the student was
expected to score: Y-hat = .121 + .861*.92 = 1.00 = Y-hat for this person.
The student’s actual Y value was below the predicted Y value. The different was -0.08 = residual.
This difference can be divided by the standard error of estimate to get a Z-score. The SEM is
.05964, from the Model Summary of the above regression.
So the student’s Z-score of his/her performance relative to what was expected was . . .
Z of residual =
Residual
-.08
------------------------------------ = --------------------- = -1.34
Standard Error of Residuals
.05964
The student did poorly relative to what he/she was expected to do based on p5110/5110
performance.
So the student’s performance was good, based on no information concerning expectation, but poor,
based on a refined expectation.
Lecture 6 – Simple Regression - 19
5/8/2017
Here’s the scatterplot, with that student’s 5110/5130 point highlighted.
Expected 5130
score
Actual 5130 score
What does “I did good.” Mean. Or what does, “I did poorly.” Mean.
This kind of analysis can be applied to every point on a scatterplot.
Those points near the best fitting straight line represent persons whose performance on the vertical
variable was about what was expected from consideration of their performance on the horizontal
variable.
But every point far from the best fitting straight line is weird, the farther they are from the bfsl,
the weirder. Some represent good weirdness – performance better than expected; others represent
bad weirdness - performance worse than expected.
This refined evaluation of performance can be carried out whenever you’ve developed an
expectation for performance based on a regression analysis.
Lecture 6 – Simple Regression - 20
5/8/2017
Examples of Simple Regression Analyses
1) Regression of College GPA onto ACTComp: Biderman, Nguyen, & Cunningham, Common
method variance in NEO-FFI and IPIP personality measurement. SIOP, 2009.
Regression
[DataSet1] G:\MdbR\1BiasStudy\BiasStudyLWEQ1_090216.sav
Descriptive Statistics
Mean
UGPA1 GPA obtained from records during sem of participation
Std. Deviation
N
3.1014
.76895
115
22.92
3.918
115
ACTComp
Correlations
UGPA1 GPA obtained from records
during sem of participation
Pearson Correlation
UGPA1 GPA obtained from records
during sem of participation
1.000
.526
.526
1.000
ACTComp
Model Summary
Model
R
R Square
.526a
1
Adjusted R
Square
.277
Std. Error of the
Estimate
.270
.65684
a. Predictors: (Constant), ACTComp
ANOVAb
Model
1
Sum of Squares
df
Mean Square
Regression
18.654
1
18.654
Residual
48.753
113
.431
Total
67.407
114
F
Sig.
.000a
43.236
a. Predictors: (Constant), ACTComp
b. Dependent Variable: UGPA1 GPA obtained from records during sem of participation
Coefficientsa
Unstandardized Coefficients
Model
1
B
Std. Error
(Constant)
.735
.365
ACTComp
.103
.016
Standardized
Coefficients
Beta
t
.526
Sig.
2.014
.046
6.575
.000
a. Dependent Variable: UGPA1 GPA obtained from records during sem of participation
Predicted Y = 0.735 + 0.103*ACTComp
So, for ACTComp = 20,
Predicted Y = 0.735 + .103*20 = 2.80
For ACTComp = 30,
Predicted Y = 0.735 + .103*30 = 3.82.
Lecture 6 – Simple Regression - 21
ACTComp
5/8/2017
2) Regression of Core First Year I/O Grades onto Formula Score
The dependent variable is a Z-score measure of performance in the 1st year of the I-O program.
The independent variable is a formula score, prformula, used to guide our selection of I-O students.
Mean prformula = 488.46; SD prformula = 54.60. Mean core1st = 0.002; SD core1st = 0.77.
Regression
Variables Entered/Removeda
Model
1
Variables Entered
Variables Removed
prformulab
Method
.
Enter
a. Dependent Variable: core1st
b. All requested variables entered.
Model Summary
Std. Error of the
Model
R
R Square
.543a
1
Adjusted R Square
.295
Estimate
.290
.65119
a. Predictors: (Constant), prformula
ANOVAa
Model
1
Sum of Squares
df
Mean Square
F
Regression
25.528
1
25.528
Residual
61.063
144
.424
Total
86.591
145
Sig.
60.202
.000b
a. Dependent Variable: core1st
b. Predictors: (Constant), prformula
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
prformula
a. Dependent Variable: core1st
Coefficients
Std. Error
Beta
-3.752
.487
.008
.001
t
.543
Sig.
-7.707
.000
7.759
.000
.007685
So Predicted Core1st = --3.752 + .007685*prformula.
For prformula = 400,
Predicted Core1st = -3.752 + .007685*400 = -.68,
about 1 SD below the mean
For prformula = 500,
Predicted Core1st = -3.752 + .007685*500 = + .09,
about average.
For prformula = 600,
Predicted Core1st = -3.752 + .007685*600 = + .0.85,
about 1 SD above the mean.
Lecture 6 – Simple Regression - 22
5/8/2017