Download Simple Linear Regression

Document related concepts

Lasso (statistics) wikipedia , lookup

Forecasting wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Applied Business Forecasting
and Planning
Simple Linear Regression
Simple Regression



Simple regression analysis is a statistical tool That
gives us the ability to estimate the mathematical
relationship between a dependent variable (usually
called y) and an independent variable (usually
called x).
The dependent variable is the variable for which
we want to make a prediction.
While various non-linear forms may be used,
simple linear regression models are the most
common.
Introduction
• The primary goal of
quantitative analysis is to use
current information about a
phenomenon to predict its
future behavior.
• Current information is usually
in the form of a set of data.
• In a simple case, when the data
form a set of pairs of numbers,
we may interpret them as
representing the observed
values of an independent (or
predictor ) variable X and a
dependent ( or response)
variable Y.
lot size Man-hours
30
73
20
50
60
128
80
170
40
87
50
108
60
135
30
69
70
148
60
132
Introduction
The goal of the analyst
who studies the data is to
find a functional relation
y  f (x)
between the response
variable y and the
predictor variable x.
Statistical relation between Lot size and Man-Hour
180
160
140
120
Man-Hour

100
80
60
40
20
0
0
10
20
30
40
50
Lot size
60
70
80
90
Regression Function
The statement that the relation
between X and Y is statistical
should be interpreted as providing
the following guidelines:
1. Regard Y as a random variable.
2. For each X, take f (x) to be the
expected value (i.e., mean value) of
y.
3. Given that E (Y) denotes the
expected value of Y, call the
equation

E (Y )  f ( x)
the regression function.
Pictorial Presentation of Linear Regression
Model
Historical Origin of Regression


Regression Analysis was
first developed by Sir
Francis Galton, who
studied the relation
between heights of sons
and fathers.
Heights of sons of both
tall and short fathers
appeared to “revert” or
“regress” to the mean of
the group.
Construction of Regression Models
Selection of independent variables

•
Since reality must be reduced to manageable proportions
whenever we construct models, only a limited number of
independent or predictor variables can or should be included in a
regression model. Therefore a central problem is that of
choosing the most important predictor variables.
Functional form of regression relation

•
Sometimes, relevant theory may indicate the appropriate
functional form. More frequently, however, the functional form
is not known in advance and must be decided once the data have
been collected and analyzed.
Scope of model


In formulating a regression model, we usually need to restrict
the coverage of model to some interval or region of values of the
independent variables.
Uses of Regression Analysis


Regression analysis serves Three major
purposes.
1. Description
2. Control
3. Prediction
The several purposes of regression analysis
frequently overlap in practice
Formal Statement of the Model

General regression model
Y   0  1 X  
1.
0, and 1 are parameters
2.
X is a known constant
3.
Deviations  are independent N(o, 2)
Meaning of Regression Coefficients


The values of the regression parameters 0,
and 1 are not known.We estimate them
from data.
1 indicates the change in the mean
response per unit increase in X.
Regression Line

If the scatter plot of our sample data
suggests a linear relationship between two
variables i.e.
y   0  1 x

we can summarize the relationship by
drawing a straight line on the plot.
Least squares method give us the “best”
estimated line for our set of sample data.
Regression Line

We will write an estimated regression line
based on sample data as
yˆ  b0  b1 x

The method of least squares chooses the
values for b0, and b1 to minimize the sum of
squared errors
n
n
i 1
i 1
2
SSE   ( yi  yˆ i ) 2   y  b0  b1 x 
Regression Line

Using calculus, we obtain estimating
formulas:
n
b1 
or
 (x
i
i 1
 x )( yi  y )
n
 (x
i 1
i
 x )2
b1  r
n

n
n
n xi yi   xi  yi
i 1
n
i 1
n
i 1
n xi2  ( xi ) 2
i 1
Sy
Sx
b0  y  b1 x
i 1
Estimation of Mean Response


Fitted regression line can be used to estimate the
mean value of y for a given value of x.
Example

The weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.
y
x
1250
41
1380
54
1425
63
1425
54
1450
48
1300
46
1400
62
1510
61
1575
64
1650
71
Point Estimation of Mean Response

From previous table we have:
 x  564
 x  32604
 y  14365
 xy  818755
n  10

2
The least squares estimates of the regression
coefficients are:
b1 
n xy   x y
n x 2  (  x ) 2

10(818755)  (564)(14365)
 10.8
10(32604)  (564) 2
b0  1436.5  10.8(56.4)  828
Point Estimation of Mean Response

The estimated regression function is:
ŷ  828  10.8x
Sales  828  10.8 Expenditur e

This means that if the weekly advertising
expenditure is increased by $1 we would expect
the weekly sales to increase by $10.8.
Point Estimation of Mean Response


Fitted values for the sample data are
obtained by substituting the x value into the
estimated regression function.
For example if the advertising expenditure
is $50, then the estimated Sales is:
Sales  828  10.8(50)  1368

This is called the point estimate (forecast)
of the mean response (sales).
Example:Retail sales and floor space

It is customary in retail operations to asses the
performance of stores partly in terms of their
annual sales relative to their floor area (square
feet). We might expect sales to increase linearly as
stores get larger, with of course individual
variation among stores of the same size. The
regression model for a population of stores says
that
SALES = 0 + 1 AREA + 
Example:Retail sales and floor space



The slope 1 is as usual a rate of change: it is the
expected increase in annual sales associated with
each additional square foot of floor space.
The intercept 0 is needed to describe the line but
has no statistical importance because no stores
have area close to zero.
Floor space does not completely determine sales.
The term  in the model accounts for difference
among individual stores with the same floor space.
A store’s location, for example, is important.
Residual

The difference between the observed value
yi and the corresponding fitted value ŷi .
ˆi
ei  yi  y

Residuals are highly useful for studying
whether a given regression model is
appropriate for the data at hand.
Example: weekly advertising expenditure
y
1250
1380
1425
1425
1450
1300
1400
1510
1575
1650
x
41
54
63
54
48
46
62
61
64
71
y-hat
1270.8
1411.2
1508.4
1411.2
1346.4
1324.8
1497.6
1486.8
1519.2
1594.8
Residual (e)
-20.8
-31.2
-83.4
13.8
103.6
-24.8
-97.6
23.2
55.8
55.2
Estimation of the variance of the error
terms, 2

The variance 2 of the error terms i in the
regression model needs to be estimated for a
variety of purposes.


It gives an indication of the variability of the
probability distributions of y.
It is needed for making inference concerning
regression function and the prediction of y.
Regression Standard Error


To estimate  we work with the variance and take
the square root to obtain the standard deviation.
For simple linear regression the estimate of 2 is
the average squared residual.
s y. x 
2


1
1
2
2
ˆ
e

(
y

y
)
 i n2 i i
n2
To estimate  , use
s estimates the standard deviation  of the error
term  in the statistical model for simple linear
regression.
s y. x  s y. x
2
Regression Standard Error
y
x
y-hat
Residual (e)
1250
41
1270.8
-20.8
432.64
1380
54
1411.2
-31.2
973.44
1425
63
1508.4
-83.4
6955.56
1425
54
1411.2
13.8
190.44
1450
48
1346.4
103.6
10732.96
1300
46
1324.8
-24.8
615.04
1400
62
1497.6
-97.6
9525.76
1510
61
1486.8
23.2
538.24
1575
64
1519.2
55.8
3113.64
1650
71
1594.8
55.2
3047.04
y-hat = 828+10.8X
square(e)
total
36124.76
S y .x
67.19818
Basic Assumptions of a Regression Model

A regression model is based on the following
assumptions:
1.
There is a probability distribution of Y for each
level of X.
2.
Given that µy is the mean value of Y, the
standard form of the model is
 y  f (x)  
where  is a random variable with a normal
distribution with mean 0 and standard deviation .
Conditions for Regression Inference


You can fit a least-squares line to any set of
explanatory-response data when both variables are
quantitative.
If the scatter plot doesn’t show an approximately
linear pattern, the fitted line may be almost
useless.
Conditions for Regression Inference



The simple linear regression model, which
is the basis for inference, imposes several
conditions.
We should verify these conditions before
proceeding with inference.
The conditions concern the population, but
we can observe only our sample.
Conditions for Regression Inference

In doing Inference, we assume:
1.
The sample is an SRS from the population.
2.
There is a linear relationship in the population.
1.
We can not observe the population , so we check
the scatter plot of the sample data.
3.
The standard deviation of the responses about the
population line is the same for all values of the
explanatory variable.
1.
The spread of observations above and below the
least-squares line should be roughly uniform as x
varies.
Conditions for Regression Inference

Plotting the residuals against the
explanatory variable is helpful in checking
these conditions because a residual plot
magnifies patterns.
Analysis of Residual


To examine whether the regression model is
appropriate for the data being analyzed, we can
check the residual plots.
Residual plots are:




Plot a histogram of the residuals
Plot residuals against the fitted values.
Plot residuals against the independent variable.
Plot residuals over time if the data are chronological.
Analysis of Residual



A histogram of the residuals provides a check on
the normality assumption. A Normal quantile plot
of the residuals can also be used to check the
Normality assumptions.
Regression Inference is robust against moderate
lack of Normality. On the other hand, outliers and
influential observations can invalidate the results
of inference for regression
Plot of residuals against fitted values or the
independent variable can be used to check the
assumption of constant variance and the aptness
of the model.
Analysis of Residual


Plot of residuals against time provides a
check on the independence of the error
terms assumption.
Assumption of independence is the most
critical one.
Residual plots

The residuals should
have no systematic
pattern.
The residual plot to
right shows a scatter
of the points with no
individual
observations or
systematic change as x
increases.
Degree Days Residual Plot
1
Residuals

0.5
0
0
20
40
-0.5
-1
Degree Days
60
Residual plots

The points in this
residual plot have a
curve pattern, so a
straight line fits poorly
Residual plots

The points in this plot
show more spread for
larger values of the
explanatory variable x,
so prediction will be
less accurate when x is
large.
Variable transformations



If the residual plot suggests that the variance is not
constant, a transformation can be used to stabilize
the variance.
If the residual plot suggests a non linear
relationship between x and y, a transformation
may reduce it to one that is approximately linear.
Common linearizing transformations are:
1
, log( x)
x

Variance stabilizing transformations are:
1
, log( y ),
y
y,
y2
Inference about the Regression Model


When a scatter plot shows a linear
relationship between a quantitative
explanatory variable x and a quantitative
response variable y, we can use the least
square line fitted to the data to predict y for
a give value of x.
Now we want to do tests and confidence
intervals in this setting.
Inference about the Regression Model

We think of the least square line we
calculated from a sample as an estimate of a
regression line for the population.

Just as the sample mean x is an estimate of the
population mean µ.
Inference about the Regression Model

We will write the population regression line as
 0  1 x


The numbers  0 and 1 are parameters that describe the
population.
We will write the least-squares line fitted to
sample data as
b0  b1 x

This notation reminds us that the intercept b0 of the
fitted line estimates the intercept 0 of the population
line, and the slope b1 estimates the slope 1 .
Confidence Intervals and Significance
Tests



In our previous lectures we presented confidence intervals
and significance tests for means and differences in
means.In each case, inference rested on the standard error s
of the estimates and on t or z distributions.
Inference for the slope and intercept in linear regression is
similar in principal, although the recipes are more
complicated.
All confidence intervals, for example , have the form


estimate  t* Seestimate
t* is a critical value of a t distribution.
Confidence Intervals and Significance
Tests


Confidence intervals and tests for the slope and
intercept are based on the sampling distributions
of the estimates b1 and b0.
Here are the facts:


If the simple linear regression model is true, each of b0
and b1 has a Normal distribution.
The mean of b0 is 0 and the mean of b1 is 1.

That is, the intercept and slope of the fitted line are unbiased
estimators of the intercept and slope of the population
regression line.
Confidence Intervals and Significance Tests

The standard deviations of b0 and b1 are multiples of the
model standard deviation .
SEb1  S (b1 ) 
s
 (x  x)
1
SEb0  S (b0 )  s
n

2
x2
n
2
(
x

x
)
 i
i 1
Confidence Intervals and Significance
Tests
Example:Weekly Advertising Expenditure

Let us return to the Weekly advertising
expenditure and weekly sales example.
Management is interested in testing whether
or not there is a linear association between
advertising expenditure and weekly sales,
using regression model. Use  = .05
Example:Weekly Advertising Expenditure

Hypothesis:
H0 :
Ha :

1  0
1  0
Decision Rule:
Reject H0 if t  t.025;8  t  2.306
or
t  t.025;8  t  2.306
Example:Weekly Advertising Expenditure

Test statistic:
t 
S (b1 ) 
b1
S (b1 )
S y. x
 (x  x)
2

67.2
 2.38
794.4
b1  10.8
t 
10.8
 4. 5
2.38
Example:Weekly Advertising Expenditure

Conclusion:
Since t =4.5 > 2.306 then we reject H0.
There is a linear association between
advertising expenditure and weekly sales.
Confidence interval for 1
b1  t

(

2
; n 2 )
( S (b1 ))
Now that our test showed that there is a
linear association between advertising
expenditure and weekly sales, the
management wishes an estimate of 1 with a
95% confidence coefficient.
Confidence interval for 1


For a 95 percent confidence coefficient, we
require t (.025; 8). From table B in appendix
III, we find t(.025; 8) = 2.306.
The 95% confidence interval is:
b1  t
(

2
; n2)
( S (b1 ))
10.8  2.306(2.38)
10.8  5.49  (5.31, 16.3)
Example: Do wages rise with experience?

Many factors affect the wages of workers: the industry
they work in, their type of job, their education and their
experience, and changes in general levels of wages. We
will look at a sample of 59 married women who hold
customer service jobs in Indiana banks. The following
table gives their weekly wages at a specific point in time
also their length of service with their employer, in month.
The size of the place of work is recorded simply as “large”
(100 or more workers) or “small.” Because industry, job
type, and the time of measurement are the same for all 59
subjects, we expect to see a clear relationship between
wages and length of service.
Example: Do wages rise with experience?
Example: Do wages rise with experience?
Example: Do wages rise with experience?

From previous table we have:
 x  4159
 x  451031
 y  23069  y  9460467  xy  1719376
n  59
2
2

The least squares estimates of the regression
coefficients are:
b1  r
sy
sx

b0  y  bx 
Example: Do wages rise with experience?




What is the least-squares regression line for
predicting Wages from Los?
Suppose a woman has been with her bank for 125
months. What do you predict she will earn?
If her actual wages are $433, then what is her
residual?
The sum of squared residuals for the entire sample
is
 ( y  yˆ )  385453.641
59
2
i 1
i
i
Example: Do wages rise with experience?

Do wages rise with experience?

The hypotheses are:
H0: 1 = 0,

Ha: 1 > 0
The test statistics
t

The P- value is:

Conclusion:
b1

SEb1
Example: Do wages rise with experience?

A 95% confidence interval for the average
increase in wages per month of stay for the
regression line in the population of all married
female customer service workers in Indiana bank
is
b1  t * SEb1 

The t distribution for this problem has n-2 = 57
degrees of freedom
Example: Do wages rise with experience?


Regression calculations in Practice are
always done by software.
The computer out put for the case study is
given in the following slide.
Example: Do wages rise with experience?
Using the regression Line


One of the most common reasons to fit a
line to data is to predict the response to a
particular value of the explanatory variable.
In our example, the least square line for
predicting the weekly earnings for female
bank customer service workers from their
length of service is
yˆ  349.4  .5905 x
Using the regression Line

For a length of service of 125 months, our leastsquares regression equation gives
yˆ  349.4  (.5905)(125)  $423 per week

There are two different uses of this prediction.


We can estimate the mean earnings of all workers in the
subpopulation of workers with 125 months on the job.
We can predict the earnings of one individual worker
with 125 months of service.
Using the regression Line

For each use, the actual prediction is the same,yˆ  $423
.But the margin of error is different for the two
cases.

To estimate the mean response, we use a confidence
 y  0  1 x*
interval.
 To estimate an individual response y, we use
prediction interval.

A prediction interval estimates a single random response y
rather than a parameter like µy
Using the regression Line


The main distinction is that it is harder to
predict for an individual than for the mean
of a population of individuals.
Each interval has the usual form

yˆ  t * SE
The margin of error for the prediction interval
is wider than the margin of error for the
confidence interval.
Using the regression Line

The standard error for estimating the mean
response when the explanatory variable x
takes the value x* is:
Using the regression Line

The standard error for predicting an
individual response when the explanatory
variable x takes the value x* is:
Prediction of a new response ( ŷ )


We now consider the prediction of a new
observation y corresponding to a given level
x of the independent variable.
In our advertising expenditure and weekly
sales, the management wishes to predict the
weekly sales corresponding to the
advertising expenditure of x = $50.
Interval Estimation of a new response (ŷ )


The following formula gives us the point estimator
(forecast) for y.
yˆ  b0  b1 x
1- % prediction interval for a new observation ŷ
is:
yˆ  t 
(S f )
(

Where
S f  S y. x
2
; n2)
1
( x  x )2
1 
n  ( x  x )2
Example

In our advertising expenditure and weekly sales,
the management wishes to predict the weekly sales
if the advertising expenditure is $50 with a 90 %
prediction interval.
yˆ  828  10.8(50)  1368
S f  S y. x
1
( x  x )2
1 
n  ( x  x )2
1 (50  56.4) 2
S f  67.2 1  
 72.11
10
794.4

We require t(.05; 8) = 1.860
Example

The 90% prediction interval is:
yˆ  t(.05;8) ( S f )
1368  1.860(72.11)
(1233.9, 1502.1)
Analysis of variance approach to Regression
analysis



Analysis of Variance is the term for statistical
analyses that break down the variation in data into
separate pieces that correspond to different
sources of variation.
It is based on the partitioning of sums of squares
and degrees of freedom associated with the
response variable.
In the regression setting, the observed variation in
the responses (yi) comes from two sources.
Analysis of variance approach to Regression
analysis

Consider the weekly advertising
expenditure and the weekly sales example.
There is variation in the amount ($) of
weekly sales, as in all statistical data. The
variation of the yi is conventionally
measured in terms of the deviations:
yi  y
Analysis of variance approach to Regression
analysis

The measure of total variation, denoted by SST, is the sum
of the squared deviations:
SST   ( yi  y)2



If SST = 0, all observations are the same(No variability).
The greater is SST, the greater is the variation among the y
values.
When we use the regression model, the measure of
variation is that of the y observations variability around the
fitted line:
ˆi
yi  y
Analysis of variance approach to Regression
analysis

The measure of variation in the data around the
fitted regression line is the sum of squared
deviations (error), denoted SSE:
SSE   ( yi  yˆi )2


For our Weekly expenditure example
SSE = 36124.76
SST = 128552.5
What accounts for the substantial difference
between these two sums of squares?
Analysis of variance approach to Regression
analysis

The difference is another sum of squares:
SSR   ( yˆi  y ) 2



SSR stands for regression sum of squares.
SSR is the variation among the predicted
responses ŷi . The predicted responses lie on the
least-square line. They show how y moves in
response to x.
The larger is SSR relative to SST, the greater is
the role of regression line in explaining the total
variability in y observations.
Analysis of variance approach to Regression
analysis

In our example:
SSR  SST  SSE  128552.5  36124.76  92427.74

This indicates that most of variability in
weekly sales can be explained by the
relation between the weekly advertising
expenditure and the weekly sales.
Formal Development of the Partitioning


We can decompose the total variability in the
observations yi as follows:
yi  y  yˆi  y  yi  yˆi
The total deviation yi  y can be viewed as the
sum of two components:
 The deviation of the fitted value ŷi around the mean
y.

The deviation of yi around the fitted regression line.
Formal Development of the Partitioning

Skipping quite a bit of messy algebra, we
just state that this analysis of variance
equation always holds:
 ( y  y)   ( yˆ  y)   ( y  yˆ )
2
i

2
i
2
i
i
Breakdown of degree of freedom:
n  1  1  (n  2)
Mean squares


A sum of squares divided by its degrees of
freedom is called a mean square (MS)
Regression mean square (MSR)
SSR
MSR 
1

Error mean square (MSE)
MSE 

SSE
n2
Note: mean squares are not additive.
Mean squares

In our example:
MSR 
SSR 92427.74

 92427.74
1
1
SSE 36124.76
MSE 

 4515.6
n2
8
Analysis of Variance Table

The breakdowns of the total sum of squares
and associated degrees of freedom are
displayed in a table called analysis of
variance table (ANOVA table)
Source of
Variation
SS
df
MS
F-Test
Regression
SSR
1
MSR
=SSR/1
MSR/MSE
Error
SSE
n-2
MSE
=SSE/(n-2)
Total
SST
n-1
Analysis of Variance Table

In our weekly advertising expenditure and
weekly sales example the ANOVA table is:
Source of
variation
SS
df
MS
Regression
92427.74
1
92427.74
Error
36124.76
8
4515.6
Total
128552.5
9
Analysis of Variance Table



The Analysis of Variance table reports in a
different way quantities such as r2 and s
that are needed in regression analysis.
It also reports in a different way the test for
the overall significance of the regression.
If regression on x has no value for
predicting y, we expect the slope of the
population regression line to be close to 0.
Analysis of Variance Table


That is the null hypothesis of “no linear
relationship” is:
H 0 : 1  0
We standardize the slope of the leastsquares line to get a t statistic.
F-Test for 1= 0 versus 1 0



The analysis of variance approach starts with sums
of squares.
If regression on x has no value for predicting y,
we expect the SSR to be only a small part of the
SST, most of which will be made of the SSE.
The proper way to standardize this comparison is
to use the ratio
F
MSR
MSE
F-Test for 1= 0 versus 1 0



In order to be able to construct a statistical
decision rule, we need to know the
distribution of our test statistic F.
When H0 is true, our test statistic, F, follows
the F- distribution with 1, and n-2 degrees
of freedom.
Table C-5 on page 513 of your text gives
the critical values of the F-distribution at 
= 0.05 and .01.
F-Test for 1= 0 versus 1 0

Construction of decision rule:


At  = 5% level
Reject H0 if
F  F ( ;1, n  2)

Large values of F support Ha and Values of
F near 1 support H0.
F-Test for 1= 0 versus 1 0


Using our example again, let us repeat the earlier
test on 1. This time we will use the F-test. The
null and alternative hypothesis are:
H 0 : 1  0
H a : 1  0
Let  = .05. Since n=10, we require F(.05; 1, 8).
From table 5-3 we find that F(.05; 1, 8) = 5.32.
Therefore the decision rule is:

Reject H0 if:
F  5.32
F-Test for 1= 0 versus 1 0


From ANOVA table we have
MSR = 92427.74
MSE = 4515.6
Our test statistic F is:
F

Decision:

92427.74
 20.47
4515.6
Since 20.47> 5.32, we reject H0, that is there is a linear
association between weekly advertising expenditure
and weekly sales.
F-Test for 1= 0 versus 1 0

Equivalence of F Test and t Test:


For given  level, the F test of 1 = 0 versus
1  0 is equivalent algebraically to the two sided ttest.
Thus, at a given level, we can use either the t-test
or the F-test for testing 1 = 0 versus
1  0.

The t-test is more flexible since it can be used for
one sided test as well.
Analysis of Variance Table

The complete ANOVA table for our
example is:
Source of
Variation
SS
df
MS
F-Test
Regression
92427.74
1
92427.74
20.47
Error
36124.76
8
4515.6
Total
128552.5
9
Computer Output

The EXCEL out put for our example is:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.847950033
R Square
0.719019259
Adjusted R Square
0.683896667
Standard Error
67.19447214
Observations
10
ANOVA
df
SS
MS
Regression
1
92431.72331
92431.72
Residual
8
36120.77669
4515.097
Total
9
128552.5
Coefficients
Intercept
AD-Expen (X)
Standard Error
t Stat
F
20.4717
P-value
Significance F
0.0019382
Lower 95%
Upper 95%
828.1268882
136.1285978
6.083416
0.000295 514.2135758
1142.0402
10.7867573
2.384042146
4.524567
0.001938 5.289142698 16.2843719
Coefficient of Determination



Recall that SST measures the total variations in yi
when no account of the independent variable x is
taken.
SSE measures the variation in the yi when a
regression model with the independent variable x
is used.
A natural measure of the effect of x in reducing
the variation in y can be defined as:
SST  SSE SSR
SSE
R 

 1
SST
SST
SST
2
Coefficient of Determination


R2 is called the coefficient of determination.
0  SSE  SST, it follows that:
0  R2  1


We may interpret R2 as the proportionate
reduction of total variability in y associated with
the use of the independent variable x.
The larger is R2, the more is the total variation of y
reduced by including the variable x in the model.
Coefficient of Determination




If all the observations fall on the fitted regression
line, SSE = 0 and R2 = 1.
If the slope of the fitted regression line
b1 = 0 so that yˆ i  y, SSE=SST and R2 = 0.
The closer R2 is to 1, the greater is said to be the
degree of linear association between x and y.
The square root of R2 is called the coefficient of
correlation.
r   R2
Correlation Coefficient

Recall that the algebraic expression for the
correlation coefficient is.
r
r
 ( x  x )( y  y )
( x  x )2 ( y  y)2
n xy   x y
n x 2  ( x ) 2 n  y 2  ( y ) 2