Download Multiple Linear Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
5/24/2017
Multiple Linear Regression
Instructor: Ron S. Kenett
Email: [email protected]
Course Website: www.kpa.co.il/biostat
Course textbook: MODERN INDUSTRIAL STATISTICS,
Kenett and Zacks, Duxbury Press, 1998
(c) 2001, Ron S. Kenett, Ph.D.
1
5/24/2017
Course Syllabus
•Understanding Variability
•Variability in Several Dimensions
•Basic Models of Probability
•Sampling for Estimation of Population Quantities
•Parametric Statistical Inference
•Computer Intensive Techniques
•Multiple Linear Regression
•Statistical Process Control
•Design of Experiments
(c) 2001, Ron S. Kenett, Ph.D.
2
5/24/2017
Simple Linear Regression Model

Probabilistic Model:
yi = b0 + b1xi + ei
where yi = a value of the dependent variable, y
xi = a value of the independent variable, x
b0 = the y-intercept of the regression line
b1 = the slope of the regression line
ei = random error, the residual

Deterministic Model:
yˆ = b + b x
i
0
1 i
where b  b , b  b
0
0 1
1
and yˆi is the predicted value of y in contrast to the actual
value of y.
(c) 2001, Ron S. Kenett, Ph.D.
3
5/24/2017
Determining the Least Squares Regression Line

Least Squares Regression Line:
ŷ = b0 + b1x1


Slope
( x y ) – n x y
i i
b =
1
( x 2 ) – n x 2
i
y-intercept
b0 = y – b1x
(c) 2001, Ron S. Kenett, Ph.D.
4
5/24/2017
Interval Estimates Using the Regression Model

Confidence Interval for the Mean of y


places an upper and lower bound around
the point estimate for the average value
of y given x.
Prediction Interval for an Individual y

places an upper and lower bound around
the point estimate for an individual
value of y given x.
(c) 2001, Ron S. Kenett, Ph.D.
5
To Form Interval Estimates

5/24/2017
The Standard Error of the Estimate, sy,x

The standard deviation of the distribution of the




data points above and below the regression line,
distances between actual and predicted values of y,
residuals, of e
The square root of MSE given by ANOVA
s y,x =
(c) 2001, Ron S. Kenett, Ph.D.
 ( yi – yˆ)2
n–2
6
5/24/2017
Equations for the Interval Estimates

Confidence Interval for the Mean of y
2
ˆy ± ta (s y,x) 1n + (x value – x)
(  x )2
2
(  x 2) – ni
i

Prediction Interval for the Individual y
(x value –
+
ŷ ± ta (sy,x ) 1 + 1
n
(
2
( x 2 ) –
i
(c) 2001, Ron S. Kenett, Ph.D.
x )2
x )2
i
n
7
5/24/2017
Comparing the Two Intervals
Notice that the confidence interval for the
mean is much narrower than the prediction
interval for the individual value. There is
greater fluctuation among individual values
than among group means. Both are centered
y = 431.872
at the point estimate.
|
|
|
0
100
Confidence
Interval:
Prediction
Interval:
(c) 2001, Ron S. Kenett, Ph.D.
|
|
200
|
194.1
|
|
300
|
|
400
|
|
500
|
|
|
351.8
511.9
|
600
|
|
700
|
669.6
8
Coefficient of Correlation

5/24/2017
A measure of the

Direction of the linear relationship between
x and y.



If x and y are directly related, r > 0.
If x and y are inversely related, r < 0.
Strength of the linear relationship between
x and y.

The larger the absolute value of r, the more the
value of y depends in a linear way on the value of x.
(c) 2001, Ron S. Kenett, Ph.D.
9
5/24/2017
Coefficient of Determination

A measure of the

Strength of the linear relationship
between x and y.



The larger the value of r 2, the more the value
of y depends in a linear way on the value of x.
Amount of variation in y that is related to
variation in x.
Ratio of variation in y that is explained
by the regression model divided by the
total variation in y.
(c) 2001, Ron S. Kenett, Ph.D.
10
5/24/2017
Testing for Linearity
Key Argument:


If the value of y does not change linearly
with the value of x, then using the mean
value of y is the best predictor for the actual
value of y. This implies y = y is preferable.
If the value of y does change linearly with
the value of x, then using the regression
model gives a better prediction for the value
of y than using the mean of y. This implies
y = yˆ is preferable.
(c) 2001, Ron S. Kenett, Ph.D.
11
5/24/2017
Three Tests for Linearity

1. Testing the Coefficient of Correlation
H0: r = 0 There is no linear relationship between x and y.
H1: r ¹ 0 There is a linear relationship between x and y.
Test Statistic:

t =
r
1 – r2
n– 2
2. Testing the Slope of the Regression Line
H0: b1 = 0 There is no linear relationship between x and y.
H1: b1 ¹ 0 There is a linear relationship between x and y.
Test Statistic:
t=s
b
1
y,x
 x2  n( x )2
(c) 2001, Ron S. Kenett, Ph.D.
12
5/24/2017
Three Tests for Linearity

3. The Global F-test
H0: There is no linear relationship between x and y.
H1: There is a linear relationship between x and y.
Test Statistic:
SSR
1
F = MSR =
MSE
SSE
(n – 2)
Note: At the level of simple linear regression, the global F-test is
equivalent to the t-test on b1. When we conduct regression analysis
of multiple variables, the global F-test will take on a unique function.
(c) 2001, Ron S. Kenett, Ph.D.
13
A General Test of b1

5/24/2017
Testing the Slope of the Population Regression
Line Is Equal to a Specific Value.
H0: b1 = b10
The slope of the population regression line is b10.
H1: b1 b10
The slope of the population regression line is not b10.

Test Statistic:
t= s
(c) 2001, Ron S. Kenett, Ph.D.
b – b
1
10
y, x
 x 2 – n( x ) 2
14
5/24/2017
The Multiple Regression Model

Probabilistic Model
yi = b0 + b1x1i + b2x2i + ... + bkxki + ei
where yi = a value of the dependent variable, y
b0 = the y-intercept
x1i, x2i, ... , xki = individual values of the
independent variables, x1, x2, ... , xk
b1, b2 ,... , bk = the partial regression coefficients for the
independent variables, x1, x2, ... , xk
ei = random error, the residual
(c) 2001, Ron S. Kenett, Ph.D.
15
5/24/2017
The Multiple Regression Model

Sample Regression Equation
yˆ = b0 + b1x1i + b2x2i + ... + bkxki
i
where
yˆ = the predicted value of the dependent
i
variable, y, given the values of x1, x2, ... , xk
b0 = the y-intercept
x1i, x2i, ... , xki = individual values of the
independent variables, x1, x2, ... , xk
b1, b2, ... , bk = the partial regression coefficients
for the independent variables, x1, x2, ... , xk
(c) 2001, Ron S. Kenett, Ph.D.
16
5/24/2017
The Amount of Scatter in the Data

The multiple standard error of the estimate
 ( yi – ŷi)2
se =
n – k –1
where yi = each observed value of y in the data set
yˆ = the value of y that would have been
i
estimated from the regression
equation
n = the number of data values in the set
k = the number of independent (x) variables
measures the dispersion of the data points
around the regression hyperplane.
(c) 2001, Ron S. Kenett, Ph.D.
17
5/24/2017
Approximating a Confidence Interval for a Mean of y

A reasonable estimate for interval bounds on the conditional
mean of y given various x values is generated by:
ŷ ± t
s
e
n
where yˆ = the estimated value of y based on the
set of x values provided
t = critical t value, (1–a)% confidence, df = n – k – 1
se = the multiple standard error of the estimate
(c) 2001, Ron S. Kenett, Ph.D.
18
5/24/2017
Approximating a Prediction Interval for an Individual y Value

A reasonable estimate for interval bounds on an individual y
value given various x values is generated by:
ŷ ± t se
where
yˆ = the estimated value of y based on the
set of x values provided
t = critical t value, (1–a)% confidence, df = n – k – 1
se = the multiple standard error of the estimate
(c) 2001, Ron S. Kenett, Ph.D.
19
5/24/2017
Coefficient of Multiple Determination

The proportion of variance in y that is
explained by the multiple regression
equation is given by:
2
y
ŷ
(
–
)
S
2
SSE
SSR
i
i
R = 1–
= 1–
=
2
SST SST
S(y – y )
i
(c) 2001, Ron S. Kenett, Ph.D.
20
5/24/2017
Coefficients of Partial Determination

For each independent variable, the
coefficient of partial determination
denotes the proportion of total variation
in y that is explained by that one
independent variable alone, holding the
values of all other independent variables
constant. The coefficients are reported
on computer printouts.
(c) 2001, Ron S. Kenett, Ph.D.
21
5/24/2017
Testing the Overall Significance of the Multiple Regression Model

Is using the regression equation to predict y
better than using the mean of y ?
The Global F-Test
I. H0: b1 = b2 = ... = bk = 0
The mean of y is doing as good a job at predicting the
actual values of y as the regression equation.
H1: At least one bi does not equal 0.
The regression model is doing a better job of
predicting actual values of y than using the mean of y.
(c) 2001, Ron S. Kenett, Ph.D.
22
5/24/2017
Testing Model Significance
II. Rejection Region
Given a
and
numerator df = k,
denominator df = n – k – 1
Decision Rule: If F > critical value,
reject H0.
1a
(c) 2001, Ron S. Kenett, Ph.D.
a
23
5/24/2017
Testing Model Significance
III. Test Statistic
where
F =
SSR = SST – SSE
SSR k
SSE (n–k–1)
SST = S ( yi – y )2
SSE = S ( y – ŷ )2
i
If H0 is rejected:
• At least one bi differs from zero.
•The regression equation does a better job of predicting
the actual values of y than using the mean of y.
(c) 2001, Ron S. Kenett, Ph.D.
24
5/24/2017
Testing the Significance of a Single Regression Coefficient
Is the independent variable xi useful in predicting the actual
values of y ?
The Individual t-Test
I. H0: bi = 0

The dependent variable (y) does not depend on values of
the
independent variable xi. (This can, with reason, be structured as a one-
tail test instead.)
H1: bi
0
The dependent variable (y) does change with the values of the
independent variable xi.
(c) 2001, Ron S. Kenett, Ph.D.
25
5/24/2017
Testing the Impact on y of a Single Independent Variable
II. Rejection Region
Given a and df = n – k – 1
Decision Rule:
If t > critical value
or t < critical value,
reject H0.
(c) 2001, Ron S. Kenett, Ph.D.
Do Not
Reject H
Reject H
a/2
Reject H
1a
a/2
26
5/24/2017
Testing the Impact on y of a Single Independent Variable
III. Test Statistic
b – 0
t = is
b
i
where
bi = estimate for bi for the multiple
regression equation
s
b = the standard deviation of bi
i
If H0 is rejected:
• The dependent variable (y) does change with the
independent variable (xi).
(c) 2001, Ron S. Kenett, Ph.D.
27
Related documents