Download File - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Equation wikipedia , lookup

System of linear equations wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Transcript
Some Notes on Regression Analysis:


Another approach to forecasting and analysis of data is “ structural” which is different
from time series. In time series all the analysis and inferences is based on the
observations and we use past observations to forecast future ones. In structural models
we assume that the quantity of interest say “Y” that is referred to as response is a
function of a number of other variables X1 , X2 , X3 ,… that are called predictors. In other
words Y = f(X1 , X2 , X3 ,…). We observe values of X1 , X2 , X3 ,… and based on that make
a forecast about value of Y. For example, consider the case of sale (y) of a product. This
amount will depend on the price of the product (X1), the amount of advertisement done
for this product (X2), the price of competing brands (X3), etc. After observing these
quantities and by inputting them into our model we arrive at a forecast.
We start with the simple case of one predictor Y = f(X) and the simplest functional
relationship which is a linear one namely Y = a + b X. Using the observed data we try to
find the “best” linear relationship between X and Y. This is equivalent to finding the
“best” line that fits the data that have been observed. This model is called Simple
Linear Regression since there is only one predictor hence the word “simple” and the
relationship is taken to be linear hence the word linear. For illustration consider the
following case dealing with the number of books sold in a bookstore and the amount of
shelf space dedicated to books versus stationary, computers and other items:
Observation
Number of Books Sold (Y)
Meter of Shelf Space (X)
1
275
6.8
2
142
3.3
3
168
4.1
4
197
4.2
5
215
4.8
6
188
3.9
Following is the scatter diagram of the above data
300
250
200
150
100
50
0
0.00

1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
The question is: “ what is meant by the “best” line?”
We use the sum of squared of error as the criterion for choosing the best line.
Specifically, suppose the Y = a + b X for the above case. Following is the error squared
for each observation (suppose “a” and ‘b” are knows)
Observation
Squared Error
1
(275-a-b 6.8)2 = e12
2
(142-a-b 3.3) 2 = e22
3
(168-a-b 4.1) 2 = e32
4
(197-a-b 4.2) 2 = e42
5
(215-a- b 4.8) 2 = e52
6
(188-a-b 3.9) 2 = e62
Let Sum of Squared Error = SSE = ∑ (Yi - a - b Xi)2 = ∑ ei2 . The “optimal” choice of “a”
and “b” is such that SSE is minimized. To find these values of “a” and “b” SSE is differentiated
with respect to “a” and “b” and set equal to zero and the solution gives the best values for “a”
that is called the intercept and for “b” that is called the slope. Specifically we solve the
following equations:
∂SSE/∂a = -2∑ (Yi - a - b Xi) = 0
∂SSE/∂b = -2∑ Xi (Yi - a - b Xi) = 0

Following is the solution to the above equations and the resulting “a” and “b”
Let X̅ = ∑ Xi /n and Y̅ = ∑ Yi /n
b = ∑ (Xi - X̅ )( Yi - Y̅)/ ∑ (Xi - X̅ )2
a = Y̅ - b X̅


The basic idea of regression is that Y = a + b X + ei . The last term is called error and it is
the randomness that prevents all the observations to fall on a line perfectly. We make
the following assumptions about the errors “ei “
o E[ei] =0 i.e., the errors are not biased upward or downward and on average they
are zero. In other words the observations do not have a tendency to be above
the line or below the line and can fall on either side of the line.
o VAR[ei] =σ2 This means that the variability of error does not depend on the
value of the predictor X. For example, it implies that the magnitude of error does
not change with the value of X.
o Cov(ei , ei) = 0 this implies that there is no correlations between errors e.g., if we
underestimate at one point it has no bearing whether we underestimate or
overestimate y at another point.
o Later on we assume that ei ‘s are independent and identically distributed as
normal distribution with mean 0 and variance σ2 (conditions 1 and 2 above) and
in the case of normal distributions condition 3 implies that errors are
independent of each other.
Explaining Capability of Regression
o Variability is a source of uncertainty in forecasting. The more variable and
unpredictable the numbers are the more difficult it is to predict them. We define
sum of squares as the amount of variability of a set of numbers ( Dividing the
sum of squares by the number of observations is the sample variance). In the
above case we define the Total Sum of Squares as ∑ (Yi - Y̅)2 which is the
variability of the quantity of interest i.e., the response variables that we would
like to forecast.
o A variability that can be explained and predicted is not a source of uncertainty.
For example, consider the forecasted values of a regression namely ŷi = a+b xi
Though ŷi ‘s are variable but this variability can be explained completely since
they are points on the regression lines and knowing xi enables us to forecast ŷi
,the point on the regression line, with certainty and without error. It should also
be pointed out that the following identity always holds for regression ∑ Yi = ∑ ŷi .
Based on the above argument the following sum of squares ∑ (ŷi - Y̅)2 is quite
explainable and due to the fact that the points are on the regression line. This
sum is called Sum of Squares Due to Regression.
o It can be shown that
∑ (Yi - Y̅)2 = ∑ (ŷi - Y̅)2 + ∑ (Yi - ŷi )2
(Note that the last summation is Sum of Squared Error.) The above expression indicates
Total Sum of Squares = Sum of Squares Due to Regression + Sum of Squares about Regression
Recall that the first term of the right hand side of the above equation namely, Sum of Squares
Due to Regression is explainable and only Sum of Squared Errors i.e., ∑ (Yi - ŷi )2 cannot be
explained. Now the portion of the Total Variability that is explainable by regression is:
∑ (ŷi - Y̅)2 / ∑ (Yi - Y̅)2 = Sum of Squares Due to Regression / Total Sum of Squares = R2 and we
used this quantity to measure the effectiveness of the regression equation, i.e., the percentage
of variability of the response i.e., y that can be explained by the regression. It should be pointed
out that R2 = Correlation (X,Y)2