Download chapter 5 - Eng

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
July 2013
Chapter 17
Least-Square Regression
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Where substantial error is associated with data, polynomial
interpolation is inappropriate and may yield unsatisfactory results when
used to predict intermediate values. Experimentally data is often of this
type. For example, the following figure (a) shows seven experimentally
derived data points showing significant variability. The data indicates that
higher values of y are associated with higher values of x.
Now, if a sixth-order interpolating polynomial is fitted to this data (fig b), it
will pass exactly through all of the points. However, because of the
variability in the data, the curve oscillates widely in the interval between
the points. In particular, the interpolated values at x = 1.5 and x = 6.5
appear to be well beyond the range suggested by the data.
A more appropriate strategy is to derive an approximating function
that fits the shape . Fig (c) illustrates how a straight line can be used to
generally characterize the trend of the data without passing through any
particular point.
One way to determine the line in figure (c) is to look at the plotted
data and then sketch a “best” line through the points. Such approaches
are not enough because they are arbitrary. That is, unless the points
define a perfect straight line (in which case, interpolation would be
appropriate), different analysis would draw different lines.
‫قااااد‬
‫مااااا يألاااا أرلبنااااا نظاااارة أنااااا‬
‫يكادون يهتمون بنا على اإلطالق‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
To avoid this, some criterion must be devised to establish a basis for
the fit. One way to do this is to derive a curve that minimizes the
discrepancy between the data points
and the curve. One technique for doing
this is called least-squares regression.
‫إماااا أن تتعااال لتحصااال علاااى ماااا‬
‫تحااالو أو ساااتجبر نفسااا علاااى‬
‫حل ما تحصل عليه‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.1 Linear Regression
The simple example of a least-squares approximation is fitting a
straight line to a set of paired observations: (x1,y1), (x2,y2), …, (xn,yn).
The mathematical expression for the straight line is
y = a0 + a 1 x + e
where a0 and a1 are coefficients representing the intercept and the slope,
respectively, and e is the error between the model and the observations,
which can be represented by rearranging the previous equation as
e = y – a0 – a1 x
thus, the error is the discrepancy between the true value of y and the
approximate value, a0 + a1x, predicted by the linear equation.
17.1.1 Criteria for the “best” fit
One strategy for fitting a “best” line through the data would be to
minimize the sum of the residual errors for all the available data, as in
∑𝑛𝑖=1 𝑒𝑖 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )
where n = total number of points. However, this is an inadequate criterion,
as illustrated by the next figure, which shows the fit of a straight line to
two points.
‫قدرات كل شخص حسل‬
‫الحدود التي يضعها لنفسه‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Obviously,
the
best
fit
is
the
line
connecting
the
points.
However, any straight line passing through the midpoint of the connecting
line results in a minimum value of the previous equation equal to zero
because the errors cancel.
‫كااس سااعادت عناادما تحأا إنجاااتات‬
‫يعتأد النا إن حتما تعجر عنها‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Therefore, another logical criterion might be to minimize the sum of the
absolute values of the discrepancies, as in
∑𝑛𝑖=1|𝑒𝑖 | = ∑𝑛𝑖=1|𝑦𝑖 − 𝑎𝑜 − 𝑎1 𝑥𝑖 |
The previous fig (b) demonstrates why this criterion is also inadequate.
For the four points shown, any straight line falling within the dashed lines
will minimize the sum of the absolute values. Thus, this criterion also does
not yield a unique best fit.
A third strategy for fitting a best line is the minimax criterion.
In this technique, the line is chosen that minimizes the maximum distance
that an individual point falls from the line. As shown in previous fig (c), this
strategy is ill-suited for regression because it gives big effect to an outlier,
that is, a single point with a large error.
A strategy that overcomes the shortcomings of the previous
approaches is to minimize the sum of the squares of the residuals between
the measured y and the y calculated with the linear model.
𝑛
𝑛
𝑛
𝑆𝑟 = ∑ 𝑒𝑖2 = ∑(𝑦𝑖,𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 − 𝑦𝑖,𝑚𝑜𝑑𝑒𝑙 )2 = ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )2
𝑖=1
𝑖=1
𝑖=1
This criterion has a number of advantages, including the fact that it yields
a unique line for a give set of data.
‫لعله من عجائل الحياةو إن إذا رفضت كل‬
‫ما هو دون الأمةو فإن ستحصل عليه‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.1.2 Least-Squares fit of a straight line
To determine values of a0 and a1, the previous equation is
differentiated with respect to each coefficient:
𝜕𝑆𝑟
𝜕𝑎0
𝜕𝑆𝑟
𝜕𝑎1
= −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )
= −2 ∑[(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )𝑥𝑖 ]
Note that we have simplified the summation symbols; unless otherwise
indicated, all summation are from i = 1 to n. Setting these derivatives
equal to zero will result in a minimum Sr.
0 = ∑ 𝑦𝑖 − ∑ 𝑎0 − ∑ 𝑎1 𝑥𝑖
0 = ∑ 𝑦𝑖 𝑥𝑖 − ∑ 𝑎0 𝑥𝑖 − ∑ 𝑎1 𝑥𝑖2
Now, realizing that ∑ 𝑎0 = na0, we can express the equations as a set of
two simultaneous linear equations with two unknowns (a0 and a1):
𝑛𝑎0 + (∑ 𝑥𝑖 )𝑎1 = ∑ 𝑦𝑖
(17.4)
(∑ 𝑥𝑖 )𝑎0 + (∑ 𝑥𝑖2 )𝑎1 = ∑ 𝑥𝑖 𝑦𝑖
These are called the normal equations. They can be solved simultaneously
𝑎1 =
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥 𝑖 ∑ 𝑦𝑖
𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2
This result can then be used in conjunction with Eq. (17.4) to solve for
𝑎0 = 𝑦̅ − 𝑎1 𝑥̅
where 𝑦̅ and 𝑥̅ are the means of y and x, respectively.
‫ماااااا يمكااااان تخيلاااااه يمكااااان‬
‫تحأيأااهو ومااا يمكاان تحأيأااه‬
‫لن نعدم طريأا للوصول إليه‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Example 17.1 Linear Regression
Problem Statement:
Fit a straight line to the x and y values in the first two columns of the next
table
Solution:
The following quantities can be computed
∑ 𝑥𝑖 𝑦𝑖 = 119.5
n=7
∑ 𝑥𝑖 = 28
𝑥̅ =
∑ 𝑦𝑖 = 24
𝑦̅ =
28
7
24
7
∑ 𝑥𝑖2 = 140
=4
= 3.428571
Using the previous two equations,
𝑎1 =
7(119.5)− 28(24)
7(140)− (28)2
= 0.8392857
𝑎0 = 3.428571 − 0.8392857(4) = 0.07142857
Therefore, the least-square fit is
𝑦 = 0.07142857 + 0.8392857𝑥
The line, along with the data, is shown in the first figure (c).
‫احذر عدوك مرة وصديأ ألف مرة فإن‬
‫انألل الصدي فهو أعلس بالمضرة‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.1.3 Quantification of Error of Linear Regression
Any line other than the one computed in the previous example results
in a larger sum of the squares of the residuals. Thus, the line is unique and
in terms of our chosen criterion is a “best” line through the points.
A number of additional properties of this fit can be explained by examining
more
closely
the
way
in
which
residuals
were
computed.
Recall that the sum of the squares is defined as
𝑆𝑟 = ∑𝑛𝑖=1 𝑒𝑖2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 )2
Notice the similarity between the previous equation and
𝑆𝑡 = ∑(𝑦𝑖 − 𝑦̅)2
The similarity can be extended further for cases where (1) the spread of
the points around the line is of similar magnitude along the entire range of
the data and (2) the distribution of these points about the line is normal.
It can be demonstrated that if these criteria are met, least-square
regression will provide the best (that is, the most likely) estimates of a 0
and a1.
In addition, if these criteria are met, a “standard deviation” of the
regression line can be determined as
𝑠𝑦⁄𝑥 = √
𝑆𝑟
𝑛−2
‫أفضل وسيلة للبار بالوعاد أن‬
)‫تعد (نابليون بونابارت‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
where 𝑠𝑦⁄𝑥 is called the standard error of the estimate. The subscript
notation “𝑦⁄𝑥 ” designates that the error is for a predicted value of y
corresponding to a particular value of x.
Also, notice that we now divide by n-2 because two data derived estimates
– a0 and a1 – were used to compute Sr; thus, we have lost two degrees of
freedom.
Another justification for dividing by n-2 is that there is no such thing as the
“spread of data” around a straight line connecting two points..
The standard error of the estimate quantifies the spread of the data.
However, 𝑠𝑦⁄𝑥 quantifies the spread around the regression line as shown
in the next figure (b) in contrast to the original standard deviation Sy that
quantified the spread around the mean ( fig (a)).
‫إذا أساديت جمايال إلااى إنساان فاحااذر‬
‫أن تاااذكرإ وإن أسااادن إنساااان إليااا‬
‫جميال فاحذر أن تنساإ‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
The above concepts can be used to quantify the “goodness” of our fit. This
is
particularly
useful
for
comparison
of
several
regressions
(next figure). To do this, we return to the original data and determine the
total sum of the squares around the mean for the dependent variable
(in our case, y). This quantity is designated as St. This is the magnitude of
the residual error associated with the dependent variable prior to
regression. After performing the regression, we can compute Sr, the sum
of
the
squares
of
the
residuals
around
the
regression
line.
This characterizes the residual error that remains after the regression.
It is, therefore, sometimes called the unexplained sum of the squares.
‫ فسااااتأول‬. . . ‫تكلااااس وأناااات راضاااال‬
‫أعظس حديث تندم عليه طوال حيات‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
The difference between the two quantifies, St – Sr, quantifies the
improvement or error reduction due to describing the data in terms of a
straight line rather than as an average value.
Because the magnitude of this quantity is scale-dependent, the difference
is normalized to St to yield
𝑟2 =
𝑆𝑡 − 𝑆𝑟
𝑆𝑡
where r2 is called the coefficient of determination and r is the correlation
coefficient (= √𝑟 2 ).
For a perfect fit, Sr = 0 and r = r2 = 1, signifying that the line explains
100 percent of the variability of the data. For r = r2 = 0, Sr = St and the fit
represents no improvement.
An alternative formulation for r that is more convenient for computer
implementations is
𝑟=
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 )(∑ 𝑦𝑖 )
√𝑛 ∑ 𝑥𝑖2 − (∑ 𝑥𝑖 )2 √𝑛 ∑ 𝑦𝑖2 − (∑ 𝑦𝑖 )2
‫ربما احتجت لسنة كاملة لكسل صدي و‬
‫لكن من السهل خسارته في دقيأة واحدة‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Example 17.2 Estimate of errors for the linear least-Squares Fit
Problem Statement:
Compute the total standard deviation, the standard error of the estimate,
and the correlation coefficient for the data in Example 17.1
Solution:
The summations are performed and represented in the previous
example’s table. The standard deviation is
𝑆𝑦 = √
𝑆𝑡
= √
𝑛−1
22.7143
7−1
= 1.9457
and the standard error of the estimate is
𝑆𝑦⁄𝑥 = √
𝑆𝑟
𝑛−2
= √
2.9911
7−2
= 0.7735
Thus, because 𝑆𝑦⁄𝑥 < 𝑆𝑦 , the linear regression model is efficient.
The extent of the improvement is quantified by
𝑟2 =
𝑆𝑡 − 𝑆𝑟
𝑆𝑡
=
22.7143−2.9911
22.7143
= 0.868
or
𝑟 = √0.868 = 0.932
These results indicate that 86.8 percent of the original uncertainty has
been explained by the linear model.
‫الثرثار إنسان فأد نعمة السمع‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.1.5 Linearization of Nonlinear Relationships
Linear regression provides a powerful technique for fitting a best line
to data. However, it is predicated on the fact that the relationship
between
the
dependent
and
independent
variables
is
linear.
This is not always the case and the first step in any regression analysis
should be to plot and visually inspect the data to know whether a linear
model applies. For example, the next figure shows some data that is
obviously curvilinear. In some cases, techniques such as polynomial
regression, are appropriate. For example, transformations can be used to
express the data in a form that is compatible with linear regression.
‫كاان مماان ينلاار السااعادة أينمااا ذهاال‬
‫و تكن ممن يخلفها وراءإ متى ذهل‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
One example is the exponential model
𝑦 = 𝛼1 𝑒 𝛽1𝑥
(17.2)
where 𝛼1 and 𝛽1 are constants. As shown in the next figure, the equation
represents a nonlinear relationship (for 𝛽1 ≠ 0) between x and y.
Another example of a nonlinear model is the simple power equation
𝑦 = 𝑎2 𝑥 𝛽2
(17.13)
where 𝛼2 and 𝛽2 are constant coefficients. As shown in the previous
figure, the equation ( for β2 ≠ 0 or 1) is nonlinear.
A third example of a nonlinear model is the saturation-growth-rate
equation
𝑦 = 𝛼3
𝑥
(17.4)
𝛽3 + 𝑥
Where 𝛼3 and 𝛽3 are constant coefficients. This model also represents a
nonlinear relationship between y and x, that levels off as x increases.
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
A simpler alternative is to use mathematical manipulations to
transform the equations into a linear form. Then, simple linear regression
can be employed to fit the equations to data.
Equation (17.2) can be linearized by taking its natural logarithm
ln 𝑦 = ln 𝛼1 + 𝛽1 𝑥 ln 𝑒
But because ln e = 1,
ln 𝑦 = ln 𝛼1 + 𝛽1 𝑥
Thus, a plot of ln y versus x will yield a straight line with a slope of 𝛽1 and
an intercept of ln 𝛼1 (previous fig d).
Equation (17.3) is linearized by taking its base-10 logarithm to give
log 𝑦 = 𝛽2 log 𝑥 + log 𝛼2
Thus, a plot of y versus log x will yield a straight line with a slope of 𝛽2 and
an intercept of log 𝛼2 ( previous fig e).
Equation (17.14) is linearized by inverting it to give
1
𝑦
=
𝛽3 1
𝛼3 𝑥
+
1
𝛼3
Thus, a plot of 1⁄𝑦 versus 1⁄𝑥 will be linear, with a slope of 𝛽3 ⁄𝛼3 and an
‫تعاااد كااال النال ااالا مالئكاااة فتنهاااار‬
‫أحالم و و تكن ثأتا بهاس عميالالاالاء‬
‫فتبكي يومالالالا على سذاجت‬
intercept of 1⁄𝛼3 (previous fig f).
In their transformed forms, these models can use linear regression to
evaluate the constant coefficients. They could then be transformed back
to
their
original
state
and
used
for
predictive
purposes.
Example 17.4 illustrates this procedure for Eq. (17.3)
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Example 17.4 Linearization of a Power Equation
Problem Statement:
Fit Eq.(17.13) to the data in the next table using a logarithmic
transformation of the data.
Solution:
The next figure (a) is a plot of the original data in its untransformed state.
Figure (b) shows the plot of the transformed data. A linear regression of
the log-transformed data yields the result
log 𝑦 = 1.75 log 𝑥 − 0.300
Thus, the intercept, log 𝛼2 , equals -0.300, and therefore, by taking the
antilogarithm,
𝛼2 = 10−0.3 = 0.5.
The
slope
is
𝛽2 = 1.75.
Consequently, the power equation is
𝑦 = 0.5𝑥 1.75
This curve, as plotted in the next figure (a), indicates a good fit.
‫من يأبى اليوم قباول النصايحة التاي تكلفاه‬
‫شيئا فسوف يضطر ردا إلى شراء األسف‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.1.6 General Comments on Linear Regression
We have focused on the simple derivation and practical use of
equations to fit data.
Some statistical assumptions that are inherent in the linear least-square
procedures are
1. Each x has a fixed value; it is not random and is known without
error.
2. The y values are independent random variables and all have the
same variance.
3. The y values for a given x must be normally distributed.
Such assumptions are relevant to the proper derivation and use of
regression. For example, the first assumption means that (1) the x values
must be error-free and (2) the regression of y versus x is not the same as x
versus y.
‫مااان يأاااع فاااى خطاااأ فهاااو إنساااان‬
‫ومن يصر عليه فهو شيطان‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.2 Polynomial Regression
Some engineering data, although representing a marked pattern, is
poorly represented by a straight line. For these cases, a curve would be
better suited to fit the data. One method to accomplish this objective is to
use transformations. Another alternative is to fit polynomials to the data
using polynomial regression.
The least-squares procedure can be readily extended to fit the data to
a higher-order polynomial. For example,
suppose that we fit a
second-order polynomial or quadratic:
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + 𝑒
for this case the sum of the squares of the residuals is
𝑆𝑟 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )2
Following the procedure of the previous section, we take the derivative of
the previous equation with respect to each of the unknown coefficients of
the polynomial, as in
𝜕𝑆𝑟
𝜕𝑎0
𝜕𝑆𝑟
𝜕𝑎1
𝜕𝑆𝑟
𝜕𝑎2
= −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
= −2 ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
= −2 ∑ 𝑥𝑖2 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥𝑖 − 𝑎2 𝑥𝑖2 )
‫يتساون في شدة الحم من‬
‫يطلل نصيحة الغير أبدا مع من‬
‫يأنع إ بنصائح ريرإ‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
These equations can be set equal to zero and rearranged to develop the
following set of normal equations:
(𝑛)𝑎0 + (∑ 𝑥𝑖 )𝑎1 + (∑ 𝑥𝑖2 )𝑎2 = ∑ 𝑦𝑖
(∑ 𝑥𝑖 )𝑎0 + (∑ 𝑥𝑖2 )𝑎1 + (∑ 𝑥𝑖3 )𝑎2 = ∑ 𝑥𝑖 𝑦𝑖
(∑ 𝑥𝑖2 )𝑎0 + (∑ 𝑥𝑖3 )𝑎1 + (∑ 𝑥𝑖4 )𝑎2 = ∑ 𝑥𝑖2 𝑦𝑖
where all summations are from i = 1 through n. Note that the above three
equations are linear and have three unknowns: 𝑎0 , 𝑎1 , and 𝑎2 .
The coefficients of the unknowns can be calculated directly from the
observed data.
For this case, we see that the problem of determining a least-squares
second-order polynomial is equivalent to solving a system of three
simultaneous linear equations.
‫أحسن ا ستفادة من كل خمس‬
‫دقائ بليء خفيف في جيب أو‬
‫أذكار أو مراجعة حفظ‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Example Polynomial Regression
Problem Statement:
Fit a second-order polynomial to the data in the first two columns of the
next table.
Solution:
From the given data,
∑ 𝑥𝑖4 = 979
m=2
∑ 𝑥𝑖 = 15
n =6
∑ 𝑦𝑖 = 152.6
∑ 𝑥𝑖 𝑦𝑖 = 585.6
𝑥̅ = 2.5
∑ 𝑥𝑖2 = 55
∑ 𝑥𝑖2 𝑦𝑖 = 2488.8
𝑦̅ = 25.433
∑ 𝑥𝑖3 = 225
Therefore, the simultaneous linear equations are
6
[15
55
15
55 𝑎0
152.6
55 225] {𝑎1 } = { 585.6 }
225 979 𝑎2
2488.8
Solving these equations through a technique such as Gauss elimination
gives a0 = 2.47857, a1 = 2.35929, and a2 = 1.86071.
‫إذا تعااااارف إلاااااى أي تتجاااااهو فساااااينتهي بااااا‬
‫المطاف على األرجح في مكان رير الذي تريدإ‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Continue:
Therefore, the least-squares quadratic equations for this case is
y = 2.47857 + 2.35929x + 1.86071x2
The standard error of the estimate based on the regression polynomial is
𝑆
𝑟
𝑆𝑦⁄𝑥 = √
= √
𝑛−( 𝑚+1 )
3.74657
6−3
= 1.12
The coefficient of determination is
𝑟2 =
𝑆𝑡 − 𝑆𝑟
𝑆𝑡
=
2513.39−3.74657
2513.39
= 0.99851
and the correlation coefficient is r = 0.99925.
These results indicate that 99.851 percent of the original uncertainty
has been explained by the model. This result supports the conclusion that
the quadratic equation represents an excellent fit, as is also evident from
the next figure.
‫يعيش اإلنسان اليوم في ظل‬
‫قرارات سابأةاتخذها فأين تحل‬
‫أن تكون بعد علر سنوات؟‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
17.3 Multiple Linear Regression
A useful extension of linear regression is the case where y is a linear
function of two or more independent variables. For example, y might be a
linear function of x1 and x2, as in
𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑒
Such an equation is particularly useful when fitting experimental data
where the variable being studied is often a function of two other variables.
For this two-dimensional case, the regression “line” becomes a “plane”
(next figure).
‫هل تود أن تعيش على هامش الحياة‬
‫كما يعيش أرلل النا أم أن لدي‬
‫إسهامات ألمت وللعالس أجمع‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
As with the previous cases, the “best” values of the coefficients are
determined by setting up the sum of the squares of the residuals,
𝑆𝑟 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )2
and differentiating with respect to each of the unknown coefficients.
𝜕𝑆𝑟
𝜕𝑎0
𝜕𝑆𝑟
𝜕𝑎1
𝜕𝑆𝑟
𝜕𝑎2
= −2 ∑(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )
= −2 ∑ 𝑥1𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )
= −2 ∑ 𝑥2𝑖 (𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1𝑖 − 𝑎2 𝑥2𝑖 )
The coefficients yielding the minimum sum of the squares of the residuals
are obtained by setting the partial derivatives equal to zero and expressing
the result in matrix form as
𝑛
[∑ 𝑥1𝑖
∑ 𝑥2𝑖
∑ 𝑥1𝑖
2
∑ 𝑥1𝑖
∑ 𝑥1𝑖 ∑ 𝑥2𝑖
∑ 𝑥2𝑖
∑ 𝑥1𝑖 ∑ 𝑥2𝑖 ]
2
∑ 𝑥2𝑖
∑ 𝑦𝑖
𝑎0
{𝑎1 } = {∑ 𝑥1𝑖 𝑦𝑖 }
𝑎2
∑ 𝑥2𝑖 𝑦𝑖
‫الجناااااون هاااااو أن تعمااااال نفاااااس‬
‫األعمااال باانفس الطريأااة وتتوقااع‬
)‫نتائج مختلفة (أنلطين‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Example 17.6 Multiple Linear Regression
Problem Statement:
The following data was calculated from the
equations y = 5 + 4x1 – 3x2:
Use multiple linear regression to fit this data.
Solution:
The summations required to develop the previous equation are:
The result is
6
[16.5
14
16.5
76.25
48
14
48]
54
𝑎0
54
{𝑎1 } = {243.5}
𝑎2
100
Which can be solved using a method such as Gauss elimination for
a0 = 5
ai = 4
a2= -3
which is consistent with the original equation from which the data was
derived.
‫ تعل عاجل عاقبته نجاحو‬:‫ستدفع أحد ثمنين‬
‫أو متعة عاجلة مؤقتة ثمنها فلل مؤلس‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
The foregoing two-dimensional case can be easily extended to m
dimensions, as in
y = a0 + a1x1 + a2x2 + … + amxm + e
where the standard error is formulated as
𝑆
𝑟
𝑆𝑦⁄𝑥 = √
𝑛−(𝑚+1)
and the coefficient of determination is computed as in Eq (17.10).
Although there may be certain cases where a variable is linearly
related to two or more other variables, multiple linear regression has
additional utility in the derivation of power equations of the general form
𝑎
𝑎
𝑎
𝑦 = 𝑎0 𝑥1 1 𝑥2 2 … . 𝑥𝑚𝑚
Such equations are extremely useful when fitting experimental data.
To use multiple linear regression, the equation is transformed by taking its
logarithm to yield.
log 𝑦 = log 𝑎0 + 𝑎1 log 𝑥1 + 𝑎2 log 𝑥2 + … + 𝑎𝑚 log 𝑥𝑚
This transformation is similar in spirit to the one used to fit a power
equations when y is a function of a single variable x.
‫الألاااا مثاااال الكرسااااي الهاااا ات‬
‫ساايجعل تتحاارك دائمااا ولكنااه‬
‫لن يوصل الى أي مكان‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Problem 17.5
Use least-squares regression to fit a straight line to
x
6
7
11
15
17
21
23
29
29
37
39
y
29
21
29
14
21
15
7
7
13
0
3
Compute the standard error of the estimate and the correlation
coefficient. Plot the data and the regression line. If someone made an
additional measurement of x = 10, y = 10, would you suspect, that the
measurement was valid or faulty? Justify your conclusion.
Solution:
The results can be summarized as
y  31.0589  0.78055 x
(s y / x  4.476306 ; r  0.901489 )
At x = 10, the best fit equation gives 23.2543. The line and data can be
plotted along with the point (10, 10).
‫ستصبح أكثر ساعادة حاين تلاعر‬
‫أن الحياااااة نفسااااها ترراااال فااااي‬
‫دعم ومساندت‬
35
30
25
20
15
10
5
0
0
10
20
30
40
The value of 10 is nearly 3 times the standard error away from the line,
23.2543 – 10 = 13.2543 ≈ 34.476
Thus, we can conclude that the value is probably erroneous.
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Problem 17.13
An investigator has reported the data tabulated below for an experiment
to determine the growth rate of bacteria k (per d), as a function of oxygen
concentration c (mg/L). It is known that such data can be modeled by the
following equation:
𝑘=
𝑘𝑚𝑎𝑥 𝑐 2
𝑐𝑠 + 𝑐 2
Where 𝑐𝑠 and 𝑘𝑚𝑎𝑥 are parameters. Use a transformation to linearize this
equation. Then use linear regression to estimate 𝑐𝑠 and 𝑘𝑚𝑎𝑥 and predict
the growth rate at c = 2 mg/L.
C
0.5
0.8
1.5
2.5
4
K
1.1
2.4
5.3
7.6
8.9
Solution:
The equation can be linearized by inverting it to yield
c 1
1
1
 s 2
k kmax c
kmax
Consequently, a plot of 1/k versus 1/c should yield a straight line with an
intercept of 1/kmax and a slope of cs/kmax
c,
mg/L
0.5
0.8
1.5
2.5
4
k, /d
1.1
2.4
5.3
7.6
8.9
Sum 
1/c2
4.000000
1.562500
0.444444
0.160000
0.062500
6.229444
1/k
0.909091
0.416667
0.188679
0.131579
0.112360
1.758375
1/c21/k
3.636364
0.651042
0.083857
0.021053
0.007022
4.399338
(1/c2)2
16.000000
2.441406
0.197531
0.025600
0.003906
18.66844
‫قامو الواثأين يحتوي على‬
). . . ‫ و لو حدث أن‬. . ‫(لكن إذا‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬
July 2013
Continue:
The slope and the intercept can be computed as
a1 
5(4.399338 )  6.229444 (1.758375 )
 0.202489
5(18.66844 )  (6.229444 ) 2
a0 
1.758375
6.229444
 0.202489
 0.099396
5
5
Therefore, kmax = 1/0.099396 = 10.06074 and cs = 10.06074(0.202489) =
2.037189, and the fit is
k
10.06074 c 2
2.037189  c 2
This equation can be plotted together with the data:
10
8
6
4
2
0
0
1
2
3
4
5
The equation can be used to compute
10.06074 (2) 2
k
 6.666
2.037189  (2) 2
‫كل الظالم الذي في الدنيا يستطيع‬
‫أن يخفي ضوء شمعت المضيئة‬
‫ أو بالبريد اإللكتروني‬9 4444 260 ‫النوتات مجانية للنفع العام فيرجى المساهمة باإلبالغ عن أي خطأ أو مالحظات تراها ضرورية برسالة نصية‬
Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms, Numerical, Economy
, eng-hs.neteng-hs.com ‫شرح ومسائل محلولة مجانا بالموقعين‬
[email protected] 9 4444 260 ‫ حمادة شعبان‬.‫م‬