Download Lecture 7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ch11 Curve Fitting
Dr. Deshi Ye
[email protected]
Outline
The method of Least Squares
Inferences based on the Least Squares
Estimators
Curvilinear Regression
Multiple Regression
2/30
11.1 The Method of Least Squares
Study the case where a dependent
variable is to be predicted in terms of a
single independent variable.
The random variable Y depends on a
random variable X.
Regressing curve of Y on x, the relationship
between x and the mean of the
corresponding distribution of Y.
3/30
Linear regression
4/30
Linear regression
Linear regression: for any x, the mean of
the distribution of the Y’s is given by    x
In general, Y will differ from this mean, and we denote
this difference as follows
Y   x

is a random variable and we can also choose 
so that the mean of the distribution of this random is
equal to zero.
5/30
EX
x
y
1 2 3 4 5 6 7 8 9 10 11 12
16 35 45 64 86 96 106 124 134 156 164 182
6/30
Analysis
ˆ  a  bx
y
ˆi
ei  yi  y
n
e
i 1
as close as possible to zero.
i
7/30
Principle of least squares
Choose a and b so that
n
n
2
e

(
y

(
a

bx
))
  i
i
i 1
2
i
i 1
is minimum. The procedure of finding the equation of the line
which best fits a given set of paired data, called the method
of least squares. Some notations:
n
n
n
( yi )2
i 1
i 1
n
n
n
n
( xi )2
i 1
n
S xx   ( xi  x )   xi2 
2
i 1
i 1
S yy   ( yi  y )2   yi2 
n
n
i 1
i 1
n
n
( xi )( yi )
i 1
i 1
n
S xy   ( xi  x )( yi  y )   xi yi 
i 1
8/30
Least squares estimators
a  y  b  x and b 
S xy
S xx
, where x , y are the means of x, y
Fitted (or estimated) regression line
ŷ  a  bx
Residuals: observation – fitted value= y i  (a  bxi )
The minimum value of the sum of squares is called the
residual sum of squares or error sum of squares. We
n
will show that
2
SSE  residual sum of squares= (yi - a - bxi )
i 1
 S xy  S xy2 / S xx
9/30
EX solution
Y = 14.8 X + 4.35
10/30
X-and-Y
X-axis
independent
predictor
carrier
input
Y-axis
dependent
predicted
response
output
11/30
Example
You’re a marketing analyst for Hasbro
Toys. You gather the following data:
Ad $
Sales (Units)
1
1
2
1
3
2
4
2
5
4
What is the relationship
between sales & advertising?
12/30
Scattergram
Sales vs. Advertising
Sales
4
3
2
1
0
0
1
2
3
4
5
Advertising
13/30
the Least Squares
Estimators
14/30
11.2 Inference based on the Least
Squares Estimators
We assume that the regression is linear in
x and, furthermore, that the n random
variable Yi are independently normally
distribution with the means    xi
Statistical model for straight-line
regression Y     x  
i
i
i
i
are independent normal distributed random
variable having zero means and the common
variance  2
15/30
Standard error of estimate
The i-th deviation and the estimate of 
is
1 n
2
2
Se 
2
[ y  (a  bx )]

n2
i 1
2

Estimate of
i
i
can also be written as follows
S yy 
S 
2
e
( S xy ) 2
S xx
n2
16/30
Statistics for inferences: based on the assumption made
concerning the distribution of the values of Y, the following
theorem holds.
Theorem. The statistics
nS xx
(a   )
(b   )
t
and t 
S xx
2
se
S xx  n( x )
se
are values of random variables having the t distribution
with n-2 degrees of freedom.
Confidence intervals
 : a  t / 2  se
1 ( x )2

n S xx
 : b  t / 2  se
1
S xx
17/30
Example
The following data pertain to number of
computer jobs per day and the central
processing unit (CPU) time required.
Number of jobs
x
1
2
3
4
5
CPU time
y
2
5
4
9
10
18/30
EX
1) Obtain a least squares fit of a line to the
observations on CPU time
b
S xy
S xx
 2, a  y  bx  0
y  2x
19/30
Example
2) Construct a 95% confidence interval for α
s 
2
e
S yy  S xy 2 / S xx
n2
The 95% confidence interval of α,
a  t / 2  se
46  400 /10

2
3
t / 2  t0.025  3.182
1 x2
1 9

 0  3.182 * 2 *

 4.72
n S xx
5 10
20/30
Example
3) Test the null hypothesis
the alternative hypothesis
level of significance.
  1 against
  1 at the 0.05
Solution: the t statistic is given by
(b   )
2 1
t
S xx 
10  2.236
se
2
Criterion:
t  t0.05  2.353
Decision: we cannot reject the null hypothesis
21/30
11.3 Curvilinear Regression
Regression curve is nonlinear.
Polynomial regression:
Y  0  1 x  2 x2 
 pxp
Y on x is exponential, the mean of the distribution of
values of Y is given by y     x
Take logarithms, we have log y  log   x  log 
Thus, we can estimate  ,  by the pairs of value ( xi ,log yi )
22/30
Polynomial regression
If there is no clear indication about the
function form of the regression of Y on x,
we assume it is polynomial regression
2
k
Y  a0  a1 x  a2 x   ak x
23/30
Polynomial Fitting
•Really just a generalization of the
previous case
•Exact solution
•Just big matrices
24/30
11.4 Multiple Regression
The mean of Y on x is given by
b0  b1 x1  b2 x2 
n
Minimize
 [ yi  (b0  b1 xi1 
 bk xk
 bk xik )]2
i 1
We can solve it when r=2 by the following equations
 y  nb  b  x  b  x
x y  b x b x b x x
 x y b  x  b  x x  b  x
0
1
2
0
0
1
1
1
2
1
1
2
2
2
1
1 2
2
1 2
2
2
2
25/30
Example
P365.
26/30
Multiple Linear Fitting
X1(x), . . .,XM(x) are arbitrary fixed functions of x
(can be nonlinear), called the basis functions
normal equations of the least squares
problem
Can be put in matrix form and solved
27/30
Correlation Models
1. How strong is the linear relationship
between 2 variables?
2. Coefficient of correlation used
Population correlation coefficient denoted

Values range from -1 to +1
28/30
Correlation
Standardized observation
Observation - Sample mean xi  x

Sample standard deviation
sx
The sample correlation coefficient r
1 n xi  x yi  y
r
(
)(
)

n  1 i 1 s x
sy
29/30
Coefficient of Correlation Values
No
Correlation
-1.0
-.5
Increasing degree of
negative correlation
0
+.5
+1.0
Increasing degree of
positive correlation
30/30
Related documents