Download Calculus III ePortfolio Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Association rule learning wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Time series wikipedia , lookup

Transcript
Calculus III ePortfolio Project
The purpose of this project is to investigate the possible linear relationship in a data
set, also known as a finite set of points (x[n],y[n]). Do these given points lie on a line with
equation y=m*x +b, or is there -somehow—a best line to which these points are closest?
The method which we will use is standard operating procedure in Statistics (MAT 120 or
MATI 21 at LaGuardia) and is called the method of least squares, and the "best" line fitting
the set of points (x[n],y[n]) is called the regression line. In an elementary Statistics text,
formulas for the slope m and the y-intercept b are usually presented without mathematical
justification; however, since this is Calculus III, we are in a position to appreciate finding
the best line through a set of points as a problem in minimizing a function of two variables,
namely m and b, the slope and intercept of the line. The best line will be that for which the
sum of the squared vertical distances between each point and the line will be as small as
possible. For a particular point (x[i],y[i]), the vertical distance to the line d[i] is the
absolute value abs(y[i]-m*x[i]-b). Thus d[i]A2=(y[i]-m*x[i]-b)A2, and we will be interested
in finding the minimum (that is finding the critical points and testing, using D(m,b)) for the
function
f(m,b)=sum((y[i]~m*x[i]-b)A2,i=l..n).
Part I of the project will be to use MAPLE to solve this Calc III problem. Note that
the points (x[n],y[n]) are given, and behave like constants. The variables are m and b, and
the minimum characterizes the "best" line. The MAPLE commands (we have already seen
them: they involve defining the function f, differentiating partially with respect to both m
and b, then solving the system of two equations in two unknowns for critical points, and
(finally) testing the points by setting up D using second partial derivatives) will be
reviewed on a separate page. The formulas for m and b in terms of the points (x[n],y[n])
are readily available. In fact they are in our Multivariate Calculus textbook. Before getting
started, I recommend that you read pp. 713-14 and pp. 733-34.
Part II of the project will involve working with actual data. In particular, the data is
a set of distances (from the sun) and year-length (in Earth days) for the first six planets:
Year-length (y)
Planet
1.
2.
3.
4.
5.
6.
Mercury
Venus
Earth
Mars
Jupiter
Saturn
88
225
365
687
4333
10759
Log(y)
Distance from
Sun in 1(^6
miles (x)
36
67
93
142
484
886
Log(x)
Why are columns for logarithms included? The German astronomer Johannes
Kepler (1571-1630) was in possession of the above planetary data, and he was trying to find
a formula for a planet's year-length in terms of that planet's distance from the sun. He was
too early for Uranus, Neptune, and Pluto, so he worked with what he had (Aside: revisionist
historians claim that Kepler's data was too good for the measuring instruments of the time,
and that he probably "cooked" his data somewhat to achieve an even better fit). We easily
observe, as did Kepler, that as presented the data is not linear. There are really two
candidates for a function relating y and x as defined above, a power function and an
exponential function:
(1) y=A*xAc, A and c are constants
(2) y=B*exp(k*x), B and k are constants.
Now, notice that if we take the logarithms of both sides of (1) and (2), we obtain:
(3) logy=c*logx+logA
(4) logy=k*x+logB.
In (3), logy is a linear function of logx, with slope c and intercept logA. In (4), logy is a
linear function of x with slope k and intercept logB. If formula (1) is correct we would
expect an approximately linear relationship between the points (logxfn], logyfn]), and we
could use the least squares method to find the best slope c and intercept logA. Working
backwards (how?) we could obtain formula (1). A similar argument holds for (2) and (4). To
accomplish this, we could plug the logarithmic data into the formulas we obtained for best m
and b in Part I. However, since we are working with numerical data, it will be faster to use
MAPLE's Statistical Package. These commands (simple!) will also be reviewed on a
separate page.
Once you have "worked up" your data for the possible models (1) and (3), or (2) and
(4), you will submit a "lab report". This will be in the form of a MAPLE Worksheet, which
will then be uploaded to your ePortfolio. Part I consists of the critical point investigation for
the least squares function, f(m,b) as defined above. This gives the slope and intercept for the
best line fitting given data. The first piece of Part II will consist of two MAPLE treatments
for the data in (3) and (4). Note: the data in (3) is sometimes called log-log and the data in
(4) is sometimes referred to as semi-log. The second piece of Part II will consist of answers
to the following questions:
(1) Give a formula to model the relationship between planetary years and distances
from the Sun. Is this formula a power function or an exponential function?
Explain (write!) what the correct formula is (or at least seems to be, on the basis
of available data.. .remember that you are playing the role of Kepler). Also,
explain what you did, how you went about finding the formula.
(2) More data is now available. Uranus is 1782 million miles from the sun, Neptune is 2790
million miles from the sun, and Pluto is 3660 million miles from the sun. What year
lengths does your formula predict? The observed years are 30600 days, 60190 days, and
90470 days respectively, for the outer planets. Compute the percent error, given by
{Abs(observed-predicted)/observed} x 100%
(3) Does the formula work for asteroids? The asteroid Ceres has a mean distance
from the sun of 257 million miles. Predict the length of the year on Ceres and
compare to the observed value of 1681 days. What is the percent error?
(4) Back in Part I, when you were carrying out the critical point analysis for the
least squares function f(m,b), we concluded that the values found minimized
f(m,b). It is easy to see that second pure partials are positive. Food for thought:
why is the calculated D positive?