Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Calculus III ePortfolio Project The purpose of this project is to investigate the possible linear relationship in a data set, also known as a finite set of points (x[n],y[n]). Do these given points lie on a line with equation y=m*x +b, or is there -somehow—a best line to which these points are closest? The method which we will use is standard operating procedure in Statistics (MAT 120 or MATI 21 at LaGuardia) and is called the method of least squares, and the "best" line fitting the set of points (x[n],y[n]) is called the regression line. In an elementary Statistics text, formulas for the slope m and the y-intercept b are usually presented without mathematical justification; however, since this is Calculus III, we are in a position to appreciate finding the best line through a set of points as a problem in minimizing a function of two variables, namely m and b, the slope and intercept of the line. The best line will be that for which the sum of the squared vertical distances between each point and the line will be as small as possible. For a particular point (x[i],y[i]), the vertical distance to the line d[i] is the absolute value abs(y[i]-m*x[i]-b). Thus d[i]A2=(y[i]-m*x[i]-b)A2, and we will be interested in finding the minimum (that is finding the critical points and testing, using D(m,b)) for the function f(m,b)=sum((y[i]~m*x[i]-b)A2,i=l..n). Part I of the project will be to use MAPLE to solve this Calc III problem. Note that the points (x[n],y[n]) are given, and behave like constants. The variables are m and b, and the minimum characterizes the "best" line. The MAPLE commands (we have already seen them: they involve defining the function f, differentiating partially with respect to both m and b, then solving the system of two equations in two unknowns for critical points, and (finally) testing the points by setting up D using second partial derivatives) will be reviewed on a separate page. The formulas for m and b in terms of the points (x[n],y[n]) are readily available. In fact they are in our Multivariate Calculus textbook. Before getting started, I recommend that you read pp. 713-14 and pp. 733-34. Part II of the project will involve working with actual data. In particular, the data is a set of distances (from the sun) and year-length (in Earth days) for the first six planets: Year-length (y) Planet 1. 2. 3. 4. 5. 6. Mercury Venus Earth Mars Jupiter Saturn 88 225 365 687 4333 10759 Log(y) Distance from Sun in 1(^6 miles (x) 36 67 93 142 484 886 Log(x) Why are columns for logarithms included? The German astronomer Johannes Kepler (1571-1630) was in possession of the above planetary data, and he was trying to find a formula for a planet's year-length in terms of that planet's distance from the sun. He was too early for Uranus, Neptune, and Pluto, so he worked with what he had (Aside: revisionist historians claim that Kepler's data was too good for the measuring instruments of the time, and that he probably "cooked" his data somewhat to achieve an even better fit). We easily observe, as did Kepler, that as presented the data is not linear. There are really two candidates for a function relating y and x as defined above, a power function and an exponential function: (1) y=A*xAc, A and c are constants (2) y=B*exp(k*x), B and k are constants. Now, notice that if we take the logarithms of both sides of (1) and (2), we obtain: (3) logy=c*logx+logA (4) logy=k*x+logB. In (3), logy is a linear function of logx, with slope c and intercept logA. In (4), logy is a linear function of x with slope k and intercept logB. If formula (1) is correct we would expect an approximately linear relationship between the points (logxfn], logyfn]), and we could use the least squares method to find the best slope c and intercept logA. Working backwards (how?) we could obtain formula (1). A similar argument holds for (2) and (4). To accomplish this, we could plug the logarithmic data into the formulas we obtained for best m and b in Part I. However, since we are working with numerical data, it will be faster to use MAPLE's Statistical Package. These commands (simple!) will also be reviewed on a separate page. Once you have "worked up" your data for the possible models (1) and (3), or (2) and (4), you will submit a "lab report". This will be in the form of a MAPLE Worksheet, which will then be uploaded to your ePortfolio. Part I consists of the critical point investigation for the least squares function, f(m,b) as defined above. This gives the slope and intercept for the best line fitting given data. The first piece of Part II will consist of two MAPLE treatments for the data in (3) and (4). Note: the data in (3) is sometimes called log-log and the data in (4) is sometimes referred to as semi-log. The second piece of Part II will consist of answers to the following questions: (1) Give a formula to model the relationship between planetary years and distances from the Sun. Is this formula a power function or an exponential function? Explain (write!) what the correct formula is (or at least seems to be, on the basis of available data.. .remember that you are playing the role of Kepler). Also, explain what you did, how you went about finding the formula. (2) More data is now available. Uranus is 1782 million miles from the sun, Neptune is 2790 million miles from the sun, and Pluto is 3660 million miles from the sun. What year lengths does your formula predict? The observed years are 30600 days, 60190 days, and 90470 days respectively, for the outer planets. Compute the percent error, given by {Abs(observed-predicted)/observed} x 100% (3) Does the formula work for asteroids? The asteroid Ceres has a mean distance from the sun of 257 million miles. Predict the length of the year on Ceres and compare to the observed value of 1681 days. What is the percent error? (4) Back in Part I, when you were carrying out the critical point analysis for the least squares function f(m,b), we concluded that the values found minimized f(m,b). It is easy to see that second pure partials are positive. Food for thought: why is the calculated D positive?