Download “Applied” Homework: Data Description and General Guidelines The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Generalized linear model wikipedia , lookup

Least squares wikipedia , lookup

Plateau principle wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
“Applied” Homework: Data Description and General Guidelines
The Minitab worksheet “body_fat” was adapted from a data set posted by Roger W.
Johnson (Dept. of Mathematics & Computer Science, South Dakota School of Mines
& Technology) on the http://www.stat.cmu.edu. R. Johnson also provides the
following references, which you might find useful:
• Bailey, Covert (1994). Smart Exercise: Burning Fat, Getting Fit, Houghton-Mifflin
Co., Boston, pp. 179-186.
• Behnke, A.R. and Wilmore, J.H. (1974). Evaluation and Regulation of Body Build
and Composition, Prentice-Hall, Englewood Cliffs, N.J.
• Siri, W.E. (1956), "Gross composition of the body", in Advances in Biological
and Medical Physics, vol. IV, edited by J.H. Lawrence and C.A. Tobias, Academic
Press, Inc., New York.
• Katch, Frank and McArdle, William (1977). Nutrition, Weight Control, and
Exercise, Houghton Mifflin Co., Boston.
• Wilmore, Jack (1976). Athletic Training and Physical Fitness: Physiological
Principles of the Conditioning Process, Allyn and Bacon, Inc., Boston.
The data concerns a sample of 252 men, and contains the following variables:
• Density of the body, determined from underwater weighing
• Percentage of body fat, calculated as a function of the Density according to
Siri’s equation: (495/Density) – 450.
• Indicator for Age group (0: up to 45 years, 1: over 45)
• Weight (lbs)
• Height (inches)
• Neck circumference (cm)
• Chest circumference (cm)
• Abdomen circumference (cm)
• Hip circumference (cm)
• Thigh circumference (cm)
• Knee circumference (cm)
• Ankle circumference (cm)
• Biceps circumference (cm)
• Forearm circumference (cm)
• Wrist circumference (cm)
The most accurate way of calculating body fat percentage is the one provided by Siri’s
equation, which requires a measurement of Density via weighting under water. This is
expensive and unpractical. On the other hand, age and the body measurements listed
above are easy to obtain. Thus, we want to understand if we can reliably describe and
predict body fat percentage on the basis of these variables, using regression. For age,
we only have a binary indicator separating men below and above 45 years. The body
measurements, on the other hand, are all continuous variables.
1
Notice also that body fat percentage is a quantity bound to vary between 0 and 100.
This could cause problems when using linear regression models (we could actually
estimate mean levels or predict values smaller than 0 or larger than 100 on certain
predictors ranges). On the other hand, we can safely use linear regression as far as we
move in ranges of the predictors where the fitted values are well above 0 and well
below 100.
Throughout the semester, the “Application” component in each of the five
homework sets will consist of employing various modeling and diagnostics techniques
learned in class on these data – with the final aim of producing a satisfactory
regression model for body fat percentage, based on all or some of the predictor
variables available to us.
NOTE: DENSITY (THE VARIABLE ON WHICH SIRI’S EQUATION IS
BASED) WILL NOT BE USE AS A PREDICTOR.
When preparing your “body fat” write-up for each homework set, make sure that:
• It does not exceed 5 pages, including figures and tables. Do not include raw
computer output (edit the relevant outputs in tables or compact summaries).
• It is divided in two parts, one devoted to technical details and outputs, and one
devoted to interpretation of the results. The latter should resemble a short report
you would write for a client, i.e. be concise and informative but not contain
technical terms, and not assume the reader has statistical knowledge.
Keep in mind that some erroneous values were detected in this data set
• Density is given to you so that you can verify whether there were mistakes in the
calculation of Percentage of body fat through Siri’s formula.
• There may be some obvious measurement errors in Height, and in other predictor
variables.
When performing the analyses required for each homework set, you can remove, or
present results with and without, a few units (men) that appear to carry erroneous
measurements (never remove more than 10 units, and always provide an appropriate
justification for removing units if you do).
Remember that a regression model should be satisfactory in three respects
• variability explanation
• diagnostics on the basic assumptions
• parsimony and interpretability
As we learn more techniques, you will be asked to work with transformations of the
response, transformations of the predictors, power and product terms, etc. – given
the large number of predictors and terms that can be derived from them, you will
often consider and evaluate several alternative models. If you encounter more than
one model that you deem satisfactory, and if you see an interest in presenting and
interpreting more than one model in your write-up, you are allowed and encouraged
to do so.
2