Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
“Applied” Homework: Data Description and General Guidelines The Minitab worksheet “body_fat” was adapted from a data set posted by Roger W. Johnson (Dept. of Mathematics & Computer Science, South Dakota School of Mines & Technology) on the http://www.stat.cmu.edu. R. Johnson also provides the following references, which you might find useful: • Bailey, Covert (1994). Smart Exercise: Burning Fat, Getting Fit, Houghton-Mifflin Co., Boston, pp. 179-186. • Behnke, A.R. and Wilmore, J.H. (1974). Evaluation and Regulation of Body Build and Composition, Prentice-Hall, Englewood Cliffs, N.J. • Siri, W.E. (1956), "Gross composition of the body", in Advances in Biological and Medical Physics, vol. IV, edited by J.H. Lawrence and C.A. Tobias, Academic Press, Inc., New York. • Katch, Frank and McArdle, William (1977). Nutrition, Weight Control, and Exercise, Houghton Mifflin Co., Boston. • Wilmore, Jack (1976). Athletic Training and Physical Fitness: Physiological Principles of the Conditioning Process, Allyn and Bacon, Inc., Boston. The data concerns a sample of 252 men, and contains the following variables: • Density of the body, determined from underwater weighing • Percentage of body fat, calculated as a function of the Density according to Siri’s equation: (495/Density) – 450. • Indicator for Age group (0: up to 45 years, 1: over 45) • Weight (lbs) • Height (inches) • Neck circumference (cm) • Chest circumference (cm) • Abdomen circumference (cm) • Hip circumference (cm) • Thigh circumference (cm) • Knee circumference (cm) • Ankle circumference (cm) • Biceps circumference (cm) • Forearm circumference (cm) • Wrist circumference (cm) The most accurate way of calculating body fat percentage is the one provided by Siri’s equation, which requires a measurement of Density via weighting under water. This is expensive and unpractical. On the other hand, age and the body measurements listed above are easy to obtain. Thus, we want to understand if we can reliably describe and predict body fat percentage on the basis of these variables, using regression. For age, we only have a binary indicator separating men below and above 45 years. The body measurements, on the other hand, are all continuous variables. 1 Notice also that body fat percentage is a quantity bound to vary between 0 and 100. This could cause problems when using linear regression models (we could actually estimate mean levels or predict values smaller than 0 or larger than 100 on certain predictors ranges). On the other hand, we can safely use linear regression as far as we move in ranges of the predictors where the fitted values are well above 0 and well below 100. Throughout the semester, the “Application” component in each of the five homework sets will consist of employing various modeling and diagnostics techniques learned in class on these data – with the final aim of producing a satisfactory regression model for body fat percentage, based on all or some of the predictor variables available to us. NOTE: DENSITY (THE VARIABLE ON WHICH SIRI’S EQUATION IS BASED) WILL NOT BE USE AS A PREDICTOR. When preparing your “body fat” write-up for each homework set, make sure that: • It does not exceed 5 pages, including figures and tables. Do not include raw computer output (edit the relevant outputs in tables or compact summaries). • It is divided in two parts, one devoted to technical details and outputs, and one devoted to interpretation of the results. The latter should resemble a short report you would write for a client, i.e. be concise and informative but not contain technical terms, and not assume the reader has statistical knowledge. Keep in mind that some erroneous values were detected in this data set • Density is given to you so that you can verify whether there were mistakes in the calculation of Percentage of body fat through Siri’s formula. • There may be some obvious measurement errors in Height, and in other predictor variables. When performing the analyses required for each homework set, you can remove, or present results with and without, a few units (men) that appear to carry erroneous measurements (never remove more than 10 units, and always provide an appropriate justification for removing units if you do). Remember that a regression model should be satisfactory in three respects • variability explanation • diagnostics on the basic assumptions • parsimony and interpretability As we learn more techniques, you will be asked to work with transformations of the response, transformations of the predictors, power and product terms, etc. – given the large number of predictors and terms that can be derived from them, you will often consider and evaluate several alternative models. If you encounter more than one model that you deem satisfactory, and if you see an interest in presenting and interpreting more than one model in your write-up, you are allowed and encouraged to do so. 2