Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Activity 5: Coefficient of Determination and Simple Linear Regression ONE OF THE TOUGHEST ASSIGNMENTS OF THE SEMESTER We’ll be using a new SPSS file for tonight’s class. The dataset is one created on two physical activity monitors. Participants in the study wore a pedometer and the Sensewear Armband for 4 days. The pedometer is a relatively cheap monitor, costing around $25. The Sensewear armband is one of the most sophisticated monitors on the market, costing around $300 per unit and another $2000 for the software to analyze the data. Both of the monitors are designed to estimate “steps taken” and “calories burned from physical activity”. We’ll be trying to figure out two research questions based on these variables. As you fill in the information below, please type in RED or BOLD font so it is easy to see. A. Download the SPSS files ‘ActivityMonitors’. Open this file using SPSS. This file contains approximately 60 subjects that wore both of the monitors for 4 days. Variables include sex (0=male, 1=female), height, weight, body mass index, and stride length (distance covered per step). You should also find the physical activity data, the pedometer estimated steps per day (PedSteps) and calories burned (PedCal), as well as the armband estimated steps per day (ArmSteps) and calories burned (ArmCal). B. Prediction Practice Just to get a sense of the dataset, create a correlation matrix using age, height, weight (in kilograms), stride length, pedometer steps and armband calories. Look at the correlations between body weight and these other variables. 1) Which variables have a statistically significant association with body weight? List the variables that are statistically significant, with the resulting r values: 2) Which of the statistically significant variables explains the most variance in body weight? How much variance does it explain? Now, use simple linear regression to predict body weight from height (height should be your answer in number 2 above). Go to ‘Analyze’ > ‘Regression’ > ‘Linear’. We are trying to predict weight (dependent variable) from height (independent variable). Make sure you use weight in kilograms (which is actually mass! 3) Is height a statistically significant predictor of weight? 4) Write out our prediction equation (BOLD the slope and underline the y-intercept): 5) Interpret the slope of our equation. “A one unit increase in height…” 6) What is the standard error of the estimate? Create a new variable called “PredWeight”. Compute this variable using your new prediction equation from number 5 above. This variable should be the predicted weight of all the subjects using their height. Calculate the mean and standard deviation of PredWeight, as well as our original body weight variable. 7) What is the mean of “PredWeight”? 8) What is the mean of the original body weight variable? 9) Why are 7 and 8 identical? Now, create a new variable called “WeightResid”. This variable should be the residuals for our line of best fit. To do this, simply create the variable using this equation Weight_kg – PredWeight = WeightResid. 10) What is the mean of WeightResid? 11) What is the standard deviation of WeightResid? 12) What is the technical term for the standard deviation of our residuals? Notice, this hand-calculated SEE may not match your answer on #6 perfectly – but it will be very close. SPSS uses a slightly different calculation, but for all practical purposes these numbers are identical. C. Estimating Calories Now, let’s re-do this process for calories, this is a much more important model. The pedometer costs $25, the armband costs around $300. So, if we can accurately predict the armband calories from just a simple pedometer, then we may be able to save some money. The armband has been tested several times in research papers and it is one of the most accurate methods of determining daily activity energy expenditure (calories burned). Both monitors are estimating calories burned from physical activity – not total daily energy expenditure, so expect your numbers to be lower. 13) Pedometer Mean Calories (SD): 14) Armband Mean Calories (SD): Remember, in this case, the armband is the ‘real’ value (it’s already been shown to be accurate). 15) Does the pedometer accurately estimate calories burned? How much error is there? Does the pedometer over- or under-estimate calories? 16) What is the average difference in calories taken between the armband and the pedometer (note: any difference here is error)? 17) What is the standard deviation of the error between the two monitors (hint: you’ll need to create a new variable)? 18) Fill in the blanks: 68% of the error (between the armband and the pedometer calories) fall between ______________ calories and _____________ calories. Let’s see if we can ‘correct’ this error using a regression equation. Run Pearson correlations between pedometer calories and the armband calories. 19) What percentage of the variance in armband calories is accounted for by the pedometer calories? Now, use simple linear regression to predict armband calories from pedometer calories. Go to ‘Analyze’ > ‘Regression’ > ‘Linear’. 20) Is pedometer calories a statistically significant predictor of armband calories? 21) Write out our prediction equation (BOLD the slope and underline the y-intercept): 22) Interpret the slope of our equation. A one calorie increase on the pedometer equals… 23) What is the standard error of the estimate? 24) How does your answer on #24 compare to your answer on #18? Would using this new prediction equation make the pedometer more accurate? Your answer on #24 should serve as a reminder that just because the p-value is low, and/or the r2 is large, does not mean your regression model is ‘perfect’. Save this word file and later in the week and let me know you have completed it before class next Monday night.