Download Activity 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Activity 5: Coefficient of Determination and Simple Linear Regression
ONE OF THE TOUGHEST ASSIGNMENTS OF THE SEMESTER
We’ll be using a new SPSS file for tonight’s class. The dataset is one created on two physical activity
monitors. Participants in the study wore a pedometer and the Sensewear Armband for 4 days. The
pedometer is a relatively cheap monitor, costing around $25. The Sensewear armband is one of the
most sophisticated monitors on the market, costing around $300 per unit and another $2000 for the
software to analyze the data. Both of the monitors are designed to estimate “steps taken” and “calories
burned from physical activity”. We’ll be trying to figure out two research questions based on these
variables. As you fill in the information below, please type in RED or BOLD font so it is easy to see.
A. Download the SPSS files ‘ActivityMonitors’. Open this file using SPSS.
This file contains approximately 60 subjects that wore both of the monitors for 4 days. Variables
include sex (0=male, 1=female), height, weight, body mass index, and stride length (distance covered
per step). You should also find the physical activity data, the pedometer estimated steps per day
(PedSteps) and calories burned (PedCal), as well as the armband estimated steps per day (ArmSteps)
and calories burned (ArmCal).
B. Prediction Practice
Just to get a sense of the dataset, create a correlation matrix using age, height, weight (in
kilograms), stride length, pedometer steps and armband calories. Look at the correlations between
body weight and these other variables.
1) Which variables have a statistically significant association with body weight? List the variables that
are statistically significant, with the resulting r values:
2) Which of the statistically significant variables explains the most variance in body weight? How much
variance does it explain?
Now, use simple linear regression to predict body weight from height (height should be your answer in
number 2 above). Go to ‘Analyze’ > ‘Regression’ > ‘Linear’. We are trying to predict weight (dependent
variable) from height (independent variable). Make sure you use weight in kilograms (which is actually
mass!
3) Is height a statistically significant predictor of weight?
4) Write out our prediction equation (BOLD the slope and underline the y-intercept):
5) Interpret the slope of our equation. “A one unit increase in height…”
6) What is the standard error of the estimate?
Create a new variable called “PredWeight”. Compute this variable using your new prediction equation
from number 5 above. This variable should be the predicted weight of all the subjects using their height.
Calculate the mean and standard deviation of PredWeight, as well as our original body weight variable.
7) What is the mean of “PredWeight”?
8) What is the mean of the original body weight variable?
9) Why are 7 and 8 identical?
Now, create a new variable called “WeightResid”. This variable should be the residuals for our line of
best fit. To do this, simply create the variable using this equation Weight_kg – PredWeight =
WeightResid.
10) What is the mean of WeightResid?
11) What is the standard deviation of WeightResid?
12) What is the technical term for the standard deviation of our residuals?
Notice, this hand-calculated SEE may not match your answer on #6 perfectly – but it will be very close.
SPSS uses a slightly different calculation, but for all practical purposes these numbers are identical.
C. Estimating Calories
Now, let’s re-do this process for calories, this is a much more important model. The pedometer
costs $25, the armband costs around $300. So, if we can accurately predict the armband calories from
just a simple pedometer, then we may be able to save some money. The armband has been tested
several times in research papers and it is one of the most accurate methods of determining daily activity
energy expenditure (calories burned). Both monitors are estimating calories burned from physical
activity – not total daily energy expenditure, so expect your numbers to be lower.
13) Pedometer Mean Calories (SD):
14) Armband Mean Calories (SD):
Remember, in this case, the armband is the ‘real’ value (it’s already been shown to be accurate).
15) Does the pedometer accurately estimate calories burned? How much error is there? Does the
pedometer over- or under-estimate calories?
16) What is the average difference in calories taken between the armband and the pedometer (note:
any difference here is error)?
17) What is the standard deviation of the error between the two monitors (hint: you’ll need to create a
new variable)?
18) Fill in the blanks: 68% of the error (between the armband and the pedometer calories) fall between
______________ calories and _____________ calories.
Let’s see if we can ‘correct’ this error using a regression equation.
Run Pearson correlations between pedometer calories and the armband calories.
19) What percentage of the variance in armband calories is accounted for by the pedometer calories?
Now, use simple linear regression to predict armband calories from pedometer calories. Go to ‘Analyze’
> ‘Regression’ > ‘Linear’.
20) Is pedometer calories a statistically significant predictor of armband calories?
21) Write out our prediction equation (BOLD the slope and underline the y-intercept):
22) Interpret the slope of our equation. A one calorie increase on the pedometer equals…
23) What is the standard error of the estimate?
24) How does your answer on #24 compare to your answer on #18? Would using this new prediction
equation make the pedometer more accurate?
Your answer on #24 should serve as a reminder that just because the p-value is low, and/or the r2 is
large, does not mean your regression model is ‘perfect’.
Save this word file and later in the week and let me know you have completed it before class next
Monday night.