Download Chapter 10: Re-Expressing Data: Get it Straight

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 10: Re-Expressing Data:
Get it Straight
AP Statistics
Weight vs. Fuel
Efficieny
Describe the relationship.
How accurate is the
model? The R-squared
value is 81.6%
Is the model appropriate
for the data? Look at
residual plot.
Residual Plot
Is the model appropriate
for the data?
Look at the pattern in the
residual plot. This shows
that a linear model is not
appropriate!!
What if linear model is not
appropriate?
We need to re-express it so that it is “linear”.
Then we can proceed like normal—with Rsquared and prediction (and more).
In this situation, we will take the reciprocal of
the fuel efficiency (so, instead of mpg it will be
gallons per mile [1/y])
Re-expressed data, with residual
Note: Our residual plot is of transformed
variable
Why re-express?
• There will be many reasons, but in this
example, suppose you want to predict the gas
mileage of a Hummer (about 6400 pounds). If
we use the non-re-expressed data, it would
say that the gas mileage would be about 0. In
the re-expressed data, it would say about 10.3
mpg (after “undoing” the re-expression).
Why do we Re-Express?
1. Make the distribution
symmetric. It is easier
to summarize the data
(esp. the center) and it
also makes it possible to
use mean and standard
deviation, which allows
us to use a normal
curve to predict.
Why do we Re-Express?
2. Make the spread of
several groups more
alike. Groups that
share a common spread
are easier to compare.
Only can be used in SD
that are common.
Why do we Re-Express?
3. Make form of
4. Make scatter in
Scatterplot more nearly
scatterplot spread out
linear. This allows us to
evenly, rather than
describe the relationhip
following a fan shape.
easier—allows us to use
This will be a
a linear model and all
requirement later on in
that goes with it.
course—related to #2
What re-expression works for
Scatterplots?
What re-expression works for
Scatterplots?
Example
Look at shape of distribution. What reexpression should we use?
Example
Example
Predict the length of a flight in
which the plane is traveling 480
mph.
Other hints to finding proper reexpression
Logarithms can be very useful in re-expressing
data to achieve linearity. However, the data
needs to have values greater than zero.
When you look at the scatterplot, you may
recognize a pattern from prior courses.
The chart on next page will help determine
which re-expression to use when you
recognize the graph
Type of
Model
Exponential
Logarithmic
Power
Model
Equation
Transformation ReExpression
Equation


x
,
y

yˆ  ab
x, log  y 
 x, y  
yˆ 
a  b ln x log  x , y 
x
yˆ  ax
b
log yˆ 
a  bx
yˆ 
a  b log x
 x, y  
log yˆ 
log x , log  y  a  b log x
Logarithmic Function
Exponential Function
Power Function
Why Not Just a Curve?
• Straight lines are easy to understand.
• We understand and can interpret the slope
and y-intercept
• We may want some of the other benefits from
re-expressing data, such as symmetry or more
equal spreads
• Is very important when we learn about
Inferences for Regression
Be Careful
• Don’t expect the re-expressed model to be perfect
• Don’t choose a model based on R-squared value alone—look at
residual plot!!!
• Multiple modes will not disappear when re-expressed
• Don’t try to re-express data that is like a rollercoaster
• If negative data values—add a small value to make the data greater
than zero, the re-express (can’t take log of zero or negative number)
• If data values are far from one, the re-expression will have a smaller
effect than if the data values are closer to one—subtract a constant
to get closer to one—if years, use years away from a constant.
Instead of 1950, use idea of “years since 1949, and use 1.
• SIMPLICITY!!!!!
The data below shows the results of an experiment that was attempting to
find the relationship between the about of time a cup of coffee is left out and
the temperature of that cup of coffee. The results are shown below.
Time
(min)
Temp
(F)
19
133
22
122
24
121
27
114
30
110
33.5
108
37
102
a. Create a scatterplot of the data and
describe the relationship.
b. Create an appropriate model for this data.
Check it’s appropriateness.
c. How accurately is the model in predicting
the temperature of the cup of coffee? Give
evidence.
d. Predict the temperature of a cup of coffee
that has been sitting out for 35 minutes.
Show your work.
e. Did the model underestimate or
overestimate the temperature of a cup of
coffee that has been sitting out for 30 min?
Show all work.