Download Slide 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Transcript
A least squares regression line was fitted to the weights (in
pounds) versus age (in months) of a group of many
young children. The equation of the line is
yˆ  16.6  0.65t
Where y is the predicted weight and t is the age of the child.
A 20-month-old child in this group has an actual weight
of 25 pounds. Which of the following is the residual
weight, in pounds, for this child?
a.
-7.85
b.
-4.60
c.
4.60
d.
5.00
e.
7.85
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 1
A least squares regression line was fitted to the weights (in
pounds) versus age (in months) of a group of many
young children. The equation of the line is
yˆ  16.6  0.65t
Where y is the predicted weight and t is the age of the child. A
20-month-old child in this group has an actual weight of
25 pounds. Which of the following is the residual weight,
in pounds, for this child?
a.
-7.85
b.
-4.60
c.
4.60
d.
5.00
e.
7.85
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 2
Chapter 4
Re-expressing Data:
Get it Straight!
Copyright © 2010 Pearson Education, Inc.
Straight to the Point



We cannot use a linear model unless the relationship
between the two variables is linear. If the data is not
linear, you may need to re-expression the data,
straightening bent relationships so that we can fit and use
a simple linear model.
Simple ways to re-express data are with logarithms,
square roots and reciprocals.
Re-expressions can be seen in everyday life:
 The Richter scale of earthquake strength is expressed
in logs.
 Decibel scale for sound intensity is measured in logs.
 Gauges of shotguns are measured in square roots.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 4
Straight to the Point (cont.)

The relationship between fuel efficiency (in miles
per gallon) and weight (in pounds) for late model
cars looks fairly linear at first:
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 5
Straight to the Point (cont.)

A look at the residuals plot shows a problem:
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 6
Straight to the Point (cont.)

We can re-express fuel efficiency as gallons per
hundred miles (a reciprocal) and eliminate the
bend in the original scatterplot:
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 7
Straight to the Point (cont.)

A look at the residuals plot for the new model
seems more reasonable:
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 8
The Goals of Re-expression are:

Goal 1: Make the distribution of a variable (as
seen in its histogram, for example) more
symmetric.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 9
Goals of Re-expression (cont.)

Goal 2: Make the spread of several groups (as
seen in side-by-side boxplots) more alike, even if
their centers differ.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 10
Goals of Re-expression (cont.)

Goal 3: Make the form of a scatterplot more
nearly linear.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 11
Goals of Re-expression (cont.)

Goal 4: Make the scatter in a scatterplot spread
out evenly rather than thickening at one end.
 This can be seen in the two scatterplots we
just saw with Goal 3:
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 12
The Ladder of Powers


There is a family of simple re-expressions that
move data toward our goals in a consistent way.
This collection of re-expressions is called the
Ladder of Powers.
The Ladder of Powers orders the effects that the
re-expressions have on data.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 13
Monotonic Functions


Increasing Preserves order
 A>B then f(a)>f(b)
Decreasing  Reverses order
 A>B then f(a)<f(b)
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 14
The Ladder of Powers
Power Name
2
1
½
“0”
–1/2
–1
Comment
Square of
data values
Try with unimodal distributions that are
skewed to the left.
Data with positive and negative values
Raw data
and no bounds are less likely to benefit
from re-expression. (Linear)
Square root of Counts often benefit from a square root
data values
re-expression.
We’ll use
Measurements that cannot be negative
logarithms here often benefit from a log re-expression.
Reciprocal
An uncommon re-expression, but
square root
sometimes useful.
The reciprocal Ratios of two quantities (e.g., mph) often
of the data
benefit from a reciprocal.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 15
Plan B: Attack of the Logarithms



When none of the data values is zero or negative,
logarithms can be a helpful ally in the search for a
useful model.
Try taking the logs of both the x- and y-variable.
Then re-express the data using some
combination of x or log(x) vs. y or log(y).
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 16
Log Properties


Logb(mn)= Logb(m)+ Logb(n)
Logb(m/n)= Logb(m)- Logb(n)
log b m  p log b m
p
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 17
Plan B: Attack of the Logarithms (cont.)
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 18
Multiple Benefits


We often choose a re-expression for one reason
and then discover that it has helped other
aspects of an analysis.
For example, a re-expression that makes a
histogram more symmetric might also straighten
a scatterplot or stabilize variance.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 19
Why Not Just Use a Curve?

If there’s a curve in the scatterplot, why not just fit
a curve to the data?
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 20
Why Not Just Use a Curve? (cont.)


The mathematics and calculations for “curves of
best fit” are considerably more difficult than “lines
of best fit.”
Besides, straight lines are easy to understand.
 We know how to think about the slope and the
y-intercept.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 21
Linear vs. Exponential Growth


Linear Growth
 Increases by fixed amount to each period
Exponential Growth
 Increases by fixed percentage for each period
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 22

Example: For each of the
models listed below,
predict y when x = 3:
a. ln yˆ  1.2  .9 x
b. yˆ  1.2  0.9 x
c. 1
yˆ
 1.2  .9 x
d. yˆ  1.2  .9 ln x
e. log yˆ  1.2  0.9 log x
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 23
TI – Tips Re-expressing data




Check the scatterplot of Yr and TUIT. Then check the
residual plot. Don’t forget to calculate the regression line
so it fills the RESID list. Remember RESID is your “y”
when plotting the residual plot.
You decide because of the curve you need to do some reexpressing. But you don’t know which one to do. Let’s
start by using the square root. Fill L1 with the square root
of TUIT. Look at the scatterplot and residual plot
(recalculate the regression line to fill the RESID list.)
Still not good enough … change fill L1 with the log of
TUIT. Scatterplot and residual plot again. Better?!
Now, write the model of the re-expressed data. (it should
have a log in it!)
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 24
What Can Go Wrong?



Don’t expect your
model to be perfect.
Don’t stray too far
from the ladder.
Don’t choose a model
based on R2 alone:
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 25
What Can Go Wrong? (cont.)

Beware of multiple modes.


Re-expression cannot pull separate modes together.
Watch out for scatterplots that turn around.

Re-expression can straighten many bent
relationships, but not those that go up then down, or
down then up.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 26
What Can Go Wrong? (cont.)

Watch out for negative data values.
 It’s impossible to re-express negative values
by any power that is not a whole number on
the Ladder of Powers or to re-express values
that are zero for negative powers.
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 27
Homework: 1-17 odd (skip 3)
Copyright © 2010 Pearson Education, Inc.
Slide 10 - 28