Download ast3e_chapter12

Document related concepts

Time series wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 12:
Analyzing the Association
Between Quantitative
Variables: Regression Analysis
Section 12.1
Model How Two Variables Are Related
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Regression Analysis
The first step of a regression analysis is to identify the
response and explanatory variables.
3

We use y to denote the response variable.

We use x to denote the explanatory variable.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Scatterplot
The first step in answering the question of association
is to look at the data.
A scatterplot is a graphical display of the relationship
between the response variable (y) and the explanatory
variable (x).
4
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: The Strength Study
An experiment was designed to measure the strength
of female athletes.
The goal of the experiment was to determine if there is
an association between the maximum number of pounds
that each individual athlete could bench press and the
number of 60-pound bench presses that athlete could do.
5
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: The Strength Study
57 high school female athletes participated in the study.
The data consisted of the following variables:
 x: the number of 60-pound bench presses an
athlete could do.

6
y: maximum bench press.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: The Strength Study
For the 57 females in this study, these variables are
summarized by:
7

x: mean = 11.0, st. deviation = 7.1

y: mean = 79.9 lbs, st. dev. = 13.3 lbs
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: The Strength Study
Figure 12.1 Scatterplot for y=Maximum Bench Press and x=Number of 60-lb. Bench
Presses.
8
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Regression Line Equation
When the scatterplot shows a linear trend, a straight line
can be fitted through the data points to describe that trend.
The regression line is: y
ˆ  a  bx
ŷ is the predicted value of the response variable y,
a is the y-intercept and b is the slope.
9
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Regression Line Predicting
Maximum Bench Press
Table 12.1 MINITAB Printout for Regression Analysis of y=Maximum Bench Press
(BP) and x =Number of 60-Pound Bench Presses (BP_60).
TI-83+/84 output
10
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Regression Line Predicting
Maximum Bench Press
The MINITAB output shows the following regression
equation:
BP = 63.5 + 1.49 (BP_60)
The y-intercept is 63.5 and the slope is 1.49.
The slope of 1.49 tells us that predicted maximum bench
press increases by about 1.5 pounds for every additional
60-pound bench press an athlete can do.
11
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Outliers
Check for outliers by plotting the data.
The regression line can be pulled toward an outlier and
away from the general trend of points.
12
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Influential Points
An observation can be influential in affecting the
regression line when one or more of two things happen:
 Its x value is low or high compared to the rest of
the data.

13
It does not fall in the straight-line pattern that the
rest of the data have
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Residuals are Prediction Errors
The regression equation is often called a prediction
equation.
The difference y  yˆ between an observed outcome and
its predicted value is the prediction error, called a
residual.

14
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: Review of Residuals
Each observation has a residual.
A residual is the vertical distance between the data point
and the regression line. The smaller the distance, the
better the prediction.
15
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: Review of Residuals
We can summarize how near the regression line the data
points fall by
sum of squared residuals 
 (residuals)   ( y  yˆ )
2
2
The regression line has the smallest sum of squared
residuals and is called the least squares line.
16
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Regression Model: A Line Describes How
the Mean of y Depends on x
At a given value of x, the equation:
yˆ  a  bx
17

Predicts a single value of the response variable.

But… we should not expect all subjects at that
value of x to have the same value of y because
variability occurs in the y values.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: The Regression Line
The regression line connects the estimated means of y
at the various x values.
In summary,
yˆ  a  bx
 Describes the relationship between x and the
estimated means of y at the various values of x.
18
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Population Regression Equation
The population regression equation describes the
relationship in the population between x and the means
of y.
The equation is denoted by:  y
19
   x
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Population Regression Equation
In the population regression equation,  is a population
y-intercept and  is a population slope.
 These are parameters, so in practice their values
are unknown.
In practice we estimate the population regression
equation using the prediction equation for the sample
data.
20
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Population Regression Equation
The population regression equation merely
approximates the actual relationship between x and the
population means of y.
It is a model.
 A model is a simple approximation for how
variables relate in the population.
21
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Regression Model
Figure 12.2 The Regression Model     x for the Means of y Is a Simple
Approximation for the True Relationship. Question: Can you sketch a true relationship
for which this model is a very poor approximation?
y
22
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Regression Model
If the true relationship is far from a straight line, this
regression model may be a poor one.
Figure 12.3 The Straight-Line Regression Model Provides a Poor Approximation
When the Actual Relationship Is Highly Nonlinear. Question: What type of
mathematical function might you consider using for a regression model in this case?
23
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Variability about the Line
At each fixed value of x, variability occurs in the y values
around their mean,  y .
The probability distribution of y values at a fixed value of x is
a conditional distribution.
At each value of x, there is a conditional distribution of y
values.
An additional parameter  describes the standard deviation
of each conditional distribution.
24
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
A Statistical Model
A statistical model never holds exactly in practice.
It is merely an approximation for reality.
Even though it does not describe reality exactly, a model
is useful if the true relationship is close to what the model
predicts.
25
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 12
Analyzing the Association
Between Quantitative
Variables: Regression Analysis
Section 12.2
Describe Strength of Association
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: Properties of the Correlation, r
The correlation, denoted by r, describes linear association.
The correlation ‘r’ has the same sign as the slope ‘b’.
The correlation ‘r’ always falls between -1 and +1.
The larger the absolute value of r, the stronger the linear
association.
27
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Correlation and Slope
We can’t use the slope to describe the strength of the
association between two variables because the slope’s
numerical value depends on the units of measurement.
The correlation is a standardized version of the slope.
The correlation does not depend on units of measurement.
28
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Correlation and Slope
The correlation and the slope are related in the following
way:
s
r b
s
29
x
y
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Predicting Strength
For the female athlete strength study:
 x: number of 60-pound bench presses
 y: maximum bench press
 x: mean = 11.0, st.dev.=7.1
 y: mean= 79.9 lbs., st.dev. = 13.3 lbs.
Regression equation:
30
yˆ  63.5  1.49x
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Predicting Strength
s 
 7.1 
x
r  b
 0.80
s 
1.49
13.3
 y 
The variables have a strong, positive association.

31
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
Another way to describe the strength of association
refers to how close predictions for y tend to be to
observed y values.
The variables are strongly associated if you can predict y
much better by substituting x values into the prediction
equation than by merely using the sample mean y and
ignoring x.

32
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
Consider the prediction error: the difference between the
observed and predicted values of y.

Using the regression line to make a prediction,
each error is: y  yˆ .

Using only the sample mean, y , to make a
prediction, each error is: y  y .

33
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
When we predict y using
summary equals:

y (that is, ignoring x), the error
 ( y  y)
2
This is called the total sum of squares.
34
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
When we predict y using x with the regression equation,
the error summary is:
2
ˆ
 ( y  y)
This is called the residual sum of squares.
35
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
When a strong linear association exists, the
regression equation predictions tend to be much better
than the predictions using y .
We measure the proportional reduction in error and
2
r
call it, .

36
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
r 
2
 ( y  y )   ( y  yˆ )
2
 ( y  y)
2
2
2
r
We use the notation for this measure because it
equals the square of the correlation r .
37
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Strength Study
For the female athlete strength study:
 x: number of 60-pound bench presses
 y: maximum bench press
 The correlation value was found to be r  0.80
2
2
r
We can calculate
from r : (0.80)  0.64
For predicting maximum bench press, the regression
equation has 64% less error than y has.
38
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Squared Correlation
2
r
SUMMARY: Properties of :
2
 r falls between 0 and 1
2
r
 1 when ( y  yˆ )2  0, this happens only when

all the data points fall exactly on the regression line.
2
2
2
ˆ
r

0

(
y

y
)


(
y

y
)

when
, this happens when
the slope b  0 , in which case each ŷ  y .
2
r
 The closer
is to 1, the stronger the linear
association: the more effective the regression
equation is compared to y in predicting y.
39

Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
SUMMARY: Correlation r and Its Square r
2
r
Both r and describe the strength of association.
‘r’ falls between -1 and +1.
 It represents the slope of the regression line when
x and y have equal standard deviations.
2
‘ r ’ falls between 0 and 1.
 It summarizes the reduction in sum of squared
errors in predicting y using the regression line
instead of using y .
40
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
2
Chapter 12
Analyzing the Association
Between Quantitative
Variables: Regression Analysis
Section 12.3
Make Inferences About the Association
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Descriptive and Inferential Parts of Regression
2
r
The sample regression equation, r, and are descriptive
parts of a regression analysis.
The inferential parts of regression use the tools of
confidence intervals and significance tests to provide
inference about the regression equation, the correlation
and r-squared in the population of interest.
42
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Assumptions for Regression Analysis
SUMMARY: Basic assumption for using regression line
for description:

The population means of y at different values of x
have a straight-line relationship with x, that is:
 y    x


43
This assumption states that a straight-line
regression model is valid.
This can be verified with a scatterplot.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Assumptions for Regression Analysis
SUMMARY: Extra assumptions for using regression to
make statistical inference:
44

The data were gathered using randomization.

The population values of y at each value of x
follow a normal distribution, with the same
standard deviation at each x value.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Assumptions for Regression Analysis
Models, such as the regression model, merely
approximate the true relationship between the variables.
A relationship will not be exactly linear, with exactly
normal distributions for y at each x and with exactly the
same standard deviation of y values at each x value.
45
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Testing Independence between
Quantitative Variables
Suppose that the slope  of the regression line equals 0
Then…


46
The mean of y is identical at each x value.
The two variables, x and y, are statistically
independent:
 The outcome for y does not depend on the
value of x.
 It does not help us to know the value of x if we
want to predict the value of y.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Testing Independence between
Quantitative Variables
Figure 12.8 Quantitative Variables x and y Are Statistically Independent When the True
Slope  = 0. Each normal curve shown here represents the variability in y values at a
particular value of x. When  = 0, the normal distribution of y is the same at each value of x.
Question: How can you express the null hypothesis of independence between x and y in
terms of a parameter from the regression model?
47
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Testing Independence between
Quantitative Variables
SUMMARY: Steps of Two-Sided Significance Test about
a Population Slope  :
1. Assumptions:
 The population satisfies regression line:
    x
y


48
Data obtained using randomization
The population values of y at each value of x
follow a normal distribution, with the same
standard deviation at each x value.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Testing Independence between
Quantitative Variables
SUMMARY: Steps of Two-Sided Significance Test about
a Population Slope  :
2. Hypotheses:
H 0 :   0, H a :   0
3. Test statistic:

49
b0
t
se
Software supplies sample slope
b and its se
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Testing Independence between
Quantitative Variables
SUMMARY: Steps of Two-Sided Significance Test about
a Population Slope  :
4. P-value: Two-tail probability of t test statistic value
more extreme than observed:
Use t distribution with df  n  2
5. Conclusions: Interpret P-value in context. If decision
needed, reject H 0 if P-value  significance level.
50
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: 60-Pound Strength and Bench
Presses
Table 12.4 MINITAB Printout for Regression Analysis of y=Maximum Bench Press
(BP) and x=Number of 60-Pound Bench Presses (BP_60).
51
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: 60-Pound Strength and Bench
Presses
Conduct a two-sided significance test of the null hypothesis of
independence.
1. Assumptions:
52

A scatterplot of the data revealed a linear trend so the
straight-line regression model seems appropriate.

The scatter of points have a similar spread at different x
values.

The sample was a convenience
sample, not a random sample,
so this is a concern.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: 60-Pound Strength and Bench
Presses
2. Hypotheses:
H 0 :   0, H a :   0
3. Test statistic:
b  0 (1.49  0)
t

 9.96
se
0.150
4. P-value: 0.000
5. Conclusion: An association exists between the number of
60-pound bench presses and maximum bench press.
53
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
A Confidence Interval for 
A small P-value in the significance test of H 0 :   0
suggests that the population regression line has a
nonzero slope.
To learn how far the slope
confidence interval:

falls from 0, we construct a
b  t.025 (se) with df  n  2
54
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Estimating the Slope for
Predicting Maximum Bench Press
Construct a 95% confidence interval for  .
1.49  2.00(0.150) whichis :
1.49  0.30 or (1.2,1.8)
Based on a 95% CI, we can conclude, on average, the
maximum bench press increases by between 1.2 and 1.8
pounds for each additional 60-pound bench press that an
athlete can do.
55
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Estimating the Slope for
Predicting Maximum Bench Press
Let’s estimate the effect of a 10-unit increase in x:
56

Since the 95% CI for  is (1.2, 1.8), the 95% CI for
10 is (12, 18).

On the average, we infer that the maximum bench
press increases by at least 12 pounds and at most
18 pounds, for an increase of 10 in the number of
60-pound bench presses.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 12
Analyzing the Association
Between Quantitative
Variables: Regression Analysis
Section 12.4
How the Data Vary Around the Regression
Line
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Residuals and Standardized Residuals
A residual is a prediction error – the difference between
an observed outcome and its predicted value.
 The magnitude of these residuals depends on the
units of measurement for y.
A standardized version of the residual does not depend
on the units.
58
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Standardized Residuals
Standardized residual =
( y  yˆ )
se( y  yˆ )
The se formula is complex, so we rely on software to find it.
A standardized residual indicates how many standard
errors a residual falls from 0.
If the relationship is truly linear and the standardized
residuals have approximately a bell-shaped distribution,
observations with standardized residuals larger than 3 in
absolute value often represent outliers.
59
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Detecting an Underachieving
College Student
Data was collected on a sample of 59 students at the
University of Georgia.
Two of the variables were:
 CGPA: College Grade Point Average

60
HSGPA: High School Grade Point Average
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Detecting an Underachieving
College Student
A regression equation was created from the data:


x: HSGPA
y: CGPA
Equation:
61
yˆ  1.19  0.64x
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Detecting an Underachieving
College Student
Table 12.6 Observations with Large Standardized Residuals in Student GPA
Regression Analysis, as Reported by MINITAB
62
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Detecting an Underachieving
College Student
Consider the reported standardized residual of -3.14.
63

This indicates that the residual is 3.14 standard
errors below 0.

This student’s actual college GPA is quite far
below what the regression line predicts.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Analyzing Large Standardized Residuals
Does it fall well away from the linear trend that the other
points follow?
Does it have too much influence on the results?
Note: Some large standardized residuals may occur just
because of ordinary random variability - even if the model
is perfect, we’d expect about 5% of the standardized
residuals to have absolute values > 2 by chance.
64
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Histogram of Residuals
A histogram of residuals or standardized residuals is a
good way of detecting unusual observations.
A histogram is also a good way of checking the
assumption that the conditional distribution of y at each x
value is normal.

65
Look for a bell-shaped histogram.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Histogram of Residuals
Suppose the histogram is not bell-shaped:
 The distribution of the residuals is not normal.
However….
 Two-sided inferences about the slope parameter
still work quite well.

66
The t-inferences are robust.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Residual Standard Deviation
For statistical inference, the regression model assumes
that the conditional distribution of y at a fixed value of x is
normal, with the same standard deviation at each x.
This standard deviation, denoted by  , refers to the
variability of y values for all subjects with the same x
value.
67
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
The Residual Standard Deviation
The estimate of  , obtained from the data, is called
the residual standard deviation:
s
68
 ( y  yˆ )
2
n2
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Variability of the Athletes’
Strengths
From MINITAB output, we obtain s, the residual
standard deviation of y:
3522.8
s
 8.0
55
For any given x value, we estimate the mean y value
using the regression equation and we estimate the
standard deviation using s = 8.0.
69
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Confidence Interval for  y
We can estimate  y , the population mean of y at a given
value of x by: yˆ  a  bx
We can construct a 95% confidence interval for  y
using:
yˆ  t.025(se of yˆ )

where the t-score has
df  n  2

70
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Prediction Interval for y
The estimate yˆ  a  bx for the mean of y at a fixed value
of x is also a prediction for an individual outcome y at the
fixed value of x.

Most regression software will form this interval within
which an outcome y is likely to fall.
yˆ  2s
where s is the residual standard deviation
71
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Prediction Interval for y vs. Confidence
Interval for  y
The prediction interval for y is an inference about where
individual observations fall.

72
Use a prediction interval for y if you want to predict
where a single observation on y will fall for a
particular x value.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Prediction Interval for y vs. Confidence
Interval for  y
The confidence interval for  y is an inference about
where a population mean falls.

Use a confidence interval for  y if you want to
estimate the mean of y for all individuals having a
particular x value.

yˆ  2 s
n

where s is the residual standard deviation
73

Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Prediction Interval for y vs. Confidence
Interval for  y
Note that the prediction interval is wider than the
confidence interval - you can estimate a population mean
more precisely than you can predict a single observation.
Caution: In order for these intervals to be valid, the true
relationship must be close to linear with about the same
variability of y-values at each fixed x-value.
74
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Maximum Bench Press and Estimating its
Mean
Table 12.7 MINITAB Output for Confidence Interval (CI) and Prediction Interval (PI) on
Maximum Bench Press for Athletes Who Do Eleven 60-Pound Bench Presses before
Fatigue.
75
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Maximum Bench Press and Estimating its
Mean
Use the MINITAB output to find and interpret a 95% CI
for the population mean of the maximum bench press
values for all female high school athletes who can do
x = 11 sixty-pound bench presses.
For all female high school athletes who can do 11
sixty-pound bench presses, we estimate the mean of
their maximum bench press values falls between 78 and
82 pounds.
76
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Maximum Bench Press and Estimating its
Mean
Use the MINITAB output to find and interpret a 95%
Prediction Interval for a single new observation on the
maximum bench press for a randomly chosen female
high school athlete who can do x = 11 sixty-pound bench
presses.
For all female high school athletes who can do 11
sixty-pound bench presses, we predict that 95% of them
have maximum bench press values between 64 and 96
pounds.
77
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Chapter 12
Analyzing the Association
Between Quantitative
Variables: Regression Analysis
Section 12.5
Exponential Regression: A Model for
Nonlinearity
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Nonlinear Regression Models
If a scatterplot indicates substantial curvature in a
relationship, then equations that provide curvature are
needed.
 Occasionally a scatterplot has a parabolic
appearance: as x increases, y increases then it
goes back down.

79
More often, y tends to continually increase or
continually decrease but the trend shows
curvature.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Exponential Growth in
Population Size
Since 2000, the population of the U.S. has been growing
at a rate of 2% a year.







80
The population size in 2010 was 309 million
The population size in 2011 was 309 x 1.02
The population size in 2012 was 309 x (1.02)2
…
The population size in 2020 is estimated to be
309 x (1.02)10
This is called exponential growth
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Exponential Regression Model
An exponential regression model has the formula:
  
x
y

81
For the mean  y of y at a given value of x,
where  and  are parameters.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Exponential Regression Model
In the exponential regression equation, the explanatory
variable x appears as the exponent of a parameter.
The mean  y and the parameter  can take only positive
values.
82
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Exponential Regression Model
 As x increases, the mean  y increases when 
 It continually decreases when 0    1
1
x
Figure 12.11 The Exponential Regression Curve for  y   . Question: Why does  y
decrease if   0.5 , even though   0 ?
83
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Exponential Regression Model
For exponential regression, the logarithm of the mean is
a linear function of x.
When the exponential regression model holds, a plot of
the log of the y values versus x should show an
approximate straight-line relation with x.
84
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Explosion in Number of
Facebook Users
Table 12.9 Number of Facebook Users Worldwide (in Millions).
85
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Explosion in Number of
Facebook Users
Figure 12.12 Plot of Number of Facebook Users (millions) from December 2004 to
June 2011.
86
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Explosion in Number of
Facebook Users
Figure 12.13 Plot of Log of Number of Facebook Users Between 2004 and 2011. When
the log of the response has an approximate straight-line relationship with the explanatory
variable, the exponential regression model is appropriate.
87
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Explosion in Number of
Facebook Users
Using regression software, we can create the
exponential regression equation:
88

x: the number of days since December 1, 2004.
Start with x = 0 for 12/1/2004, then x=1 for
12/2/2004, etc.

y: number of internet users

x
Equation: yˆ  1.956(1.003)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Interpreting Exponential Regression
Models
In the exponential regression model,
  
x
y
89

the parameter  represents the mean value of y
when x = 0;

the parameter  represents the multiplicative effect
on the mean of y for a one-unit increase in x.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.
Example: Explosion in Number of
Facebook Users
In this model: yˆ  1.956(1.003) x
The predicted number of Facebook users on 12/1/2004
(for which x = 0) is 1.956 million.
The predicted number of Internet users on 12/1/2015 is
1.956 times 1.0034017 which is approximately 120 billion
people.
90
Copyright © 2013, 2009, and 2007, Pearson Education, Inc.