Download Stats ch03.s03-3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 3
Summarizing Descriptive
Relationships
©
Scatter Plots
1.
2.
3.
4.
We can prepare a scatter plot by placing one
point for each pair of two variables that
represent an observation in the data set. The
scatter plot provides a picture of the data
including the following:
Range of each variable;
Pattern of values over the range;
A suggestion as to a possible relationship
between the two variables;
Indication of outliers (extreme points).
Scatter Plots
(Excel Example)
Rent vs. Apartment Size (Sq. Ft)
2500
Rent
2000
1500
1000
500
0
0
500
1000
1500
Size
2000
2500
Covariance
The covariance is a measure of the linear
relationship between two variables. A positive value
indicates a direct or increasing linear relationship and
a negative value indicates a decreasing linear
relationship. The covariance calculation is defined
by the equation
n
Cov( x, y )  s xy 
 ( x  X )( y  Y )
i 1
i
i
n 1
where xi and yi are the observed values, X and Y are the
sample means, and n is the sample size.
Covariance
Scatter Plots of Idealized
Positive and Negative Covariance
y
y
* *
* **
* * *
* *
*
**
Y
Y
* *
*
* * *
**
* *
**
X
x
X
Positive Covariance
Negative Covariance
(Figure 3.5a)
(Figure 3.5b)
x
Correlation Coefficient
The sample correlation coefficient, rxy, is
computed by the equation
Cov( x, y)
rxy 
sx s y
Correlation Coefficient
The correlation ranges from –1 to +1
with,
•
•
•
rxy = +1 indicates a perfect positive
linear relationship – the X and Y points
would plot an increasing straight line.
rxy = 0 indicates no linear relationship
between X and Y.
rxy = -1 indicates a perfect negative
linear relationship – the X and Y points
would plot a decreasing straight line.
Correlation Coefficient
 Positive correlations indicate
positive or increasing linear
relationships with values closer to
+1 indicating data points closer to a
straight line and closer to 0
indicating greater deviations from a
straight line.
Correlation Coefficient
Negative correlations indicate
decreasing linear relationships with
values closer to –1 indicating points
closer to a straight line and closer to 0
indicating greater deviations from a
straight line.
Scatter Plots and Correlation
Y
X
(a) r = .8
Scatter Plots and Correlation
Y
X
(b)r = -.8
Scatter Plots and Correlation
Y
X
(c) r = 0
Linear Relationships
Linear relationships can be represented by the basic
equation
Y   0  1 X
where Y is the dependent or endogenous variable that
is a function of X the independent or exogenous
variable. The model contains two parameters, 0 and 1
that are defined as model coefficients. The coefficient
0, is the intercept on the Y-axis and the coefficient 1 is
the change in Y for every unit change in X.
Linear Relationships
The nominal assumption made in linear
applications is that different values of X can be
set and there will be a corresponding mean value
of Y that results because of the underlying linear
process being studied. The linear equation model
computes the mean of Y for every value of X.
This idea is the basis for obtaining many
economic and business procedures including
demand functions, production functions,
consumption functions, sales forecasts,and many
other application areas.
Linear Function and Data Points
yˆ  b0  b1 x1
y
Yi
ei
(x1i, yi)
x1
Least Squares Regression
Least Squares Regression is a
technique used to obtain estimates (i.e.
numerical values) for the linear
coefficients 0 and 1. These estimates
are usually defined as b0 and b1
respectively.
Excel Regression Output
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.560264658
0.313896487
0.303341048
0.330316589
67
ANOVA
df
Regression
Residual
Total
Intercept
Satverb
SS
1
65
66
MS
F
Significance F
3.244673026 3.244673026 29.73789125
8.22E-07
7.092088168 0.109109049
10.33676119
Coefficients Standard Error t Stat
P-value
1.6384
0.2679
6.1147
0.0274
0.005
5.4532
The y-intercept, b0 = 1.6384 and the slope b1 = 0.0274
Lower 95%
Upper 95%
0
1.1033
2.1735
0
0.0174
0.0375
Cross Tables
Cross Tables present the number of
observations that are defined by the
joint occurrence of specific intervals
for two variables. The combination
of all possible intervals for the two
variables defines the cells in a table.
Cross Tables
Lumbe
Paint
r
Area
Tools
None
Total
East
100
50
50
50
250
North
50
95
45
60
250
West
65
70
75
40
250
215
215
170
150
750
Cross Table of Household Demand for Products by Residence
Key Words
Least Squares
Estimation Procedure
Least Squares
Regression
Sample Correlation
Coefficient
Sample Covariance
Scatter Plot