Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression toward the mean wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
MAT 254 – Probability and Statistics
Sections 1,2 & 3
2015 - Spring
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
Importance and basic concepts of Probability and Statistics. Introduction to
Statistics and data analysis
Data collection and presentation
Measures of central tendency; mean, median, mode
Probability
Conditional probability
Discrete probability distributions
Continuous probability distributions
Midterm Exam (April 1, 17:30)
Hypothesis testing
(2 weeks)
Student t-test
(2 weeks)
Chi-square
11)Correlation
12)
and regression analysis
REVIEW
Final Exam (May 25- June 7)
web.adu.edu.tr/user/oboyaci
MAT254 - Probability & Statistics
2
CORRELATION
The correlations term is used when:
1) Both variables are random variables,
2) The end goal is simply to find a number that expresses the
relation between the variables
REGRESSION
The regression term is used when
1)
One of the variables is a fixed variable,
2)
The end goal is use the measure of relation to predict values
of the random variable based on values of the fixed variable
MAT254 - Probability & Statistics
3
11 - 4
Copyright © 2010 Pearson
Addison-Wesley. All rights
reserved.
11 - 5
Copyright © 2010 Pearson
Addison-Wesley. All rights
reserved.
11 - 6
Copyright © 2010 Pearson
Addison-Wesley. All rights
reserved.
11 - 7
Copyright © 2010 Pearson
Addison-Wesley. All rights
reserved.


A scatter plot (or scatter diagram) is used to
show the relationship between two variables
Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
◦ Only concerned with strength of the
relationship
◦ No causal effect is implied
Chap 14-8
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Linear relationships
y
Curvilinear relationships
y
x
y
x
y
x
x
Chap 14-9
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
Strong relationships
y
Weak relationships
y
x
y
x
y
x
x
Chap 14-10
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
No relationship
y
x
y
x
Chap 14-11
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)


Correlation measures the strength of
the linear association between two
variables
The sample correlation coefficient r is
a measure of the strength of the linear
relationship between two variables,
based on sample observations
Chap 14-12
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.





Unit free
Range between -1 and 1
The closer to -1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
Chap 14-13
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
y
y
y
x
r = -1
r = -.6
y
x
x
r=0
y
r = +.3
x
r = +1
Chap 14-14
x
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Sample correlation coefficient:
r
 ( x  x)( y  y)
[ ( x  x ) ][  ( y  y ) ]
2
2
or the algebraic equivalent:
r
n xy   x  y
[n( x 2 )  ( x )2 ][n(  y 2 )  ( y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Chap 14-15
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Tree
Height
Trunk
Diameter
y
x
xy
y2
x2
35
8
280
1225
64
49
9
441
2401
81
27
7
189
729
49
33
6
198
1089
36
60
13
780
3600
169
21
7
147
441
49
45
11
495
2025
121
51
12
612
2601
144
=321
=73
=3142
=14111
=713
Chap 14-16
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
Tree
Height,
y
r
70
n xy   x  y
[n(  x 2 )  (  x) 2 ][n(  y 2 )  (  y) 2 ]
60

50
40
8(3142)  (73)(321)
[8(713)  (73)2 ][8(14111)  (321)2 ]
 0.886
30
20
10
0
0
2
4
6
8
10
Trunk Diameter, x
12
14
r = 0.886 → relatively strong positive
linear association between x and y
Chap 14-17
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.

Regression analysis is used to:
◦ Predict the value of a dependent variable based
on the value of at least one independent
variable
◦ Explain the impact of changes in an
independent variable on the dependent variable
Dependent variable: the variable we wish
to explain
Independent variable: the variable used to
explain the dependent variable
Chap 14-18
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.



Only one independent variable, x
Relationship between x and y is
described by a linear function
Changes in y are assumed to be
caused by changes in x
Chap 14-19
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.

Error values (ε) are statistically independent

Error values are normally distributed for any
given value of x



The probability distribution of the errors is
normal
The distributions of possible ε values have
equal variances for all values of x
The underlying relationship between the x
variable and the y variable is linear
Chap 14-20
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Chap 14-21
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
The population regression model:
Population
y intercept
Dependent
Variable
Population
Slope
Coefficient
Independent
Variable
y  β0  β1x  ε
Linear component
Random
Error
term, or
residual
Random Error
component
Chap 14-22
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
y
y  β0  β1x  ε
(continued)
Observed Value
of y for xi
εi
Predicted Value
of y for xi
Slope = β1
Random Error
for this x value
Intercept = β0
xi
x
Chap 14-23
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
The sample regression line provides an estimate of
the population regression line
Estimated
(or predicted)
y value
Estimate of
the regression
intercept
Estimate of the
regression slope
ŷ i  b0  b1x
Independent
variable
The individual random error terms ei have a mean of zero
Chap 14-24
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.

b0 and b1 are obtained by finding the
values of b0 and b1 that minimize the
sum of the squared residuals
e
2

 (y ŷ)

 (y  (b
2
0
 b1x))
Chap 14-25
2
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.

The formulas for b1 and b0 are:
b1
(x  x)(y  y)


 (x  x)
and
b0  y  b1x
algebraic equivalent for b1:
2
b1 
x y

 xy 
n
2
(
x
)

2
x  n
Chap 14-26
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.


b0 is the estimated average value
of y when the value of x is zero
b1 is the estimated change in the
average value of y as a result of a
one-unit change in x
Chap 14-27
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.


A real estate agent wishes to examine the
relationship between the selling price of a
home and its size (measured in square feet)
A random sample of 10 houses is selected
◦ Dependent variable (y) = house price in
$1000s
◦ Independent variable (x) = square feet
Chap 14-28
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
House Price in $1000s
(y)
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Chap 14-29
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
House price model: scatter plot and
regression line
450
House Price ($1000s)

Intercept
= 98.248
400
350
Slope
= 0.10977
300
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
house price  98.24833  0.10977 (square feet)
Chap 14-30
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
house price  98.24833  0.10977 (square feet)

b0 is the estimated average value of Y when
the value of X is zero (if x = 0 is in the range
of observed x values)
◦ Here, no houses had 0 square feet, so b0 =
98.24833 just indicates that, for houses within the
range of sizes observed, $98,248.33 is the portion
of the house price not explained by square feet
Chap 14-31
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
house price  98.24833  0.10977 (square feet)

b1 measures the estimated change in
the average value of Y as a result of a
one-unit change in X
◦ Here, b1 = .10977 tells us that the average value
of a house increases by .10977($1000) = $109.77,
on average, for each additional one square foot of
size
Chap 14-32
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.




The sum of the residuals from the least squares
regression line is 0 
( (y yˆ )  0
)
The sum of the squared residuals is a minimum
ˆ )2 )
(minimized (y y

The simple regression line always passes
through the mean of the y variable and the
mean of the x variable
The least squares coefficients are unbiased
estimates of β0 and β1
Chap 14-33
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.

Total variation is made up of two parts:
SST  SSE  SSR
Total sum of
Squares
SST  ( y  y)2
Sum of Squares
Error
SSE  ( y  ŷ)2
Sum of Squares
Regression
SSR  ( ŷ  y)2
where:
y = Average value of the dependent variable
y = Observed values of the dependent variable
valueStatistics: A
ŷ = Estimated value of y for the given xBusiness
Chap 14-34
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)

SST = total sum of squares
◦ Measures the variation of the yi values around their
mean y

SSE = error sum of squares
◦ Variation attributable to factors other than the
relationship between x and y

SSR = regression sum of squares
◦ Explained variation attributable to the relationship
between x and y
Chap 14-35
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
y
yi
 2
SSE = (yi - yi )

y
_

y
SST = (yi - y)2
 _2
SSR = (yi - y)
_
y
_
y
x
Xi
Chap 14-36
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.


The coefficient of determination is the
portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
The coefficient of determination is also
called R-squared and is denoted as R2
SSR
R 
SST
2
where
0  R2  1
Chap 14-37
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
y
R2 = 1
R2 = 1
x
100% of the variation in y is
explained by variation in x
y
R2
= +1
Perfect linear relationship
between x and y:
x
Chap 14-38
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
y
0 < R2 < 1
x
Weaker linear relationship
between x and y:
Some but not all of the
variation in y is explained
by variation in x
y
x
Chap 14-39
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
R2 = 0
y
No linear relationship
between x and y:
R2 = 0
x
The value of Y does not
depend on x. (None of the
variation in y is explained
by variation in x)
Chap 14-40
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R 

SST
total sum of squares
2
Note: In the single independent variable case, the coefficient
of determination is
R r
2
2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Chap 14-41
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
House Price
in $1000s
(y)
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Estimated Regression Equation:
house price  98.25  0.1098 (sq.ft.)
Predict the price for a house
with 2000 square feet
Chap 14-42
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
(continued)
Predict the price for a house
with 2000 square feet:
house price  98.25  0.1098 (sq.ft.)
 98.25  0.1098(200 0)
 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Chap 14-43
Business Statistics: A
Decision-Making Approach, 7e
© 2008 Prentice-Hall, Inc.
END OF THE LECTURE…
MAT254 - Probability & Statistics
44