Download Logistic regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component regression wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Lasso (statistics) wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
General linear models
One and Two-way ANOVA in SPSS
Repeated measures ANOVA
Multiple linear regression
2-way ANOVA in SPSS Example 14.1
2
2-way ANOVA in SPSS
Click Add
3
Repeated measures
The stroop test
BLUE
4
The model
5
Assumptions
6
If the sphericity does not hold…
a) The trick: Greenhouse and Geisser
b) Repeated MANOVA
c) More complex models
7
And now in SPSS
8
Multiple linear regression
•
•
•
•
•
Regression: One variable is considered dependent on the other(s)
Correlation: No variables are considered dependent on the other(s)
Multiple regression: More than one independent variable
Linear regression: The independent factor is scalar and linearly dependent on the
independent factor(s)
Logistic regression: The independent factor is categorical (hopefully only two
levels) and follows a s-shaped relation.
9
Remember the simple linear regression?
If Y is linearly dependent on X, simple linear
regression is used:
Y j    X j
 is the intercept, the value of Y when X = 0
 is the slope, the rate in which Y increases when X
increases
10
I the relation linear?
12
10
8
6
4
2
0
-2
-4
-3
-2
-1
0
1
2
3
11
Multiple linear regression
If Y is linearly dependent on more than one independent variable:
Y j    1 X 1 j   2 X 2 j
 is the intercept, the value of Y when X1 and X2 = 0
1 and 2 are termed partial regression coefficients
1 expresses the change of Y for one unit of X when 2 is kept constant
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
7
6
25
5
20
4
15
3
10
2
5
1
0
12
Multiple linear regression – residual error and estimations
As the collected data is not expected to fall in a plane an error term must be added
Y j    1 X 1 j   2 X 2 j   j
The error term sums up to be zero.
Estimating the dependent factor and the population parameters:
Yˆj  a  b1 X 1 j  b2 X 2 j
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
7
6
25
5
20
4
15
3
10
2
5
1
0
13
Multiple linear regression – general equations
In general an finite number (m) of independent variables may be used to estimate the
hyperplane
m
Y j      i X ij   j
i 1
The number of sample points must be two more than the number of variables
14
Multiple linear regression – least sum of squares
The principle of the least sum of squares are usually used to perform the fit:
 Y
n
j 1
j
 Yˆj

2
15
Multiple linear regression – An example
16
Multiple linear regression – The fitted equation
17
Multiple linear regression – Are any of the coefficients significant?
F = regression MS / residual MS
18
Multiple linear regression – Is it a good fit?
R2 = 1-regression SS / total SS
• Is an expression of how much of the
variation can be described by the model
• When comparing models with different
numbers of variables the adjusted R-square
should be used:
Ra2 = 1 – regression MS / total MS
The multiple regression coefficient:
R = sqrt(R2)
The standard error of the estimate =
sqrt(residual MS)
19
Multiple linear regression – Which of the coefficient are significant?
•
•
•
•
•
sbi is the standard error of the
regression parameter bi
t-test tests if bi is different from 0
t = bi / sbi
 is the residual DF
p values can be found in a table
20
Multiple linear regression – Which of the are most important?
•
The standardized regression
coefficient , b’ is a normalized
version of b
b bi
'
i
x
y
2
i
2
21
Multiple linear regression - multicollinearity
•
•
•
If two factors are well correlated the estimated b’s becomes inaccurate.
Collinearity, intercorrelation, nonorthogonality, illconditioning
Tolerance or variance inflation factors can be computed
•
Extreme correlation is called singularity and on of the correlated variables must be
removed.
22
Multiple linear regression – Pairvise correlation coefficients
rxy 
 xy ;  xy   X
x y
2
2
2
i  X Yi  Y ;  x    X i  X 
2
23
Multiple linear regression – Assumptions
The same as for simple linear regression:
1. Y’s are randomly sampled
2. The residuals are normal distributed
3. The residuals have equal variance
4. The X’s are fixed factors (their error are small).
5. The X’s are not perfectly correlated
24