Download Analysis of Variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Bias of an estimator wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Analysis of Variance:
Some Review and Some New
Ideas
Remember the concepts of variance and the
standard deviation…
• Variance is the square of the standard deviation
• Standard deviation (s) - the square root of the sum of the squared
deviations from the mean divided by the number of cases.
• See p. 47 in the text.
• We now want to use these concepts in regression analysis.
• We will be learning a new statistical test, the F test, which we will use
to assess the statistical significance of a regression equation (not just
the coefficients)
We will also use Analysis of Variance
(ANOVA)…
• To compare difference of more than two means….
• Which we’ve done to date with a T test.
Equations
• Mean
• Variance
• Standard Deviation
• Coefficient of Variation
Steps for calculating variance
• 1. Calculate the mean of a variable
• 2. Find the deviations from the mean: subtract the variable
mean from each case
• 3. Square each of the deviations of the mean
• 4. The variance is the mean of the squared deviations from
the mean, so sum the squared deviations from step 3 and
divide by the number of cases.
• (When we did these steps before we were interested in going
on to calculate a standard deviation and coefficient of
variation. Now we’ll just stick with variance.)
Calculating Variance
PERSONS
6
6
9
3
4
8
10
5
13
9
2
10
11
2
8
12
11
7
7
6
• 1. Calculate the mean of a
variable
• 2. Find the deviations from
the mean: subtract the
variable mean from each
case
PERSONSMEAN
DEVIATIONS
6
7.45
1.45
6
7.45
1.45
9
7.45
-1.55
3
7.45
4.45
4
7.45
3.45
8
7.45
-0.55
10
7.45
-2.55
5
7.45
2.45
13
7.45
-5.55
9
7.45
-1.55
2
7.45
5.45
10
7.45
-2.55
11
7.45
-3.55
2
7.45
5.45
8
7.45
-0.55
12
7.45
-4.55
11
7.45
-3.55
7
7.45
0.45
7
7.45
0.45
6
7.45
1.45
Calculating Variance, cont.
• 3. Square each of the deviations
of the mean
• 4. The variance is the mean of
the squared deviations from the
mean, so sum the squared
deviations from step 3 and
divide by the number of cases
• The Sum of the squared
deviations = 198.950
• Variance = 198.950/20 =
9.948
PERSONSMEAN
DEVIATIONSDEVSQARD
6
7.45
1.45
2.1025
6
7.45
1.45
2.1025
9
7.45
-1.55
2.4025
3
7.45
4.45
19.8025
4
7.45
3.45
11.9025
8
7.45
-0.55
0.3025
10
7.45
-2.55
6.5025
5
7.45
2.45
6.0025
13
7.45
-5.55
30.8025
9
7.45
-1.55
2.4025
2
7.45
5.45
29.7025
10
7.45
-2.55
6.5025
11
7.45
-3.55
12.6025
2
7.45
5.45
29.7025
8
7.45
-0.55
0.3025
12
7.45
-4.55
20.7025
11
7.45
-3.55
12.6025
7
7.45
0.45
0.2025
7
7.45
0.45
0.2025
6
7.45
1.45
2.1025
A New Concept: Sum of Squares
• The sum of the square deviations from the mean is called the Sum of
Squares
• Remember when we know nothing else about an interval variable, the best
estimate of it is its mean.
• By extension, the sum of squares is the best estimate of the sum of
squared deviations if we know nothing else about the variable.
• But….when we have more information, for example in a statistically
significant bivariate regression model, we can improve on the best
estimate of the dependent variable by using the information from the
independent variable to estimate it.
The regression equation is a better estimator
of food costs than the mean of food costs.
Calculating Total Sum of Squares
• Multiply the variance by N-1, so Total Sum of Squares =
8127.019*(638-1)
Statistics
TOTAL FOOD COSTS
N
Valid
638
Missing 0
Mean
270.2310
Variance
8127.019
Calculations for the Regression sum of
Squares
• Regression sum of squares equals the sum of the squares of the
deviations between yhat (predicted y) and ymean,
• RSS = Ʃ (yhat – ymean)2
• Residual Sum of Squares = TSS - RSS
Now we want to estimate how much better
• To do that, we use the sum of squares calculations
• We partition the total sum of squares (TSS), e.g., the sum of square
deviations from the mean, into two parts
• The first part is the sum of squared deviations using the regression
equation (Regression Sum of Squares).
• The second part is the sum of squared deviations left over, e.g., not
accounted for by the regression equation, or more formally, the TSSRegression Sum of Squares = the Residual Sum of Squares.
Now let’s look at what we’ve accomplished…
• To do that, we’ll calculate an F test
• We need to add information about degrees of freedom.
• Remember the concept…how many parameters can one change and
still calculate the statistic. If we want to know the mean, and the
know the values, we can calculate the mean. If we know the mean,
and we know all the values but one, we can calculate that last value.
So there is 1 degree of freedom.
• For the F test, we need information about the degrees of freedom in
the regression model. The formula is k-1 (the number of parameters
to be estimated). For the bivariate model, that is a and b, so 2-1=1
Degrees of freedom continued…
• For the Residual Sum of Squares, the degrees of freedom is N-k, so for
this model, 638-2 = 636.
• We then calculate a mean squares, by dividing the degrees of
freedom into the Sum of squares.
• The F statistic is the regression mean square divided by the residual
mean square.
• The probability of the F statistic is drawn from the probability table.
Another Way to Think about R Square
• The Regression Sum of Squares divided by the Total Sum of Squares is
a measure of the proportion of variance explained by the model.
• So 2070301.432/5176911.308 = .399991049 or ~40%.