Download PPT19

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Taylor's law wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
GG313 Lecture 19
Nov 1 2005
Regression Summary
and
Anova Test
What are the major linear regression (line fitting)
algorithms and their properties?
algorithm
minimizes
LSY: Least-squares y on x
y  a  bx 
2: LSY with errors in y
yi  a  bxi 
  


i
2
i
i
2
LSXY: Complete orthogonal
RMA: Reduced Major Axis
2
2 

  yi Yi    xi  Xi  



LMS: Least median of squares (Robust)


x  X y Y 
i
i
i
i
median of squares
algorithm
uses
LSY: Least-squares y on x
normal data, not steep
2: LSY with errors in y
data with known errors
LSXY: Complete orthogonal
normal data, all slopes
RMA: Reduced Major Axis
normal data, all slopes
LMS: Least median of squares (Robust)
bad outliers
algorithm
LSY: Least-squares y on x
poor results
outliers, steep
2: LSY with errors in y
-----
LSXY: Complete orthogonal
outliers
RMA: Reduced Major Axis
outliers
LMS: Least median of squares (Robust)
heavy groupings
LMS Line groups
LMS Line outliers
LSY Line groups
LSY Line outliers
ANOVA TEST
The anova (analysis of variance) test is used to tetermine
whether MANY samples come from the same population.
Earlier we tested to see whether two samples were from the
same population using the t-test; anova is used in a similar
way to test the means of many samples using the f-test.
We place the sample values in the columns of a matrix, so
we have n observations in each sample, and each row
represents one of k samples:
What might these values be:
• densities at different depths in wells
• fossil measurements at different sites
• color values in a photograph
• manganese crust thickness vs water depth at
different sites
As previously, we set up a hypothesis that suggests
that the values at each site are different, and a null
hypothesis that the values are the same. At our
standard confidence level of 95%, we will see if the
null hypothesis can be rejected, implying that the
values at the sites are not from the same population.
The anova test ASSUMES that
1) the populations are normally distributed
2) the populations have the same variance (2)
The test involves the calculation of several parameters:
k
n
2
k
2
k
n
2
SST    xij  x   n xi  x     xij  xi  , where
i1 j1
i
i
j
x is the " grand mean" , and xi is the row mean.
The two terms on the right are known as
:
SST  SS(Tr)  SSE
SS(Tr) (treatment of sum of squares) is a measure of
the variation of the sample means and
SSE (error sum of squares) is a measure of the
variation within samples
(4.65,6)
The F-test statistic is then given by:
estimate of  2 from variation of xi
F
estimate of  2 from variation within samples
SS(Tr)/(k 1)

SSE /(k(n 1))
(4.68)
The value of F will vary from zero to large values. If it is close
to zero, then the null hypothesis is likely. If the F value above
is large, the null hypothesis is unlikely.
We obtain our F comparison value (critical value) from the Ftable or from Matlab using the level of confidence we want
(usually 95% or 5% depending on the table) and the degrees
of freedom given by k-1 (the number of samples - 1) and
k*(n-1), the total number of observations minus the number of
samples.
EXAMPLE: ANOVAEX.html
2-WAY anova
2-way anova is the same as 1-way anova explained
above except that not only are the columns compared the means of the samples - but also the rows.
For example, the means of the samples may be different,
but the means of the rows may be the statistically the
same. Such as where the densities change in a similar
way with depth.
anova2 generates some new statistics: SSB and a new
SSE:
n
1
1 2
2
SSB   T j  T , where
k j1
kn
T j   xij  sum of obsevations in the j
i
SSE  SST  SS(Tr)  SSB
th
row, and
We then get two F-values, one for the samples
(treatments) and one for the rows:
Fcolumns=MS(Tr)/MSE and
Frows=MSB/MSE
The critical F-values are then calculated using k-1 and (k1)*(n-1) degrees of freedom for Fcolumns, and n-1 and (k1)*(n-1) degrees of freedom for Frows. We then check the
critical F-values against the observed F-values and reject
the null hypothesis if the critical F-value is larger than the
observed.
EXAMPLE: ANOVA2EX.html