Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Taylor's law wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Misuse of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
GG313 Lecture 19 Nov 1 2005 Regression Summary and Anova Test What are the major linear regression (line fitting) algorithms and their properties? algorithm minimizes LSY: Least-squares y on x y a bx 2: LSY with errors in y yi a bxi i 2 i i 2 LSXY: Complete orthogonal RMA: Reduced Major Axis 2 2 yi Yi xi Xi LMS: Least median of squares (Robust) x X y Y i i i i median of squares algorithm uses LSY: Least-squares y on x normal data, not steep 2: LSY with errors in y data with known errors LSXY: Complete orthogonal normal data, all slopes RMA: Reduced Major Axis normal data, all slopes LMS: Least median of squares (Robust) bad outliers algorithm LSY: Least-squares y on x poor results outliers, steep 2: LSY with errors in y ----- LSXY: Complete orthogonal outliers RMA: Reduced Major Axis outliers LMS: Least median of squares (Robust) heavy groupings LMS Line groups LMS Line outliers LSY Line groups LSY Line outliers ANOVA TEST The anova (analysis of variance) test is used to tetermine whether MANY samples come from the same population. Earlier we tested to see whether two samples were from the same population using the t-test; anova is used in a similar way to test the means of many samples using the f-test. We place the sample values in the columns of a matrix, so we have n observations in each sample, and each row represents one of k samples: What might these values be: • densities at different depths in wells • fossil measurements at different sites • color values in a photograph • manganese crust thickness vs water depth at different sites As previously, we set up a hypothesis that suggests that the values at each site are different, and a null hypothesis that the values are the same. At our standard confidence level of 95%, we will see if the null hypothesis can be rejected, implying that the values at the sites are not from the same population. The anova test ASSUMES that 1) the populations are normally distributed 2) the populations have the same variance (2) The test involves the calculation of several parameters: k n 2 k 2 k n 2 SST xij x n xi x xij xi , where i1 j1 i i j x is the " grand mean" , and xi is the row mean. The two terms on the right are known as : SST SS(Tr) SSE SS(Tr) (treatment of sum of squares) is a measure of the variation of the sample means and SSE (error sum of squares) is a measure of the variation within samples (4.65,6) The F-test statistic is then given by: estimate of 2 from variation of xi F estimate of 2 from variation within samples SS(Tr)/(k 1) SSE /(k(n 1)) (4.68) The value of F will vary from zero to large values. If it is close to zero, then the null hypothesis is likely. If the F value above is large, the null hypothesis is unlikely. We obtain our F comparison value (critical value) from the Ftable or from Matlab using the level of confidence we want (usually 95% or 5% depending on the table) and the degrees of freedom given by k-1 (the number of samples - 1) and k*(n-1), the total number of observations minus the number of samples. EXAMPLE: ANOVAEX.html 2-WAY anova 2-way anova is the same as 1-way anova explained above except that not only are the columns compared the means of the samples - but also the rows. For example, the means of the samples may be different, but the means of the rows may be the statistically the same. Such as where the densities change in a similar way with depth. anova2 generates some new statistics: SSB and a new SSE: n 1 1 2 2 SSB T j T , where k j1 kn T j xij sum of obsevations in the j i SSE SST SS(Tr) SSB th row, and We then get two F-values, one for the samples (treatments) and one for the rows: Fcolumns=MS(Tr)/MSE and Frows=MSB/MSE The critical F-values are then calculated using k-1 and (k1)*(n-1) degrees of freedom for Fcolumns, and n-1 and (k1)*(n-1) degrees of freedom for Frows. We then check the critical F-values against the observed F-values and reject the null hypothesis if the critical F-value is larger than the observed. EXAMPLE: ANOVA2EX.html