Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Types of artificial neural networks wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
LECTURE 3 LINEAR REGRESSION & CORRELATION Parametric hypothesis tests(cont); Linear regression and correlation Supplementary Readings: Wilks, chapters 5,6 Bevington, P.R., Robinson, D.K., Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, 1992. t test (one sample) Consider a Gaussian sample of data of size n We’re interested in whether or not the data are consistent with a population with some specified mean m Under the assumption of Gaussian statistics, the statistic follows a “t” distribution xm t var(x ) 1/ 2 var( x ) s / n 2 t test (one sample) (f= n) Note that the test could be either one or twosided depending on the situation at hand xm t var(x ) 1/ 2 Note: t approaches Z for large n… follows a “t” distribution t test (two sample) Consider two independent Gaussian samples of data of size n1 and n2 We’re interested in whether or not they appear to have been drawn from parent distributions with different means The standard deviation of the difference between the means under the assumption of Gaussian statistics is: t test (two sample) Consider two independent Gaussian samples of data of size n1 and n2 This motivates a modified form of the t statistic: t test (two sample) Consider two independent Gaussian samples of data of size n1 and n2 This motivates a modified form of the t statistic: (f= n1+n2 for var1=var2 ) F test What if we’re interested in whether or not two subsets of the data appear to have been drawn from parent distributions with different variances? /v F /v 2 1 1 2 2 2 F test /v F /v 2 1 1 2 2 2 F distribution F distribution Note that Confidence Levels Associated with the F-test and t-test Depend on the Assumption of some nominal (N) degrees of freedom Frequently the true number of degrees of freedom N’ is less than N owing to serial correlation We will discuss how to correct for this later… Recall Chi-Squared Suppose that we’re not interested in the model of a constant mean process (y=m), but rather, a process that has a linear trend (y=a+bx) 2 y m N 2 i i1 What is the appropriately modified chi-squared? N 1 P P i 2 1,..., N 2 y m 1 exp i 2 Recall Chi-Squared y: “dependent variable” x: “independent variable y a bx N i i 2 i1 What is the appropriately modified chi-squared? N 1 P P i 2 1,..., N 2 y m 1 exp i 2 2 Recall Method of Maximum Likelihood Most probable value for the statistic of interest is given by the peak value of the joint probability distribution. Easiest to work with the Log-Likelihood function: 2 1 L(m, ) N ln N ln 2 y m 2 2 i For model of a constant mean, we want to maximize L relative to m: L(m, ) 0 m Method of Maximum Likelihood Most probable value for the statistic of interest is given by the peak value of the joint probability distribution. Easiest to work with the Log-Likelihood function: L(a,b) N ln N ln 2 1 2 y a bx i i 2 For model of a linear trend, we want to maximize L relative to a and b: L(a,b) 0 a L(a,b) 0 b 2 Linear Regression This amounts to the minimization of chi-squared, 2(a,b) 0 a 2(a,b) 0 b 1 L(a,b) N ln N ln 2 2 y a bx i i For model of a linear trend, we want to maximize L relative to a and b: L(a,b) 0 a L(a,b) 0 b 2 Linear Regression This amounts to the minimization of chi-squared, n y a bx i 2(a,b) i i 1 2 2 yi nab xi 2 2(a,b) 0 a y a bx 0 i i y abx Linear Regression This amounts to the minimization of chi-squared, n y a bx i 2(a,b) i i 1 2 2 2 2(a,b) 0 b ( y a bx )( x ) 0 i i i 2 y x a x b x ii i i Linear Regression y abx 2 y x a x b x ii i i We can write this as a matrix equation, n x i x y i a i 2 x y x b i i i Linear Regression n x i xi a y i 2 y x x b i i i The solution is: n y x y x i i i i b 2 n x 2 x i i a y bx y x b i i If x y 0 we have: x 2 i a 0 Linear Regression Linear Correlation yx 1 i i r n s s x y s b x s y y x b i i If x y 0 we have: x 2 i a 0 Linear Regression Linear Correlation What if the independent variable (“x”) is time? Determination of Trend y x b i i If x y 0 we have: a 0 x 2 i Linear Regression n y a bx i 2(a,b) i i 1 2 Define: y a bx i i i We call these residuals What should we require of them? Linear Regression GAUSSIAN What should we require of them? Chi-Squared ( N 2) / 2 y a bx x exp( x / 2 ) N i i P ( x) 2 N N / 2 i 1 ( N / 2)2 2(n=5) Gaussian data m 2 v 2 This is an important feature in the Analysis of Variance (“ANOVA”) 2(n=5) y a bx N i i 2 i1 Gaussian data m 2 v 2