* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Discrete Joint Distributions
German tank problem wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Data assimilation wikipedia , lookup
Choice modelling wikipedia , lookup
Regression toward the mean wikipedia , lookup
Least squares wikipedia , lookup
Coefficient of determination wikipedia , lookup
BIOINF 2118 ANOVA, logistic regression, discriminant analysis, survival analysis p. 1 of 4 ANOVA: Analysis of variance In “supervised learning” (regression analysis), when a predictor is a CATEGORY, we can replace it by a series of indicator functions, then do linear regression as before. See example: hotdogs.R The “ANOVA” point of view : partitioning the sum of squares of deviations. For a “one-way layout”, suppose there are n i observations in category i , i = 1,..., p . Define We compare the model in which all the individual means are the same: H0: all E(Yij ) = m to the “omnibus” (all-direction) alternative HA: not all E (Yij ) are the same. Under either hypothesis, Replacing the right-hand side by its mean Under the null hypothesis, and . , we get the estimate 2 sˆ y2 (within) = Swithin / (n - p). . 2 sˆ y2 (between) = Sbetween / (p - 1). The two estimates should be similar. 2 But under the alternative, since the true means will differ from each other, Sbetween will be bigger. So a good test statistic, whose distribution is exactly known when data are normal, is . This is a ratio of independent chisquare statistics, so it’s an Fp-1,n-p random variate under H0. BIOINF 2118 ANOVA, logistic regression, discriminant analysis, survival analysis p. 2 of 4 Logistic regression When the target (dependent variable) is BINARY, the regression methods we’ve discussed don’t work well. Instead we use LOGISTIC models: where “logit” means log odds: ez . Its inverse is antilogit( z ) = . 1+ ez The likelihood function is Maximizing the likelihood usually requires an iterative algorithm. Once we have a model fit, say we can predict for future observations: . Logistic regression is a special case of GENERALIZED LINEAR MODELS or GLMs: where g is called a LINK FUNCTION and the distribution of Y is known given its expectation: Y | E (Y ) ~ FE (Y ) . For logistic regression, the link function is logit and the distribution is Bernoulli(E(Y)). See prisonersPicnic.R for examples. A typical call in R would be: glm( formula = S.sick ~ E.ate + D.drank , family=binomial, weights=counts). A 'family' object in R contains both the error distribution and the link function. For glm() in R, "error distribution" signifies how the variance changes with the mean. For the 'binomial' family, variance = mean*(1-mean). Each 'family' has a default link function. For the 'binomial', the default link function is logit. BIOINF 2118 ANOVA, logistic regression, discriminant analysis, survival analysis p. 3 of 4 Poisson regression Another example of generalized linear models relates a linear predictor to counts. The natural link function g is the logarithm: . A typical call in R would be: glm( formula = counts ~ S.sick + E.ate + D.drank, family = poisson ). (In the prisoners' picnic logistic regression and poisson regression are nearly identical.) For the 'poisson' family, the variance function is variance = mean. The default link function is log. An important observation about logistic regression and discriminant analysis æ Y ö æ Y The data are: ç 1 ÷ ... ç n çè X 1 ÷ø çè X n ö ÷ where the Y are group membership indicators. ÷ø In logistic regression, the Y are considered the “targets” or “dependent variables”. We condition on the X’s, the” predictors” or “independent variables”. But some techniques (like discriminant analysis) involve modeling the joint distribution of the X’s with the Y’s. This has the effect of stabilizing estimates. In contrast, logistic regression models only the conditional distribution Y|X. See Hastie, Tibshirani, Friedman Elements of Statistical Learning . Compare logistic regression (LR) ˆLR arg max log[Yi | X i ] i with linear discriminant analysis (LDA): b̂ LDA = arg max b å log[Yi , X i ] i æ ö = arg max b ç å log[Yi | X i ] + å log[ X i ]÷ . è i ø i So LDA’s modeling of [X] is like a regularizer, as if there is a penalty function . For example, if { X i (1) log[ X i ] i : Yi 0} and { X i : Yi 1} are perfectly separated by a hyperplane, then the LR MLE’s will go to infinity (overfitting), while the LDA MLEs will not. LR conditions on X. LDA models [X]. Since two Gaussian point clouds always extend to infinity and therefore interpenetrate, perfect (over-)fitting can never be achieved by LDA. Which is good. BIOINF 2118 ANOVA, logistic regression, discriminant analysis, survival analysis p. 4 of 4 Proportional hazards regression Time-to-event data comes in two types: The event has happened. The event has not yet happened. You could just regard the outcome as binary, but that discards all the time information. Instead we form a likelihood function that uses all the data correctly “complete” “complete” “censored” “censored” “censored” Data freeze: 'administrative censoring" Likelihood terms To do this correctly require special “survival analysis” methods. Cox proportional hazards regression: Define the hazard function to be , interpreted as the probability density of “failing” (event happening) at t, given that the event hasn’t happened yet. The proportional hazards assumption is: , where there is an unknown “baseline hazard function” h0 . Other methods common in survival analysis are: The Kaplan-Meier estimator, which estimates the c.d.f. of the event time nonparametrically. (Actually it estimates the “survival function”, defined as 1 – c.d.f..) The log-rank test, for testing whether two survival functions are the same or different. This is useful for example, in a randomized clinical trial, when testing if two medicines differ in the time to death, time to relapse, or time to some other event.``