Download Discrete Joint Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
BIOINF 2118
ANOVA, logistic regression, discriminant analysis, survival analysis p. 1 of 4
ANOVA: Analysis of variance
In “supervised learning” (regression analysis), when a predictor is a CATEGORY, we can replace it
by a series of indicator functions, then do linear regression as before.
See example: hotdogs.R
The “ANOVA” point of view : partitioning the sum of squares of deviations.
For a “one-way layout”, suppose there are n i observations in category i , i = 1,..., p .
Define
We compare the model in which all the individual means are the same:
H0: all
E(Yij ) = m
to the “omnibus” (all-direction) alternative
HA: not all E (Yij ) are the same.
Under either hypothesis,
Replacing the right-hand side by its mean
Under the null hypothesis,
and
.
, we get the estimate
2
sˆ y2 (within) = Swithin
/ (n - p).
.
2
sˆ y2 (between) = Sbetween
/ (p - 1). The two estimates should be similar.
2
But under the alternative, since the true means will differ from each other, Sbetween will be bigger.
So a good test statistic, whose distribution is exactly known when data are normal, is
.
This is a ratio of independent chisquare statistics, so it’s an Fp-1,n-p random variate under H0.
BIOINF 2118
ANOVA, logistic regression, discriminant analysis, survival analysis p. 2 of 4
Logistic regression
When the target (dependent variable) is BINARY, the regression methods we’ve discussed don’t work
well. Instead we use LOGISTIC models:
where “logit” means log odds:
ez
. Its inverse is antilogit( z ) =
.
1+ ez
The likelihood function is
Maximizing the likelihood usually requires an iterative algorithm.
Once we have a model fit, say we can predict for future observations:
.
Logistic regression is a special case of GENERALIZED LINEAR MODELS or GLMs:
where g is called a LINK FUNCTION and the distribution of Y is known given its expectation:
Y | E (Y ) ~ FE (Y ) .
For logistic regression, the link function is logit and the distribution is Bernoulli(E(Y)).
See prisonersPicnic.R for examples. A typical call in R would be:
glm( formula = S.sick ~ E.ate + D.drank ,
family=binomial, weights=counts).
A 'family' object in R contains both the error distribution and the link function.
For glm() in R, "error distribution" signifies how the variance changes with the mean.
For the 'binomial' family, variance = mean*(1-mean).
Each 'family' has a default link function. For the 'binomial', the default link function is logit.
BIOINF 2118
ANOVA, logistic regression, discriminant analysis, survival analysis p. 3 of 4
Poisson regression
Another example of generalized linear models relates a linear predictor to counts.
The natural link function g is the logarithm:
.
A typical call in R would be:
glm( formula = counts ~ S.sick + E.ate + D.drank, family = poisson ).
(In the prisoners' picnic logistic regression and poisson regression are nearly identical.)
For the 'poisson' family, the variance function is variance = mean.
The default link function is log.
An important observation about logistic regression and discriminant analysis
æ Y ö æ Y
The data are: ç 1 ÷ ... ç n
çè X 1 ÷ø çè X n
ö
÷ where the Y are group membership indicators.
÷ø
In logistic regression, the Y are considered the “targets” or “dependent variables”.
We condition on the X’s, the” predictors” or “independent variables”.
But some techniques (like discriminant analysis) involve modeling the joint distribution of the X’s with
the Y’s. This has the effect of stabilizing estimates. In contrast, logistic regression models only the
conditional distribution Y|X. See Hastie, Tibshirani, Friedman Elements of Statistical Learning .
Compare logistic regression (LR)
ˆLR  arg max   log[Yi | X i ]
i
with linear discriminant analysis (LDA):
b̂ LDA = arg max b å log[Yi , X i ]
i
æ
ö
= arg max b ç å log[Yi | X i ] + å log[ X i ]÷ .
è i
ø
i
So LDA’s modeling of [X] is like a regularizer, as if there is a penalty function
.
For example, if { X i
(1) log[ X i ]
i
: Yi  0} and { X i : Yi  1} are perfectly separated by a hyperplane, then the LR
MLE’s will go to infinity (overfitting), while the LDA MLEs will not.
LR conditions on X. LDA models [X]. Since two Gaussian point clouds always extend to infinity and
therefore interpenetrate, perfect (over-)fitting can never be achieved by LDA. Which is good.
BIOINF 2118
ANOVA, logistic regression, discriminant analysis, survival analysis p. 4 of 4
Proportional hazards regression
Time-to-event data comes in two types:
The event has happened.
The event has not yet happened.
You could just regard the outcome as binary, but that discards all the time information. Instead we
form a likelihood function that uses all the data correctly
“complete”
“complete”
“censored”
“censored”
“censored”
Data freeze: 'administrative censoring"
Likelihood terms
To do this correctly require special “survival analysis” methods.

Cox proportional hazards regression:
Define the hazard function to be
, interpreted as the probability density
of “failing” (event happening) at t, given that the event hasn’t happened yet.
The proportional hazards assumption is:
,
where there is an unknown “baseline hazard function” h0 .
Other methods common in survival analysis are:

The Kaplan-Meier estimator, which estimates the c.d.f. of the event time nonparametrically.
(Actually it estimates the “survival function”, defined as 1 – c.d.f..)

The log-rank test, for testing whether two survival functions are the same or different. This is
useful for example, in a randomized clinical trial, when testing if two medicines differ in the time
to death, time to relapse, or time to some other event.``