Download BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD …

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Time series wikipedia , lookup

Transcript
BOULDER WORKSHOP
STATISTICS REVIEWED:
LIKELIHOOD MODELS
Andrew C. Heath
PRE-HISTORY (STATISTICS 101)
Binomial distribution
– gives the probabilities of various possible
numbers of ‘successful’ outcomes in a fixed number of discrete trials,
where all trials have the same probability of success.
Probability that X is equal to a particular value x (x= 0, 1, 2…n) is given
by
 n x
P  X  x     p (1  p ) n x
 x
Useful for genetics (e.g. transmission versus non-transmission of an
allele)!
LIKELIHOOD
Focus on probability (“likelihood”) of the data, as a function of model
parameters.
E.g. Sham (1998) uses Neel & Schull (1954) data on opalescent dentine in
random sample of 112 offspring of an affected parent, found 52 affected, 60
normal. Compatible with hypothesis of rare autosomal dominant? Does the
observed population (.464) differ from expected proportion of 0.5?
Likelihood function for segregation ratio p is
 n r
L p     p (1  p)n r
r
MAXIMUM LIKELIHOOD ESTIMATION
= Find the maximum value of the likelihood function in the range 0  p 1. In
more difficult problems, usual to maximize the log-likelihood, since
computationally more convenient, & this also maximizes the likelihood.

p
In this simple case, maximum likelihood estimate (MLE) of p, = r/n.
For the opalescent dentine data, ignoring the constant term involving n, r
which does not vary as a function of p, log-likelihood function is
ln L(p) = 52 ln(p) + 60 ln(1–p)
Figure 2.1 Log-likelihood function of the segregation ratio for the opalescent
dentine data (from Sham, 1998)
p
LIKELIHOOD-RATIO STATISTIC
Likelihood ratio statistic: twice the difference between the log likelihood of the

data at the MLE (i.e. p ), L1, and the log likelihood of the data at the
hypothesized value of 0.5, L0: 2(ln L1 – ln L0).
For the segregation ratio example,

2ln L1  ln L0   2 r  ln r  n  r ln nr  n  ln 1
n
n
2

= 0.57 for the opalescent dentine example.
Likelihood-ratio statistic in this case is distributed as chi-square on one degree of
freedom, hence non-significant.
MATRIX ALGEBRA BASICS
A–1
Inverse of A
AT or A'
Transpose of A
|A|
Determinant of A
A  B
rxc cxr
A postmultiplied by B:
conformable for multiplication
since number of columns
of A = number of rows of B.
Resulting matrix is r x r.
Trace of matrix A
Tr (A)
HISTORY
(MX INTRODUCTORY WORKSHOP)
Maximum-likelihood estimation using linear covariance structure models, e.g. fitting
models to twin data:
Let p be the number of observed variables, the expected covariance matrix be E, and the
expected vector of means be , where E and  are functions of q free parameters to
be estimated from the data. Let x1, x2…xn denote to observed variables. Assuming
that the observed variables follow a multivariate normal distribution, the loglikelihood of the observed data is given by
n
ln L1   n ln2 E   1  xi  μ T E1 xi  μ 
2
2 i 1
This is the formula used for maximum likelihood model-fitting to raw continuous
data, assuming a multivariate normal distribution. (Often, 2 ln L, is estimated).
Requires that we provide:
(a) model for expected covariance matrix – e.g. in terms of additive genetic, shared
and non-shared environmental variance components – that will vary as a
function of relationship;
(b) model for expected means – in simplest applications we might estimate a separate
mean that might differ by gender, or possibly by twin pair zygosity.
EXAMPLE:
MZ Pairs:
VA  VC  VE

VA  VC

E=
 = [ m , m] T
DZ Pairs:
E=
=
 VA  VC  VE
 1

 2 VA  VC
VA  VC

VA  VC  VE 

1V
A
2


 VE 

 VC
VA  VC
[ m , m]T
Where m is estimate of population mean, VA, VC, VE and are additive genetic,
shared environmental and non-shared environmental variances, all estimated
jointly from the data.
Compare e.g. ln L1 for VA VC VE model
ln L0 for VC VE model
2(ln L1 – ln L0) distributed as chi-square on one degree of freedom as before.
HISTORY
(MX INTRODUCTORY WORKSHOP) - II
In practice most applications in the MX Introductory Workshop fitted models to summary covariance
matrices.
We can simplify (e.g. Sham, p. 238):
n
(1)
x  μ T E 1 xi  μ 
ln L1

 n ln2 E   1 
i 1 i
2
2
(2)

 n ln2   ln E  m  μ T E 1 m  μ   tr E 1S 
2


Where m is the vector of sample means of the p observed variables, S is the observed covariance
matrix. For the simple applications in the Introductory Workshop, we had no predictions for the means
structure, so we can saturate that component of the model (i.e. estimate a separate mean for every
observed mean), equivalent to deleting the term (m)T E1 (m) in (2). Thus the log-likelihood of
the observed data becomes
(3)
ln L1   n ln2   ln E  tr E 1S
2



Under a saturated model, which equates every element of E to the corresponding element of S (i.e. a
perfect fit model) we have for the log-likelihood
ln L0



 n ln2
2
 n ln2
2

  ln S  tr S 1S
  ln S  p 

HISTORY
(MX INTRODUCTORY WORKSHOP) - II
Thus the likelihood-radio test of the fitted model against the saturated model becomes



2ln L0  ln L1   n ln E  ln S  tr E1S  p
For multiple group problem, sum over groups.

(4)
Analysis of Australian BMI data-young female MZ twins pairs-MX DYI version
Analysis of Australian BMI data-young female MZ twins pairs-MX DIY version
!
DA NG=1 NI=2 NO=0
begin matrices;
A LO 1 1 FR ! Additive genetic variance
C LO 1 1 FR ! Shared environmental variance
E LO 1 1 FR ! Non-shared environmental variance
M FU 2 2 ! This will be observed MZ covariance matrix
D FU 2 2 ! This will be observed DZ covariance matrix
g fu 1 1 ! coefficient of 0.5 for DZ pairs
n fu 1 1 ! sample size for MZ pairs (female in this illustration)
k fu 1 1 ! sample size for DZ pairs (female in this illustration)
p fu 1 1 ! order of matrices (i.e. number of variables =2 in this case)
end matrices;
mat g 0.5
mat p 2
mat m
0.7247 0.5891
0.5891 0.7915
mat n 532
mat d
0.7786 0.2461
0.2461 0.8365
mat k 326
mat a 0.25
mat c 0.25
mat e 0.25
Analysis of Australian BMI data-young female MZ twins pairs-MX DYI version (ctd)
BEGIN ALGEBRA;
t=n+k; ! total sample size
U=(A+C+E | A+C _ A+C | A+C+E); ! Expected MZ covariance matrix
V=(A+C+E | g*A+C _ g*A+C | A+C+E); ! Expected DZ covariance matrix
H=n*(\ln (\det(U))-\ln (\det(M)) + \tr((U~ *M))-p); ! fit-function for MZ group
J=k*(\ln (\det(V))-\ln (\det(D)) + \tr((V~ *D))-p); ! fit-function for DZ group
F=h+j;
END ALGEBRA;
bo 0.01 1.0 e(1,1)
bo 0.0 1.0 c(1,1) a(1,1)
CO F ;
option user df=6
end
PRE-HISTORY (STATISTICS 101) - II
LINEAR REGRESSION: requires weaker assumptions than linear
covariance structure models. Does not assume multivariate normal
distribution, only homoscedastic residuals. Flexible for handling selective
sampling schemes where we oversample extreme values of predictor
variable(s).
We can fit linear regression models by maximum likelihood e.g. using MX.
HISTORY!
MX INTRODUCTORY WORKSHOP - III
“Definition variables” – an option in MX when fitting to raw data,
which allows us to model effects of some variables as fixed effects, modeling
their contribution to expected means.
Simple example: controlling for linear or polynomial regression of a quantitative
measure on age. Don’t want to model covariance structure with age (which
probably has rectangular distribution!)
Definition variables = variables whose values may vary from individual to
individual, that can be read into matrices
Important example: genotypes at a given locus or set of loci.