Download bayesian-integration

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein purification wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein moonlighting wikipedia , lookup

List of types of proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Proteomics wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Cyclol wikipedia , lookup

Transcript
Bayesian integration of genomics
data
Martijn A. Huynen
CMBI, Radboud University Medical Centre
The cilium, a eukaryotic organelle
Identifying novel ciliary genes using a Bayesian
classifier
Proteomics data
Shared transcription
factors
Published datasets
 P f i True  
 PCilium  n

   log 2 
GeneScore  log 2 

P~ Cilium  i 1
P f i False  





Pr ior
Expression data
Evolutionary data
compare Li et al, Cell 2004; Avidor Reis et al, Cell 2004; Cilidb
The main trick in these kind of bayesian analyses
P(Mj|D) = P(D|Mj) * P (Mj) / S r=1,2, N P (D|Mr) * P(Mr)
The probability of the Model (is a gene ciliary?) is the
probability of the data given the model (which fraction of
ciliary proteins interact with other ciliary proteins) * the prior
probability that a protein is ciliary, divided by the sum of all
possible probabilities that models gave rise to the data and
the probabilities of those models.
The “prior” probability of a gene being ciliary is the number
of genes that are ciliary
In the simplest example, we can simply compare two models, and
reduce it to a (log) odds ratio of the two models (being ciliary or not
being ciliary) and the likelihood that ciliary proteins e.g. interact
with ciliary proteins, compared to the likelihood that non-ciliary
proteins interact with ciliary proteins. We can also combine it with
other data (e.g. co-evolution data) and add up logarithms of the
ratios (effectively multiplying the ratios) and that of the probability
of the model (the number of genes that are ciliary divide by the
number that is not)

log
P M
P M1 data1, 2
2
data1, 2
  log Pdata M   Pdata M   log PM 
 Pdata M  Pdata M  PM 
1
1
2
1
1
1
2
2
2
2
What to do with the negatives, e.g.
when you have a protein that does not
interact with a ciliary protein. One
could just ignore it… but we can also
take it along in the calculation: what
fraction of ciliary proteins do not
interact with ciliary proteins divided by
the fraction of non-ciliary proteins that
do not interact with ciliary proteins
P M1 data1, 2 
Pdata1 M1  Pdata2 M1 
PM1 
log
 log

 log
Pdata1 M 2  Pdata2 M 2 
PM 2 
PM 2 data1, 2 
Gold Standard & Negative set
(M1)
(M2)
Extracelular
chromosome
Endosome
Van Dam et al, Cilia. 2013 May 31;2(1):7. d
Genomic co-occurrence of ciliary genes
among 52 species: defining a cutoff
Dissimilarity with cilium distribution
Expression profiling, based on weighing >
1500 human datasets for their relevance to
the cilium. Binning
Co-expression with the gold standard ciliary genes
Bayesian integration, including a prior :
distinguishing between ciliary vs. nonciliary genes, including a “prior”.