Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein moonlighting wikipedia , lookup
List of types of proteins wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Bayesian integration of genomics data Martijn A. Huynen CMBI, Radboud University Medical Centre The cilium, a eukaryotic organelle Identifying novel ciliary genes using a Bayesian classifier Proteomics data Shared transcription factors Published datasets P f i True PCilium n log 2 GeneScore log 2 P~ Cilium i 1 P f i False Pr ior Expression data Evolutionary data compare Li et al, Cell 2004; Avidor Reis et al, Cell 2004; Cilidb The main trick in these kind of bayesian analyses P(Mj|D) = P(D|Mj) * P (Mj) / S r=1,2, N P (D|Mr) * P(Mr) The probability of the Model (is a gene ciliary?) is the probability of the data given the model (which fraction of ciliary proteins interact with other ciliary proteins) * the prior probability that a protein is ciliary, divided by the sum of all possible probabilities that models gave rise to the data and the probabilities of those models. The “prior” probability of a gene being ciliary is the number of genes that are ciliary In the simplest example, we can simply compare two models, and reduce it to a (log) odds ratio of the two models (being ciliary or not being ciliary) and the likelihood that ciliary proteins e.g. interact with ciliary proteins, compared to the likelihood that non-ciliary proteins interact with ciliary proteins. We can also combine it with other data (e.g. co-evolution data) and add up logarithms of the ratios (effectively multiplying the ratios) and that of the probability of the model (the number of genes that are ciliary divide by the number that is not) log P M P M1 data1, 2 2 data1, 2 log Pdata M Pdata M log PM Pdata M Pdata M PM 1 1 2 1 1 1 2 2 2 2 What to do with the negatives, e.g. when you have a protein that does not interact with a ciliary protein. One could just ignore it… but we can also take it along in the calculation: what fraction of ciliary proteins do not interact with ciliary proteins divided by the fraction of non-ciliary proteins that do not interact with ciliary proteins P M1 data1, 2 Pdata1 M1 Pdata2 M1 PM1 log log log Pdata1 M 2 Pdata2 M 2 PM 2 PM 2 data1, 2 Gold Standard & Negative set (M1) (M2) Extracelular chromosome Endosome Van Dam et al, Cilia. 2013 May 31;2(1):7. d Genomic co-occurrence of ciliary genes among 52 species: defining a cutoff Dissimilarity with cilium distribution Expression profiling, based on weighing > 1500 human datasets for their relevance to the cilium. Binning Co-expression with the gold standard ciliary genes Bayesian integration, including a prior : distinguishing between ciliary vs. nonciliary genes, including a “prior”.