* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Amsterdam 2004
Protein structure prediction wikipedia , lookup
Protein domain wikipedia , lookup
List of types of proteins wikipedia , lookup
Western blot wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein purification wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics High-throuhput data on gene function • What do I mean: omics, microarray, chip-on-chip • Why are people generating these data? – post-genomic era / systems biology: the challenge to understand the roles of the e.g. 6,000 gene products in yeast and how they interact to create a eukaryotic organism. – Because they can: apply automation also to other areas of molecular biology beyond sequencing – To have “screens” for the research question at hand rather than to have to test each guess at a time • What about evolutionary genomics? • Yeast • Accuracy / noise HTP data • What do they mean: experimental knowledge, but still what do they in terms of e.g. function? • A deluge • Bioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of coming up with biological questions with which we can probe this data Microarray data Microarray data two conditions often used for “screens” (Correlated) mRNA expression • mRNA levels are systematically measured under a variety of different cellular conditions, and genes are grouped if they show a similar transcriptional response to these conditions. Hughes et al. 2000Cell Profile Similarity Identifies Sterol-Pathway Disturbance Resulting from Deletion of Uncharacterized ORF YER044c (ERG28) and from Dyclonine Treatment (A) Prominent gene clusters responding to interference with ergosterol biosynthesis, (B) Comparison of the transcript profile of an erg28Δ strain to that of an erg3Δ strain. (C) Sterol content of wild-type (left) and erg28Δ (right) strains. Ihmels et al. 2002 Nature Genetics Conventional hierarchical clustering of co-expression data could fail, because genes can play a role in multiple cellular processes and their common regulatory element can only be detected in a subset of experiments. detect genes that are co-expressed under a subset of conditions. a comprehensive set of overlapping ‘transcriptional modules’ Citric acid cycle? Different activity under different experimental conditions Rapid divergence in expression between duplicate genes inferred from microarray & promotor data 0.1 = 3.2 My Clustering conditions where the conditions are genes: yet another way to get to functional “links” Yeast-2-hybrid Pairs of proteins to be tested for interaction are expressed as fusion proteins ('hybrids') in yeast: one protein is fused to a DNA-binding domain, the other to a transcriptional activator domain. Any interaction between them is detected by the formation of a functional transcription factor. Examples from the original Ito publication: A autophagy B spindle pole body function C and vesicular transport Arrows ~ orientation of two-hybrid interaction, beginning from the bait to the prey. Accuracy of Y2H and how to improve it b Improving reliability using protein complexes reasoning / internal consistency Internal filtering! Accuracy of Y2H and how to improve it B Mass spectrometry of purified complexes. • Individual proteins are tagged and used as 'hooks' to biochemically purify whole protein complexes. These are then separated and their components identified by mass spectrometry. b Exosome Ski Stages in mRNA degradation socio-affinity indices: dotted lines, 5–10; dashed lines, 10–15; plain lines, >15. Bait proteins are shown in bold and shaded circles around groups of proteins indicate cores and modules. Cellular Function pdb Phylogenetic profile Y2H Protein interactions: literature databases • Literature derived, normally manually curated (as opposed to text mining) • Biased? • No new knowledge • Useful for benchmarking & for the study of the evolution of e.g. protein complexes • For example: Munich Informatation center for Protein Sequences (MIPS) • Databases that contain literature and omics: Database of Interacting Proteins (DIP), Biomolecular INteraction Database (BIND), Systematic screening for lethality of knockouts on a rich medium • The functions of many open reading frames (ORFs) identified in genomesequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a highthroughput strategy, each with a precise deletion of one of 2026 ORFs Of the deleted ORFs, 17 percent were essential for viability in rich medium. Winzeler et al. 1999 Science Genetic interactions (synthetic lethal/sick) • Two nonessential genes that cause lethality when mutated at the same time form a synthetic lethal interaction. Such genes are often functionally associated and their encoded proteins may also interact physically. Tong et al. 2001 Science One thing we can do with synthetic lethals • Ideker: protein interactions What do to with synthetic lethals? Kelley and Ideker 2005 Natu ChIP-on-chip • Tagged strains (one strain for each regulator). • Micro-array for a strain to see which pieces of DNA are found in excess if you isolate the regulator plus bound DNA. b Gfp localization • Mating of fluorescent protein markers specific for organelles plus fluorescent protein tags for each gene Other functional genomics data: the omes • quantitative proteomics • Kinome • PTMome • (almost) All of these data is freely and publicly available • Take home message “wow this exists !!!” fraction of reference set covered by data Coverage Bioinformatics for Benchmarking & Integration Purified Complexes HMS-PCI purified complexes TAP genomic context mRNA co-expression two methods synthetic lethality yeast two-hybrid raw data filtered data parameter choices Accuracy fraction of data confirmed by reference set combined evidence three methods Advanced integration B