* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt - Michael Kuhn
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Point mutation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Molecular ecology wikipedia , lookup
Biochemical cascade wikipedia , lookup
Proteolysis wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene desert wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Gene nomenclature wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Interactome wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
The STRING database Michael Kuhn EMBL Heidelberg protein interactions example Tryptophan synthase beta chain E. Coli K12 many sources genomic context curated knowledge experimental evidence T literature 373 genomes (only completely sequenced genomes) 1.5 million genes (not proteins) Genome Reviews RefSeq Ensembl model organism databases data integration genomic context methods gene fusion gene neighborhood phylogenetic profiles Cell Cellulosomes Cellulose automatic inference of interactions correct interactions wrong associations gene fusion score: sequence similarity gene neighborhood score: sum of intergenic distances phylogenetic profiles SVD singular value decomposition (removes redundancy) score: Euclidean distance all scores are “raw scores” not comparable sequence similarity sum of intergenic distances Euclidean distance benchmarking calibrate against “gold standard” (KEGG) raw scores probabilistic scores e.g. “70% chance for an assocation” curated knowledge KEGG Kyoto Encyclopedia of Genes Reactome GO Gene Ontology primary experimental data many sources many parsers BIND Biomolecular Interaction Network Database GRID General Repository for Interaction Datasets HPRD Human Protein Reference Database co-expression microarray data GEO Gene Expression Omnibus correlation coefficient literature mining different gene identifiers synonyms list Medline SGD Saccharomyces Genome Database The Interactive Fly OMIM Online Mendelian Inheritance in Man simple scheme co-mentioning more advanced NLP Natural Language Processing Gene and protein names Cue words for entity recognition Verbs for relation extraction The expression of the cytochrome genes CYC1 and CYC7 is controlled by HAP1 calibrate against gold standard combine all evidence Bayesian scoring scheme e.g.: two scores of 0.7 combined probability: ? e.g.: two scores of 0.7 combined probability: 0.91 1 - (1-0.7)2 = 0.91 evidence transfer evidence spread over many species transfer by orthology (or “fuzzy orthology”) von Mering et al., Nucleic Acids Research, 2005 von Mering et al., Nucleic Acids Research, 2005 two modes COG mode von Mering et al., Nucleic Acids Research, 2005 higher coverage lower specificity includes all available evidence some orthologous groups are too large to be meaningful proteins mode von Mering et al., Nucleic Acids Research, 2005 maximum specificity lower coverage information will be relevant for selected species Demo outlook take home message STRING integrates information and predicts interactions You can always go to the sources Proteins mode: specific species COG mode: more coverage, especially for prokaryotic genes Acknowledgements The STRING team Lars Jensen Peer Bork Christian von Mering & group in Zurich Berend Snel Martijn Huynen Thank you for your attention take home message STRING integrates information and predicts interactions You can always go to the sources Proteins mode: specific species COG mode: more coverage, especially for prokaryotic genes Exercises: tinyurl.com/36twzq (or via course wiki) Alternative server: xi.embl.de Bork et al., Current Opinion in Structural Biology, 2004