Download ppt - Michael Kuhn

Document related concepts

Genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Point mutation wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene therapy wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Molecular ecology wikipedia , lookup

Biochemical cascade wikipedia , lookup

Proteolysis wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene desert wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Gene nomenclature wikipedia , lookup

RNA-Seq wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Interactome wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
The STRING database
Michael Kuhn
EMBL Heidelberg
protein interactions
example
Tryptophan synthase beta chain
E. Coli K12
many sources
genomic context
curated knowledge
experimental evidence
T
literature
373 genomes
(only completely sequenced genomes)
1.5 million genes
(not proteins)
Genome Reviews
RefSeq
Ensembl
model organism databases
data integration
genomic context methods
gene fusion
gene neighborhood
phylogenetic profiles
Cell
Cellulosomes
Cellulose
automatic inference
of interactions
correct interactions
wrong associations
gene fusion
score: sequence similarity
gene neighborhood
score: sum of intergenic distances
phylogenetic profiles
SVD
singular value decomposition
(removes redundancy)
score: Euclidean distance
all scores are “raw scores”
not comparable
sequence similarity
sum of intergenic distances
Euclidean distance
benchmarking
calibrate against “gold standard”
(KEGG)
raw scores
probabilistic scores
e.g. “70% chance for an assocation”
curated knowledge
KEGG
Kyoto Encyclopedia of Genes
Reactome
GO
Gene Ontology
primary experimental data
many sources
many parsers
BIND
Biomolecular Interaction Network
Database
GRID
General Repository for Interaction
Datasets
HPRD
Human Protein Reference Database
co-expression
microarray data
GEO
Gene Expression Omnibus
correlation coefficient
literature mining
different gene identifiers
synonyms list
Medline
SGD
Saccharomyces Genome Database
The Interactive Fly
OMIM
Online Mendelian Inheritance in Man
simple scheme
co-mentioning
more advanced
NLP
Natural Language Processing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction
The expression of
the cytochrome genes
CYC1 and CYC7
is controlled by
HAP1
calibrate against gold
standard
combine all evidence
Bayesian scoring scheme
e.g.: two scores of 0.7
combined probability: ?
e.g.: two scores of 0.7
combined probability: 0.91
1 - (1-0.7)2 = 0.91
evidence transfer
evidence spread
over many species
transfer by orthology
(or “fuzzy orthology”)
von Mering et al., Nucleic Acids Research, 2005
von Mering et al., Nucleic Acids Research, 2005
two modes
COG mode
von Mering et al., Nucleic Acids Research, 2005
higher coverage
lower specificity
includes all available evidence
some orthologous groups are too large
to be meaningful
proteins mode
von Mering et al., Nucleic Acids Research, 2005
maximum specificity
lower coverage
information will be relevant for selected
species
Demo
outlook
take home message
STRING integrates information and
predicts interactions
You can always go to the sources
Proteins mode: specific species
COG mode: more coverage, especially
for prokaryotic genes
Acknowledgements
The STRING team
Lars Jensen
Peer Bork
Christian von Mering & group in Zurich
Berend Snel
Martijn Huynen
Thank you for your attention
take home message
STRING integrates information and
predicts interactions
You can always go to the sources
Proteins mode: specific species
COG mode: more coverage, especially
for prokaryotic genes
Exercises:
tinyurl.com/36twzq
(or via course wiki)
Alternative server:
xi.embl.de
Bork et al., Current Opinion in Structural Biology, 2004