Download Title goes here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mutation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene desert wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Nucleosome wikipedia , lookup

Protein moonlighting wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transposable element wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Primary transcript wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

RNA-Seq wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA supercoil wikipedia , lookup

Gene nomenclature wikipedia , lookup

Human genome wikipedia , lookup

DNA vaccination wikipedia , lookup

Genetic engineering wikipedia , lookup

Metagenomics wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Molecular cloning wikipedia , lookup

Genome evolution wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenomics wikipedia , lookup

Genomic library wikipedia , lookup

Microsatellite wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Advancing Science with DNA Sequence
IMG terms and pathways
Natalia Ivanova
Iain Anderson
Thanos Lykidis
Nikos Kyrpides
Krishna Palaniappan
Amy Chen
Frank Korzeniewski
Yuri Grechkin
Ernest Szeto
Victor Markowitz
MGM Workshop
May 16, 2012
Advancing Science with DNA Sequence
New: SEED
subsystems
Transport DB,
Phenotypes
Why so many?
What’s the difference?
Which one should I use?
Advancing Science with DNA Sequence
Where it all comes from
• Experimental data: gene A in a
genome X




catalyzes a reaction
interacts with another protein(s)
gene knock-out causes certain phenotype
…
This information is recorded in a
structured way:
 ontologies (e.g. Gene Ontology)
 pathway collections (metabolic and
protein-protein interaction)
 other (reasoning rules, like TIGR Genome
Properties)
Advancing Science with DNA Sequence
Modeling the data properly – why
nobody does that
phenotype
gene
pathway
transcript
protein
evidence
reaction
enzyme
compounds
• Genes are connected to phenotypes via a multi-step
process, with many parameters
• We have very vague ideas about the steps/parameters for
the majority of genes/phenotypes
• If we design a relational database for gene/phenotype
connections, most tables will be empty
Advancing Science with DNA Sequence
What it looks like in real life –
KEGG vs MetaCyc
KEGG
http://www.genome.jp/kegg/
MetaCyc
http://metacyc.org/
Advancing Science with DNA Sequence
Ammonia oxidation pathway in
KEGG
• Plus 4 more entries:
for 1.14.99.39
for each subunit
Advancing Science with DNA Sequence
The same pathway/reaction in
MetaCyc
Similar problems to
KEGG:
• multifunctional
enzymes
• multisubunit
enzymes
• differences in
reaction
recording
Advancing Science with DNA Sequence
Even MetaCyc record is still
incomplete
• Which subunit has which
cofactor?
• Type of Cu2+ cluster,
type of Fe2+ cluster?
• One of the subunits is a
cytochrome c, yet the
enzyme is cytosolic?
• Does it require any help
with maturation of metal
clusters?
• Pseudomonas sp. PB16 was shown to have only 1 enzyme from the
pathway, hydroxylamine reductase. Does it have the entire pathway?
Advancing Science with DNA Sequence
Even bigger mess: bioinformatics
inference
• Experimental data: gene A in a
genome X




catalyzes a reaction
interacts with another protein(s)
gene knock-out causes certain phenotype
…
What about gene B in genome Y,
which is similar to gene A?
Advancing Science with DNA Sequence
“True or false?” game
• If GenBank record says nothing about gene B
annotation protocol, the annotation must be
correct
• If GenBank record says the gene was
manually annotated, the annotation must be
correct
• If GenBank record says gene B was manually
annotated, and it has a bi-directional best
BLAST hit to gene A with e-value of 1.0e-5,
the annotation must be correct
•…
Advancing Science with DNA Sequence
Weaknesses
• Orthology detection: fails on many families
with deviation from vertical transmission
• BLAST is agnostic of which amino acids are
more important for protein function
• Using consensus sequence (either as PSSM or
HMM) with family-specific bit score cutoffs
would be much better, but cannot be used in
current implementation of KEGG
Advancing Science with DNA Sequence
Pathway collections: KEGG,
MetaCyc and others
Which particular set of interactions is a
pathway? (i. e. how do we define
pathway boundaries within the network?)
Advancing Science with DNA Sequence
Ideal solution: pathway NR
• All pathway collections share a common
skeleton of reactions, which consist of
reactants (compounds)
• All reactions share the common base of
proteins annotated as catalysts
• Can we merge the information from
different collections, using the best features
of all of them?
Advancing Science with DNA Sequence
IMG terms: 3 types
A
B
R1
Not an IMG term!
Enzyme (EC x.x.x.x)
Enzyme (EC x.x.x.x)
monomeric, needs cofactor C
C
R2, spontaneous
Enzyme (EC x.x.x.x)
monomeric precursor
IMG term of the type
“Gene product”
 IMG terms of 3 types:
1. gene product
2. multi-subunit protein complex
3. modified protein
Enzyme (EC x.x.x.x)
heterotrimeric, needs cofactor D
R4, chaperone
Enzyme (EC x.x.x.x)
heterotrimeric, subunit C
IMG term of the type
“Modified protein”
Enzyme (EC x.x.x.x)
heterotrimeric, subunit A
D
IMG term of the type
“Protein complex”
R3, spontaneous
Enzyme (EC x.x.x.x)
heterotrimeric, subunit B
IMG term of the type
“Gene product”
Enzyme (EC x.x.x.x)
heterotrimeric, subunit A precursor
Advancing Science with DNA Sequence
Protein-protein interaction
pathways:
same model
Advancing Science with DNA Sequence
You’ve been warned!