* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Title goes here
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene desert wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Transposable element wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Primary transcript wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
DNA supercoil wikipedia , lookup
Gene nomenclature wikipedia , lookup
Human genome wikipedia , lookup
DNA vaccination wikipedia , lookup
Genetic engineering wikipedia , lookup
Metagenomics wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Molecular cloning wikipedia , lookup
Genome evolution wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenomics wikipedia , lookup
Genomic library wikipedia , lookup
Microsatellite wikipedia , lookup
Non-coding DNA wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Advancing Science with DNA Sequence IMG terms and pathways Natalia Ivanova Iain Anderson Thanos Lykidis Nikos Kyrpides Krishna Palaniappan Amy Chen Frank Korzeniewski Yuri Grechkin Ernest Szeto Victor Markowitz MGM Workshop May 16, 2012 Advancing Science with DNA Sequence New: SEED subsystems Transport DB, Phenotypes Why so many? What’s the difference? Which one should I use? Advancing Science with DNA Sequence Where it all comes from • Experimental data: gene A in a genome X catalyzes a reaction interacts with another protein(s) gene knock-out causes certain phenotype … This information is recorded in a structured way: ontologies (e.g. Gene Ontology) pathway collections (metabolic and protein-protein interaction) other (reasoning rules, like TIGR Genome Properties) Advancing Science with DNA Sequence Modeling the data properly – why nobody does that phenotype gene pathway transcript protein evidence reaction enzyme compounds • Genes are connected to phenotypes via a multi-step process, with many parameters • We have very vague ideas about the steps/parameters for the majority of genes/phenotypes • If we design a relational database for gene/phenotype connections, most tables will be empty Advancing Science with DNA Sequence What it looks like in real life – KEGG vs MetaCyc KEGG http://www.genome.jp/kegg/ MetaCyc http://metacyc.org/ Advancing Science with DNA Sequence Ammonia oxidation pathway in KEGG • Plus 4 more entries: for 1.14.99.39 for each subunit Advancing Science with DNA Sequence The same pathway/reaction in MetaCyc Similar problems to KEGG: • multifunctional enzymes • multisubunit enzymes • differences in reaction recording Advancing Science with DNA Sequence Even MetaCyc record is still incomplete • Which subunit has which cofactor? • Type of Cu2+ cluster, type of Fe2+ cluster? • One of the subunits is a cytochrome c, yet the enzyme is cytosolic? • Does it require any help with maturation of metal clusters? • Pseudomonas sp. PB16 was shown to have only 1 enzyme from the pathway, hydroxylamine reductase. Does it have the entire pathway? Advancing Science with DNA Sequence Even bigger mess: bioinformatics inference • Experimental data: gene A in a genome X catalyzes a reaction interacts with another protein(s) gene knock-out causes certain phenotype … What about gene B in genome Y, which is similar to gene A? Advancing Science with DNA Sequence “True or false?” game • If GenBank record says nothing about gene B annotation protocol, the annotation must be correct • If GenBank record says the gene was manually annotated, the annotation must be correct • If GenBank record says gene B was manually annotated, and it has a bi-directional best BLAST hit to gene A with e-value of 1.0e-5, the annotation must be correct •… Advancing Science with DNA Sequence Weaknesses • Orthology detection: fails on many families with deviation from vertical transmission • BLAST is agnostic of which amino acids are more important for protein function • Using consensus sequence (either as PSSM or HMM) with family-specific bit score cutoffs would be much better, but cannot be used in current implementation of KEGG Advancing Science with DNA Sequence Pathway collections: KEGG, MetaCyc and others Which particular set of interactions is a pathway? (i. e. how do we define pathway boundaries within the network?) Advancing Science with DNA Sequence Ideal solution: pathway NR • All pathway collections share a common skeleton of reactions, which consist of reactants (compounds) • All reactions share the common base of proteins annotated as catalysts • Can we merge the information from different collections, using the best features of all of them? Advancing Science with DNA Sequence IMG terms: 3 types A B R1 Not an IMG term! Enzyme (EC x.x.x.x) Enzyme (EC x.x.x.x) monomeric, needs cofactor C C R2, spontaneous Enzyme (EC x.x.x.x) monomeric precursor IMG term of the type “Gene product” IMG terms of 3 types: 1. gene product 2. multi-subunit protein complex 3. modified protein Enzyme (EC x.x.x.x) heterotrimeric, needs cofactor D R4, chaperone Enzyme (EC x.x.x.x) heterotrimeric, subunit C IMG term of the type “Modified protein” Enzyme (EC x.x.x.x) heterotrimeric, subunit A D IMG term of the type “Protein complex” R3, spontaneous Enzyme (EC x.x.x.x) heterotrimeric, subunit B IMG term of the type “Gene product” Enzyme (EC x.x.x.x) heterotrimeric, subunit A precursor Advancing Science with DNA Sequence Protein-protein interaction pathways: same model Advancing Science with DNA Sequence You’ve been warned!