* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download EnsEmbl – Genome Browser
Gene expression profiling wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Point mutation wikipedia , lookup
Adeno-associated virus wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Human genetic variation wikipedia , lookup
Oncogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Copy-number variation wikipedia , lookup
Microevolution wikipedia , lookup
Genetic engineering wikipedia , lookup
Transposable element wikipedia , lookup
Genome (book) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Minimal genome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Pathogenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Genome editing wikipedia , lookup
Center for Biologisk Sekvensanalyse ”Resources of Biomolecular Data: Sequences, Structures and Functionality” PhD course #27803 Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark [email protected] Center for Biologisk Sekvensanalyse Outline Magnitudes and Scales Resources: Data Sources & Tools • • • • • • Primary DNA sources Sequence Repositories Structure Repositories Functional Categorization Integration of Databases The Human Genome • Genome Browsers • Prediction Tools • Evaluation of Prediction Servers Starting points • Link collections Center for Biologisk Sekvensanalyse Resources: Sources & Tools There is A LOT OF biomolecular databases/sources A LOT OF overlap of information/redundancy A LOT OF TOOLS Personal picks/preferences • User-friendliness • Update intervals • Curation efforts / error correction • Linkage to other DBs Center for Biologisk Sekvensanalyse Faster than Moore’s law... Center for Biologisk Sekvensanalyse Human Genome Published HUGO: Nature, 15.feb.2001 Celera: Science, 16.feb.2001 Center for Biologisk Sekvensanalyse Magnitudes and Scales Human genome 3,200,000,000 bp • Single basepair full genome is 9 orders of magnitude Genome = Football field: ~3 billion leaves of grass Single base A T G C (or SNP) = 1 leaf of grass Genome browsing • Zooming from whole stadium to single leaf How we got the sequence Center for Biologisk Sekvensanalyse Sanger chain termination method Center for Biologisk Sekvensanalyse Primary DNA sources Trace files repositories Single read: 500-1000 bp (~golf ball size / jig saw puzzle) Variable quality • WashU-Merck Human EST Project / Trace files • ”Base-calling” non-trivial Center for Biologisk Sekvensanalyse Assembly is Non-trivial! Center for Biologisk Sekvensanalyse Sequence repositories - GenBank et al. GenBank / EMBL / DDBJ • Highly redundant (many versions of same gene) • Cross-updated daily • Version history is recorded • Previous sequence records can be retrieved • Contigs/HTGS (100-200 kb) finishing at different stages • Draft Finished • Includes genomic DNA, cDNA, ESTs, translated peptides Center for Biologisk Sekvensanalyse Non-redundant and Curated databases Non-redundant • Manual or automatic curation • DNA • RefSeq (NCBI; semi-automated) • Ensembl gene index (automated) • Protein • RefSeq (NCBI; semi-automated) • TrEMBL (EMBL; automated) Center for Biologisk Sekvensanalyse Curated database: UniProt/SwissProt SIB - Swiss Institute of Bioinformatics Protein Knowledgebase / Sequence Database • Highly curated • Experimental evidence evaluated (e.g. modifications) • All 80,000 entries checked by Amos Bairoch himself ;-) ExPASy - Expert Protein Analysis System • Proteomics tools: links + local servers Center for Biologisk Sekvensanalyse Structure databases / Protein Data Bank (PDB) X-ray , NMR biomolecular structures Protein Data Bank (PDB) >22,000 structures (April 2003) http://www.rcsb.org/pdb/ Center for Biologisk Sekvensanalyse Functional Categorization Gene Ontology (GO) • Hierarchical • Controlled vocabulary Center for Biologisk Sekvensanalyse Functional Categorization Gene Ontology (GO) http://www.geneontology.org/ • Molecular Function - the tasks performed by individual gene products; examples are transcription factor and DNA helicase • Biological Process - broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component - subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex Center for Biologisk Sekvensanalyse Integration of databases - Webs of websites Links, links, links... SRS = Sequence Retrieval System • Powerful, complex query language BioDAS – Distributed Annotation System http://srs.ebi.ac.uk/ Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) • (Evaluate the value of predicted features) Center for Biologisk Sekvensanalyse GeneCards http://nciarray.nci.nih.gov/cards/ Center for Biologisk Sekvensanalyse GeneCards-II Center for Biologisk Sekvensanalyse GeneCards-III Center for Biologisk Sekvensanalyse GeneCards-IV Center for Biologisk Sekvensanalyse GeneCards-V Center for Biologisk Sekvensanalyse Genetic/Medical Information OMIM, Online Mendelian Inheritance in Man (NCBI) • The OMIM database is a catalog of human genes and genetic disorders • >13,000 entries (April, 2002) • Examples: cystic fibrosis, prions, amyloid precursor protein • Condensed, highly curated descriptions of genetics/disease/animal models/references Center for Biologisk Sekvensanalyse OMIM-I (http://www3.ncbi.nlm.nih.gov/Omim/) Center for Biologisk Sekvensanalyse OMIM-II Center for Biologisk Sekvensanalyse OMIM-III Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs)? (Prediction servers) • (Evaluate the value of predicted features) Center for Biologisk Sekvensanalyse Genome Browsing Three public • Open access • Use same genome build/assembly • NCBI (U.S.) • UCSC (Santa Cruz, U.S.) • EnsEmbl (EBI, EU) One private • Restricted, commercial • Academic, free usage: 1 Mbase/week • Proprietary assembly • Celera Genomics (U.S.) Center for Biologisk Sekvensanalyse Celera Human/Mouse Genomes Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World NCBI – National Center for Biotechnology Information (U.S.) • http://www.ncbi.nlm.nih.gov/Genomes/index. html UCSC – Univ. California – Santa Cruz (U.S.) • http://genome.ucsc.edu/ EnsEmbl – European Molecular Biology Laboratory (E.U.) • http://www.ensembl.org/ Center for Biologisk Sekvensanalyse NCBI Center for Biologisk Sekvensanalyse NCBI Center for Biologisk Sekvensanalyse UCSC – Genome Browser Center for Biologisk Sekvensanalyse UCSC – Genome Browser II Center for Biologisk Sekvensanalyse Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser Center for Biologisk Sekvensanalyse EnsEmbl – Genome Browser Center for Biologisk Sekvensanalyse For ’my gene’, how do I: Get an overview of the sequence information known? (GeneCards) Examine the ’Genome Neighbourhood’? (Genome Browsers) Predict protein post-translational modifications (PTMs) or Gene Structure? (Prediction servers) • ...and evaluate the reliability of prediction methods CBS Services/Toolbox Center for Biologisk Sekvensanalyse http://www.cbs.dtu.dk/services/ Center for Biologisk Sekvensanalyse NetPhos – a prediction server Center for Biologisk Sekvensanalyse http://www.cbs.dtu.dk/services/NetPhos/ Center for Biologisk Sekvensanalyse NetPhos – a prediction server Center for Biologisk Sekvensanalyse Evaluating Prediction Servers Performance on independent/crossvalidated data presented? Published in peer-reviewed journal? Cited by others? • Science Citation Index Linked to from credible web sites? • Google Page-rank • ”link:URL” search Center for Biologisk Sekvensanalyse Evaluating Prediction Servers Center for Biologisk Sekvensanalyse 2can Bioinformatics Education At EBI – European Bioinformatics Institute http://www.ebi.ac.uk/2 can/index.html Tutorials, resource links, etc. Center for Biologisk Sekvensanalyse Starting Points General Bioinformatics • NCBI, National Center for Biotechnology Information, U.S. • EBI, European Bioinformatics Institute Prediction Tools • CBS, DK • Expasy (Protein analysis), Switzerland Center for Biologisk Sekvensanalyse Dynamic Resources Pros • Includes most recent developments • Updated regularly • User interface improves(usually) Cons • Difficult to keep pace • Tutorials and lectures hard to recycle ;-( • Difficult to use at irregular intervals Center for Biologisk Sekvensanalyse Genome Browsers - Portals to the Genomic World Three main entry points: • NCBI, UCSC, EnsEmbl • Essentially contain same information • High degree of linking to secondary databases • Advisable to become familiar with only one genome browser • Learn to navigate and make queries GeneCards and OMIM • well suited for getting a quick overview of a gene of interest Center for Biologisk Sekvensanalyse Prediction Servers Evaluate scientific ’soundness’ • Look for indications of quality (citations, etc.) Remember that prediction servers provide...well, predictions! Center for Biologisk Sekvensanalyse The End