Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Bioinformatics Part 2 of 2 M.E:440.714 September 8, 2003 Jonathan Pevsner, Ph.D. [email protected] Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by John Wiley & Sons, Inc. These images and materials may not be used without permission from the publisher. We welcome instructors to use these powerpoints for educational purposes, but please acknowledge the source. The book has a homepage at http://www.bioinfbook.org Including hyperlinks to the book chapters. We posted 1000 bioinformatics links here: http://pevsnerlab.kennedykrieger.org then click “bioinformatics” Question #3: How can I use NCBI (or other sites) to find information about a protein or gene? Four ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez [3] UniGene [4] ExPASy Sequence Retrieval System (this is separate from NCBI) 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq LocusLink is a great starting point: it collects key information on each gene/protein from major databases. It now covers 8 organisms. RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635) [2] Entrez [3] UniGene [4] ExPASy SRS What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 NT_030059 Rs7079946 GenBank genomic DNA sequence Genomic contig dbSNP (single nucleotide polymorphism) DNA N91759.1 NM_006744 An expressed sequence tag (1 of 170) RefSeq DNA sequence (from a transcript) RNA NP_007635 AAC02945 Q28369 1KT7 RefSeq protein GenBank protein SwissProt protein Protein Data Bank structure record protein 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez Entrez is divided into sites for nucleotide, protein, structure, genomes, OMIM, and more. You can use limits (such as RefSeq) to focus your Entrez search. [3] UniGene [4] ExPASy SRS FASTA format Graphics format 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez [3] UniGene UniGene collects expressed sequence tags (ESTs) into clusters, in an attempt to form one gene per cluster. Use UniGene to study where your gene is expressed in the body, when it is expressed, and see its abundance. [4] ExPASy SRS 4 ways to access protein and DNA sequences [1] LocusLink with RefSeq [2] Entrez [3] UniGene [4] ExPASy SRS There are many bioinformatics servers outside NCBI. Try ExPASy’s sequence retrieval system at http://www.expasy.ch/ (ExPASy = Expert Protein Analysis System) Or try ENSEMBL at www.ensembl.org for a premier human genome web browser. Question #4: How can I find information about a particular disease? Answer: Try OMIM Two main disease databases: general and locus-specific General OMIM GeneCards (Weizmann) http://bioinformatics.weizmann.ac.il/cards/ Genes & Disease (at NCBI) http://www.ncbi.nlm.nih.gov/disease/ Locus-specific Human Gene Mutation Database (HGMD) http://archive.uwcm.ac.uk/uwcm/mg/docs/oth_mut.html Course sponsors Dean’s Office, School of Medicine Division of Health Sciences Informatics Welch Medical Library Kennedy Krieger Institute Dept. of Neuroscience Dept. of Biostatistics, School of Public Health