Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Bioinformatics
Part 2 of 2
M.E:440.714
September 8, 2003
Jonathan Pevsner, Ph.D.
[email protected]
Copyright notice
Many of the images in this powerpoint presentation
are from Bioinformatics and Functional Genomics
by Jonathan Pevsner (ISBN 0-471-21004-8).
Copyright © 2003 by John Wiley & Sons, Inc.
These images and materials may not be used
without permission from the publisher. We welcome
instructors to use these powerpoints for educational
purposes, but please acknowledge the source.
The book has a homepage at http://www.bioinfbook.org
Including hyperlinks to the book chapters.
We posted 1000 bioinformatics links here:
http://pevsnerlab.kennedykrieger.org then click “bioinformatics”
Question #3:
How can I use NCBI
(or other sites)
to find information
about a protein
or gene?
Four ways to access protein and
DNA sequences
[1] LocusLink with RefSeq
[2] Entrez
[3] UniGene
[4] ExPASy Sequence Retrieval System
(this is separate from NCBI)
4 ways to access protein and DNA sequences
[1] LocusLink with RefSeq
LocusLink is a great starting point: it collects
key information on each gene/protein from
major databases. It now covers 8 organisms.
RefSeq provides a curated, optimal accession
number for each DNA (NM_006744)
or protein (NP_007635)
[2] Entrez
[3] UniGene
[4] ExPASy SRS
What is an accession number?
An accession number is label that used to identify a
sequence. It is a string of letters and/or numbers that
corresponds to a molecular sequence.
Examples (all for retinol-binding protein, RBP4):
X02775
NT_030059
Rs7079946
GenBank genomic DNA sequence
Genomic contig
dbSNP (single nucleotide polymorphism)
DNA
N91759.1
NM_006744
An expressed sequence tag (1 of 170)
RefSeq DNA sequence (from a transcript)
RNA
NP_007635
AAC02945
Q28369
1KT7
RefSeq protein
GenBank protein
SwissProt protein
Protein Data Bank structure record
protein
4 ways to access protein and DNA sequences
[1] LocusLink with RefSeq
[2] Entrez
Entrez is divided into sites for nucleotide, protein,
structure, genomes, OMIM, and more. You can use limits
(such as RefSeq) to focus your Entrez search.
[3] UniGene
[4] ExPASy SRS
FASTA format
Graphics format
4 ways to access protein and DNA sequences
[1] LocusLink with RefSeq
[2] Entrez
[3] UniGene
UniGene collects expressed sequence tags (ESTs)
into clusters, in an attempt to form one gene per cluster.
Use UniGene to study where your gene is expressed
in the body, when it is expressed, and see its abundance.
[4] ExPASy SRS
4 ways to access protein and DNA sequences
[1] LocusLink with RefSeq
[2] Entrez
[3] UniGene
[4] ExPASy SRS
There are many bioinformatics servers outside NCBI.
Try ExPASy’s sequence retrieval system at
http://www.expasy.ch/
(ExPASy = Expert Protein Analysis System)
Or try ENSEMBL at www.ensembl.org for a premier
human genome web browser.
Question #4:
How can I find
information about
a particular disease?
Answer:
Try OMIM
Two main disease databases:
general and locus-specific
General
OMIM
GeneCards (Weizmann)
http://bioinformatics.weizmann.ac.il/cards/
Genes & Disease (at NCBI)
http://www.ncbi.nlm.nih.gov/disease/
Locus-specific
Human Gene Mutation Database (HGMD)
http://archive.uwcm.ac.uk/uwcm/mg/docs/oth_mut.html
Course sponsors
Dean’s Office, School of Medicine
Division of Health Sciences Informatics
Welch Medical Library
Kennedy Krieger Institute
Dept. of Neuroscience
Dept. of Biostatistics, School of Public Health