Download Slide - CSUS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Accessing information on
molecular sequences
Bio 224
Dr. Tom Peavy
Sept 1, 2010
What is an accession number?
An accession number is a label that is used to identify a
sequence. It is a string of letters and/or numbers that
corresponds to a molecular sequence.
Examples (all for retinol-binding protein, RBP4):
X02775
NT_030059
Rs7079946
GenBank genomic DNA sequence
Genomic contig
dbSNP (single nucleotide polymorphism)
DNA
N91759.1
NM_006744
An expressed sequence tag (1 of 170)
RefSeq DNA sequence (from a transcript)
RNA
NP_007635
AAC02945
Q28369
1KT7
RefSeq protein
GenBank protein
SwissProt protein
Protein Data Bank structure record
protein
NCBI’s RefSeq project: accession for
genomic, mRNA, protein sequences
Accession
AC_123456
AP_123456
NC_123456
NG_123456
NM_123456
NM_123456789
NP_123456
NP_123456789
NR_123456
NT_123456
NW_123456
NZ_ABCD12345678
XM_123456
XP_123456
XR_123456
YP_123456
ZP_12345678
Molecule
Genomic
Protein
Genomic
Genomic
mRNA
mRNA
Protein
Protein
RNA
Genomic
Genomic
Genomic
mRNA
Protein
RNA
Protein
Protein
Method
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Curation
Mixed
Automated
Automated
Automated
Automated
Automated
Automated
Auto. & Curated
Automated
Note
Alternate complete genomic
Protein products; alternate
Complete genomic molecules
Incomplete genomic regions
Transcript products; mRNA
Transcript products; 9-digit
Protein products;
Protein products; 9-digit
Non-coding transcripts
Genomic assemblies
Genomic assemblies
Whole genome shotgun data
Transcript products
Protein products
Transcript products
Protein products
Protein products
Six ways to access DNA and
protein sequences
1) Entrez Gene with RefSeq database (NCBI)
2) UniGene
3) Nucleotide or Protein databases (NCBI)
4) European Bioinformatics Institute (EBI)
and Ensembl (separate from NCBI)
5) ExPASy Sequence Retrieval System
(separate from NCBI)
6) UCSC Genome Browser
What is an EST?
• Expressed Sequence Tag sequence
• “A short strand of DNA that is part of a
cDNA molecule and can act as an
identifier of a gene.”
• In essence, a single pass DNA
sequencing reaction for a particular cDNA
UniGene: unique genes via ESTs
• UniGene at NCBI:
www.ncbi.nlm.nih.gov/UniGene
• UniGene clusters contain many ESTs, which are
DNA sequences (typically 500 base pairs in length)
corresponding to the mRNA from an expressed gene.
ESTs are sequenced from a complementary DNA
(cDNA) library.
• UniGene data come from many cDNA libraries.
Thus, when you look up a gene in UniGene
you get information on its abundance
and its regional distribution.
Pages 20-21
Cluster sizes in UniGene
This is a gene with
1 EST associated;
the cluster size is 1
Cluster sizes in UniGene
This is a gene (or 1 cluster)
with10 ESTs associated;
the cluster size is 10
Note: HTC= high thoroughput cDNAs
FASTA format
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene
Orthologous genes for various model species can be
easily identified using this site (curated database)