Download Databases_what_and_w..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenomics wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

DNA sequencing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene wikipedia , lookup

History of genetic engineering wikipedia , lookup

Protein moonlighting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Designer baby wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Human genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic library wikipedia , lookup

Pathogenomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Microsatellite wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Sequence Databases
What are they and why do we need them
What is sequence data?
DNA, RNA and Protein (Amino Acids)
Why do I need it?
• Evolution
• Mutation
• Natural Selection
• Intra and Inter-species relationships
• Niche exploitation
• Ecosystems
REALLY?
Intra and Inter-species
YES! relationships
Niche exploitation
Ecosystems
Evolution
Mutation
Natural Selection
Phenotypes
Intra and Inter-species relationships
• Phenotypes
come from the proteins.
Niche exploitation
Ecosystems
• Proteins
come from the DNA via RNA.
• Changes in DNA cause changes in proteins.
• Changes in proteins cause changes in phenotypes.
How do we find those changes?
Sequencing
Is the Sequence everything?
The sequence itself is not informative; it must be
analyzed by comparative methods against existing
databases to develop hypothesis concerning relatives
and function.
What do Databases let you do?
• Explore and investigate sequence data
 Classify organisms
 Assign a possible function to a gene
 Verify a sequences identity
 Annotate a genome
 Design primers for PCR and probe experiments
What is a Database?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Databases allow us to more easily find what we need
What Databases are there?
Ten Important Bioinformatics Databases
Name
Address
Description
GenBank/DDBJ/EMBL
www.ncbi.nlm.nih.gov
Nucleotide sequences
Ensembl
www.ensembl.org
Human/Mouse genome
PubMed
www.ncbi.nlm.nih.gov
Literature references
NR
www.ncbi.nlm.nih.gov
Protein sequences
SWISS-PROT
www.expasy.ch
Protein sequences
InterPro
www.ebi.ac.uk
Protein domains
OMIM
www.ncbi.nlm.nih.gov
Genetic diseases
Enzymes
www.chem.qmul.ac.uk
Enzymes
PDB
www.rcsb.org/pdb/
Protein structures
KEGG
www.genome.ad.jp
Metabolic pathways
Many other specialized Databases are available.
Bioinformatics for Dummies, 2003
What Database should I use?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
A.K.A. GenBank
How big is GenBank?
QuickTime™
and a
QuickTime™ and a
TIFF (Uncompressed)
decompressor
TIFF (Uncompressed) decompressor
are needed
see this
picture.
are needed
to tosee
this
picture.
1977
DNA
Sequencing
1985
PCR
1987
Automated
Sequencing
1997
Capillary
Sequencing
Who can put data into GenBank?
Sequence data are submitted to GenBank from
scientists from around the world.
Warning: GenBank does not check the validity or
accuracy of sequences submitted. This is left up
to the scientific community to verify, like all
published scientific data.
How do I use GenBank?
www.ncbi.nlm.nih.gov
Problem 1. You are constructing a phylogeny of
Euglenoids and you have determined from the literature
that the Beta-tubulin gene is a good gene for this purpose.
How do I start???
QuickTime™ and a
MPEG-4 Video decompressor
are needed to see this picture.
How do I use GenBank?
www.ncbi.nlm.nih.gov
Euglenozoa AND tubulin NOT kinetoplastida
AF182759
How do I use GenBank?
Problem 2. You are studying domestication of Sorghum
vulgare. From reading about sorghum you find out that it
is closely related to Zea mays.
You also find out that maize has a wild relative
teosinte that forms multiple stocks. Domesticated maize
forms a single stock. Domesticated sorghum has a single
stock while wild sorghum (Johnsongrass) has multiple
stocks.
Broomcorn (Sorghum)
Domesticated
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Johnsongrass Wild
Sorghum vulgare
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Sorghum halepense
How do I use GenBank?
Problem 2. Continued
Moreover, the paper states that this trait is
controlled by a single gene teosinte branched 1 (tb1).
You wonder “Does sorghum have this gene?”.
The paper does provide a set (Forward and
Reverse) PCR primers that where used to isolate and
sequence the tb1 gene.
Will they work for Sorghum?
Sequencing Sorghum
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
QuickTime™ and a
GIF decompressor
are needed to see this picture.
QuickTime™ and a
GIF decompressor
are needed to see this picture.
Sequencing Sorghum
QuickTi me™ and a
TIFF ( Uncompressed) decompressor
are needed to see thi s pi ctur e.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickT ime ™an d a
TIFF ( Uncomp res sed) deco mpre ssor
ar e need ed to see this pictur e.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Sequencing Sorghum
Does sorghum have the tb1 gene?
>Sorghum_vulgare_sequence
ATGGACTTACCGCTTTACCAACAACTGCAGCTCAGCCCGCCTTCCCCAAAGCCGGACCAATCAAGCAGCT
TCTACTGCTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGCCGCCGACGCCAGCTTTCACCTGAGCTA
CCAGATCGGTAGTGCCGCCGCCGCCATCCCTCCACAAGCCGTGATCAACTCGCCGGAGGACCTGCCGGTG
CAGCCGCTGATGGAGCAGGCGCCGGCGCCGCCTACAGAGCTTGTCGCCTGCGCCAGTGGTGGTGCACAAG
GCGCCGGCGTCAGCGTCAGCCTGGACAGGGCGGCGGCCGCGGCCGCCGCGAGGAAAGACCGGCACAGCAA
GATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTGTCCCTTGACGTCGCCCGCAAGTTCTTC
GCGCTCCAGGACATGCTTGGCTTCGACAAGGCCAGCAAGACGGTACAATGGCTCCTCAACACGTCCAAGG
CCGCCATCCAGGAGATCATGGCCGACGACGTCGACGCGTCGTCGGAGTGCGTGGAGGATGGCTCCAGCAG
CCTCTCCGTCGACGGCAAGCACAACCCGGCGGAGCAGCTGGGAGATCAGAAGCCCAAGGGTAATGGCCGC
AGCGAGGGGAAGAAGCCGGCCAAGTCAAGGAAGGCGGCGACCACCCCAAAGCCGCCAAGAAAATCGGGGA
ATAATGCGCACCCGGTCCCCGACAAGGAGACGAGGGCGAAGGCGAGGGAGAGGGCGAGGGAGCGAACCAA
GGAGAAGCACCGGATGCGTTGGGTAAAGCTTGCATCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGTG
GCTAGCGACAGGCCGAGCTCGAACCATTTGAACCACCACCACCACTCATCGTCGTCCATGAACATGCCGC
GTGCTGCGGAGGCTGAATTGGAGGAGAGGGAGAGGTGCTCATCAACTCTCAACAATAGAGGAAGGATGCA
AGAAATCACAGGGGCGAGCGAGGTGGTCCTAGGCTTTGGCAACGGAGGAGGATACGGCGGCGGCAACTAC
TACTGCCAAGAACAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGA
www.ncbi.nlm.nih.gov/BLAST/
Resources at NCBI
GenBank – Molecular Databases
Nucleotides, Proteins, Structures, Expression (ESTs) and
Taxonomy.
Literature Databases
PubMed, Journals, OMIM, Book, and Citation Matcher.
Genomes and Maps – Entrez
Map Viewer, UniGene, COGs, Organism-specific, Organelle,
Virus, and Plasmids.
Tools – Software Engineering
BLAST, Sequence Analysis, 3-D Structures, Gene Expression,
Literature and Genome Analysis.
Education
Books, Courses, Public Information.
Research
Biology, Computers.
Objectives
1. Explain what can you do with sequence data.
2. Explain what a database is.
3. Describe what kinds of data and resources are available.
4. Describe some of the uses of databases.
Other Specialty Databases
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
are needed to see this picture.
TIFF (Uncompressed) decompressor
QuickTime™ and a
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.