* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Primary transcript wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Point mutation wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Oncogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Whole genome sequencing wikipedia , lookup
X-inactivation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene therapy wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Public health genomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Genomic library wikipedia , lookup
Gene desert wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pathogenomics wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Minimal genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Using Nucleotide Sequence Databases © Wiley Publishing. 2007. All Rights Reserved. Learning Objectives Distinguish the structure of eukaryotic and prokaryotic genes Make sense of a GenBank entry Understand the difference between GenBank and a gene-centric resource Browse whole-genome databases Outline 1. 2. 3. 4. 5. Reminder on genes and genomes Searching GenBank (the DNA database) Using gene-centric databases Analyzing microbial genomes Browsing the human genome Typical Prokaryotic Genome Prokaryotes are microscopic organisms They have a circular genome Its length is a few million Bp (0.6 – 10 Mb) Prokaryotes have about 1 gene per Kb 70 % of their genome is coding for proteins Their genes do not overlap Typical Prokaryotic ProteinCoding Gene The gene has an uninterrupted sequence Prokaryotic mRNA contains • The Ribosome Binding Site (RBS) • The Open Reading Frame (ORF) in one piece • In operons, the RNA can contain several ORFs Typical Eukaryotic Genome Eukaryotes can be small (yeast) or big (whales) Genomes are made of linear pieces of DNA called chromosomes One chromosome: 10 to 700 Mb The Human Genome • Contains 22+1 chromosomes • Is 3 Gb long One gene every 100 Kb (human) 5 % of the genome is coding for proteins Typical Eukaryotic ProteinCoding Gene The coding sequences are made of coding exons separated by introns Introns are spliced out and exons glued together to make the ORF One gene can code for several alternative proteins: alternative splicing Prokaryotes vs. Eukaryotes Prokaryotes • Genome=one large circular chromosome + a few small circular chromosomes (plasmides) • 0.5 to 8 Mb / chromosome • Genes in one piece • 70% of the genome is coding • 1 gene / Kb Eukaryotes • Genome= many large linear chromosomes • 10 to 700 Mb / chromosome • Genes split • 5% of the genome is coding • 1 gene/ 100 Kb (Human) GenBank Housed by the National Center for Bioitechnologies (NCBI) GenBank is the memory of biological science Contains EVERY DNA sequence ever published GenBank is the original information source for most biological databases GenBank is more complicated to use than genecentric databases Reading a Prokaryotic GeneBank Entry ACCESSION is the accession number • • Unique to each entry Permanent LOCUS contains information on gene size ORGANISM Defines the organism containing the gene REFERENCE indicates who produced the sequence FEATURES lists some functional features of the gene GenBank entries can contain more than one gene FEATURE section of a GenBank Entry Promoter • Gives the precise coordinates of the promoter • There can be more than one promoter RBS gives the coordinates of the Ribosome Binding Site CDS gives all the properties of the CoDing sequence that codes for the protein Reading a Eukaryotic GeneBank Entry The sections are the same as in a prokaryotic entry SOURCE contains a map section that indicates the chromosome containing the gene GENE introduces indications to reconstruct the CDS from the gene Remember: Eukaryotic genes are interrupted by introns Assembling CDSs from a GenBank Entry The gene, mRNA, and CDS sections tell you which segments of which entry must be joined to reconstruct the gene, the mRNA, or the CDS Assembling CDSs from a GenBank Entry A gene can code for several alternative mRNAs Example: The dUTPase Gene codes for • Mitochondrial dUTPase • Nuclear dUTPase Limitations of GenBank GenBank entries can contain • Entire genes • Portions of genes • Many genes GenBank entries can be of uneven quality • Can be duplicates and/or inaccurate • The database is not a selection center • All data is treated equally GenBank entries are not the final word on particular genes • They have no authoritative biological meaning • They merely keep track of what was done Gene-centric databases are needed to compile everything that is known on a given gene and to correct potential errors Using Gene-centric Databases: Entrez Gene Entrez Gene can be accessed from the NCBI In GenBank, each entry is one sequence from one publication In Entrez Gene, each entry is one gene Entrez Gene is built with GenBank data Whole-Genome Databases The Entrez Gene genome provides access to whole-genome databases Use whole-genome sites to explore complete genomes of • Viruses • Prokaryotes • Eukaryotes A genome browser lets you get the details or the big picture • Zoom in on a precise gene • Zoom out of a portion of the genome • Visualize positions Visualizing a Viral Genome at the NCBI Go to www.ncbi.nih.nlm.gov/entrez Select viruses on the left side Type HIV1 The browser displays a map of the virus and links to information relevant to the virus and its proteins Exploring the Human Genome with ENSEMBL Accessible at www.ensembl.org ENSEMBL is a database of eukaryotic genomes • Annotated entries • Wide range of examples: human, mouse, dog, and so on ENSEMBL annotation is mostly automated ENSEMBL contains tools to • • • • Browse the complete genome Search the complete genome with BLAST Visualize the position of a gene Visualize all experimental information on this gene (transcripts) Visualizing Human Chromosomes on ENSEMBL Visualizing Human Chromosomes on ENSEMBL (cont’d.) By pointing on a chromosome region you can zoom inside the chromosome All genes are cross-indexed with databases so you can find all related experimental information Going Farther The TIGR Institute: www.tigr.org • TIGR = The Institute for Genomic Research • Specializes in prokaryotes The DoE Joint Genome Institute : img.jgi.doe.gov • DoE = Department of Energy (U.S. government agency) • Focuses on environmentally important prokaryotes University of California at Santa Cruz: genome.ucsc.edu • A very good alternative to ENSEMBL