Download TGAC * Sequence Polymorphisms Module

Biological Concepts Genomes  A genome is an organism’s entire complement of DNA.  DNA is a directional molecule composed of two anti-parallel strands.  The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose.  Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons.  Transposons can be located in intergenic regions (between genes) or in introns (within genes).  Genes and transposons are directional, and can be encoded on either DNA strand.  Repeats are non-directional, and, in effect, do occur on both strands.  Transposons can mutate like any other DNA sequence. Genes  Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a stop codon.  Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.).  The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called the “template strand,” because it serves as the template for synthesizing mRNA.  Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes.  Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF).  Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining segments (exons) are spliced together.  Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus (spliceosome).  Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries.  Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and end in AG (mRNA: AG).  The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions (UTRs); introns can be located in UTRs.  In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins.  UTRs hold information for the half-lives of mRNAs and for regulatory purposes.  Gene > mRNA > CDS.  CDS = nucleotides that encode amino acid sequence.  In mRNA: CDS = ORF. BLAST Searches  Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein sequence.  Gene or protein homologs share sequence similarities due to descent from a common ancestor.  Biological evidence is needed to edit and confirm gene models predicted by computer algorithms.  Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence data are available, too, but much less common.  Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA.  ESTs & cDNAs may be incomplete.  The BLAST algorithm does not resolve intron/exon boundaries.  The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but, instead, matches query subsequences as well (“local” matches).  The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence twice. 1 Web Resources A. Major Plant Genome Hubs: DOE JGI’s http://www.phyotozme.net University of Iowa: http://www.plantgdb.org/ CSHL: http://www.gramene.org/ ENSEMBL: http://plants.ensembl.org/index.html NCBI: http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html NCBI: http://www.ncbi.nlm.nih.gov/mapview/ B. Some Plant Genome Portals: Arabidopsis, TAIR: http://www.arabidopsis.org/ Corn: http://www.maizesequence.org/index.html Grape: http://www.cns.fr/externe/GenomeBrowser/Vitis/ Poplar: http://genome.jgi-psf.org/poplar/poplar.home.html Rice: http://rice.plantbiology.msu.edu/ Tomato: http://solgenomics.net/about/tomato_sequencing.pl C. Browsers: Ensembl: http://www.ensembl.org GBrowse: http://gmod.org/wiki/GBrowse JBRowse: http://jbrowse.org/ UCSC Browser: http://genome.ucsc.edu xGDB: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php D. Annotation Tools: Apollo: http://apollo.berkeleybop.org/current/index.html Artemis: http://www.sanger.ac.uk/resources/software/artemis/ yrGATE: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php E. Other Resources: Course download site: http://gfx.dnalc.org/files/evidence DynamicGene: http://www.sanger.ac.uk/resources/software/artemis/ GeneBoy: http://www.dnai.org/geneboy/ BioServers: http://www.bioservers.org/bioserver/ mRNA/gDNA: http://www.ncbi.nlm.nih.gov/spidey/ mRNA/gDNA: http://pbil.univ-lyon1.fr/sim4.php Splice site predictor: http://www.fruitfly.org/seq_tools/splice.html 2 Promoter predictor: http://www.fruitfly.org/seq_tools/promoter.html 3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download TGAC * Sequence Polymorphisms Module