* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download TGAC * Sequence Polymorphisms Module
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenomics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genome (book) wikipedia , lookup
Pathogenomics wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Point mutation wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Genomic library wikipedia , lookup
Metagenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Minimal genome wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Designer baby wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Human genome wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Messenger RNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Genome editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biological Concepts Genomes A genome is an organism’s entire complement of DNA. DNA is a directional molecule composed of two anti-parallel strands. The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose. Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons. Transposons can be located in intergenic regions (between genes) or in introns (within genes). Genes and transposons are directional, and can be encoded on either DNA strand. Repeats are non-directional, and, in effect, do occur on both strands. Transposons can mutate like any other DNA sequence. Genes Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a stop codon. Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.). The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called the “template strand,” because it serves as the template for synthesizing mRNA. Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes. Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF). Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining segments (exons) are spliced together. Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus (spliceosome). Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries. Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and end in AG (mRNA: AG). The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions (UTRs); introns can be located in UTRs. In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins. UTRs hold information for the half-lives of mRNAs and for regulatory purposes. Gene > mRNA > CDS. CDS = nucleotides that encode amino acid sequence. In mRNA: CDS = ORF. BLAST Searches Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein sequence. Gene or protein homologs share sequence similarities due to descent from a common ancestor. Biological evidence is needed to edit and confirm gene models predicted by computer algorithms. Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence data are available, too, but much less common. Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA. ESTs & cDNAs may be incomplete. The BLAST algorithm does not resolve intron/exon boundaries. The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but, instead, matches query subsequences as well (“local” matches). The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence twice. 1 Web Resources A. Major Plant Genome Hubs: DOE JGI’s http://www.phyotozme.net University of Iowa: http://www.plantgdb.org/ CSHL: http://www.gramene.org/ ENSEMBL: http://plants.ensembl.org/index.html NCBI: http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html NCBI: http://www.ncbi.nlm.nih.gov/mapview/ B. Some Plant Genome Portals: Arabidopsis, TAIR: http://www.arabidopsis.org/ Corn: http://www.maizesequence.org/index.html Grape: http://www.cns.fr/externe/GenomeBrowser/Vitis/ Poplar: http://genome.jgi-psf.org/poplar/poplar.home.html Rice: http://rice.plantbiology.msu.edu/ Tomato: http://solgenomics.net/about/tomato_sequencing.pl C. Browsers: Ensembl: http://www.ensembl.org GBrowse: http://gmod.org/wiki/GBrowse JBRowse: http://jbrowse.org/ UCSC Browser: http://genome.ucsc.edu xGDB: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php D. Annotation Tools: Apollo: http://apollo.berkeleybop.org/current/index.html Artemis: http://www.sanger.ac.uk/resources/software/artemis/ yrGATE: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php E. Other Resources: Course download site: http://gfx.dnalc.org/files/evidence DynamicGene: http://www.sanger.ac.uk/resources/software/artemis/ GeneBoy: http://www.dnai.org/geneboy/ BioServers: http://www.bioservers.org/bioserver/ mRNA/gDNA: http://www.ncbi.nlm.nih.gov/spidey/ mRNA/gDNA: http://pbil.univ-lyon1.fr/sim4.php Splice site predictor: http://www.fruitfly.org/seq_tools/splice.html 2 Promoter predictor: http://www.fruitfly.org/seq_tools/promoter.html 3