Download Document

Working with a Single DNA Sequence © Wiley Publishing. 2007. All Rights Reserved. Learning Objectives  Discover how to manipulate your DNA sequence on a computer, analyze its composition, predict its restriction map, and amplify it with PCR  Find out about gene-prediction methods, their potential, and their limitations  Understand how genomes and sequences and assembled Outline 1. Cleaning your DNA of contaminants 2. Digesting your DNA in the computer 3. Finding protein-coding genes in your DNA sequence 4. Assembling a genome Cleaning DNA Sequences  In order to sequence genomes, DNA sequences are often cloned in a vector (plasmid, YAC, or cosmide)  Sequences of the vector can be mixed with your DNA sequence  Before working with your DNA sequence, you should always clean it with VecScreen Computing a Restriction Map  It is possible to cut DNA sequences using restriction enzymes  Each type of restriction enzyme recognizes and cuts a different sequence: • EcoR1: GAATTC • BamH1: GGATCC  There are more than 900 different restriction enzymes, each with a different specificity  The restriction map is the list of all potential cleavage sites in a DNA molecule  You can compile a restriction map with www.firtsmarket.com/cutter Making PCR with a Computer  Polymerase Chain Reaction (PCR) is a method for amplifying DNA  PCR is used for many applications, including • Gene cloning • Forensic analysis • Paternity tests  PCR amplifies the DNA between two anchors  These anchors are called the PCR primer Designing PCR Primers  PCR primes are typically 20 nucleotides long  The primers must hybridize well with the DNA  On biotools.umassmed.edu, find the best location for the primers: • Most stable • Longest extension Analyzing DNA Composition DNA composition varies a lot Stability of a DNA sequence depends on its G+C content (total guanine and cytosine) High G+C makes very stable DNA molecules Online resources are available to measure the GC content of your DNA sequence Predicting Genes  The most important analysis carried out on DNA sequences is gene prediction  Gene prediction requires different methods for eukaryotes and prokaryotes  Most gene-prediction methods use hidden Markov Models Predicting Genes in Prokaryotic Genome In prokaryotes, protein-coding genes are uninterrupted • No introns Predicting protein-coding genes in prokaryotes is considered a solved problem • You can expect 99% accuracy Finding Prokaryotic Genes with GeneMark  GeneMark is the state of the art for microbial genomes  GeneMark can • Find short proteins • Resolve overlapping genes • Identify the best start codon  GeneMark uses hidden Markov Models  Use exon.gatech.edu/GeneMark Predicting Eukaryotic Genes  Eukaryotic genes (human, for example) are very hard to predict  Precise and accurate eukaryotic gene prediction is still an open problem • ENSEMBL contains 21,662 genes for the human genome • There may well be more genes than that in the genome, as yet unpredicted  You can expect 70% accuracy on the human genome with automatic methods  Experimental information is still needed to predict eukaryotic genes Finding Eukaryotic Genes with GenomeScan  GenomeScan is the state of the art for eukaryotic genes  GenomeScan works best with • Long exons • Genes with a low GC content  GenomeScan uses • Hidden Markov Models • Homology searches  It can incorporate experimental information  Use genes.mit.edu/genomescan Producing Genomic Data  Until recently, sequencing an entire genome was very expensive and difficult  Only major institutes could do it  Today, scientists estimate that in 10 years, it will cost about $1000 to sequence a human genome  With sequencing so cheap, assembling your own genomes is becoming an option  How could you do it? Sequencing and Assembling a Genome (I)  To sequence a genome, the first task is to cut it into many small, overlapping pieces  Then clone each piece Sequencing and Assembling a Genome (II)  Each piece must be sequenced  Sequencing machines cannot do an entire sequence at once • They can only produce short sequences smaller than 1 Kb • These pieces are called reads  It is necessary to assemble the reads into contigs Sequencing and Assembling a Genome (III)  The most popular program for assembling reads is PHRAP • Available at www.phrap.org  Other programs exist for joining smaller datasets • For example, try CAP3 at pbil.univ-lyon1.fr/cap3.php Going Farther  Predicting when and how genes are expressed is one of the main challenges of modern biology • It requires predicting genes • It also requires predicting promoters  The challenge is to find these regions and to understand the signals they contain  Try the following resources: • Zhang Lab • EPD rulai.cshl.edu www.epd.isb-sib.ch

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document