* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Gel electrophoresis of nucleic acids wikipedia , lookup
DNA sequencing wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Exome sequencing wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Molecular cloning wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Non-coding DNA wikipedia , lookup
Molecular evolution wikipedia , lookup
Community fingerprinting wikipedia , lookup
Working with a Single DNA Sequence © Wiley Publishing. 2007. All Rights Reserved. Learning Objectives Discover how to manipulate your DNA sequence on a computer, analyze its composition, predict its restriction map, and amplify it with PCR Find out about gene-prediction methods, their potential, and their limitations Understand how genomes and sequences and assembled Outline 1. Cleaning your DNA of contaminants 2. Digesting your DNA in the computer 3. Finding protein-coding genes in your DNA sequence 4. Assembling a genome Cleaning DNA Sequences In order to sequence genomes, DNA sequences are often cloned in a vector (plasmid, YAC, or cosmide) Sequences of the vector can be mixed with your DNA sequence Before working with your DNA sequence, you should always clean it with VecScreen Computing a Restriction Map It is possible to cut DNA sequences using restriction enzymes Each type of restriction enzyme recognizes and cuts a different sequence: • EcoR1: GAATTC • BamH1: GGATCC There are more than 900 different restriction enzymes, each with a different specificity The restriction map is the list of all potential cleavage sites in a DNA molecule You can compile a restriction map with www.firtsmarket.com/cutter Making PCR with a Computer Polymerase Chain Reaction (PCR) is a method for amplifying DNA PCR is used for many applications, including • Gene cloning • Forensic analysis • Paternity tests PCR amplifies the DNA between two anchors These anchors are called the PCR primer Designing PCR Primers PCR primes are typically 20 nucleotides long The primers must hybridize well with the DNA On biotools.umassmed.edu, find the best location for the primers: • Most stable • Longest extension Analyzing DNA Composition DNA composition varies a lot Stability of a DNA sequence depends on its G+C content (total guanine and cytosine) High G+C makes very stable DNA molecules Online resources are available to measure the GC content of your DNA sequence Predicting Genes The most important analysis carried out on DNA sequences is gene prediction Gene prediction requires different methods for eukaryotes and prokaryotes Most gene-prediction methods use hidden Markov Models Predicting Genes in Prokaryotic Genome In prokaryotes, protein-coding genes are uninterrupted • No introns Predicting protein-coding genes in prokaryotes is considered a solved problem • You can expect 99% accuracy Finding Prokaryotic Genes with GeneMark GeneMark is the state of the art for microbial genomes GeneMark can • Find short proteins • Resolve overlapping genes • Identify the best start codon GeneMark uses hidden Markov Models Use exon.gatech.edu/GeneMark Predicting Eukaryotic Genes Eukaryotic genes (human, for example) are very hard to predict Precise and accurate eukaryotic gene prediction is still an open problem • ENSEMBL contains 21,662 genes for the human genome • There may well be more genes than that in the genome, as yet unpredicted You can expect 70% accuracy on the human genome with automatic methods Experimental information is still needed to predict eukaryotic genes Finding Eukaryotic Genes with GenomeScan GenomeScan is the state of the art for eukaryotic genes GenomeScan works best with • Long exons • Genes with a low GC content GenomeScan uses • Hidden Markov Models • Homology searches It can incorporate experimental information Use genes.mit.edu/genomescan Producing Genomic Data Until recently, sequencing an entire genome was very expensive and difficult Only major institutes could do it Today, scientists estimate that in 10 years, it will cost about $1000 to sequence a human genome With sequencing so cheap, assembling your own genomes is becoming an option How could you do it? Sequencing and Assembling a Genome (I) To sequence a genome, the first task is to cut it into many small, overlapping pieces Then clone each piece Sequencing and Assembling a Genome (II) Each piece must be sequenced Sequencing machines cannot do an entire sequence at once • They can only produce short sequences smaller than 1 Kb • These pieces are called reads It is necessary to assemble the reads into contigs Sequencing and Assembling a Genome (III) The most popular program for assembling reads is PHRAP • Available at www.phrap.org Other programs exist for joining smaller datasets • For example, try CAP3 at pbil.univ-lyon1.fr/cap3.php Going Farther Predicting when and how genes are expressed is one of the main challenges of modern biology • It requires predicting genes • It also requires predicting promoters The challenge is to find these regions and to understand the signals they contain Try the following resources: • Zhang Lab • EPD rulai.cshl.edu www.epd.isb-sib.ch