Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gel electrophoresis of nucleic acids wikipedia , lookup

DNA sequencing wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Exome sequencing wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Molecular cloning wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

RNA-Seq wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Working with a Single
DNA Sequence
© Wiley Publishing. 2007. All Rights Reserved.
Learning Objectives
 Discover how to manipulate your DNA sequence on
a computer, analyze its composition, predict its
restriction map, and amplify it with PCR
 Find out about gene-prediction methods, their
potential, and their limitations
 Understand how genomes and sequences and
assembled
Outline
1. Cleaning your DNA of contaminants
2. Digesting your DNA in the computer
3. Finding protein-coding genes in your DNA sequence
4. Assembling a genome
Cleaning DNA Sequences
 In order to sequence genomes, DNA sequences are often cloned in a
vector (plasmid, YAC, or cosmide)
 Sequences of the vector can be mixed with your DNA sequence
 Before working with your DNA sequence, you should always clean it
with VecScreen
Computing a Restriction Map
 It is possible to cut DNA sequences using restriction enzymes
 Each type of restriction enzyme recognizes and cuts a different
sequence:
• EcoR1: GAATTC
• BamH1: GGATCC
 There are more than 900 different restriction enzymes, each with a
different specificity
 The restriction map is the list of all potential cleavage sites in a DNA
molecule
 You can compile a restriction map with www.firtsmarket.com/cutter
Making PCR with a Computer
 Polymerase Chain Reaction (PCR) is a method for amplifying DNA
 PCR is used for many applications, including
• Gene cloning
• Forensic analysis
• Paternity tests
 PCR amplifies the DNA between two anchors
 These anchors are called the PCR primer
Designing PCR Primers
 PCR primes are typically 20 nucleotides long
 The primers must hybridize well with the DNA
 On biotools.umassmed.edu, find the best location for the
primers:
• Most stable
• Longest extension
Analyzing DNA Composition
DNA composition varies a lot
Stability of a DNA sequence depends on its G+C
content (total guanine and cytosine)
High G+C makes very stable DNA molecules
Online resources are available to measure the GC
content of your DNA sequence
Predicting Genes
 The most important analysis carried out on DNA
sequences is gene prediction
 Gene prediction requires different methods for
eukaryotes and prokaryotes
 Most gene-prediction methods use hidden Markov
Models
Predicting Genes in Prokaryotic Genome
In prokaryotes, protein-coding genes are
uninterrupted
• No introns
Predicting protein-coding genes in prokaryotes is
considered a solved problem
• You can expect 99% accuracy
Finding Prokaryotic Genes
with GeneMark
 GeneMark is the state of the art
for microbial genomes
 GeneMark can
• Find short proteins
• Resolve overlapping genes
• Identify the best start codon
 GeneMark uses hidden Markov
Models
 Use exon.gatech.edu/GeneMark
Predicting Eukaryotic Genes
 Eukaryotic genes (human, for example) are very hard to predict
 Precise and accurate eukaryotic gene prediction is still an open problem
• ENSEMBL contains 21,662 genes for the human genome
• There may well be more genes than that in the genome, as yet unpredicted
 You can expect 70% accuracy on the human genome with automatic
methods
 Experimental information is still needed to predict eukaryotic genes
Finding Eukaryotic Genes
with GenomeScan
 GenomeScan is the state of the
art for eukaryotic genes
 GenomeScan works best with
• Long exons
• Genes with a low GC content
 GenomeScan uses
• Hidden Markov Models
• Homology searches
 It can incorporate experimental
information
 Use genes.mit.edu/genomescan
Producing Genomic Data
 Until recently, sequencing an entire genome was very
expensive and difficult
 Only major institutes could do it
 Today, scientists estimate that in 10 years, it will cost about
$1000 to sequence a human genome
 With sequencing so cheap, assembling your own genomes is
becoming an option
 How could you do it?
Sequencing and Assembling
a Genome (I)
 To sequence a genome, the first task is to cut it into
many small, overlapping pieces
 Then clone each piece
Sequencing and Assembling
a Genome (II)
 Each piece must be sequenced
 Sequencing machines cannot do an entire sequence at once
• They can only produce short sequences smaller than 1 Kb
• These pieces are called reads
 It is necessary to assemble the reads into contigs
Sequencing and Assembling
a Genome (III)
 The most popular program for assembling reads is PHRAP
• Available at www.phrap.org
 Other programs exist for joining smaller datasets
• For example, try CAP3 at pbil.univ-lyon1.fr/cap3.php
Going Farther
 Predicting when and how genes are expressed is one of the
main challenges of modern biology
• It requires predicting genes
• It also requires predicting promoters
 The challenge is to find these regions and to understand the
signals they contain
 Try the following resources:
• Zhang Lab
• EPD
rulai.cshl.edu
www.epd.isb-sib.ch