Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genome evolution wikipedia , lookup
Human genome wikipedia , lookup
DNA sequencing wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Human Genome Project wikipedia , lookup
Exome sequencing wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Metagenomics wikipedia , lookup
The Human Genome Project – Part 2 BLT/ Topic 1 Pt 2/Apr 2012 2 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Your assignment for last week… • Group 1 and 2 – explain the following terms related to genome sequencing: 1: mapping, STSs and ESTs, coverage, contigs, golden tiling path, • 2: library, BACs, finishing, annotation • Group 3 – explain the hierarchical approach • Group 4 – explain the whole genome shotgun approach 3 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Some animations to watch first …. http://www.yourgenome.org/teachers/bac.shtml http://www.dnalc.org/resources/animations/ http://bcs.whfreeman.com/thelifewire/content/chp17/1702002.html http://www.dnaftb.org/39/ 4 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Background • Field of genomics began with decision to sequence human genome ▫ Size of human genome is 3 billion base pairs, which necessitated new ways to do sequencing • Approaches to sequencing the human genome ▫ Scale up existing techniques ▫ Develop new sequencing techniques ▫ Start with smaller genomes used as a warm-up project Whole-genome shotgun sequencing I • Developed by Celera ▫ Subsidiary of Applied Biosystems, maker of automated sequencers • No mapping • Instead, the whole genome is sheared • Randomly sequenced © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 6 Whole-genome shotgun sequencing II © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Generate tens of millions of sequence reads Assemble 7 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Whole-genome shotgun sequencing III • Major challenge: assembly ▫ Repetitive elements are the biggest problem • Performed on very high-speed computers, using novel software • Key to assembly is paired reads ▫ Sequence both ends of each clone 8 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Map-based sequencing I • Human Genome Project adopted a map-based strategy ▫ ▫ ▫ ▫ Start with well-defined physical map Produce shortest tiling path for large-insert clones Assemble the sequence for each clone Then assemble the entire sequence, based on the physical map 9 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Map based sequencing Steps in genomic sequencing • Library making ▫ Large-insert library from genome • Production sequencing ▫ Generate fragments to be sequenced ▫ Perform sequencing reactions ▫ Determine sequence • Finishing ▫ Assemble into continuous sequence ▫ Fill gaps 10 Map based sequencing Library making © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 • Library of genomic fragments made in vector ▫ BAC – Bacterial artificial chromosome ▫ Usually have several-fold coverage Every DNA sequence on five to eight different clones • Difficult and inefficient to sequence straight from large fragment • Need to break into manageable pieces ▫ Random shearing By nebulization or sonication Fragments for sequencing • Generally use 2–10 kb pieces for sequencing • Clone into sequencing vector ▫ Contains binding sites for sequencing primers © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Pros and cons of large-insert vectors • Lambda phage and cosmids ▫ Inserts stable ▫ But insert size too small for large-scale sequencing projects • YACs ▫ Largest insert size ▫ But difficult to work with © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 BACs and PACs • BACs and PACs ▫ Most commonly used vectors for large-scale sequencing ▫ Good compromise between insert size and ease of use ▫ Growth and isolation similar to that for plasmids © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Contigs • Contigs are groups of overlapping pieces of chromosomal DNA ▫ Make contiguous clones • For sequencing one wants to create “minimum tiling path” ▫ Contig of smallest number of inserts that covers a region of the chromosome genomic DNA contig minimum tiling path © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 15 Physical mapping © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 • Restriction mapping (by restriction endonucleases) • STS (Sequence tag site) mapping • FISH – fluorescence in-situ hybridisation 16 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 PPT from D. Bartholomeu 17 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 PPT from D. Bartholomeu 18 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finishing I • Process of assembling raw sequence reads into accurate contiguous sequence ▫ Required to achieve 1/10,000 accuracy Gap Single stranded • Manual process ▫ Look at sequence reads at positions where programs can’t tell which base is the correct one ▫ Fill gaps ▫ Ensure adequate coverage © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finishing II • To fill gaps in sequence, design primers and sequence from primer • To ensure adequate coverage, find regions where there is not sufficient coverage and use specific primers for those areas GAP Primer Primer © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 21 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Verification • Region verified for the following: ▫ Coverage ▫ Sequence quality ▫ Contiguity • Determine restriction-enzyme cleavage sites ▫ Generate restriction map of sequenced region ▫ Must agree with fingerprint generated of clone during mapping step 22 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Sequencing coverage • Coverage is the number of times the same region is sequenced ▫ Ideally, one wants an equal number of sequences in each direction • To obtain accuracy of one error in 10,000 bases, one needs the following: ▫ 10x coverage Stringent finishing ▫ Complete sequence Base-perfect sequencing 23 Map-based sequencing II Construct clone map and select mapped clones Generate several thousand sequence reads per clone Assemble © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 24 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Why map before sequencing? • Major problem in large-scale sequencing: ▫ Current technologies can only sequence 600–800 bases at a time • One solution: make a physical map of overlapping DNA fragments ▫ Determine sequence of each fragment ▫ Then assemble to form contiguous sequence 25 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 26 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Controversy: Map-based sequencing vs. whole-genome shotgun sequencing • Celera used publicly funded sequence to produce its published draft of the human genome • Scientists who worked on the map-based effort claimed Celera couldn’t have produced a draft without access to the public sequence • Celera scientists claim that they could have produced an accurate draft even without the public sequence 27 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Hybrid approach • Combines aspects of both map-based and wholegenome shotgun approaches ▫ ▫ ▫ ▫ Map clones Sequence some of the mapped clones Do whole-genome sequencing Combine information from both methods Use sequence from mapped clones as scaffold to assemble whole-genome shotgun reads • Used for sequencing the mouse genome Sequence annotation • Annotation performed on completed sequence • Computer programs used to find the following: ▫ ▫ ▫ ▫ Genes Exons and introns Regulatory sequences Repetitive elements © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Industrialization of sequencing • Most large-scale sequencing projects divide tasks among different teams ▫ Large-insert libraries ▫ Production sequencing ▫ Finishing • Sequencing machines run 24/7 • Many tasks performed by robots © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 30 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 More about Mapping! • • • • • Genetic mapping Physical mapping Chromosome walking Determining DNA sequences New techniques for mapping and sequencing Mapping I • Mapping is identifying relationships between genes on chromosomes ▫ Just as a road map shows relationships between towns on highway • Two types of mapping: genetic and physical © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 32 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Mapping II Genetic mapping ▫ Based on differences in recombination frequency between genetic loci • Physical mapping ▫ Based on distances in base pairs between specific sequences found on the chromosome • Most powerful when genetic and physical mapping are combined 33 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genetic mapping I • Based on recombination frequencies ▫ The further away two points are on a chromosome, the more recombination there is between them • Because recombination frequencies vary along a chromosome, we can obtain a relative position for the loci 34 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genetic mapping II • Genetic mapping requires that a cross be performed between two related organisms ▫ The organism should have phenotypic differences resulting from allele differences at two or more loci • The frequency of recombination is determined by counting the F2 progeny with each phenotype Genetic mapping example I • Genes on two different chromosomes ▫ Independent assortment during meiosis ▫ No linkage F1 9 : 3 : 3 : 1 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genetic mapping example II • Genes very close together on same chromosome ▫ Will usually end up together after meiosis ▫ Tightly linked F1 1 : 2 : 1 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genetic mapping example III • Genes on same chromosome, but not very close together ▫ Recombination will occur ▫ Frequency of recombination proportional to distance between genes ▫ Measured in centiMorgans recombinants © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 38 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genetic markers • Genetic mapping between positions on chromosomes ▫ Positions can be genes Responsible for phenotype Examples: eye color or disease trait ▫ Positions can be physical markers DNA sequence variation 39 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Physical mapping • Determination of physical distance between two points on chromosome ▫ Distance in base pairs • Physical markers are DNA sequences that vary between two related genomes Referred to as a DNA polymorphism Usually not in a gene ▫ Examples RFLP SSLP SNP RFLP • Restriction-fragment length polymorphism ▫ Cut genomic DNA from two individuals with restriction enzyme ▫ Run Southern blot ▫ Probe with different pieces of DNA ▫ Sequence difference creates different band pattern 200 1 GGATCC CCTAGG 400 GTATCC GATAGG 200 * 2 GGATCC CCTAGG GCATCC GGTAGG GGATCC CCTAGG 400 1 2 * * 600 400 GGATCC CCTAGG 200 * © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 SSLP • Simple-sequence length polymorphism • • • • Most genomes contain repeats of three or four nucleotides Length of repeat varies Use PCR with primers external to the repeat region On gel, see difference in length of amplified fragment 1 1 ATCCTACGACGACGACGATTGATGCT 18 2 ATCCTACGACGACGACGACGACGATTGATGCT 12 2 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 SNP • Single-nucleotide polymorphism ▫ One-nucleotide difference in sequence of two organisms ▫ Found by sequencing ▫ Example: Between any two humans, on average one SNP every 1,000 base pairs 1ATCGATTGCCATGAC 2ATCGATGGCCATGAC SNP © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Expressed Sequence Tags (EST) • Idea: sequence only “important” genes ▫ Those genes expressed in a particular tissue • Sequence random cDNAs made from RNA extracted from tissue of interest Muscle mRNA cDNA libraries “New” Biolims Robotized stations DNA sequencers © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing II • Make cDNA library • Select clones at random • Sequence in from one or both ends 5’ cDNA 3’ Partial sequence = EST ▫ One-pass sequencing • The resulting sequence = expressed sequence tag (EST) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 45 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 EST sequencing: pros and cons • Advantages ▫ Relatively inexpensive ▫ Certainty that sequence comes from transcribed gene ▫ Information about tissue and developmental stage • Disadvantages ▫ No regulatory information ▫ Usually less than 60% of genes found in EST collections ▫ Location of sequence in genome unknown