Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BioSci 145B Lecture #5 5/4/2004 • Bruce Blumberg – 2113E McGaugh Hall - office hours Wed 12-1 PM (or by appointment) – phone 824-8573 – [email protected] • TA – Curtis Daly [email protected] – 2113 McGaugh Hall, 924-6873, 3116 – Office hours Tuesday 11-12 • lectures will be posted on web pages after lecture – http://eee.uci.edu/04s/05705/ - link only here – http://blumberg-serv.bio.uci.edu/bio145b-sp2004 – http://blumberg.bio.uci.edu/bio145b-sp2004 BioSci 145B lecture 5 page 1 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing • The problem – Genome sizes for most eukaryotes are large (108-109 bp) – High quality sequences only about 600-800 bp /pass • The solution – Break genome into lots of bits and sequence them all – Reassemble with computer • The benefit – Rapid increase in information about genome size, gene comparisons, etc BioSci 145B lecture 5 page 2 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) • Shotgun sequencing NOT invented by Craig Venter – Messing 1981 first description of shotgun – Sanger lab developed current methods in 1983 – approach • blast genome into small chunks – Shearing is usual – 4-cutters also used • clone these chunks – In the early days, try to make small insert libraries .5-1.5kb – Now typically make 3 library types » 3-5 kb, 8 kb plasmid » 40 kb fosmid - to jump repetitive sequences BioSci 145B lecture 5 page 3 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) • sequence + assemble by computer – A priori difficulties • how to assemble fragments – Software now very good • what to do about repeats? – Fosmids and BAC STC help a lot • how to get nice uniform distribution of sequences without too much redundancy? – Biggest problem, not really well resolved BioSci 145B lecture 5 page 4 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) – Assembled sequences always have gaps of various sizes • how to cross these gaps? – Quickly and cost-effectively • Need to link sequences somehow – How depends on the size of the gaps to be crossed BioSci 145B lecture 5 page 5 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) – For small gaps (up to 8 kb or so) • often can close by sequencing both ends of clones – For medium sized gaps (8-30 kb) • Primer walking across a linking clone (cosmid or fosmid) BioSci 145B lecture 5 page 6 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) • Large gaps require much more effort – Identify large insert clones that span gap • Typically from BAC end sequences • May have to screen libraries to find – Shotgun sequence these and assemble – Close any small gaps remaining with primer walking BioSci 145B lecture 5 page 7 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) • Shotgun sequencing (contd) – How to minimize sequence redundancy (re-sequencing the same region)? • Best way to minimize redundancy is map before you start – C. elegans was done this way - when the sequence was finished, it was FINISHED » mapping took almost 10 years – mapping much too tedious and nonprofitable for Celera » who cares about redundancy, let’s sequence and make $$ • why does redundancy matter? – Finished sequence today costs about $0.50/base BioSci 145B lecture 5 page 8 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) – Mapping by hybridization – Mapping by fingerprinting BioSci 145B lecture 5 page 9 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) • Actual large insert fingerprinting gel BioSci 145B lecture 5 page 10 ©copyright Bruce Blumberg 2004. All rights reserved Traditional (map first) vs STC (map as you go along) mapping Map before sequencing BioSci 145B lecture 5 Map as you go page 11 ©copyright Bruce Blumberg 2004. All rights reserved The human genome • In Feb 12 2001, Celera and Human Genome project published “draft” human genome sequencs – Celera -> 39114 (WGS) – Ensembl -> 29691 (map as you go) – Consensus from all sources ~30K • Number of genes – C. elegans – 19,000 – Arabidopsis 25,000 • Predictions had been from 50-140k human genes – What’s up with that? – Are we only slightly more complicated than a weed? – How can we possibly get a human with less than 2x the number of genes as C. elegans – Implications? • UNRAVELING THE DNA MYTH: The spurious foundation of genetic engineering, Barry Commoner, Harpers Magazine Feb, 2002 BioSci 145B lecture 5 page 12 ©copyright Bruce Blumberg 2004. All rights reserved The human genome • The answer – Somewhat sloppy science – Gene sets don’t overlap completely – Floor is 42K – 105,680 UniGene clusters from ESTs (down from 128,826 last year) = 42113 BioSci 145B lecture 5 page 13 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing(contd) • Whole genome shotgun sequencing (Celera) – premise is that rapid generation of draft sequence is valuable – why bother trying to clone and sequence difficult regions? • Basically just forget regions of repetitive DNA - not cost effective – R0t analysis suggests not many genes there anyway – using this approach, genome was alleged to be 90% finished in 2001 • More than 95% today • rule of thumb is that it takes at least as long to finish the last 5% as it took to get the first 95% – problems • sequence may never be complete as is C. elegans • much redundant sequence with many sparse regions and lots of gaps. • Fragment assembly for regions of highly repetitive DNA is dubious at best • Map as you go method inherently more complete – Sets up for finishing since an ordered set of overlapping BACs is produced • Both methods produce reasonable data given enough sequencing BioSci 145B lecture 5 page 14 ©copyright Bruce Blumberg 2004. All rights reserved The human genome • How finished is the human genome sequence? – Draft sequence to high coverage – Chromosome by chromosome finishing now • Chr 22 – 1999 • Chr 21 – 2000 • Chr 20 – 2001 • Chr 15 – 2003 • Chr 6,7,Y-2003 • Chr 13,19 -2004 BioSci 145B lecture 5 page 15 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing (contd) • Knowing what we know now – how to approach a large new genome? – Xenopus tropicalis 1.7 Gb (about ½ human) – BAC end sequencing – Whole genome shotgun – Gaps closed with BACS – 8 x coverage by end of 2004 – Finishing dependent on additional funding BioSci 145B lecture 5 page 16 ©copyright Bruce Blumberg 2004. All rights reserved Genome sequencing • DOE – Joint Genome Institute – http://www.jgi.doe.gov/ – Numerous advances in sequencing technology • Increased pass rate from ~70% to > 90% • Lowered cost nearly 3 fold BioSci 145B lecture 5 page 17 ©copyright Bruce Blumberg 2004. All rights reserved Other sequencing technologies • Sequencing by hybridization is most interesting – Construct a high-density microchip with all possible combinations of a short oligonucleotide • Up to 25-mers • By photolithography – Synthesized on chip directly – Label and hybridize fragment to be sequenced – Wash stringently – Read fluorescent spots – Reconstruct sequence by computer BioSci 145B lecture 5 page 18 ©copyright Bruce Blumberg 2004. All rights reserved Other sequencing technologies (contd) • Sequencing by hybridization rarely used for de novo sequencing – Extremely fast and useful to sequence something you already know the sequence of but want to identify mutation – Disease causing changes • e.g in mitochondrial DNA – SNP discovery – Works best for examining sequence of <10 kb BioSci 145B lecture 5 page 19 ©copyright Bruce Blumberg 2004. All rights reserved Other sequencing technologies (contd) • http://www.affymetrix.com/products/arrays/index.affx • SNP discovery – Photo shows mitochondrial chip – Right panel shows pairs of normal (top) vs disease (bottom) (Leber’s Hereditary Optic Neuropathy) • Top 3 disease mutations • Bottom control with no change BioSci 145B lecture 5 page 20 ©copyright Bruce Blumberg 2004. All rights reserved Useful software for molecular biology (contd) • NCBI – www.ncbi.nlm.nih.gov – main information and analysis resource – indispensable resource BioSci 145B lecture 5 page 21 ©copyright Bruce Blumberg 2004. All rights reserved Useful software for molecular biology (contd) • • NCBI – Blast – how to find similar genes www.ncbi.nlm.nih.gov/BLAST/ BioSci 145B lecture 5 page 22 ©copyright Bruce Blumberg 2004. All rights reserved Useful software for molecular biology (contd) • Why pay Celera? BioSci 145B lecture 5 page 23 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 1. (6 points) Your laboratory works on the strange organisms that live around hydrothermal vents in the deep ocean as a model system for the first multicellular organisms. Your PI has developed a new method of culturing such organisms, making it possible to grow the wormlike animals found around the vents in the laboratory. One of the first things that needs to be done is to construct the molecular tools that will be required to characterize your assigned animal, the Pompeii worm (Alvinella pompejana) which can survive an environment as hot as 80° C. The ultimate goal will be to establish an A. pompejana genome project including whole genome sequencing and mapping, an EST project and DNA microarrays. The first goal is to make a genomic library. What type of library will you make, i.e., which type of vector? Justify your choice. What type of equipment will be required to make your library? BioSci 145B lecture 5 page 24 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 1. answer You should choose to make a BAC or PAC library. BAC is best for genome sequencing because it accepts large inserts, is stable and the vector is small, facilitating shotgun sequencing Not so much equipment required other than standard molecular biology laboratory equipment, electroporator and PFGE – pulsed field gel electrophoresis. PFGE is indispensable for isolation of large DNA as needs to be used for making good genomic libraries. BioSci 145B lecture 5 page 25 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 2. (4 points) Describe a method to make a physical map of the A. pompejana genome in order to facilitate large-scale sequencing. Use large insert genomic library to construct a map. Map the clones by fingerprinting, map as you go, or hybridization. Restriction mapping of the whole genome was NOT an acceptable answer. BioSci 145B lecture 5 page 26 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 3. (5 points) You received an E. coli strain with the following genotype from a neighboring laboratory for the purposes of propagating your genomic library: mcrA, Δ(mrr-hsdRMS-mcrBC), ΔlacX74, deoR, recA1, araD139, Δ(araleu)7697, galU, galK, endA1, nupG (in every case above, the bacteria are DEFICIENT in the indicated gene product) a) Is this a good strain for the type of genomic library you have chosen to make, i.e., does it have the necessary genetic markers for your library to be stable and readily screened? b) If so, what are the desirable markers that the strain has. If not, which ones are missing? c) Would the strain be suitable if you had made a YAC library? Why? a) suitable for PAC and BAC b) is restriction deficient, and deoR. Some also pointed out that the strain should have lacZΔM15 for blue white selection if BACs were being used. c) strain is not suitable for YAC library because yeast artificial chromosomes can only be propagated in YEAST BioSci 145B lecture 5 page 27 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 4. (5 points) A colleague has experimentally determined that the A. pompejana genome is 110 Mb – right between C. elegans (97 Mb) and Drosophila melanogaster (120 Mb). Describe a sequencing strategy that could allow the rapid generation of a draft genome sequence. How might you combine the mapping proposed in your answer to question 2 to facilitate the completion of the genome sequence? Whole genome shotgun will generate a rapid draft sequence. Combining this with whole genome map made in 2 will enable closing gaps. BioSci 145B lecture 5 page 28 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 5. (6 points) As a side project, you decide to see if the A. pompejana genome contains homeobox genes. You dig into the laboratory archives and find a cDNA probe that contains the Drosophila melanogaster Antennapedia homeobox. What is the best way to find whether the A. pompejana genome contains homeobox genes? If so, how will you isolate genomic clones containing these homeobox genes? Let’s say you find 8 A. pompejana homeobox genes. Describe a quick way to tell whether they are located in one or more clusters as in Drosophila or C. elegans? Genomic southern with A. pompejana DNA probed with Antp homeobox to work out conditions Screen the genomic library you made using the Antp probe using these conditions Once you recover the 8 genes, start hybridizing them back to the large insert clones or to Southern of PFGE electrophoresis of 8-cutter digest of genomic DNA. Note whether more than 1 homeobox gene maps to each clone or fragment BioSci 145B lecture 5 page 29 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 7. (6 points) Remember that you also need to provide material for the EST project. This means that it is time to make cDNA libraries, right? Assume that the libraries you make will be used for more than just EST sequencing. What sort of vector will you choose? Should you go to the trouble of enriching the library for full-length cDNAs? If so, how? Should the libraries be standard, normalized, or subtracted? Justify your answer. If normalized or subtracted libraries are required, describe generally how you will make them. • Plasmid vector (NOT PAC or BAC) • Yes you should enrich for full-length cDNAs since the library will be used for multiple purposes • Cap trap, oligo-capping or cap-affinity chromatography gets full-length mRNA which should yield a library enriched for full-length cDNAs • The libraries should be normalized since EST sequencing is contemplated and we don’t want to sequence the same thing many times • Make normalized libraries by making driver from the library you wish to normalize, then hybridizing it back to ss-cDNA from that library to a low Cot value (5-10). After removing hybrids, use the remaining cDNA to make the normalize library BioSci 145B lecture 5 page 30 ©copyright Bruce Blumberg 2004. All rights reserved Practice midterm 8. (4 points) What are the major differences between normalized and subtracted cDNA libraries? If you want to use a cDNA library to isolate genes expressed specifically in the tail of A. pompejana compared with the head, would it be better to normalize or subtract the probe that you will use? Explain your reasoning. Normalized libraries are depleted in abundant genes and enhanced in rare genes by self-hybridization. Subtracted libraries are depleted in genes that are common between two sources A subtracted probe is appropriate here since you wish to identify genes specifically expressed in the tail. BioSci 145B lecture 5 page 31 ©copyright Bruce Blumberg 2004. All rights reserved