Download Sequencing the Human Genome

The Structure of Proteins and DNA Pauling 1951 Crick&Watson 1953 The History of Genome Mapping 1955: Fred Sanger produces first amino-acid sequencing of a protein (insulin) 1956: Tjio, Levan determine the number of human chromosomes. 1961: Brenner, Jacobs, Meselson discover role of mRNA in making proteins. 1963: Wu and Kaiser map first DNA sequence (12 basepairs). 1966: Nirenberg, Khorana, Ochoa map codon-amino acid connections. 1968: Meselson, Smith, Wilcox, Kelley discover the use of restrictions enzymes. 1975: Gilbert, Maxam develop the “clone-contig” method of producing DNA sequences. 1977: Sanger improves on process by the use of dideoxynucleotides and DNA polymerase. 1978: Sanger maps entire genome of ΦX174 virus (5386 base-pairs). 1980: Sanger maps first human gene (16,569 base-pairs). How Do We Sequence a Protein? M (methionine) (cysteine) C C (cysteine) (glycine) G A (alanine) (proline) P G (glycine) (proline) P T (threonine) Try Breaking It Up 1 A M (methionine) (cysteine) 2 C ’s 2 G ’s C 1 M C (cycsteine) 2 P ’s (glycine) G 1 T A (alanine) (proline) P G (glycine) (proline) P T (threonine) The Protein G A P T G P T C A P P M M G C C C G Protein Broken Apart Another Breakdown Sanger’s Technique for Protein Mapping 1. Attach a molecule to the end of the protein to make it glow yellow. 2. Use a mild detergent to break the protein apart into random pieces. 3. Put the mixture into a separator material (gel) and let the pieces sink. 4. Smaller molecules sink faster, so the molecules separate by length. 5. Find the glowing molecules at each level, and analyze their amino acid content. 6. Put the protein back together amino acid by amino acid. Problem: There was a limit to the number of amino acids in a row that could be separated. 7. Solution: Use digestive enzymes to break the protein chain at specific places, then analyze each piece as above. The Sanger Method M M (methionine) (cysteine) C C C C (cycsteine) G (glycine) G A (alanine) A P (proline) P G G (glycine) P (proline) P T T (threonine) The Protein With Molecule Attached G A P T G P T C A P P M M G C C C G Protein Broken Apart Another Breakdown Mapping DNA Some key players in DNA mapping: restriction enzymes: Cut DNA at specific points, depending upon the sequence at that point. DNA polymerase: Replicates complementary DNA (cDNA) strands from a single strand of DNA. primer: Short sequence of single-strand DNA that can start the DNA polymerase off at some point in the main DNA strand dideoxynucleotides: Artificial A,C,G,T molecules that serve two functions — first, to tag the growing DNA with a colored dye (a different one for each letter), and second, to cause the DNA polymerase to stop building the DNA at that point. The Basic Method of Sequencing a Small (700-900 bp) Strand of DNA The Sangar Sequencing Method : 1. Put many copies of the DNA strand, together with lots of primers, DNA polymerase, regular nucleotides and a smaller number of special dideoxynucleotides, into a warm broth. Shake well. 2. The DNA polymerase starts a copying reaction on the strands of DNA, constructing a strand of cDNA by grabbing either a nucleotide or a dideoxynucleotide from the broth. The sequencing process starts. 3. As long as normal nucleotides are attached the cDNA continues to grow. When one of the dideoxynucleotides are attached, however, the process stops, leaving a strand of cDNA beginning with a specific starting sequence and ending with a single dideoxynucleotide. The Sequence Reading Process: 4. After letting this process go a specific amount of time, place the mixture into a gel. 5. Draw the mixture through the gel, letting each piece sink to the right level based on its size. 6. Read the dye colors at the various levels, to get the base pair at the end of a section of DNA of each length. 7. Put the whole thing together to get your genome sequence. Getting Manageable Pieces of DNA to Sequence 1. Break each chromosome apart at known sequence locations, called bacterial artificial chromosomes (BACs) of about 150,000 bps each. 2. “Shock” these into the DNA of e-coli bacteria, and let them replicate the BACs to any degree. 3. Take each BAC and cut it into manageable pieces, using restriction enzymes. 4. Clone (artificially replicate) these pieces, so as to have enough to work with. This is known as PCR, or polymerase chain reaction. 5. Put the pieces into a bath that unwinds and separates them into single strand. 6. Perform the Sanger sequencing process to obtain the sequences of each piece of DNA 7. “Put these pieces back together” to form the entire DNA sequence. Shotgun Sequencing A major computational tool in all large genome sequencing projects is the shotgun technique of sequencing. Instead of always sequencing a genome from known locations (a difficult and time-consuming job), you sequence from many different locations, and try to put the sequence back together. Steps of the Shotgun Technique 1. Break the DNA you are trying to sequence into arbitrary smaller fragments of 700– 900 base pairs, so that there is a large amount of overlapping among the fragments. 2. Sequence each fragment by using the consecutive tagging technique given above. 3. Take pairs of fragments, and match up the overlapping right- and left-hand ends letter by letter to grow longer and longer multifragment subsequences that are consistent with all of the contained fragments. 4. If the overlapping of the fragments is sufficiently large, then there will be a unique sequence of the correct size that is “strongly” consistent with the set of smaller fragments. Coverage The key to obtaining a unique DNA sequence from a set of DNA fragments is to insure a sufficent amount of coverage of the fragments to the DNA you are trying to sequence. k-fold coverage: Insures that at least k of your fragments cover each base pair of the DNA sequence. Mapping the human genome requires a coverage of between 5- and 10-fold to insure reasonable accuracy. Example Suppose you had the following set of 8 fragments: ATCG CCA CCAT CCCC CGC CGCC GCC TCG And you wish to find a sequence with 2-fold coverage. The unique (10-base) sequence that has 2-fold coverage is CGCCCCATCG ---------CCCC TCG CCA CGCC ATCG GCC CGC CCAT 2343333322 Note that with two additional GC fragments, we could obtain 3-fold coverage of the same sequence: CGCCCCATCG ---------CCCC TCG CCA CGCC ATCG GCC CGC CCAT CG CG 3443333333 On the other hand, if the original set of fragments consisted of two CC fragments instead of one CCCC fragment, we could also obtain a sequence having 3-fold coverage. How? The History of the Human Genome Project 1984: Department of Energy needs information on genetic defects of chemical agents. The International Commission for Protection Against Environmental Mutagens and Carcinogens suggests that a map of the human genome would be important in this endeavor. 1986: Renato Dulbecco, in an editorial in Science, suggests a national effort at reconstructing the human genome. DOE sets up Santa Fe Workshop to pursue the issue. National Academy of Scences sets up a blue-ribbon panel to discuss the project. National Institutes of Health belatedly starts discussions. 1988: NAS report appears, stressing multidisciplinary participation of labs across the country. The House Energy and Commerce Committee decides that the government should fund such an effort. 1990: Joint public effort launched, at an estimated cost of $3 billion, by the International Human Genome Mapping Consortium, jointly administered by NIH and DOE and involving 20 labs and hundreds of scientists. 1998: Celera Genomics, under the direction of Craig Venter, becomes the first private company to enter the race. It worked almost independently of the HGP. February, 2001: IHGMC and Celera announce jointly in Nature and Science, respectively, the draft map of the human genome. This consisted of 94% of the genome, 26,000 reported genes with 30,000-40,000 total genes suspected. A Comparison of Techniques Organization HGP: Public, 20 laboratories and many hundreds of people. Celera: Private, 1 laboratory and about 65 people (and 40 high-speed computers). Technique: HPG: Clone contig — Separate genome into clone libraries with known locations, and shotgun sequence each library element. Better control of gene locations, but significant startup time to obtain the associated chromosomal maps. Celera: Whole-genome shotgun — Sequence entire chromosomes by shotgun method. More computer intensive, but also needs more coverage. Source of the genome HPG: 5 donors chosen from hundreds of candidates. Celera: 21 donors. Both groups were anonymous and chosen from varied ethnic groups. Time frame: HPG: 1990–2000, but actual mapping done between 1999 and 2000. Celera: 1998–2000. Publication: HPG: Nature. In addition, newly sequenced sections were made public on the web within 24 hours of sequencing. Celera: Science. Celera’s intention is to sell or patent further information about the human genome. Computer time: Celera reported 30,000CPU hours for assembly of fragments into a single genome. Number of genes: 25,000–35,000 for both studies, accounting for only about 3% of the entire genome sequence. Coverage: 90-94% of the genes mapped in both studies (and 25% of the entire genome). Comparison of results: Hard to judge, since presentation of the two studies is different. Preliminary studies indicate at least a 99% match between the two sequences. Current Accomplishments “Complete” sequencing of the HG: 99% of the euchromatic (gene-containing) portion of the HG has been sequenced with 99.99% accuracy, and with no gaps in this region greater than 150,000 bps. Current estimate of number of genes: 20,00025,000. All chromosomes have been completely sequenced: The last chromosome (#1) was sequenced in May 2006. Other genomes sequenced: 180 different species have been sequenced, including lots of bacteria, E.coli, brewer’s yeast, roundworm, fruit fly, mosquito, mouse, rat, dog (at NC State), chimpanzee, orangutan, elephant, cat, chicken, and many others. Cost (human genome): 2003: $3 billion 2012: $1700

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Sequencing the Human Genome