* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DNA sequencing
Gel electrophoresis of nucleic acids wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Copy-number variation wikipedia , lookup
DNA vaccination wikipedia , lookup
Primary transcript wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
SNP genotyping wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Oncogenomics wikipedia , lookup
Neocentromere wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
DNA sequencing wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Point mutation wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Molecular cloning wikipedia , lookup
Human genetic variation wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Epigenomics wikipedia , lookup
Transposable element wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Metagenomics wikipedia , lookup
Microsatellite wikipedia , lookup
Whole genome sequencing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Human genome wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Human Genome Project wikipedia , lookup
Genomic library wikipedia , lookup
Lecture of Principles of gene engineering 2008. 4/28 An overview of genomics: From the basis of molecular cloning to the genome sequencing projects (Human Genome Project). Dr. Jin-Mei Lai [email protected] 1 So far in this course you have learned how to clone and identify the gene of your interest. cDNA PCR A A A A DNA polymerase (Taq) A A + A A ligation Amp+ plate (+ X-gal & IPTG) ampR selection Inserted DNA disrupts lac Z’ gene Blue colony: non-recombinant 2 White colony: recombinant How can I get the sequence of specific gene? Design specific primers Nick translation (search information from gene database) PCR (using cDNA or cDNA library as template) cloning into expression vector * Make a cDNA library 3 Generate cDNA from tissues or cell lines. isolation of total RNA purification the mRNA by oligo-dT column reverse transcription PCR amplification RT-PCR 4 Why we should use cDNA? 5 What is a gene ? - An open reading frame + Its transcriptional control elements (promoter and terminator) 6 Fig. 1.22 The differences of gene expression in prokaryotes and eukaryotes. 7 How can we study the unknown genes? Fully understanding the genome we studied will improve to identify and investigate the unknown genes. * Genome sequencing projects! 1. Genomic mapping. 2. Genetic mapping. 3. Physical mapping. 4. Nucleotide or Genome sequencing. 8 8 Genomic mapping The chromosome content of an organism (its karyotype) can be visualized using a microscope. shorter arm ~ Different chromosomes are usually different sizes (ranging in the human from 279x106 bp for chromosome 1 to 45x106 bp for chromosome 21). ~ distinct chromosome banding patterns. (Giemsa stain) longer arm Cytological map (low resolution) 9 Some chromosome abnormalities that cause inherited genetic diseases can be observed by karyotype analysis. e.g. Down’s Sydrome (trisomy 21) Klinefelter’s syndrome (47XXY) * Cystic fibrosis chromosome 7q31; CFRT 10 * Fluorescence in situ hybridization (FISH) ~ a kind of in situ hybridization in situ: in place DNA probes: radioactively labeled fluorescence labeled (now) Low resolution: less than 3 Mbp Yellow: satellite DNA in centromere 11 Genetic mapping ~ is a representation of the distance between two DNA elements based upon the frequency at which recombination occurs between the two. * The first genetic map of a chromosome: ~ from Drosophila mating crosses data The information gained from the experimental crosses could be used to plot out the location of genes. Tightly linked genes are physically located close to each other, while those that were only weakly linked are physically further apart. 12 A centimorgan (cM) is defined as the distance between two loci on a genetic map. A cM is a measure of genetic distance and not physical distance. Look closely at the diagram. If two loci are far apart, it is possible to miss that a double cross-over occurred. 13 * Major drawbacks for genetic mapping ~ The requirement for a phenotype for the gene that is being mapped and the number of crosses required to generate accurate mapping data. ~ A tacit assumption of mapping based on crosses is that the recombination frequency is equal for all part of the chromosome. Except recombinational hot-spots and cold-spots In human, relatively low number of genes have been identified, hence difficult to estimate map distances. 14 * An alternative to genetic mapping using phenotypes is to follow the inheritance of DNA sequence variations between individuals. Though more than 99% of human DNA sequences are the same across the population. ~ still a huge numbers of variations in DNA sequence between individuals. Several methods used to exploit the inheritance of the variations to map genomic location. Ex. 1. Single-nucleotide polymorphisms. 2. Variable number tandem repeats (VNTRs). 3. Microsatellites. 15 1. Single-nucleotide polymorphisms (SNPs). ~ the most common types of sequence variation between individuals. ~ occur as frequently as about once every 100300 bp What kinds of genome variations are there? Genome variations include mutations and polymorphisms. Technically, a polymorphism (a term that comes from the Greek words "poly," or "many," and "morphe," or "form") is a DNA variation in which each possible sequence is present in at least 1% of people. For example, a place in the genome where 93 percent of people have a T and the remaining 7 percent have an A is a polymorphism. If one of the possible sequences is present in less than 1 percent of people (99.9 percent of people have a G and 0.1 percent have a C), then the variation is called a mutation. Informally, the term mutation is often used to refer to a harmful genome variation that is associated with a specific human disease, while the word polymorphism implies a variation that is neither harmful nor beneficial. However, scientists are now learning that many polymorphisms actually do affect a person's characteristics, though in more complex and sometimes unexpected ways. 17 About 90 percent of human genome variation comes in the form of single nucleotide polymorphisms, or SNPs (pronounced "snips"). As their name implies, these are variations that involve just one nucleotide, or base. ~ frequency: once every 100-300 bp ~ may be “disease causing mutations” occur in non-coding regions of DNA some alter the restriction enzyme recognition sites. Restriction fragment length polymorphisms (RFLPs) (detected by Southern blotting using a radioactive DNA probe) 18 * Restriction fragment length polymorphisms (RFLPs) Southern blotting 19 20 RFLP (Restriction Fragment Length Polymorphisms) 21 Highly repeated DNA sequences. --- short, arranged in tandem. 1. Satellite DNAs ~ consist of short sequences that form very large clusters. ex. satellite DNA in centromere 2. Minisatellite DNAs ~ range from 12 to 100 base pairs in length and are found in clusters containing as many as 3000 repeats. * unstable, the copy number often changes from one generation to the next. (polymorphic apply to DNA fingerprinting) 3. Microsatellite DNAs ~ shortest and are present in small clusters of about 10~40 bps in length 22 2. VNTR stands for "variable number of tandem repeats" A tandem repeat is a short sequence of DNA that is repeated in a head-to-tail fashion at a specific chromosomal locus. Tandem repeats are interspersed throughout the human genome. Some sequences are found at only one site -- a single locus -- in the human genome. For many tandem repeats, the number of repeated units vary between individuals. Such loci are termed VNTRs. 23 Think … One VNTR in humans is a 17 bp sequence of DNA repeated between 70 and 450 times in the genome. The total number of base pairs at this locus could vary from 1190 to 7650. VNTRs are detected as RFLPs by Southern Hybridization. 24 Minisatellite sequences are used to identify individuals in criminal or paternity cases through the technique of DNA fingerprinting. criminal case: V: victim D: defendant 25 3. Microsatellites. ~ are short, 2-6 bp, tandemly repeated sequences that occur in a random fraction distributed throughout the genome. ~ generated by polymerase “slippage” during replication. The most common type is 5’-AC-3’ 26 Physical mapping ~ the physical map of a genome is a map of genetic markers made by analyzing a genomic DNA sequence directly, rather than analysing recombination events. 1. Restriction maps 2. Radiation hybrid maps 3. STS maps Ex 1 . NotI recognition sequence (5’-GCGGCCGC-3’) NotI would be expected to occur, by chance, every 48=65536 bp however, it cleaves human DNA on average once every 10 Mbp Why? The DNA sequence within the genome is not random! restriction mapping does provide highly reliable fragment ordering and distance estimation 27 Radiation hybrids Whole-genome radiation hybrids RH maps are constructed by typing a panel of hybrids with a set of human DNA markers Only a PROPORTION of the pieces of the broken human chromosomes will integrate into rodent chromosomes Ex 3. STS maps. ~ STSs (sequence tagged sites) are short DNA sequences (100-200 base pairs) that were generated by PCR using primers based on already known DNA sequences. ~ have been sequenced and assigned to a chromosomal location, define a unique site on the genome. * Aligning clones by STS mapping. STS: To order inserts from individual human chromosomes in a YAC library. 29 * Resolution ranges encountered in genome mapping. 30 The different types of cytological, genetic and physical map of a chromosome. cM: centiMorgan Mbp: Megabase pairs 31 The sequencing projects are then used to determine the individual base sequence of each clone. Manual DNA sequencing DNA sequencing methodologies: ca. 1977! Maxam-Gilbert base modification by general and specific chemicals. Sanger DNA replication. substitution of substrate with chain-terminator chemical. depurination or depyrimidination. single-strand excision. more efficient not amenable to automation automation?? 32 DNA sequencing: Maxam & Gilbert sequencing ~ The method is reliable for sequencing up to ~100 nucleotides at a time. The technique requires that the target DNA is end-labeled (usually radioactively). 33 Either 4 or 5 separate chemical reactions are performed. The reactions are carried out in two stages: Stage 1: Specific chemical modification of bases in the DNA. Stage 2: Chemical cleavage of sugar-phosphate backbone at modification site. 34 5’3’ direction DNA sequencing: Sanger’s method (“bio” based methods) ~ dideoxynucleotide ~ based upon the faithful replication of DNA using a DNA 35 Sanger method - Can lead to clean and unambiguous assignment of about 300 bases per reaction. * 7M urea gel * High power level 70oC Reduce secondary structure of DNA 36 fragments. Automated DNA sequencing ~ a set of dideoxynucleotides has been developed that are labelled with fluorescent dyes precisely. BigDyeTM terminator 37 Sophisticated base calling software is available to convert the fluorescent patterns obtained into a sequence of DNA bases. speed, more reliable in sequence interpretation. ~ as many as 1000 bases can be read automatically from a single reaction, although the sequence obtained from within 500 bp of the primer is generally more reliable. 38 ABI 377 envelope: 96 lanes Capillary electrophoresis 39 How to sequence the genome? How to reconstruct the original genome sequence based on the small fragments that are cloned into individual vectors? Strategies Clone contigs Whole genome shotgun Hierarchical shotgun 40 Clone contigs ~ the simplest way to generate overlapping DNA sequence is to isolate and sequence one clone, from a library, then identify (by hybridization) a second clone, whose insert overlaps with the first. The second clone is then sequenced and the information used to identify a third clone, whose inset overlaps with the second clone, and so on. 41 * Contig: (the basis of chromosome walking) ~ contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome. 42 Chromosome walking ~ This method is used to move systematically along a chromosome from a known location and to clone overlapping genomic clones that represent progressively longer parts of a particular chromosome. 43 Whole genome shotgun (WGS) - was first used to sequence the genome of the bacterium Haemophilus influenzae. ~ the fragments of the genome, which have been randomly generated, are cloned into a vector and each insert is sequenced. the sequence is then examined for overlaps and the genome is reconstructed by assembling the overlapping sequences together. 44 Identifying additional clones that contained sequences close to the gap-point. * Advantage: ~ no prior knowledge of the sequence of the genome is required. * Disadvantage: ~ may limited by the ability to identify overlapping sequences. ~Time-consuming (every sequence must be compared with every other sequence in order to identify the overlaps) ~ Repetitive DNA sequences in the genome may lead to the incorrect assignment of contigs. 45 Hierarchical shotgun -- preferred by the Human Genome Project In this approach, genomic DNA is cut into pieces of about 150 Mb and inserted into BAC vectors, transformed into E. coli where they are replicated and stored. Each BAC fragment is fragmented randomly into smaller pieces and each piece is cloned into a plasmid and sequenced on both strands. These sequences are aligned so that identical sequences are overlapping. 46 Two general strategies for sequencing a complete genome. 47 What was the Human Genome Project ? The Human Genome Project (HGP) was the international, collaborative research program whose goal was the complete mapping and understanding of all the genes of human beings. All our genes together are known as our "genome." Goals of HGP: 1. Determine the DNA sequence of the entire human genome 2. Store this information in databases 3. Identify all of the genes in human DNA 4. Improve tools for data analysis 48 Brief review of genomics- regarding to HGP Human Genome Project (HGP) ~ started at late 1980 by 20 centers of six nations (coordinated by NIH/USA), led first by Watson and after 1992 by Collins. ~ The completed sequence of the human genome (3x109 bp) was published in April 2003 (efforts spanning 13 yrs). ~ Joining by Celera Co. (funded in 1997 by Venter) accelerated the process (two years ahead of schedule). James D. Watson 49 49 The Beginning of the Project Most the first 10 years of the project were spent improving the technology to sequence and analyze DNA. Scientists all around the world worked to make detailed maps of our chromosomes and sequence model organisms, like worm, fruit fly, and mouse. 50 The Human Genome Project Began in 1990 The Mission of the HGP: The quest to understand the human genome and the role it plays in both health and disease. “The true payoff from the HGP will be the ability to better diagnose, treat, and prevent disease.” --- Francis Collins, Director of the HGP and the National Human Genome Research Institute (NHGRI) 51 Brief review of genomics- regarding to HGP The genome is our Genetic Blueprint Nearly every human cell contains 23 pairs of chromosomes 1 - 22 and XY or XX XY = Male XX = Female Length of chr 1-22, X, Y together is ~3.2 billion bases (about 2 meters diploid) 52 Brief review of genomics- regarding to HGP 5000 bases per page CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATG AGAAAAATACAGTGATTCC AAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAATC AGAGCATCTGC TACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAA CTGATGTACCTCTGAAAAGAGA AGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCC CTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAAT GAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATC ATTTTCAGGTT CTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGT CATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAA ATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGA TTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTACC AGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCT AAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCT CAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAAG AACACAGTGAA ACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATAT TTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAA AACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAAT TAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAAA TTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTG TCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTAC ATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGCT ATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGCC ATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTGG TACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATCA TTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCTC ATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGAA CATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAAG AACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCATG CTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAACT GGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGTT CATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATTC TGTAGGTTGTCTGTTTATTTTGTTAATAGTTTCTTTTGCTATGCAGAAGCTCTTAATAAGTTTAATGAGATCCTGATATGTTAGGCTTTGTGTCCCCACCCAAATCTCATCTTGAATTATA TCTCCATAATCACCACATGGAGAGACCAGGTGGAGGTAATTGAATCTGGGGGTGGTTTCACCCATGCTGTTCTTGTGATAGTGAATGAGTTCTCACGAGATCTAATGGTTTTATGAGG GGCTCTTCCCAGCTTTGCCTGGTACTTCTCCTTCCTGCCGCTTTGTGAAAAAGGTGCATTGCGTCCCTTTCACCTTCTTCTATAATTGTAAGTTTCCTGAGGCCTTCCCAGCCATGCTGAA CTTCAAGTCAATTAAACCTTTTTCTTTATAAATTACTCAGTCTCTGGTGGTTCTTTATAGCAGTGTGAAAATGGACTAATGAAGTTCCCATTTATGAATTTTTGCTTTTGTTGCAATTGCTT TTGACATCTTAGTCATGAAATCCTTGCCTGTTCTAAGTACAGGACGGTATTGCCTAGGTTGTCTTCCAGGGTTTTTCTAATTTTGTGTTTTGCATTTAAGTGTTTAATCCATCTTGAGTTGA TTTTTGTATATTGTGTAAGGAAGGGGTCCAGTTTCAATCTTTTGCATATGGCTAGTTAGTTATCCCAGTACCATTTATTGAAAAGACAGTCTTTTCCCCATCGCTCGTTTTTGTCAGTTTT ATTGATGATCAGATAATCATAGCTGTGTGGCTTTATTTCTGGGTTCTTTATTCTGTTCTATTGGTTTATGTCCCTGTTTTTGTGCCAGTACCATGCTGTTTTGGTTAACATAGCCCTGTAGT ATAGTTTGAGGTCAGATAGCCTGATGCTTCCAGCTTTGTTCTTTTTCTTAAGATTGCCTTGGCTATTTGGCCTCTTTTTTGGTTCCACATGAATTTTAAAACAGTTGTTTCTAGTTTTTGAA GAATGTCATTGGTAGTTTGATAGAAATAGCATTTAATCTGTAAATTGATTTGTGCAGTATGGCCTTTTAATGATATTGATTCTTCCTATCCATGAGCATGATATGTTTTCCATTTTGTTTG TATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCAT TGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT TGCCTTCCTGATTTGACTGC CAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTG ATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCT TTGGGGCTGAGACTATGGGGTTTTCTAGATATAGAATCATGTCAGCTTCAAATAGGGATAATTTTACTTCCTCTCTTCCTATTTGGATGCCCTTTATTTCTTTCTCTTGCCTGATTACTCTG GCTGGGATTTCCTATGTTGAATAGGAGT CATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT 53 The Completion of the Human Genome Sequence • June 2000 White House announcement that the majority of the human genome (80%) had been sequenced (working draft). • Working draft made available on the web July 2000 at genome.ucsc.edu. • Publication of 90 percent of the sequence in the February 2001 issue of the journal Nature. • Completion of 99.99% of the genome as finished sequence on July 2003. 54 Fully sequenced genomes are, in fact, not usually complete. Higher-eukaryotic genomes have large regions of DNA that currently can’t be cloned or assembled. Ex. Telomeres, centromeres and “heterochromatic gapsDNA”, which has few genes and many repeated regions 55 The Project is not Done… Imagine the genome as a book written without capitalization or punctuation, without breaks between words, sentences, or paragraphs, and with strings of nonsense letters scattered between and even within sentences. A passage from such a book in English might look like this: Even in a familiar language it is difficult to pick out the meaning of the passage: The quick brown fox jumped over the lazy dog. The dog lay quietly dreaming of dinner. And the genome is "written" in a far less familiar language, multiplying the difficulties involved in reading it. 56 The Project is not Done… Next there is the Annotation: The sequence is like a topographical map, the annotation would include cities, towns, schools, libraries and coffee shops! So, where are the genes? How do genes work? And, how do scientists use this information for scientific understanding and to benefit us? 57 Next class, We will learn how to find the genes? 58 Genome projects use two general approaches: a. The mapping approach divides the genome into segments with genetic and physical mapping, refines the map of each segment, and finally sequences the DNA. (Genetic and physical maps are made first to provide markers for sequencing.) b. A “shotgun” approach breaks the genome into random, overlapping fragments, and sequences each fragment. Based on overlaps, the sequences are assembled by computer. An advantage is that physical mapping is not required. 59 Yeast artificial chromosome (YAC) vectors allow the cloning , within yeast cells, of fragments of foreign genomic DNA that can approach 500 kbp in size. A yeast centromere (CEN4) Yeast autonomously replicating sequence (ARS1) Yest telomeres (TEL) Genes for YAC selection in yeast. Bacterial replication origin and a bacterial selectable marker. 60