Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
HUMAN GENOME PROJECT 101 Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001 Human Genome Project Begun in 1990, the U.S. Human Genome Project is a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but effective resource and technological advances have accelerated the expected completion date to 2003. HGP goals are to: ■ identify all the approximately 35,000* genes in human DNA, ■ determine the sequences of the 3 billion chemical base pairs that make up human DNA, ■ store this information in databases, ■ improve tools for data analysis, ■ transfer related technologies to the private sector, and ■ address the ethical, legal, and social issues (ELSI) that may arise from the project. Human Genome Data • Derived from the Human Genome Project • sequence freeze date in anticipation of data release: 22 July 2000 • Release of First Draft Sequence of Human Genome : Nature 409 (6822), 15 February 2001 Science 291 (5507), 16 February 2001 • Release of “Complete” Draft Sequence of Human Genome: April 2003 GENE exons GENE Intragenic region introns tandem repeats interspersed repeats ACGTTGTGTCGCTGATTAGCTAGACCAAGATAGTTCG CTATAGGCTATAGCGATATAACCCAGGGGGGATATAT TAGGAGGAGAGATATAGGATAGATTACATGTGATATA TAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTT CTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTAGATG Fine Structure of Human Genomic DNA The Human Genome 3.2 billion nucleotides How many genes? < 40,000 >100,000 But think of all our traits, Jim-bo! Ours?! Are you of my species? Get lost, punk! Ouch! The Human Genome ACGTTGTGTCGCTGATTAGCTAGACCAAGATAG TTCGCTATAGGCTATAGCGATATAACCCAGGGG GGATACGCWHENISAGENEAGENETATTAGGAG GAGAGATATAGGATAGATTACATGTGATATATA GGAGAGAGAATATATAAGAGAGAGAGAGATTTT TTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC Experimental Discovery (Genetics) Comparative Genomics (Alignment) Gene Prediction Alignment CTCGCTGACTCAATCGGATTATGCTAGTCG TCATATGACTCAATCGGATTATGCTAGTCG TGACTCAATCGGATTATGCTAGTCG ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG ATTTTTTTGACTCAATCGGATTA CGGGGTGACTCAATCGGA GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG AAAAATATATTGACTCAATCGGATTATGCTAGTCG GTCGTAGCTTGACTCAATCGGATTATGCTAGTCG CTCGCTGACTCAATCGGATTATGCTAGTCG TCATATGACTCAATCGGATTATGCTAGTCG TGACTCAATCGGATTATGCTAGTCG ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG ATTTTTTTGACTCAATCGGATTA CGGGGTGACTCAATCGGA GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG AAAAATATATTGACTCAATCGGATTATGCTAGTCG GTCGTAGCTTGACGGAATCGGATTATGCTAGTCG CTCGCTGACTCAATCGGATTATGCTAGTCG TCATATGACTCAATCGGATTATGCTAGTCG TGACTCAATCGGATTATGCTAGTCG ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG ATTTTTTTGACTCAATCGGATTA CGGGGTGACTCAATCGGA GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG AAAAATATATTGACTCAATCGGATTATGCTAGTCG GTCGTAGCTTGACGGAATCGGATTATGCTAGTCG Gene Prediction TTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAG GAGAGAATATAAAGGATAGATTACATGTGATATATGGAGAGAGAATATAT AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTATGGATTGC GCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGG AGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATAT AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCG CTTCGCTATAGGCTATGCGATATAACCCAGGGGGGATACGCTATTAGGAG GAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAA GAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCT TCGCTATAGGCTATAGCGATATGACCCAGGGGGGATACGCTATTAGGAGG AGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAG AGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTT CGCTATAGGCTATAGCGATATAACCCAGGGGGGATATGATATTAGGAGGA GAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAAATAATATAA GAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC TTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAG GAGAGAATATAAAGGATAGATTACATGTGATATATGGAGAGAGAATATAT AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTATGGATTGC GCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGG AGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATAT AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCG CTTCGCTATAGGCTATGCGATATAACCCAGGGGGGATACGCTATTAGGAG GAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAA GAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCT TCGCTATAGGCTATAGCGATATGACCCAGGGGGGATACGCTATTAGGAGG AGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAG AGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTT CGCTATAGGCTATAGCGATATAACCCAGGGGGGATATGATATTAGGAGGA GAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAAATAATATAA GAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC GENE exons GENE Intragenic region introns tandem repeats interspersed repeats Gene Prediction Algorithms based on consensus nucleotide sequences of •tata boxes and start codons •stop codons •splice junctions •CpG islands Comparative Gross Results from Model Genome Projects Humans have about 35,000 genes! You were right. So what’s new! Human Genes Surprising Findings = !! • • • • !! Only 35,000 genes most genes in euchromatin GC/AT patchiness !! Gene density higher & intron size smaller in GC-rich patches • !! 1.4% translated, 28% transcribed • !! Origins of genes Some Origins of Human Genes • Most from distant evolutionary past (basic metabolism, transcription, translation,replication fixed since appearance of bacteria and yeast) • Only 94/1278 families vertebrate-specific • 740 are nonprotein-encoding RNA genes • many derive from partial genomes of viruses and virus-like elements—genomic fossils • some acquired directly from bacteria (rather than by evolution from bacteria) Genomic Fossils Genomic Fossils (also known as Molecular Fossils) • interspersed repeats • generated by integration of transposable elements or retrotransposable RNAs • active contemporary modifier of some vertebrate genomes (mouse) • formerly active modifier of human genome • some as prevalent as 1.5 million copies Alu Elements Type of Short Interspersed Nuclear Element (SINE) direct repeats 5’ 31 bp A Alu • • • • • • • B A/T-rich region and 3’-UTR 3’ AAAn RNA polymerase III Promoter A-rich region 50-300 bp transcribed by RNA polymerase III 3’ oligo dA-rich tail found only in primates 1,500,000 copies derived from 7SL RNA gene dimer-like structure most retroposition occurred 40 mya Reverse Transcription Essential for Retroposition and Proliferation of Retroelements • Converts primed RNAs into cDNAs • catalyzed by RNA-dependent DNA pol » (reverse transcriptase) • pol encoded by retroviruses and active LINEs Retroviral genomic RNA Alu RNA LINE RNA Alu Elements as Genomic Fossils Alu Subfamily Structure (millions of years) Oldest [J] Intermediate [S] 450,000 copies Youngest [Y] 50,000 copies Jo Jb (65) Sg Y (25) Yb8 S (50) Sc Ya5 Sx Ya8 Sq Sp Alu Subfamily Structure PS [J]: Primate-Specific. Abundant in all primates. 65-70 mya: Early Prosimian (strepsirhini) Alu Subfamily Structure AS [S]: Anthropoid-Specific (haplorhini) 50-60 mya One mutation difference than PS. Alu Subfamily Structure CS[S]: Catarrhine-specific. Nine mutations arising 30-40 mya: Platyrrhines (FN) (Marmoset) Catarrhine (DFN) (Macaque) Alu Subfamily Structure HS [Y]: Human-specific. Five or more additional 20-25 mya: Almost exclusively Hominids Master Gene Model of Retroposition P. Deininger, M. Batzer, Trends in Genetics 8:307, 1992 1. Amplification 3’ 5’ 2. Master mutation 5’ 3’ TIME (m.y.) Alus as Genomic Fossils Alu Subfamily Structure (millions of years) Oldest [J] Intermediate [S] 450,000 copies Youngest [Y] 50,000 copies Jo Jb (65) Sg Y (25) Yb8 S (50) Sc Ya5 Sx Ya8 Sq Sp ALU INSERTIONS AND DISEASE LOCUS BRCA2 Mlvi-2 DISTRIBUTION de novo de novo (somatic?) SUBFAMILY Y Ya5 de novo Familial Ya5 Yb8 about 50% Ya5 Familial Y Familial one Japanese family Ya5 Yb8 familial Ya4 C1 inhibitor ACE de novo about 50% Y Ya5 Factor IX 2 x FGFR2 GK a grandparent De novo ? Ya5 Ya5 NF1 APC PROGINS Btk IL2RG Cholinesterase CaR Sx DISEASE Breast cancer Associated with leukemia Neurofibromatosis Hereditary desmoid disease Linked with ovarian carcinoma X-linked agammaglobulinaemia XSCID Cholinesterase deficiency Hypocalciuric hypercalcemia and neonatal severe hyperparathyroidism Complement deficiency Linked with protection from heart disease Hemophilia Apert’s Syndrome Glycerol kinase deficiency REFERENCE Miki et al, 1996 Economou-Pachnis and Tsichlis, 1985 Wallace et al, 1991 Halling et al, 1997 Rowe et al, 1995 Lester et al, 1997 Lester et al, 1997 Muratani et al, 1991 Janicic et al, 1995 Stoppa Lyonnet et al, 1990 Cambien et al, 1992 Vidaud et al, 1993 Oldridge et al, 1997 McCabe et al, (personal comm.) What’s New About Old Fossils? In the Human Genome • Comprise nearly 50% of genome • 50% more Alu elements than were predicted by molecular biology • scarce in highly-regulated regions (detrimental?) • enriched in GC regions (beneficial?) • little activity, but little scouring • occur frequently within exons • contribute to formation of genes encoding novel proteins The Human Genome FEATURES 3.2 billion bases 28% transcribed <1.4% encodes protein 50% repeats not many modern protein families Only ~35,000 genes! Humans have about 35,000 genes! Well, then… How can you explain human complexity?