Download Human Genome Project

HGP, Fragment Assembly, and Physical Mapping 蔡懷寬 E-mail: [email protected] 人類基因組解讀計畫 (Human Genome Project, HGP) Human Genome 基因組(genome)   All the genetic material in the chromosomes of a particular organism Its size is generally given as its total number of base pairs. 基因組的大小       Human: 3000 million bases Mouse: 3000 million bases Drosophila (fruit fly): 165 million bases Nematode (roundworm): 100 million bases Yeast (fungus): 14 million bases E. coli (bacteria) 4.67 million bases 人類基因組解讀計畫   簡稱為HGP (Human Genome Project) 主要目標有：       identify all the genes in human DNA, determine the sequences of the 3 billion chemical bases that make up human DNA store this information in databases develop tools for data analysis transfer related technologies to the private sector address the ethical, legal, and social issues (ELSI) that may arise from the project Human Genome Project (HGP) Who did the work? The work begun formally in 1990, carried out in 16 centers across the world. The project originally was planned to last 15 years, but rapid technological advances have accelerated the expected completion date to 2003. The international Human Genome Mapping Consortium includes researchers in France, Germany, Japan, China, Great Britain, Canada and the US. Celera Genomics (www.celera.com) Human Genome Project (HGP) Whose genome was sequenced? Celera sequenced the genomes of five anonymous individuals one African-American, one Asian-Chinese, one Hispanic-Mexican and two Caucasians. One individual's genome was sequenced 3.5 times; about half the genome of each of the remaining individuals was sequenced. The HGP study is based on data collected from many individuals from around the world over a longer period of time, it is more difficult to estimate the exact size of the HGP pool (though it is significantly more than Celera's five). HGP的沿革與進展(續)  2001年2月:   Initial sequencing and analysis of the human genome (Nature, Vol. 409, 15 Feb. 2001, by International Human Genome Sequencing Consortium) The sequence of the human genome (Science, Vol. 291, 16 Feb. 2001, by J. C. Venter, et al.) Publication of the Draft Human Genome Sequence February 12, 2001 Science, 16 February 2001 Vol. 291 No.5507 Pages 1145-1434 Nature, 15 February 2001 Vol. 409 Pages 813-960 Strategies For Sequencing the Human Genome The HGP's approach has primarily relied on a "map-based approach" sequencing an overlapping series of large chunks of human DNA cloned into bacteria. These sequences, represented by overlapping bacterial clones, are then compiled using computer software using knowledge of each clone's position on the map. In this way, HGP has read each letter of DNA an average of five times Celera's genome sequencing approach relied on "shotgun sequencing" - a method in which small bits of the genome are sequenced and assembled by computers into intermediary "scaffolds" and, ultimately, whole chromosomes and the genome How to Sequence a Genome by Mapped clones (clone & clone & clone) Sequencing Strategies (1) • Map-Based Assembly: • Create a detailed complete fragment map • Time-consuming and expensive • Provides scaffold for assembly • Original strategy of Human Genome Project Interactive Presentation-How to Sequence a Genome by Mapped Clones Introduction 1. Mapping…1min 16s 2. Building Libraries …28s 3. Subclones…35s 4. E. coli to store and copy DNA… 54s 5. Preparing DNA for sequencing…16s 6. Sequencing Reaction…1min 44s 7. Products of Sequencing Reaction… 40s 8. Separating the Sequencing Reaction… 30s 9. Reading the Sequencing Reaction… 28s 10. Assembling the Results… 55s 11. Working Draft Sequence 12. Conclusion How to Sequence a Genome by Shotgun Sequencing Sequencing Strategies (2) • Shotgun: • Quick, highly redundant – requires 7-9X coverage • • • for sequencing reads of 500-750bp. This means that for the Human Genome of 3 billion bp, 21-27 billion bases need to be sequence to provide adequate fragment overlap. Computationally intensive Troubles with repetitive DNA Original strategy of Celera Genomics Shotgun Sequencing: Assembly of Random Sequence Fragments • To sequence a Bacterial Artificial Chromosome (100-300Kb), millions of copies are sheared randomly, inserted into plasmids, and then sequenced. If enough fragments are sequenced, it will be possible to reconstruct the BAC based on overlapping fragments. What Does the Draft Human Genome Sequence Tell Us? By the Numbers The Wheat from the Chaff How It's Arranged How the Human Compares with Other Organisms Variations and Mutations By the Numbers •The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G). •The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. •The total number of genes is estimated at 30,000 to 35,000 much lower than previous estimates of 80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite of generich and gene-poor areas. •Almost all (99.9%) nucleotide bases are exactly the same in all people. •The functions are unknown for over 50% of discovered genes. The Wheat from the Chaff •Less than 2% of the genome codes for proteins. •Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the human genome. •Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes. •During the past 50 million years, a dramatic decrease seems to have occurred in the rate of accumulation of repeats in the human genome. How It's Arranged •The human genome's gene-dense "urban centers" are predominantly composed of the DNA building blocks G and C. •In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and AT-rich regions usually can be seen through a microscope as light and dark bands on chromosomes. •Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between. •Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich areas, forming a barrier between the genes and the "junk DNA." These CpG islands are believed to help regulate gene activity. •Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231). How the Human Compares with Other Organisms •Unlike the human's seemingly random distribution of gene-rich areas, many other organisms' genomes are more uniform, with genes evenly spaced throughout. •Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA transcript "alternative splicing" and chemical modifications to the proteins. This process can yield different protein products from the same gene. •Humans share most of the same protein families with worms, flies, and plants, but the number of gene family members has expanded in humans, especially in proteins involved in development and immunity. •The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%). •Although humans appear to have stopped accumulating repeated DNA over 50 million years ago, there seems to be no such decline in rodents. This may account for some of the fundamental differences between hominids and rodents, although gene estimates are similar in these species. Variations and Mutations •Scientists have identified about 1.4 million locations where single-base DNA differences (SNPs) occur in humans. This information promises to revolutionize the processes of finding chromosomal locations for disease-associated sequences and tracing human history. •The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females. Researchers point to several reasons for the higher mutation rate in the male germline, including the greater number of cell divisions required for sperm formation than for eggs. Other Model Organisms Organism H. influenzae S. cerevisiae C. elegans A. thaliana D. melanogaster M. musculus H. sapiens Genome size 1.8MB 12.1MB 97MB 100MB 180MB 3000MB 300MB Completion date 1995 1996 1998 2000 2000 Estimated no of genes 1,740 6,034 19,099 25,000 13,061 3,5000-45,000 ELSI Ethical, Legal, and Social Implications of Human Genetics Research Discrimination in insurance and employment based on genetic information When and how new genetic tests should be integrated into mainstream health care services Informed consent in genetic research protocols; and Public and professional education about genetics research and bioethics. ELSI The Next Step: Functional Genomics •Transcriptomics (microarray) •Proteomics •Structural genomics •Knockout studies •Comparative genomics •Transcriptomics (microarray) involves large-scale analysis of messenger RNAs transcribed from active genes to follow when, where, and under what conditions genes are expressed. •Studying protein expression and function--or proteomics--can bring researchers closer to what's actually happening in the cell than gene-expression studies. This capability has applications to drug design. •Structural genomics initiatives are being launched worldwide to generate the 3D structures of one or more proteins from each protein family, thus offering clues to function and biological targets for drug design. •Experimental methods for understanding the function of DNA sequences and the proteins they encode include knockout studies to inactivate genes in living organisms and monitor any changes that could reveal their functions. •Comparative genomics--analyzing DNA sequence patterns of humans and wellstudied model organisms side-by-side-has become one of the most powerful strategies for identifying human genes and interpreting their function.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Human Genome Project