* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sample Chapter
DNA damage theory of aging wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
DNA vaccination wikipedia , lookup
Point mutation wikipedia , lookup
DNA sequencing wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Comparative genomic hybridization wikipedia , lookup
Oncogenomics wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
DNA supercoil wikipedia , lookup
Transposable element wikipedia , lookup
Molecular cloning wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Epigenomics wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Human genetic variation wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Pathogenomics wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Genetic engineering wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Microsatellite wikipedia , lookup
Public health genomics wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Metagenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human genome wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
2 C HAPTE R Human Genome Project and its Ethical Issues 2.1. INTRODUCTION The Human Genome Project (HGP) is an 13-year effort, which formally initiated in October 1990. The first idea of Human Genome Project came from the discussions held during scientific meetings, which were organized by the US department of energy and other scientific organizations between 1984 and 1986. The project was planned, spanning a period of 15 years, but rapid technological advances accelerated its completion within 13 years, i.e., in 2003. The three billion US dollars funds were earmarked for the sequencing of more than two meters length of human DNA. The goal of the project was to determine the complete sequence of the three billion (3 × 109) DNA subunits (bases), identify all human genes, and make them accessible for further biological study. As a part of the HGP, parallel sequencing was done for selected model organisms, such as the bacterium E. coli to help develop the technology and interpret human gene function. The Department of Energy’s ‘Human Genome Program (HGP)’ and the National Institutes of Health’s ‘National Human Genome Research Institute (NHGRI)’ together sponsored the US Human Genome Project. Ari Patrinos, head of the Office of Biological and Environmental Research, directed the Department of Energy’s ‘Human Genome Program’ research. Francis Collins directed the National Institutes of Health’s, National Human Genome Research Institute efforts. The Corporate Genome Project was initiated rather late by Celera Genomics, a company founded by a former NIH scientist, Craig Venter, and funded by Perkin-Elmer, a large instrumentation manufacturer that makes and sells instruments to the government and to the private sector as well. The Human Genome Project (HGP) and the Corporate Genome Project (CGP) are two very distinctly different entities having different cultures and attitudes. The focus has been around the privacy issue. Celera wanted to retain some of the information or control over the information that it was going to publish about the human genome, because, in fact, its business model was dependent on that fact. From the inception of this project, due to the huge budget in sequencing of human DNA, many laboratories around the United States received Human Genome Project and its Ethical Issues 17 determine, among other things, how the organism looks, how well its body metabolizes food or fights infection, and sometimes even how it behaves. The human genome is made up of DNA (which has four different chemical building blocks). DNA is made up of four similar chemicals (called bases and abbreviated A, T, C and G) that are repeated millions or billions of times throughout a genome. The human genome, for example, has 3 billion pairs of bases. In DNA, the particular order of As, Ts, Cs and Gs is extremely important. The order underlies the life’s diversity, even dictating whether an organism is human or another species, such as yeast, rice, or fruit fly, all of which have their own genomes and are themselves the focus of genome projects. Since all organisms are related through similarities in DNA sequences, insights gained from nonhuman genomes often lead to new knowledge about human biology. To get an idea of the size of the human genome present in each of our cells, consider the following analogy: If the DNA sequence of the human genome were compiled in books, the equivalent of 200 volumes the size of a telephone book (at 1000 pages each) would be needed to hold it all (Fig. 2.1). Storing all this information is a great challenge for computer experts known as bioinformatics specialists. One million bases (called a mega base and abbreviated Mb) of DNA sequence data is roughly equivalent to 1 megabyte of computer data storage space. Since the human genome is 3 billion base pairs long, 3 gigabytes of computer data storage space is needed to store the entire genome. This includes only nucleotide sequence data, and does not include data annotations and other information that can be associated with sequence data. As time goes on, more annotations will be entered as a result of laboratory findings, literature searches, data analyses, personal communications, automated data-analysis programs, and auto annotators. These annotations associated with the sequence data are likely to dwarf the amount of Fig. 2.1. Compiling the DNA sequence from the human genome into books would require 200 volumes, each the size of the 1,000 page Bangalore telephone book. storage space actually taken up by the initial 3 billion nucleotide sequence. Of course, that’s not much of a surprise because the sequence is merely a starting point for a much deeper biological understanding. Human beings are also similar to other living cells in their basic cell characteristics. Cells: These are the fundamental working units of every living system. All the instructions Human Genome Project and its Ethical Issues 19 Fig. 2.2. Human genome and nature of DNA. blocks, G and C. In contrast, the gene-poor “deserts” are rich in the DNA building blocks, A and T. GC- and AT-rich regions can usually be seen through a microscope as light and dark bands on chromosomes. Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA in between. Stretches of upto 30,000 C and G bases repeating over and over often occur adjacent to gene-rich areas, forming a barrier between the genes and the “junk DNA”. These CpG islands are believed to help regulate gene activity. Chromosome 1 has most of the genes (2968), and the Y chromosome has the fewest (231). Scientists have identified about 1.4 million locations where single-base DNA differences i.e., single nucleoride polymorphisms (SNPs) occur in humans. This information promises to revolutionize the processes of finding chromosomal locations for disease-associated sequences and tracing human history. 2.4. GENOME SEQUENCING Genome sequencing is the term used to describe the laboratory process of reading the order of the four letters of the genetic alphabet (A, C, G, T) along a strand of DNA. The steps involved in such efforts are as follows: 1. Selection of suitable sample materials. 2. Isolation of DNA from the cells, and preparation of large samples of high quality DNA from these cells. 3. Cutting the purified DNA at random sites into a manageable size, overlapping pieces of the DNA sample. 4. Insertion of these DNA pieces into packages for the production of unlimited copies of such selected DNA. 5. Recording the order of bases for each DNA sample piece by using DNA sequencing techniques. 6. Determination of the overlap of each piece, and assembling the sequences to give the final genome of the human. 20 Bioethics and Biosafety While following the above approaches, it is necessary to make appropriate sample populations based on the distribution of humans. A primary goal of the Human Genome Project is to generate detailed maps of the human genome. These maps will aid in determining the location of genes within the human genome. More specifically, they will assign genes to their chromosomes. Two types of maps are being generated genetic linkage maps and physical maps. Genetic linkage maps determine the relative arrangement and approximate distances between genes and markers on the chromosomes and physical maps specify the physical location (in base pairs) and distance between genes or DNA fragments with unknown functions that are mapped to specific regions of the chromosomes. Maps have different levels of resolution, ranging from low to high. The degree of resolution that is appropriate depends on whether, for example, a large fragment of DNA is to be studied or a more detailed picture of a small DNA region is needed. A human genomic library consists of random DNA fragments, and is used to establish sets of ordered, overlapping cloned DNA fragments or contigs for each chromosome of the genome, In other words, these are high-resolution maps. After mapping is complete, the DNA must be sequenced to determine the order of all the nucleotide bases of the chromosomes, and the genes in the DNA sequence must be identified. In all phases of the project, a major focus has been on developing instrumentation to increase the speed of data collection and analysis. New, automated technologies are significantly increasing the speed and accuracy of DNA sequencing, while decreasing the cost. Software and database systems manage the data generated from mapping and sequencing projects. Database management systems store and aid in distributing genomic information (Fig. 2.3). Genetic linkage maps show the order and genetic distance between pairs of linked genes, that is, genes on the same chromosome that determine variable phenotypic traits (the difference between genetic distance and physical distance is explained below). Genetic linkage maps enable the geneticists to follow the inheritance of specific traits (that is, genes) as they are passed from generation to generation within the families. Linkage maps also determine the arrangement of genes or markers with unknown functions on the chromosomes. They show the order of linked genes and pairwise distances between their loci. During meiosis, as the haploid egg and sperm cells form, homologous chromosomes (maternally and paternally derived) line up, and DNA segments can be exchanged between the homologs. The new combinations of alleles result from this process of homologous recombination. During meiosis, each human chromosome pair is involved, on an average, in 1.5 crossover events. The likelihood of crossing over increases as the distance between the two loci increases. Crossing over between two genes or markers on the same chromosome can sometimes occur if there is enough distance between them. If two genes are very close, they are “linked” and recombination is unlikely to occur between them. Thus, the frequency of recombination is a quantitative index of the linear distance between two genes on a genetic linkage map. Distances are measured in centimorgans (cM), named after the famous geneticist, Thomas Hunt Morgan. If genes (for example, A and B) are separated by recombination 1% of the time, that is, if one out of 100 products of meiosis is recombinant, they are 1 cM apart. A genetic distance of 1 cM represents a physical distance of approximately one million base pairs (1 Mb). Genetic maps are very powerful. An inherited disease gene can be located on the map if a second gene or DNA reference marker is also inherited in individuals with the disease, but is not found in individuals who do not have that disease. Exact chromosomal locations have already been found for many disease genes, including fragile X syndrome, cystic fibrosis and Buntington’s disease. Human Genome Project and its Ethical Issues 21 Fig. 2.3. Process of determination of DNA sequence from human chromosome. 22 Bioethics and Biosafety Many inherited diseases are caused by single genes, and thus can be studied by genetic linkage analysis. Almost 5,000 genetic disorders have been studied in this way. These maps, however, do not relate directly to the physical structure of DNA, and the gene of interest cannot be isolated on the basis of information from genetic linkage maps alone or human genome mapping. Linkage analysis involves the study of family members carrying a particular trait for an inherited disorder. Often, several generations of one family are studied to obtain enough information with which to infer linkage. Some family members must express the trait (gene) or genetic disorder, and the trait must vary among individuals (that is, there must be different alleles or forms of the gene). Analysis also requires that there be individuals who are heterozygous for DNA reference markers or who have a second gene linked to the gene in question. Heterozygous family members (members carrying two different forms of the trait or gene—one dominant and one recessive allele) enable geneticists to determine which chromosome of the homologous pair carries the allele for the genetic disorder, and whether it is passed on to the offspring. The physical location of the DNA marker on a chromosome can then be found by using the marker sequence as a DNA probe. Polymorphic DNA markers serve as reference points or landmarks to help find a region of DNA that contains the gene of interest. If a gene is found between two DNA markers, the DNA region can be isolated for further study. An early goal of the investigators of Human Genome Project was to generate linkage maps with polymorphic DNA markers, spaced 2 to 5 cM along each chromosome. This goal was reached in 1995. Such a map helps scientists to find genes of interest relative to about 1,500 markers within the genome, Once linkage maps have some 3,300 polymorphic DNA markers, each separated by only 1 cM, gene hunting will be much easier. Thus, for polymorphic DNA markers to be valuable, their linkage with a gene must be established, and their physical locations must be identified through the use of probes. Several large scientific groups working on the human genome are identifying markers to generate comprehensive genetic maps. 2.5. PHYSICAL MAPS The physical maps specify the exact physical location (in base pairs) and distance between genes or markers, or unknown DNA or genes. These maps provide information about the physical organization of the DNA; examples are the location of restriction enzyme sites and the order of restriction fragments of chromosomes. An entire genome can be studied using a library of genomic DNA. These clones are uncharacterized, random fragments and are not placed in order, as they would be on the chromosome. As the human genome is very large, large DNA fragments must be cloned into vectors to maintain manageable number of clones in the library. Yeast artificial chromosomes (YACs) are being used as cloning vectors for the human genome, since a DNA can be up to one million base pairs in length. Human DNA is attached to the yeast DNA and transferred into yeast host cells for replication. Only a small portion of the yeast’s total DNA, i.e., origin of replication, telomere, and centromere is required for replication, so most of the YAC DNA is the foreign DNA. The average insert used in YAC libraries is 200,000 0 400,000 base pairs in length. This range is 10 times larger than inserts used in other libraries, such as for bacteriophage and cosmids, where up to 20,000 to 40,000 base pairs, respectively, can be cloned. The human genome can be represented by 7,500 YAC clones, and is maintained and amplified in yeast host cells. YACs and their inserts are cut into smaller fragments and recloned or subcloned (for example, into cosmids), so that a detailed map of a YAC clone is obtained. YAC clones are screened by PCR to isolate specific genes of interest. DNA inserts are also Human Genome Project and its Ethical Issues 23 analyzed by obtaining restriction maps, identifying polymorphic markers, and/or DNA sequencing. However, without an ordered physical map, i.e., one that refers to actual physical distances in base pairs between landmarks, the location of particular clones cannot be identified. Another method called fluorescence in situ hybridization (FISH) of probes to metaphase chromosomes provides information for constructing low-resolution chromosomal maps. Chromosomal maps are actual physical maps because distances are measured in base pairs. Metaphase chromosomes are spread out on a microscope slide, and a solution containing a fluorescent-tagged DNA probe is added. Under the appropriate conditions, the probe hybridizes to its DNA complement on the chromosome and is detected with a fluorescent microscope (Figs. 2.4 & 2.5). The relative orientation of genes and DNA fragments can be assigned to specific chromosomes, and the gaps between mapped Fig. 2.4. A microscopic preparation (metaphase squash before a karyotype is made) of human chromosomes showing the differences in size and banding patterns of the chromosomes. Fig. 2.5. Fluorescence of chromosome position by probes in fluorescence in situ hybridization. cosmids can be bridged. Chromosomal mapping is used to locate genetic markers that are associated with observable traits. Another type of physical map is the cDNA map, which localizes coding regions (for example exons) to specific chromosome regions or bands. The cDNA molecules are synthesized from an mRNA template. The DNA map is probably one of the most important types of map, since it can identify the chromosomal location of specific genes, whether their functions are known or not. Researchers searching for a specific disease causing gene can use cDNA maps to help locate it after having established a general location by genetic linkage methods. High-resolution physical maps can be generated by a method that is sometimes called bottom-up mapping. The chromosome is cut into small overlapping fragments, each of which is cloned and the order determined. These fragments form continuous DNA blocks called contigs. The bottom-up method generates a detailed map called a ‘contig’ map. A library of clones ranging from 10,000 base pairs to 1 Mb is used for mapping. Each clone can be localized to specific regions within chromosomal bands. This “linked” library of overlapping clones comprises a chromosomal segment. The production of human contig maps requires several steps. First, a library must be made that 24 Bioethics and Biosafety represents the human genome—either the entire genome or a segment—in cloned DNA fragments. The DNA fragments within each clone must overlap other fragments. Overlap is accomplished by cutting the DNA with a specific restriction enzyme. If every restriction site on the DNA were cut, fragments would not overlap. Therefore, enzyme digestion is conducted in such a way that only a particular DNA restriction site is cut. This partial digestion randomly leaves many sites uncut, so that overlapping DNA fragments are produced and the order along the chromosomes can be determined. The order of the clones or contigs can be determined by identifying the overlaps in the DNA fragments. Overlap can be detected when some of the DNA bands are the same i.e., two clones have bands in common. This method of assembling pairs of clones into contigs is difficult and timeconsuming. Automation and sophisticated computer algorithms may increase the efficiency. Different approaches may be used to fill in the gaps that are likely to be present even after researchers generate detailed physical maps. For example, microdissection, which is used to physically cut a piece of DNA from a specific region of a chromosome. This chromosomal piece can be cut into smaller fragments by restriction enzymes, cloned, mapped, and sequenced by standard methods. An alternate method is “chromosome walking”, in which a small region at the end of the DNA fragment is used as a probe to screen the library for the adjacent clone. A DNA piece at the end of this second cloned fragment is used as a next probe. This process continues until a complete physical map has been obtained. Since the human genome is divided into chromosomes, chromosome specific libraries can be constructed so that each chromosome has a contig map. Mapping is simplified if each chromosome is separated from the others before being cut by restriction enzymes and cloned to make libraries. Twenty-four libraries are required: 22 autosomal libraries, and one each for the X and Y chromosomes. The several types of maps range from coarse to fine resolution. The map with the lowest resolution is the genetic map, which measures the frequency of recombination between linked markers (which can be genes or noncoding DNA). The next level of resolution is the restriction map, on which DNA restriction fragments ranging from 1 to 2 Mb are separated and mapped. The next higher level of resolution is achieved by placing in order 400,000 to 1,000,000 base pair fragments of overlapping clones from libraries of YAC clones. These clones are then further subcloned (with insert sizes of 20,000 to 40,000 base pairs) into other vectors to produce contig maps. Finally, the DNA base sequence map having the finest resolution is determined. Sequence-Tagged Sites In the sequencing approaches, Human Genome Project requires that the collected genome information be shared. A major problem is that investigators from different laboratories use a variety of methods for generating and mapping DNA fragments, thus making correlations difficult when data from different laboratories are compared. Therefore, to solve this problem, universal reference system has been developed. Unique regions of 200 to 500 base pairs of partially sequenced DNA are used to identify clones, contigs, and long stretches of DNA. These sequence-tagged sites (STSs) are standard markers that are used for physical mapping. An STS can also can be a region of cDNA i.e., an exp sequence called an expressed-sequence tag (EST). ESTs are used to represent landmarks along the map, thus helping to identify the regions where pairs of clones overlap. These special sequences constitute a “universal mapping language”, enabling everyone to refer to a specific region of the genome by the same name, and enabling investigators to share information and 26 Bioethics and Biosafety humans, especially in proteins involved in development and immunity. The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%) and the fly (3%). Although humans appear to have stopped accumulating repeated DNA over 50 million years ago, there seems to be no such decline in rodents. This may account for some of the fundamental differences between hominids and rodents, though gene estimates are similar in these species. Scientists have proposed many theories to explain evolutionary contrasts between humans and other organisms, including those of life span, litter sizes, inbreeding, and genetic drift. Variations and Mutations US Human Genome Project Research Goals The completion of the human DNA sequence in the spring of 2003 coincided with the 50th anniversary of Watson and Crick’s description of the fundamental structure of DNA. The analytical power arising from the reference DNA sequences of entire genomes and other genomics resources has jump-started, what some call the “biology century”. The Human Genome Project was marked by accelerated progress. In June, 2000, the rough draft of the human genome was completed a year before its schedule time. In February 2001, special issues of Science and Nature contained the working draft sequence and analyses were published. The project’s first 5-year plan, intended to guide research in financial years 1990-1995, was revised in 1993 due to unexpected progress, and the second plan outlined goals through the FY, 1998. The third and final plan (Science, 23 October 1998) was developed during a series of DOE and NIH workshops. Some 18 countries have participated in the worldwide effort, with significant contributions from the Sanger Center in the United Kingdom, and research centers in Germany, France and Japan. Difference Between Draft Sequence and Finished Sequence To generate the high-quality reference sequence, completed in April 2003, an additional sequencing was done to close the gaps and reduce the ambiguities. Further, only a single error was allowed for every 10,000 bases, the agreed-upon standard for the HGP. Investigators believe that a high-quality sequence is critical for recognizing regulatory components of genes that are very important in understanding human biology and disorders such as heart disease, cancer, and diabetes. The genomes have been sequenced completely as shown in the Table 2.2. The small genomes of several viruses and bacteria, and the much larger genomes of three higher organisms have been completely sequenced. They are bakers’ or brewers’ yeast (Saccharomyces cerevisiae), the roundworm (Caenorhabditis elegans) and the fruit fly (Drosophila melanogaster). In October, 2001, the draft sequence of the pufferfish Fugu rubripes, the first vertebrate after the human, was completed, and scientists finished the first genetic sequence of a plant weed Arabidopsis thaliana, in December 2000. Many more genomes have been sequenced since then. Human Genome project is also called Human Genome Initiative Scientific Research Effort to analyze the DNA of humans and of several lower organisms. The project began in the United States in 1990 under the sponsorship of the US Department of Energy and the National Institutes of Health. Projects undertaken concurrently in Japan, the United Kingdom, Italy, France, and Russia are coordinated with the American effort through the Human Genome Organization. The ultimate goal of the project is to identify the chromosomal location of every human gene, and to determine the precise chemical structure of each gene in order to elucidate its function in health and disease. The information gathered is expected to serve as the basic reference for research in human biology and medicine in the 21st century, Human Genome Project and its Ethical Issues 27 Table 2.2. The list of organisms whose genome sequence is completed. Group Virus Prokaryotes Eukaryotes Organism MS2 SV40 φX174 M13 λ Herpes simplex T2, T4, T6 Smallpox Methanococcus jannaschii E. coli Borrelia burgdorferi Saccharomyces cerevisiae Caenorhabditis elegans Arabidopsis thaliana Drosophila melanogaster Homo sapiens Zea mays Fugu rubripes Amphiuma means and to provide fundamental insights into the genetic basis of human diseases. The new technologies developed in the course of the project will be applicable in numerous other fields of biomedical endeavour. Each cell of an organism has a set of chromosomes containing the heritable genetic material that directs its development, i.e., its genome. The genetic material of chromosomes is DNA. Each of the paired strands of the DNA molecule is a linear array of subunits called nucleotides, or bases, of which there are four types—adenine, cytosine, thymine, and guanine. Genes are discrete stretches of nucleotides that carry the information, which is used by the cell to synthesize proteins. Human genes take up only about 5 to 10% of the DNA. Some of the remaining DNA, which does not code for proteins, may regulate whether or not proteins are made, but the function of most of it is unknown. Genome size Haploid number 4 kb 5 kb 5 kb 6 kb 50 kb 152 kb 165 kb 267 kb 1600 kb 4600 kb 910 kb 13 Mb 97 Mb 100 Mb 180 Mb 3000 Mb 4500 Mb 400 Mb 90,000 Mb 1 1 1 1 1 1 1 1 1 1 1 16 06 05 04 23 10 22 14 This landmark of scientific achievement represented the completion of the first stage of the project. Initial results published by both groups in February 2001 declared that the human genome actually contains only about 30,000 to 40,000 genes, much fewer than originally thought. Two types of maps were constructed—genetic linkage maps and physical maps. Genetic linkage map provides the relative location of genes and other markers on the basis of how frequently genes are inherited together; the closer genes are to each other on a chromosome, the more likely they are to be inherited together. Physical maps locate genes in relation to the presence of known nucleotide sequences that act as landmarks along the length of a chromosome. One such “marker” used to map the human genome is a sequence-tagged site (STS)—a short sequence of nucleotides that occurs only once throughout the genome. A relatively detailed physical map was needed before sequencing could 28 Bioethics and Biosafety begin. Sequencing, in which the precise order of the nucleotide sequence is determined, was the most technically challenging part of the project. DNA sequencing of the nematode worm Caenorhabditis elegans and the yeast Saccharomyces cerevisiae was completed in 1996. The DNA sequencing of the other organisms was completed in the following order: (1) E. coli-1997. (2) Fruit fly (Drosophila melanogaster) and plant Arabidopsis thaliana—2000. (3) The laboratory mouse (Mus musculus) and bacterium Staphylococcus aureus—2001. The rationale for these efforts is that many genes with similar functions in disparate organisms have been conserved in evolution and show surprising similarities. Genes from simpler organisms can thus be used to study human beings. Another objective of the Human Genome Project is to address the ethical, legal, and social implications of the information obtained. Society will derive the greatest benefit from this knowledge only if it takes measures to prevent abuses, such as invasion of the privacy of an individual’s genetic background by employers, insurers,or government agencies, or discrimination based on genetic grounds. The HGP was the natural culmination of the history of genetics research. In 1911, Alfred Sturtevant, then an undergraduate researcher in the laboratory of Thomas Hunt Morgan, realized that he could—and had to, in order to manage his data—map the location of the fruit fly (Drosophila melanogaster) genes, whose mutations Morgan laboratory was tracking over generations. Sturtevant’s very first gene map can be likened to the Wright brothers’ first flight at Kitty Hawk. In turn, the Human Genome Project can be compared to the Apollo program bringing humanity to the moon. The hereditary material of all multicellular organisms is the famous double helix of deoxyribonucleic acid (DNA), which contains all of our genes. DNA, in turn, is made up of four chemical bases, pairs of which form the “rungs” of the twisted, ladder-shaped DNA molecules. All genes are made up of stretches of these four bases, which are arranged in different ways and in different lengths. HGP researchers have deciphered the human genome in three major ways—determining the order or “sequence”, of all the bases in our genome’s DNA, making maps that show the locations of genes in major sections of all our chromosomes, and producing what are called linkage maps, complex versions of the type originated in early Drosophila research, through which inherited traits (such as those for genetic disease) can be tracked over generations. The HGP has revealed that there are probably somewhere between 30,000 and 40,000 human genes, and their location can be identified now. This ultimate product of the HGP has given the world a resource of detailed information about the structure, organization and function of the complete set of human genes. This information can be thought as the basic set of inheritable “instructions” for the development and functioning of a human being. The International Human Genome Sequencing Consortium published the first draft of the human genome in the journal ‘Nature’ in February 2001, with the sequence of the entire genome’s three billion base pairs some 90 percent complete. A startling finding of this first draft was that the number of human genes appeared to be significantly fewer than previous estimates, which ranged from 50,000 genes to as many as 140,000.The full sequence was completed and published in April 2003. How to Sequence The task of determining the complete sequence of the 3,200,000,000 bases of the human genome (30× the size of the nematode genome) was extremely daunting at the time when the project was formally launched. Several lines of investigation focused on an alternative approach Human Genome Project and its Ethical Issues 29 to characterize the human genome. For example, complete genome sequencing may be bypassed by selectively sequencing just expressed sequences, obtained by extracting mRNA from a wide range of human tissues. Large scale expressed sequence tags (EST) projects in both the public and private domain resulted in the collection of huge amount of sequence information on human genes. An international consortium to map the ESTs in the genome, using the genetic map as a framework, resulted in the human gene map of 35,000 genes. This was an important and valuable milestone in HGP. However, the sequence of most of the mRNAs was incomplete, unknown number of genes were missing from the collection, and there was no information available on gene structures. In contrast, the experience gained from the study of smaller genomes, especially those of the nematode and yeast illustrated the enormous potential to obtain a complete set of genes, gene structures and all other genetic information by determining the complete sequence of the genome. Furthermore, by breaking the task into manageable segments, and using a physical map to co-ordinate the work, it was possible to undertake projects to sequence genomes that were far beyond the capabilities of a simple shotgun approach. For the human genome, therefore, the strategy adopted was to use the landmarks provided by the genetic map, and later the gene map, as a framework to anchor a physical map of overlapping clones which represented all human chromosomes. initially much of the work was done using yeast artificial chromosomes (YACs), a yeast cloning system, which accepts vary large fragments and thus allows a physical map to be built quickly over very large distances. However, the development of new bacterial cloning systems called BACs or PACs (bacteria or P1 derived artificial chromosomes), which were capable of taking large inserts (up to 250 kb) made it possible to make long range maps directly in bacterial clones. This coupled with the greater convenience and stability of bacterial clones compared to YACs, resulted in the choice of this system for construction of the physical map to provide the tile path of clones for sequencing. Each BACs or PACs has been sequenced using a random shotgun approach. This approach is essentially the same as was developed for the early whole genome sequencing projects. DNA from the BAC or PAC is broken up randomly into short fragments (typically 1-2kb long), which are sub cloned into plasmids or bacteriophage M13 cloning vector. The resulting sub clones (transformed bacterial colonies) are picked at random, cultured and the sub clone DNA is extracted for use as a sequencing template. A primer (short DNA strand) is hybridized to the template within the vector sequence (which is common to all clones). This provides a starting point for DNA polymerase to synthesize new strands of DNA by incorporating the deoxynucleotide triphosphate (dNTPs), which are the single base precursors of DNA. Fluorescently labeled analogues for each base are included in the same reaction (dideoxy NTPs or ddNTPs); a different fluorescent tag is used for each of the four bases. These analogues extend the chain in a base-specific manner when they are incorporated (and these are called “chain terminators”). The product of the reaction is a ladder of newly synthesized DNA fragments of increasing size in single base increments. Each fragment in the mixture is terminated at a specific place, which can be identified according to its specific fluorescent label that can be separated on the basis of size by electrophoresis, either through polyacrylamide “slab” gels, or more recently through a viscous liquid matrix held in individual capillaries (capillary gel electrophoresis). The ladder of colored bands thus represents the sequence of the bases in the DNA, and can be read automatically by an automatic fluorescent detector. The sequence of all the sub clones of a single BAC or PAC is analyzed together. All overlap[s] between sequences are identified, and the individual reads are assembled onto contigs. A consensus sequence is obtained at this stage, and Human Genome Project and its Ethical Issues 31 Although the HGP is finished, analyses of the data will continue for many years. An important feature of the HGP project was the federal government’s long-standing dedication to the transfer of technology to the private sector. By licensing technologies to private companies and awarding grants for innovative research, the project catalyzed the multibillion-dollar US biotechnology industry, and fostered the development of new medical applications. Rapid progress in genome science and a glimpse into its potential applications have spurred observers to predict that biology will be the foremost science of the 21st century. Technology and resources generated by the Human Genome Project and other genomics research are already having a major impact on research across the life sciences. The potential for commercial development of genomics research presents US industry with a wealth of opportunities, and sales of DNA-based products and technologies in the biotechnology industry are projected to exceed $45 billion by 2009 (Consulting Resources Corporation Newsletter, Spring 1999). Current and Potential Applications of Genome Research Include the Following: • • • • • • • • • • • • Molecular medicine Energy sources and environmental applications Risk assessment Bioarchaeology, anthropology, evolution, and human migration DNA forensics (identification) Agriculture, livestock breeding and bioprocessing Molecular medicine Improved diagnosis of disease Earlier detection of genetic predispositions to disease Rational drug design Gene therapy and control systems for drugs Pharmacogenomics “custom drugs” Broader applications reaching into many areas of the economy include the following: • Clinical medicine: Many more individualized diagnostics and prognostics, drugs, and other therapies. • Agriculture and livestock: More nutritious and healthier crops and animals. • Industrial processes: Cleaner and more efficient manufacturing in sectors such as chemicals, pulp and paper, textiles, food, fuels, metals, and minerals. • Environmental biotechnology: Biodegradable products, new energy resources, environmental diagnostics and less hazardous cleanup of mixed toxic-waste sites. • DNA fingerprinting: Identification of humans and other animals, plants and microbes; evolutionary and human anthropological studies; and detection of and resistance to harmful agents that might be used in biological warfare. Technology and resources promoted by the Human Genome Project are beginning to have profound impacts on biomedical research, and promise to revolutionize the wider spectrum of biological research and clinical medicine. Increasingly detailed genome maps have aided researchers seeking genes associated with dozens of genetic conditions, including myotonic dystrophy, fragile X syndrome, neurofibromatosis, diabetes types 1 and 2, inherited colon cancer, Alzheimer’s disease and familial breast cancer. On the horizon is a new era of molecular medicine, characterized less by treating symptoms and more by looking to the most fundamental causes of disease. Rapid and more specific diagnostic tests will make earlier treatment of countless maladies possible. Medical researchers will also be able to devise novel therapeutic regimens based on new classes of drugs, immunotherapy techniques, avoidance of environmental conditions that may trigger disease, and possible augmentation or even replacement of defective genes through gene therapy. Human Genome Project and its Ethical Issues 33 reassembling DNA fragments in their original order. This repeated sequencing is known as genome “depth of coverage”. Draft sequence data is mostly in the form of 10,000 basepair-sized fragments whose approximate chromosomal locations are known. In June 2000, the Human Genome Project and Celera Genomics, a privately owned firm founded in 1998, jointly announced the completion of the initial sequencing of the human genome, which is composed of about three billion nucleotide base pairs. Developing the Tools and Technologies for the Success of HGP The DOE investments described below helped to make the Human Genome Project a success. Substantial investments by the NIH and the Wellcome Trust in the UK were equally important, however, and should not be overlooked. In most cases, the DOE achievements outlined below were the result of basic research programs. Research is an incremental process that learns from both the success and failures of other research investments, including other agencies and organizations. Furthermore, no single instrument, technology, reagent, or protocol made high-throughput DNA sequencing possible, many contributors were responsible. DNA Sequencers Research on capillary-based DNA sequencing contributed to the development of two major DNA sequencing machines—the Perkin-Elmer 3700 and the MegaBace DNA sequencers. The MegaBace DNA sequencer was developed initially with DOE funds by Dr. Richard Mathies at UC, Berkeley. The Perkin-Elmer 3700 was based, in part, on DOE-funded research by Dr. Norman Dovichi at the University of Alberta. These high-throughput instruments are one of the keys to the success of the genome project. Fluorescent Dyes DNA sequencing originally used radiolabeled DNA subunits. DOE-funded research contributed to the development of fluorescent dyes, which increased the accuracy and safety of DNA sequencing as well as the ability to automate the procedures. DNA Cloning Vectors Before the sequencing of large DNA molecules, they are cut into small pieces and multiplied, or cloned into numerous copies using microbialbased “cloning” vectors. Today, the bacterial artificial chromosome (BAC) is the most commonly used vector for initial DNA amplification before sequencing. These cloning vectors were developed with DOE funds. BAC-End Sequencing The widely agreed-upon strategy for sequencing the human genome is based on the use of BACs, which carry fragments of human DNA from known locations in the genome. DOE-funded research at the Institute for Genomic Research in Rockville, Maryland, and at the University of Washington provided the sequencing community with a complete set of over 450,000 BAC-based genetic “markers” corresponding to a sequence tag every 3 to 4 kilobases across the entire human genome. These markers were needed to assemble both the draft and the final human DNA sequence. Gene Recognition and Assembly Internet Link (GRAIL) Gene Recognition and Assembly Internet Link (GRAIL) is one of the most widely used computer programs for identifying the potential genes in DNA sequence and for general DNA sequence analysis. This powerful analytical tool was developed with DOE funds by Dr. Ed Uberbacher at Oak Ridge National Laboratory. Although a Human Genome Project and its Ethical Issues 35 • • • Structural genomics: Initiatives are being launched worldwide to generate the 3-D structures of one or more proteins from each protein family, thus offering clues to function and biological targets for drug design. Experimental methods for understanding the function of DNA sequences and the proteins they encode include knockout studies to inactivate genes in living organisms, and monitor any changes that could reveal their functions. Comparative genomics: Analyzing DNA sequence patterns of humans and well-studied model organisms side-by-side has become one of the most powerful strategies for identifying human genes and interpreting their function. 2.9. FUTURE OF HGP IN THE MEDICINE AND GENETICS The medical industry is building upon the knowledge, resources, and technologies emanating from the HGP to further understanding of genetic contributions to human health. As a result of this expansion of genomics into human health applications, the field of genomic medicine was born. Genetics is playing an increasingly important role in the diagnosis, monitoring and treatment of diseases. Diagnosing and Predicting Disease and Disease Susceptibility All diseases have a genetic component (Fig. 2.6), whether inherited or resulting from the body’s response to environmental stresses like viruses or toxins. The success of the HGP has even enabled researchers to pinpoint errors in genes— the smallest units of heredity—that cause or contribute to disease. The ultimate goal is to use this information to develop new ways to treat, cure, or even prevent the thousands of diseases that afflict humankind. Fig. 2.6. Diagram of human chromosome 19 showing the locations of selected defective genes and genetic markers. But the road from gene identification to effective treatments is long and fraught with challenges. In the meantime, biotechnology companies are racing ahead with commercialization by designing diagnostic tests to detect errant genes in people suspected of having particular diseases or of being at risk for developing them. Human Genome Project and its Ethical Issues 37 analyzing and addressing the ethical, legal and social implications of human genetics research at the same time that the basic scientific issues are being studied. In this way, problem areas can be identified, and solutions developed before the scientific information is integrated into health care practice. The ELSI Program is viewed as essential to the success of the genome project in the United States, and is supported by federal HGP funds. The National Institutes of Health’s ‘National Human Genome Research Institute (NHGRI)’ has committed 5% of its annual research budget to study ELSI issues. The US Department of Energy Office of Energy Research, NHGRI’s partner in the US Human Genome Project, also reserves a portion of its funding for ELSI research and education. ELSI and its establishments anticipates and addresses the implications of mapping and sequencing of the human genomes for the individuals and society. It also examines the ethical, legal and social consequences of mapping and sequencing the human genome; stimulates public discussion of the issues; and develops policy options, which would assure that the information is used for the benefit of individuals and society. The Working Group envisioned a program that would anticipate potential problems before they actually occur, and identify possible solutions for the problems. It suggested a number of means for accomplishing these goals. Specifically, it encouraged the research community to explore and gather data on a wide range of issues pertinent to the human genome program that could be used to develop educational programs, policy recommendations or possible legislative solutions. A number of areas for focus were identified, including fairness in the use of genetic information, the impact of knowledge of genetic variation on individuals, and the privacy and confidentiality of genetic information, to name a few. In 1990, in response to the Working Group’s report, the NHGRI established the ELSI Branch (later renamed the ELSI Research Program) in its Division of Extramural Research, and the DOE established an ELSI program in their Office of Energy Research. Since the beginning, these two programs have collaborated closely, including the joint support of the ELSI Working Group, the development of complementary research priority areas, and the co-funding of ELSI activities of mutual interest. SUMMARY Humans are higher in the hierarchy of living organisms because of their independent thinking and fantasizing capacity. Thus, understanding the human genome and its contents gives an idea about how simple, single-celled zygote/organisms developed into a complex individual. Human Genome project is also called Human Genome Initiative scientific research effort to analyze the DNA of humans and of several lower organisms. The project began in the United States in 1990 under the sponsorship of the US Department of Energy and the National Institutes of Health. Projects undertaken concurrently in Japan, the United Kingdom, Italy, France, and Russia are coordinated with the American effort through the Human Genome Organization. The project’s ultimate goal is to identify the chromosomal location of every human gene, and to determine each gene’s precise chemical structure in order to elucidate its function in health and disease. The information gathered is expected to serve as the basic reference for research in human biology and medicine in the 21st century, and to provide fundamental insights into the genetic basis of human disease. The new technologies developed in the course of the project will be applicable in numerous other fields of biomedical endeavour. The total number of genes is estimated to be 30,000, which is much lower than previous estimates of 80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite of gene-rich and gene-poor areas. 38 Bioethics and Biosafety The functions are unknown for over 50% of the discovered genes. Less than 2% of the human genome codes for functional proteins of the total three billion base pairs in all cells of the body. Repeated sequences that do not code for proteins (“junk DNA”) make-up at least 50% of the human genome. Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes. Chromosome 1 has most of the genes (2968) and the Y chromosome has the fewest (231). Genome sequencing is the term used to describe the laboratory process of reading the order of the four letter of the genetic alphabets (A,C,G,T) along a strand of DNA. The various steps involved in such efforts are as follows: Selection of suitable sample materials for the DNA; isolation of DNA from cells and preparation of large samples of high quality DNA from these cells; cutting the purified DNA at random sites into manageably sized, overlapping pieces of the DNA sample; insertion of these DNA pieces into packages for the production of limitless copies of such selected DNA; recording the order of bases for each DNA sample piece by using DNA sequencing techniques; determination of the overlap of each piece, and assembling the sequences to give the final genome of the human. While following the above approaches, it is necessary to make appropriate sample populations based on the humans distribution. A primary goal of the Human Genome project is to generate detailed maps of the human genome. These maps will aid in determining the location of genes within the human genome. More specifically, they will assign genes to their chromosomes. Two types of maps are being generated. Genetic linkage maps determine the relative arrangement and approximate distances between genes and markers on the chromosomes; physical maps specify the physical location (in base pairs) and distance between genes or DNA fragments with unknown functions that are mapped to specific regions of the chromosomes. In the Human Genome Project, importance is also given to the sequencing of other model organisms. DNA sequencing of the nematode worm Caenorhabditis elegans and the yeast Saccharomyces cerevisiae was completed in 1996, the bacterium Escherichia coli in 1997, the fruit fly (Drosophila melanogaster) and the plant Arabidopsis thaliana in 2000, and the laboratory mouse (Mus musculus) and the bacterium Staphylococcus aureus in 2001. The rationale behind these findings is that many genes with similar functions in disparate organisms have been conserved in evolution and show surprising similarities. Genes from simpler organisms can thus be used to study their counterparts found in human beings. Another objective of the Human Genome Project is to address the ethical, legal, and social implications of the information obtained. Society will derive benefit from this knowledge only if it takes measures to prevent abuses, such as invasion of the privacy of an individual’s genetic background by employers, insurers, or government agencies, or discrimination based on genetic grounds. Large number of advancement made in the diverse fields including molecular biology, genetic engineering and sequencing provided a great impetus to the Human Genome Project. These technological developments dramatically decreased the cost of DNA sequencing, while increasing its speed and efficiency. For example, it took four years for the international Human Genome Project to produce the first billion base pairs of sequence, and less than four months to produce the second billion base pairs. In the month of January, 2003, the DOE team sequenced 1.5 billion bases. The cost of sequencing has dropped dramatically since the project began and is still dropping rapidly. Other major factors involved in cost and time reduction were greatly improved