The Human Genome Project Human Genome Project The project that has identified and located all of the genes in human DNA, and determined the sequences of the chemical bases that make up human DNA. This information is stored in databases. The aims of the project were to: determine the sequences that comprise human DNA, identify all of the genes in human DNA, store this information in databases and improve tools available for its analysis, transfer technologies gained from the project to private industry (eg biotechnology companies) to develop new medical applications, address the ethical, legal and social issues that may arise from the project. The sequence of a ‘working draft’ of the human genome was published in the science journals Science and Nature in early 2001. This was a special event, as the public and private research efforts were publishing their results at the same time. Analysis of the draft sequence revealed a vast amount of information including: the average human gene consists of 3,000 nucleotide bases, but sizes vary greatly – the largest known human gene has 2.4 million bases, the order of 99.9% of nucleotide bases is exactly the same in all people, the functions of over 50% of discovered genes remain unknown, less than 2% of the genome encodes for the production of proteins, much of the genome consists of repetitive base sequences. These repeats appear to have no direct function, but over time reshape the genome by rearranging it; creating new genes or modifying and reshuffling existing genes, gene-rich areas of the genome are predominantly made up of G and C bases, whereas genepoor regions are mainly composed of A and T bases, chromosome 1 has the most genes (2968) whereas the Y chromosome has the least (231). Much is still unknown about our genome. Some of the things we still don’t know are: the exact number of genes in the human genome, the exact location, function and regulation of these genes, the amount, distribution, information content and functions of ‘non-coding’ DNA, that is, DNA that does not code for a protein product, how gene expression, protein expression and post-translational events are orchestrated, evolutionary conservation of genes and proteins amongst different organisms, correlation of genetic variation between individuals with respect to health and disease. How many genes did you say? The size of genomes differs from one organism to the next. The human genome contains more than 3.2 billion base pairs and about 30,000 genes. The largest known genome belongs to a microscopic amoeba, Amoeba dubia, which is closely followed in size by the lungfish and the Easter lily. Three quarters of the Japanese pufferfish's 31,000 genes have direct human counterparts. genomics The study of the DNA sequence in the chromosomes of an organism. This includes the genes that code for proteins, the regulatory sequences that control the genes and the non-coding DNA segments. Now that we have a map of the human genome, we have to learn how to read it. That means figuring out which gene does what. Of the estimated 30,000 genes in the human genome, we have very little idea about what each one does. One way of studying genes is to directly compare the entire genome with other organisms. This study is called comparative genomics. The human genome is extremely complicated and so, by comparing it with others, such as the mouse or fruit fly genome, we gain insights into the similarities and differences. Scientists can learn much about the function of human genes by comparing them with their mouse counterparts. All the organisms scientists are using for genome comparison are known as model organisms, in that they are a model against which the human genome can be studied. So how can we compare mice genes with human genes when we have 2 legs and mice have 4, when we have opposable thumbs and mice have claws? On a DNA level humans and other organisms aren’t that different - on average, mouse and human genes are 85% similar. So far completely mapped ‘model organism’ genomes include chimpanzee, mouse, rat, pufferfish, fruit fly, sea squirts, roundworm, baker's yeast, the bacterium Escherichia coli and in February 2005, the kangaroo. All of these organisms are being used by comparative genomics researchers to further understanding of the human genome. Around the world scientists from different nations are sequencing genomes. In early 2005 research was underway into the dog, chicken, honey bee and sea urchin genomes. What We've Learned So Far What Does the Draft Human Genome Sequence Tell Us? By the Numbers The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G). The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. The total number of genes is estimated at 30,000 —much lower than previous estimates of 80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite of gene-rich and gene-poor areas. Almost all (99.9%) nucleotide bases are exactly the same in all people. The functions are unknown for over 50% of discovered genes. The Wheat from the Chaff Less than 2% of the genome codes for proteins. Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the human genome. Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes. During the past 50 million years, a dramatic decrease seems to have occurred in the rate of accumulation of repeats in the human genome. How It's Arranged The human genome's gene-dense "urban centers" are predominantly composed of the DNA building blocks G and C. In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and AT-rich regions usually can be seen through a microscope as light and dark bands on chromosomes. Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between. Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich areas, forming a barrier between the genes and the "junk DNA." These CpG islands are believed to help regulate gene activity. Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231). How the Human Compares with Other Organisms Unlike the human's seemingly random distribution of gene-rich areas, many other organisms' genomes are more uniform, with genes evenly spaced throughout. Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA transcript "alternative splicing" and chemical modifications to the proteins. This process can yield different protein products from the same gene. Humans share most of the same protein families with worms, flies, and plants, but the number of gene family members has expanded in humans, especially in proteins involved in development and immunity. The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%). Although humans appear to have stopped accumulating repeated DNA over 50 million years ago, there seems to be no such decline in rodents. This may account for some of the fundamental differences between hominids and rodents, although gene estimates are similar in these species. Scientists have proposed many theories to explain evolutionary contrasts between humans and other organisms, including those of life span, litter sizes, inbreeding, and genetic drift. 4.1 Discuss the Benefits of the Human Genome Project (H.P.G) 1. The genome of an organism is all the genetic material of an individual or species. Eg in a haploid human cell there are about 3 billion DNA bases arranged along a chromosome. 2. The HGP is an international project that aims to identify all the human genes and to determine the sequences of the 3 billion bases in human DNA. 3. The project began in 1990 – should be completed by 2003 – 97% of human DNA is called “junk” DNA – it serves no obvious purposes. The outcome has been made possible by advances in technology, molecular techniques, electronics and computer software. 4. The goals of the project are:a) Genetic mapping of the human genome – ie locating 3000 genetic markers n human DNA (genes); b) Physical mapping – ie cutting each chromosome into fragments and then determining the correct order of the pieces; c) DNA sequencing – determining the exact order of the nucleotides on each chromosome, d) Analysing the genomes of other organisms eg bacteria, yeast Benefits 1. Medical Benefits Improved diagnosis of disease eg cancer, high blood pressure Detection of genetic predisposition to disease Drug design and gene therapy When faulty genes are detected the disease can be treated early, it also leads to diagnostic tests which can include genetic counselling Normal genes may be cloned and the product of the expression of these genes used for treatment. 2. Non-Medical Benefits Creates understanding of human evolution eg comparison between species, understanding of evolutionary relationships (eg 1% difference between us and chimps) Development in forensics Understanding gene expression, mutate and are expressed, also the function of same genes Possible Ethical Issues. Who controls the genetic information Will researchers be prevented by rival commercial interests from developing new tests and therapies] Will knowledge of our genetic profile allow us to avoid disease, or inform us about our possible early death Will our genetic profile be used against us by employers, insurers or governments (discrimination). 4.2 Describe and Explain the Limitations of data obtained from the HGP – see above – also 1. Some scientists believe some complex processes eg brain function, will never be understood even with the HGP. 2. Some genes, when they interact and function together, produce qualities that can’t be explained by the genes’ activities: (knowing the base sequences of DNA does not determine the function of every gene.) 3. Criticism of money spent on the project – could it be spent elsewhere. Why was a portion of the NHGRI budget set aside for ethical considerations? Since the beginning of the Human Genome Project, it has been clear that science's expanding knowledge of the genome would have a profound impact upon humanity. To maximize the potential for beneficial effects while minimizing the risk of detrimental effects, it was essential that research be conducted to investigate a wide range of issues related to the acquisition and use of genomic information. Five percent of the annual budget of the NHGRI is dedicated to examining ethical, legal and social implications (ELSI) related to human genome research, incorporating specific recommendations into the activities of NHGRI and providing guidance to policymakers and the public. The ELSI program at NHGRI, which is considered unprecedented in biomedical science in terms of scope and level of priority, provides an effective basis from which to assess the implications of genome research, and has resulted in several notable improvements to the HGP. An example is the decision to sequence the DNA of several anonymous individuals, rather than a known individual, in order to protect privacy. Another example is the development of widely used genetic privacy guidelines and draft legislation. The ELSI program at NHGRI now serves as a model for large, publicly funded science efforts. What will the next 50 years of medical science look like? Having the essentially complete sequence of the human genome is similar to having all the pages of a manual needed to make the human body. The challenge to researchers and scientists now is to determine how to read the contents of all these pages and then understand how the parts work together and to discover the genetic basis for health and the pathology of human disease. In this respect, genome-based research will eventually enable medical science to develop highly effective diagnostic tools, to better understand the health needs of people based on their individual genetic make-ups, and to design new and highly effective treatments for disease. Individualized analysis based on each person's genome will lead to a very powerful form of preventive medicine. We'll be able to learn about risks of future illness based on DNA analysis. Physicians, nurses, genetic counselors and other health-care professionals will be able to work with individuals to focus efforts on the things that are most likely to maintain health for a particular individual. That might mean diet or lifestyle changes, or it might mean medical surveillance. But there will be a personalized aspect to what we do to keep ourselves healthy. Then, through our understanding at the molecular level of how things like diabetes or heart disease or schizophrenia come about, we should see a whole new generation of interventions, many of which will be drugs that are much more effective and precise than those available today. When can we expect new and better drugs? It's important to be careful about raising expectations. Most new drugs based on the completed genome are still perhaps 10 to 15 years in the future, although more than 350 biotech products many based on genetic research - are currently in clinical trials, according to the Biotechnology Industry Organization. It usually takes more than a decade for a company to conduct the kinds of clinical studies needed to win marketing approval from the Food and Drug Administration. Testing, however, will arrive more quickly, especially the ability to predict individual future health risks, and the ability to implement an enhanced approach to preventive medicine. In the next decade, we may also be better able to determine which drugs work best for individuals, based on their genetic make-up. How has the Human Genome Project affected biological research? Biological research has traditionally been a very individualistic enterprise, with researchers pursuing medical investigations more or less independently. The magnitude of both the technological challenge and the necessary financial investment prompted the Human Genome Project to assemble interdisciplinary teams, encompassing engineering and informatics as well as biology; automate procedures wherever possible; and concentrate research in major centers to maximize economies of scale. As a result, research involving other genome-related projects (e.g., the International HapMap Project to study human genetic variation and the Encyclopedia of DNA Elements, or ENCODE, project) is now characterized by large-scale, cooperative efforts involving many institutions, often from many different nations, working collaboratively. The era of team-oriented research in biology is here. In addition to introducing large-scale approaches to biology, the Human Genome Project has produced all sorts of new tools and technologies that can be used by individual scientists to carry out smaller scale research in a much more effective manner. Will the era of the genome begin in April? Yes. We are entering a new age of discovery that will transform human health. Our eventual knowledge about the workings of the genome has the potential to fundamentally change our most basic perceptions of our biological world. It is difficult to predict what will be learned and how future knowledge will be applied, but there can be little doubt that understanding the genome will revolutionize our concept of health and improve the human condition in remarkable ways. Now that the genome is complete, what's next for NHGRI? NHGRI's vision for the future, which is being published April 24, 2003 in the journal Nature, details a diverse and exciting landscape of new possibilities. NHGRI will particularly focus on opportunities to translate the results of the Human Genome Project into advances in medicine, including projects that build upon the completed human genome sequence. This is particularly true of projects of a large international scope that require extensive coordination and public investment to ensure that results and discoveries remain freely available in the public domain. An example is NHGRI's genetic variation mapping project, or HapMap, which will speed the discovery of genes related to common illnesses such as asthma, cancer, diabetes and heart disease. The HapMap should also be a powerful resource for studying the genetic factors contributing to variation in response to environmental influences, in susceptibility to infection, and in the effectiveness of drugs and vaccines. Another example is the ENCODE project, which aims to create a comprehensive encyclopedia of the functional elements encoded in the DNA sequence, by cataloging the identity and precise location of all of the protein-encoding and non-protein-encoding genes within the genome. the benefits of the Human Genome Project Completed in 2003, the Human Genome Project (HGP) was a collaborative project that lasted for 13 years. The goals of the project were to: o identify all of the approximately 25,000-30,000 genes in human DNA o determine the sequences of the 3 billion chemical base pairs that make up human DNA o store this information in databases o improve and develop tools for data analysis o address the ethical, legal and social issues that may arise from the project. Analysis of the data will continue for many years. By licensing technologies to private companies and awarding grants for innovative research, the project accelerated the biotechnology industry and aided the development of new medical applications. Current and potential applications and benefits of the HGP include: Improvements in Molecular Medicine: improved diagnosis of inheritable diseases; earlier detection of genetic predispositions to disease; rational drug design; gene therapy and control systems for drugs. More accurate risk assessment: assess health damage and risks caused by exposure to both high and low doses of radiation; assess health damage and risks caused by exposure to mutagenic chemicals and cancercausing toxins; reduce the likelihood of heritable mutations. Better understanding of evolution and human migration (Bioarchaeology, Anthropology, Evolution and Human Migration) : study evolution through germline mutations in lineages; study migration of different population groups based on female genetic inheritance; study mutations on the Y chromosome to trace lineage and migration of males; compare breakpoints in the evolution of mutations with ages of populations and historical events. DNA Forensics: identify potential suspects whose DNA may match evidence left at crime scenes through DNA fingerprinting of samples such as blood or skin exonerate persons wrongly accused of crimes identify crime and catastrophe victims establish paternity and other family relationships match organ donors with recipients in transplant programs. describe and explain the limitations of data obtained from the Human Genome Project It is now believed that only approximately 3% of the DNA in human chromosomes codes for proteins. The other 97% of the DNA consists of non-coding regions (sometimes called ‘junk DNA’), whose functions may include providing chromosomal structural integrity and regulating where, when, and in what quantity proteins are made. The use of about 50% of this ‘"junk DNA" is not known. Some genes are found inside other genes, thus making their identification difficult. Non-coding DNA is used in DNA fingerprinting. It may be a long time before scientists totally understand the role of every gene, its interaction with other genes and how DNA relates to such things as behaviour, brain function and other aspects of neurobiology. Some other limitations of the Human Genome Project involve ethical, legal and social implications such as: o fairness in the use of genetic information o privacy and confidentiality o psychological impact and stigmatisation o education, standards and quality control o commercialisation o conceptual and philosophical implications. What is sequencing and how do you sequence a genome? Sequencing means determining the exact order of the base pairs in a segment of DNA. Human chromosomes range in size from about 50,000,000 to 300,000,000 base pairs. Because the bases exist as pairs, and the identity of one of the bases in the pair determines the other member of the pair, scientists do not have to report both bases of the pair. The primary method used by the HGP to produce the finished version of the human genetic code is map-based, or BAC-based, sequencing. BAC is the acronym for "bacterial artificial chromosome." Human DNA is fragmented into pieces that are relatively large but still manageable in size (between 150,000 and 200,000 base pairs). The fragments are cloned in bacteria, which store and replicate the human DNA so that it can be prepared in quantities large enough for sequencing. If carefully chosen to minimize overlap, it takes about 20,000 different BAC clones to contain the 3 billion pairs of bases of the human genome. A collection of BAC clones containing the entire human genome is called a "BAC library." In the BAC-based method, each BAC clone is "mapped" to determine where the DNA in BAC clones comes from in the human genome. Using this approach ensures that scientists know both the precise location of the DNA letters that are sequenced from each clone and their spatial relation to sequenced human DNA in other BAC clones. For sequencing, each BAC clone is cut into still smaller fragments that are about 2,000 bases in length. These pieces are called "subclones." A "sequencing reaction" is carried out on these subclones. The products of the sequencing reaction are then loaded into the sequencing machine (sequencer). The sequencer generates about 500 to 800 base pairs of A, T, C and G from each sequencing reaction, so that each base is sequenced about 10 times. A computer then assembles these short sequences into contiguous stretches of sequence representing the human DNA in the BAC clone. Whose DNA was sequenced for the Human Genome Project? This is intentionally not known to protect the volunteers who provided DNA samples for sequencing. The sequence is derived from the DNA of several volunteers. To ensure that the identities of the volunteers cannot be revealed, a careful process was developed to recruit the volunteers and to collect and maintain the blood samples that were the source of the DNA. The volunteers responded to local public advertisements near the laboratories where the DNA "libraries" were prepared. Candidates were recruited from a diverse population. The volunteers provided blood samples after being extensively counseled and then giving their informed consent. About 5 to 10 times as many volunteers donated blood as were eventually used, so that not even the volunteers would know whether their sample was used. All labels were removed before the actual samples were chosen. What does it mean when you say you've completed the Human Genome Project? The main goals of the Human Genome Project were first articulated in 1988 by a special committee of the U.S. National Academy of Sciences, and later adopted through a detailed series of five-year plans jointly written by the National Institutes of Health and the Department of Energy. At this time, the principal goals laid out by the National Academy of Sciences have been achieved, including the essential completion of a high-quality version of the human sequence. Other goals included the creation of physical and genetic maps of the human genome, which were accomplished in the mid- 1990s, as well as the mapping and sequencing of a set of five model organisms, including the mouse. All of these goals have been achieved within the time frame and budget first estimated by the NAS committee. Notably, quite a number of additional goals not considered possible in 1988 have been added along the way and successfully achieved. Examples include advanced drafts of the sequences of the mouse and rat genomes, as well as a catalog of variable bases in the human genome.