* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Genome and Disease
X-inactivation wikipedia , lookup
Gene desert wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Gene expression wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Community fingerprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Molecular evolution wikipedia , lookup
NEW SCIENTIST (London, England) Feb. 17, 2001, pp. 1-4 "Copyright (c) 2001, NEW SCIENTIST. Distributed by New York Times Special Features/Syndication Sales." GENES, THE GENOME AND DISEASE by Kate Bendall We Have Cleared the First Hurdle in the Race to Understand the Human Genome, but There Is Much Hard Work Ahead. The Potential Rewards in Medicine, However, Are Enormous. They Include Treatments Tailor-Made to Your Own Genes and Possible Cures for Diseases Such As Cancer and Alzheimer's What makes us human? On Monday 26 June 2000, scientists announced that they had finally found out. After over a decade of hard slog, they had produced a "working draft" of the sequence of the human genome--the recipe for making a human being. Understanding the genome will revolutionise human biology and medicine, but the road to producing the rough draft of the sequence was far from smooth (see "A Brief History Of The Genome Project"). The term genome is a combination of two words: gene and chromosome, and is defined as all the genetic material, or DNA, in a cell. The main aim of the Human Genome Project, launched in 1990, is to produce a complete sequence of the 3 billion base pairs that make up the human genome, using DNA provided by anonymous donors from a variety of ethnic backgrounds. This is a task of mammoth proportions--if written out, the sequence would fill 200 volumes the size of telephone directories and take nine years to read aloud. Scientists hope to perfect the sequence in 2003--fifty years after James Watson and Francis Crick published their landmark paper describing the doublehelical structure of DNA. Most genes encode sequences of amino acids which make up proteins, and the main interest in sequencing the human genome lies in identifying and characterising all human genes. This task alone will keep biologists busy for decades. Yet, genes make up only about 3 per cent of the human genome (see Figure 1). The remaining 97 per cent of the data produced in the Human Genome Project describes non-coding DNA, which does not encode proteins. Why are scientists prepared to invest so much time and money in sequencing noncoding DNA--which some have dubbed "junk DNA"? The reason is that it isn't simply junk. There are many different types of non-coding DNA, some of which play important roles in the structure, function and evolution of the genome. Introns are segments of non-coding sequence within genes that interrupt the coding sections, or exons, of eukaryotic genes. To make a protein, cellular machinery first produces an exact RNA copy of the DNA gene sequence. Enzymes then remove the introns and stick the exons back together to make messenger RNA (mRNA), in a process called splicing. The mRNA is the molecular blueprint for making the protein. Why does the cell go to all that trouble? Introns have an important role as inert "spacer" elements. DNA breaks can take place within introns without disrupting a gene's coding regions, so allowing exons to change places in the sequence. This exon shuffling can alter the resulting protein's structure much faster than by gradually accumulating mutations in individual bases. Introns help genes evolve rapidly, allowing species to adapt more easily to changing environments. Perhaps more puzzling is the existence of vast amounts of non-coding DNA between genes, much of which is present in blocks of repeated sequences. Some repetitive sequences are important in maintaining chromosome structure, such as the centromeres and telomeres of eukaryotic chromosomes. Centromeres act like handles for cells to haul copied chromosomes into daughter cells during cell division. Telomeres are sequences at the tips of chromosomes that make sure the cell's DNA replication machinery copies the ends of the chromosome properly. Telomeres normally wear down with successive divisions and, if they disappear completely, the rest of the chromosome will erode and the cell will die. An enzyme called telomerase rebuilds telomeres and is normally only found in germ cells, which divide to make sperm or eggs. However, many human cancers also contain telomerase, meaning that the cells can keep dividing indefinitely--as if they had become immortal. One class of repetitive DNA, known as minisatellite DNA, is found mainly in telomeres, and is exploited by scientists in genetic fingerprinting (see Inside Science No. 52). Other non-coding sequences regulate gene expression, in other words whether genes are active. For example, promoters and enhancers are associated with many genes, and determine which cells express genes, and at what level. However, no one understands the purpose of much non-coding DNA. It may simply have accumulated over evolutionary time like old junk in an attic. On the other hand, copying it during cell division uses up valuable energy, so it may well play as yet undiscovered roles. Trying to understand how the human genome defines a human being is like assembling a car engine using only a list of numbered parts, without names or descriptions. The overall process is termed annotation. At a bare minimum, it involves identifying the beginning, end and intron-exon structure of each gene--no mean task among so much "junk" DNA. Full annotation means not merely cataloguing all the genes in the human genome, but understanding what each does. Scientists can try to find out what a gene does by seeing whether its sequence is similar, or homologous, to other human genes whose function is known. It can also be very informative to compare a human DNA sequence with sequences from model organisms such as mice or fruit flies. Researchers use complex computer programs written by bioinformatics specialists to perform these tasks, but the process is complicated and the programs do not work perfectly. Improving these programs and using them for annotation of human sequence data will be a major focus of biological research over the next two decades. With only a working draft rather than a complete sequence available for now, scientists are using a range of methods to estimate how many genes the human genome contains. Estimates are surprisingly variable, ranging from about 35,000 to over 100,000. The question is so hotly debated that scientists have a $1-a-bet sweepstake on the answer. Those betting at the higher end of the scale argue that, generally speaking, more complex organisms have more genes, and a high gene number is the only way to explain human complexity. Scientists placing their bets on a much lower number say that complexity results from how genes are regulated or expressed, not from how many there are. They point out that the fruit fly Drosophila has around 5,000 genes fewer than the supposedly simpler nematode worm, Caenorhabditis elegans. What's more, researchers have used computer analyses to estimate the number of genes on human chromosome 22. Scale this information up for the rest of the genome, and the grand total is close to 35,000. The question remains open until 2003, when the winner of the sweepstake will finally be announced. WHAT DO GENES DO? BELT AND BRACES MECHANISMS Nearly all genes encode proteins, and these proteins interact to produce a living organism in many complex ways. Some genes are essential--the protein product of the insulin receptor gene, for example, plays a critical role in metabolism. Other genes, such as some defining eye or hair colour, are non-essential. However, there are so many genes in the human genome that sometimes protein products from several genes are capable of carrying out the same biological role. These are known as redundant genes, and they often have similar sequences. Redundant genes provide a protective "belt and braces" mechanism against harmful mutations. If a mutation takes out a redundant gene, another gene can compensate. This back-up mechanism means that redundant genes are a powerful evolutionary force, as they are free to mutate and change their functions without damaging the organism. Which genes are essential for life? Is there a set of proteins common to all organisms? Scientists have begun to address these fascinating questions by comparing the gene content of different bacteria. For example, the bacterium Mycoplasma genitalium, with just 480 protein-coding genes, has the smallest number of genes of any known independently replicating cell. A team led by Craig Venter, head of Celera Genomics, has destroyed the function of some of these genes one at a time, and found that between 265 and 350 genes are essential for the bacterium to grow in the laboratory. This discovery opens the door to building an artificial living organism. In theory, scientists could make an artificial bacterial chromosome containing only these essential genes. They could put it inside a bacterium stripped of its natural genome, to see whether those genes are enough to build a living cell. However, this experiment is not only technically difficult to perform, but raises safety and ethical questions. Venter is currently placing the work on hold until he has a response from a group of 20 leading theologians, lawyers and philosophers, who are debating whether the project is morally and ethically acceptable. Each person's genome is 99.8 per cent identical to everyone else's. So a detailed analysis of a single human sequence will have a huge impact on biological and medical research. However, this is only part of the story--every human being is unique. As Gregor Mendel showed using pea plants in the 1860s, genetic variation within a single species can lead to very different characteristics. Different versions of the same gene are called alleles. Alleles are created by various changes in DNA sequence, such as deletions, insertions, rearrangements, or changes in single base pairs. Some researchers are studying changes in single base pairs--single nucleotide polymorphisms or SNPs--as part of the wider Human Genome Project. SNPs occur at about 3 million sites in the genome, and may affect gene function, depending on the exact base change and where it occurs. They are of interest to researchers because they help to make each of us unique. Genetic variation between individuals of a species is essential for evolution: without it, a species would not be able to evolve to adapt to changes in the environment. However, not all genetic variation is beneficial. Researchers have studied SNPs in a gene called LPL, which encodes an enzyme involved in fat metabolism, and found that some of them can indirectly predispose people to heart disease. But some genetic variation disrupts gene function so severely that it causes disease directly. Such changes qualify as harmful mutations. Some genetic diseases, such as cystic fibrosis or muscular dystrophy, are caused by mutations in single genes, and are known as monogenic diseases. Technological leaps over the past two decades have led to the isolation and characterisation of the genes responsible for more than a hundred monogenic disorders. The method used is positional cloning. Researchers study affected families and work out how the disease is inherited through the generations. They take DNA samples from family members, and sequence sections of DNA scattered throughout the genome, known as molecular markers. By seeing which marker sequences are found more often in family members affected by the disease, they can work out on which chromosome the gene responsible for the disease lies, and its rough position on that chromosome, a method called genetic mapping. In the past, researchers would then have had to laboriously sequence their way along the chromosome, starting from known sequences located near the area of interest, until they isolated the correct gene, a process known as physical mapping. Now, however, the Human Genome Project is revolutionising the technique, as researchers can simply search a computer database of sequence in the relevant region of the chromosome. REVOLUTIONARY METHODS--TYPES OF GENETIC DISEASE Although researchers have collared the culprit genes for many monogenic disorders, the development of effective treatments is lagging disappointingly far behind. Gene therapy, which involves administering functional copies of a defective gene as a drug (see Inside Science No. 66), looks promising, but has been held back by technical difficulties. While monogenic diseases are devastating to individual families, they are rare in populations as a whole. Diseases such as diabetes, hypertension, asthma, common cancers, and the major psychiatric disorders are a far greater threat to public health. Since these diseases are inherited in much more complex ways, identifying their genetic causes is proving an even greater challenge to researchers. Mendel discovered his laws of inheritance by studying horticultural varieties of the garden pea. These varieties differed from each other by mutations in single alleles. The alleles each produced very obvious phenotypes, such as tall or short plants, making it easier for Mendel to interpret his results. He would not have been able to draw his conclusions by working on the genetically diverse weeds in his garden, and people are more like weeds than peas in this regard. The overwhelming majority of human characteristics that are at least partly defined by genes (height, metabolic activity and intelligence, for example), as well as many common diseases, don't follow Mendel's simple laws of inheritance. Polygenic diseases are thought to result from the combined effects of several genes. Environmental factors also affect the likelihood of disease developing, and for this reason the diseases are also called complex traits. Unlike laboratory mice, breeding experiments with humans are out of the question, and our environment cannot be closely controlled. This makes it tricky to separate the contributions of environment and genetics to these diseases. Progress in identifying the genes that contribute to some of these conditions has thus been slower than expected. Researchers use similar positional cloning methods to those used to study monogenic disorders, but identifying relevant genes is much harder for several reasons. First, they do not know whether the genes they are searching for are inherited in a dominant or recessive fashion. Second, many genes make a contribution to the disease, making it much harder to be sure whether a given gene is involved or not. Third, the genes that contribute to the disease are often different in different families. These constraints mean that researchers have to study very large numbers of individuals to track down disease genes, and carry out complicated statistical analyses to work out the likelihood that a given gene is contributing to the development of disease. Despite their complexity, genetic contributions to some polygenic diseases have been identified. In some cases, inheritance of a particular allele has a major effect on the likelihood of developing a disease. However, the presence of other genes can influence how this disease develops. For example, while the likelihood of late-onset Alzheimer's disease is very strongly affected by which allele of the apolipoprotein E gene a person carries, researchers are now gathering evidence that other genes also play a role. But they have yet to identify them. So far, developing effective therapies for these conditions has proved even more difficult than identifying their genetic causes. Although it may become possible to predict how likely a person is to develop a particular disease, there may be no effective treatment available. Sadly, mastectomy is currently the most effective preventative treatment for a young woman with a genetic predisposition to breast cancer. The hope is that more effective drug therapy can be developed for many complex diseases. Although researchers have made a great deal of progress in identifying disease genes, so far they have mainly focused their efforts on identifying individual genes, and inferring how genes interact with each other. But living creatures aren't run by key genes acting in isolation, rather by a complex network of interactions among vast numbers of genes. Two exciting new developments are now enabling scientists to get a wealth of clues to this complicated story. The new techniques, microarray technology and proteomics, provide "snapshots" of all the genes expressed in a cell or tissue under different environmental conditions. They involve analysing the expression of thousands of messenger RNA molecules and proteins, respectively. Microarray technology, like many fundamental techniques in molecular biology, is based on the principle of hybridisation. If single-stranded DNA or RNA molecules are heated together at about 65 degrees Centigrade, any strands that contain complementary sequences stick to each other (hybridise). So if researchers label known DNA or RNA molecules with fluorescent or radioactive compounds, they can then use these molecules to pick out strands with similar sequences in DNA or RNA samples under investigation. Hybridisation-based techniques can be used to find interesting DNA sequences, work out how much of a sequence of interest is present, and even identify genes that are actively expressed in a sample. ERA OF GENETIC HOPES--CUSTOMISED DRUG TREATMENT Traditional hybridisation methods work well, but as only one gene is analysed in each experiment, they are labour-intensive. In contrast, up to a million genes or gene fragments can be analysed simultaneously in a single microarray experiment. This is possible as hybridisation works well on a microscopic scale. Microarrays are just microscope slides on which tiny DNA droplets containing strands of known sequence have been deposited by a robot. The DNA droplets on the slide are hybridised with messenger RNA molecules extracted from cells, so that all the genes expressed in those cells can be identified in one experiment. Microarrays are mostly used for this kind of transcript profiling--comparing, gene by gene, differences in levels of expression between two tissues. They can provide clues to the function of a gene. For example, a gene expressed only in the pancreas is unlikely to be directly involved in the pathology of Alzheimer's disease. Comparing expression in different states of the same tissue can also give clues. By carrying out transcript profiles of the skeletal muscles of young and old mice, scientists have discovered more than 100 genes implicated in ageing. Other studies have identified genes that are more active in breast tumours than in normal tissue. Microarrays should also speed up the development of effective drugs, making it easier to choose suitable genes to target. Drugs that single out genes expressed only in a few tissues may have fewer side effects than those that target genes expressed throughout the body. Studying changes in gene expression when a new drug is administered can help to explain how that drug works, and in clinical trials it can provide an early indication of whether the drug is effective, or has toxic side effects. Increasingly, drug treatments will be customised to particular patients, and microarrays can help here too. Sometimes a disease can produce very similar symptoms in different patients, although the underlying genetic cause may be completely different. For example, two forms of leukaemia, known as AML and ALL, produce very similar symptoms, but the effective treatments for each are quite different. Using a kind of microarray called gene chips, scientists have found 50 genes that are expressed differently in the two forms. As a result, new cases of leukaemia can now be diagnosed accurately and treated appropriately. Imagine that your doctor could customise your prescription to match your genetic make-up. This is the aim of scientists who have set out to catalogue which SNPs determine how different people respond to drugs. Once they have been identified, microarrays could be used to screen patients' genomes for particular SNPs. For example, people who carry certain variants of the enzyme cytochrome P450 do not convert the painkiller codeine into morphine, and so do not benefit from the drug. Improved understanding of how drugs work should reduce the numbers of patients needed for clinical trials, and so decrease the cost of developing new drugs. While microarray experiments have wide-ranging applications, they only investigate messenger RNA. This is only an intermediate step in gene expression: the end-products of most genes are proteins. The proteome is the set of all proteins expressed by a genome, and differs fundamentally from the genome. The genome is nearly static, whereas the proteome changes in response to internal and external influences. For example, cells containing identical genomes differentiate during development into different tissue types because they express characteristic proteomes. What's more, several factors, such as differences in how exons are spliced together, mean that many different proteins can be produced from a single gene. So there are about ten times as many proteins as there are genes. So genomic information alone can't give us the full picture of what goes on inside cells. For example, despite having identified the entire 480 protein-coding genes of Mycoplasma genitalium, we still do not understand the function of the proteins and the relationships between them. Proteomics--the large-scale investigation of which proteins are expressed in a cell--has recently become a much more influential field of study, for two reasons. First, the technique of biological mass spectroscopy has been improved so much that it has become a powerful method for identifying proteins. Secondly, knowledge of the entire human coding sequence means that scientists can characterise just a few amino acids from a protein, and then search databases to identify that protein. Between them, proteomics and microarray experiments will produce a wealth of useful data. Molecular biology has already revolutionised biology, but new technologies are again changing our view of how life works. Powerful techniques such as microarrays and proteomics will give us the muscle to unpick the mind-boggling complexity of how genes and proteins give rise to a living creature. Genomic data from a wide range of organisms will benefit many branches of medical and biological research, holding out the prospect of new, more finely honed treatments for disease. The journey that started with the sequencing of the human genome has only just begun. *** A BRIEF HISTORY OF THE HUMAN GENOME PROJECT The Human Genome Project was launched in October 1990 with the publication of a research plan to sequence the human genome. The plan was jointly produced by two American agencies, the Department of Energy and the National Institutes of Health. The project's main aims are to sequence the entire human genome, and to identify the genes encoded. It also aims to develop techniques to analyse the data, and to address ethical, legal and social issues that may arise. Broader sequencing projects are also under way, including sequencing the entire genomes of several other organisms (more than 30 have been completed to date), and a project which aims to catalogue the variation in human DNA sequences from people around the world. However, in May 1998, entrepreneurial scientist Craig Venter launched a company (later named Celera Genomics) to sequence the human genome in just three years--four years ahead of the predicted completion of the publicly funded project. Venter uses a method called shotgun sequencing. This involves smashing up the entire genome into little pieces, sequencing the pieces, and then reassembling them in the right order using computer programs that look for overlapping sequences between adjacent fragments. This is faster than the approach used by the public consortium, which sequences adjacent segments of chromosomes. Venter also announced that the company would not adhere to the public consortium's policy of allowing researchers immediate and unrestricted access to the sequence data. Fearing that Venter would patent and sell the human genome sequence, the public consortium redoubled its efforts and announced a projected completion date of spring 2001. In fact, Francis Collins (leader of the public project) and Venter made a joint announcement at a ceremony in the White House, Washington DC, on 26 June 2000 that they had each completed a working draft of the human genome. The focus of the project will now shift to identifying genes and working out what they do. ***