Download Genome and Disease

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Gene desert wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Community fingerprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
NEW SCIENTIST
(London, England)
Feb. 17, 2001, pp. 1-4
"Copyright (c) 2001, NEW SCIENTIST. Distributed by New York Times Special
Features/Syndication Sales."
GENES, THE GENOME AND DISEASE
by Kate Bendall
We Have Cleared the First Hurdle in the Race to Understand the Human Genome, but
There Is Much Hard Work Ahead. The Potential Rewards in Medicine, However, Are
Enormous. They Include Treatments Tailor-Made to Your Own Genes and Possible
Cures for Diseases Such As Cancer and Alzheimer's
What makes us human? On Monday 26 June 2000, scientists announced that they had
finally found out. After over a decade of hard slog, they had produced a "working draft"
of the sequence of the human genome--the recipe for making a human being.
Understanding the genome will revolutionise human biology and medicine, but the road
to producing the rough draft of the sequence was far from smooth (see "A Brief History
Of The Genome Project").
The term genome is a combination of two words: gene and chromosome, and is
defined as all the genetic material, or DNA, in a cell. The main aim of the Human
Genome Project, launched in 1990, is to produce a complete sequence of the 3 billion
base pairs that make up the human genome, using DNA provided by anonymous donors
from a variety of ethnic backgrounds. This is a task of mammoth proportions--if written
out, the sequence would fill 200 volumes the size of telephone directories and take nine
years to read aloud. Scientists hope to perfect the sequence in 2003--fifty years after
James Watson and Francis Crick published their landmark paper describing the doublehelical structure of DNA.
Most genes encode sequences of amino acids which make up proteins, and the main
interest in sequencing the human genome lies in identifying and characterising all human
genes. This task alone will keep biologists busy for decades. Yet, genes make up only
about 3 per cent of the human genome (see Figure 1). The remaining 97 per cent of the
data produced in the Human Genome Project describes non-coding DNA, which does not
encode proteins.
Why are scientists prepared to invest so much time and money in sequencing noncoding DNA--which some have dubbed "junk DNA"? The reason is that it isn't simply
junk. There are many different types of non-coding DNA, some of which play important
roles in the structure, function and evolution of the genome.
Introns are segments of non-coding sequence within genes that interrupt the coding
sections, or exons, of eukaryotic genes. To make a protein, cellular machinery first
produces an exact RNA copy of the DNA gene sequence. Enzymes then remove the
introns and stick the exons back together to make messenger RNA (mRNA), in a process
called splicing. The mRNA is the molecular blueprint for making the protein. Why does
the cell go to all that trouble? Introns have an important role as inert "spacer" elements.
DNA breaks can take place within introns without disrupting a gene's coding regions, so
allowing exons to change places in the sequence. This exon shuffling can alter the
resulting protein's structure much faster than by gradually accumulating mutations in
individual bases. Introns help genes evolve rapidly, allowing species to adapt more easily
to changing environments.
Perhaps more puzzling is the existence of vast amounts of non-coding DNA between
genes, much of which is present in blocks of repeated sequences. Some repetitive
sequences are important in maintaining chromosome structure, such as the centromeres
and telomeres of eukaryotic chromosomes. Centromeres act like handles for cells to haul
copied chromosomes into daughter cells during cell division. Telomeres are sequences at
the tips of chromosomes that make sure the cell's DNA replication machinery copies the
ends of the chromosome properly. Telomeres normally wear down with successive
divisions and, if they disappear completely, the rest of the chromosome will erode and the
cell will die. An enzyme called telomerase rebuilds telomeres and is normally only found
in germ cells, which divide to make sperm or eggs. However, many human cancers also
contain telomerase, meaning that the cells can keep dividing indefinitely--as if they had
become immortal. One class of repetitive DNA, known as minisatellite DNA, is found
mainly in telomeres, and is exploited by scientists in genetic fingerprinting (see Inside
Science No. 52).
Other non-coding sequences regulate gene expression, in other words whether genes
are active. For example, promoters and enhancers are associated with many genes, and
determine which cells express genes, and at what level. However, no one understands the
purpose of much non-coding DNA. It may simply have accumulated over evolutionary
time like old junk in an attic. On the other hand, copying it during cell division uses up
valuable energy, so it may well play as yet undiscovered roles.
Trying to understand how the human genome defines a human being is like
assembling a car engine using only a list of numbered parts, without names or
descriptions. The overall process is termed annotation. At a bare minimum, it involves
identifying the beginning, end and intron-exon structure of each gene--no mean task
among so much "junk" DNA. Full annotation means not merely cataloguing all the genes
in the human genome, but understanding what each does. Scientists can try to find out
what a gene does by seeing whether its sequence is similar, or homologous, to other
human genes whose function is known. It can also be very informative to compare a
human DNA sequence with sequences from model organisms such as mice or fruit flies.
Researchers use complex computer programs written by bioinformatics specialists to
perform these tasks, but the process is complicated and the programs do not work
perfectly. Improving these programs and using them for annotation of human sequence
data will be a major focus of biological research over the next two decades.
With only a working draft rather than a complete sequence available for now,
scientists are using a range of methods to estimate how many genes the human genome
contains. Estimates are surprisingly variable, ranging from about 35,000 to over 100,000.
The question is so hotly debated that scientists have a $1-a-bet sweepstake on the answer.
Those betting at the higher end of the scale argue that, generally speaking, more
complex organisms have more genes, and a high gene number is the only way to explain
human complexity. Scientists placing their bets on a much lower number say that
complexity results from how genes are regulated or expressed, not from how many there
are. They point out that the fruit fly Drosophila has around 5,000 genes fewer than the
supposedly simpler nematode worm, Caenorhabditis elegans. What's more, researchers
have used computer analyses to estimate the number of genes on human chromosome 22.
Scale this information up for the rest of the genome, and the grand total is close to
35,000. The question remains open until 2003, when the winner of the sweepstake will
finally be announced.
WHAT DO GENES DO? BELT AND BRACES MECHANISMS
Nearly all genes encode proteins, and these proteins interact to produce a living
organism in many complex ways. Some genes are essential--the protein product of the
insulin receptor gene, for example, plays a critical role in metabolism. Other genes, such
as some defining eye or hair colour, are non-essential. However, there are so many genes
in the human genome that sometimes protein products from several genes are capable of
carrying out the same biological role. These are known as redundant genes, and they
often have similar sequences. Redundant genes provide a protective "belt and braces"
mechanism against harmful mutations. If a mutation takes out a redundant gene, another
gene can compensate. This back-up mechanism means that redundant genes are a
powerful evolutionary force, as they are free to mutate and change their functions without
damaging the organism.
Which genes are essential for life? Is there a set of proteins common to all organisms?
Scientists have begun to address these fascinating questions by comparing the gene
content of different bacteria. For example, the bacterium Mycoplasma genitalium, with
just 480 protein-coding genes, has the smallest number of genes of any known
independently replicating cell. A team led by Craig Venter, head of Celera Genomics, has
destroyed the function of some of these genes one at a time, and found that between 265
and 350 genes are essential for the bacterium to grow in the laboratory.
This discovery opens the door to building an artificial living organism. In theory,
scientists could make an artificial bacterial chromosome containing only these essential
genes. They could put it inside a bacterium stripped of its natural genome, to see whether
those genes are enough to build a living cell. However, this experiment is not only
technically difficult to perform, but raises safety and ethical questions. Venter is currently
placing the work on hold until he has a response from a group of 20 leading theologians,
lawyers and philosophers, who are debating whether the project is morally and ethically
acceptable.
Each person's genome is 99.8 per cent identical to everyone else's. So a detailed
analysis of a single human sequence will have a huge impact on biological and medical
research. However, this is only part of the story--every human being is unique. As Gregor
Mendel showed using pea plants in the 1860s, genetic variation within a single species
can lead to very different characteristics.
Different versions of the same gene are called alleles. Alleles are created by various
changes in DNA sequence, such as deletions, insertions, rearrangements, or changes in
single base pairs. Some researchers are studying changes in single base pairs--single
nucleotide polymorphisms or SNPs--as part of the wider Human Genome Project. SNPs
occur at about 3 million sites in the genome, and may affect gene function, depending on
the exact base change and where it occurs. They are of interest to researchers because
they help to make each of us unique.
Genetic variation between individuals of a species is essential for evolution: without
it, a species would not be able to evolve to adapt to changes in the environment.
However, not all genetic variation is beneficial. Researchers have studied SNPs in a gene
called LPL, which encodes an enzyme involved in fat metabolism, and found that some
of them can indirectly predispose people to heart disease. But some genetic variation
disrupts gene function so severely that it causes disease directly. Such changes qualify as
harmful mutations.
Some genetic diseases, such as cystic fibrosis or muscular dystrophy, are caused by
mutations in single genes, and are known as monogenic diseases. Technological leaps
over the past two decades have led to the isolation and characterisation of the genes
responsible for more than a hundred monogenic disorders. The method used is positional
cloning. Researchers study affected families and work out how the disease is inherited
through the generations. They take DNA samples from family members, and sequence
sections of DNA scattered throughout the genome, known as molecular markers. By
seeing which marker sequences are found more often in family members affected by the
disease, they can work out on which chromosome the gene responsible for the disease
lies, and its rough position on that chromosome, a method called genetic mapping.
In the past, researchers would then have had to laboriously sequence their way along
the chromosome, starting from known sequences located near the area of interest, until
they isolated the correct gene, a process known as physical mapping. Now, however, the
Human Genome Project is revolutionising the technique, as researchers can simply search
a computer database of sequence in the relevant region of the chromosome.
REVOLUTIONARY METHODS--TYPES OF GENETIC DISEASE
Although researchers have collared the culprit genes for many monogenic disorders,
the development of effective treatments is lagging disappointingly far behind. Gene
therapy, which involves administering functional copies of a defective gene as a drug (see
Inside Science No. 66), looks promising, but has been held back by technical difficulties.
While monogenic diseases are devastating to individual families, they are rare in
populations as a whole. Diseases such as diabetes, hypertension, asthma, common
cancers, and the major psychiatric disorders are a far greater threat to public health. Since
these diseases are inherited in much more complex ways, identifying their genetic causes
is proving an even greater challenge to researchers.
Mendel discovered his laws of inheritance by studying horticultural varieties of the
garden pea. These varieties differed from each other by mutations in single alleles. The
alleles each produced very obvious phenotypes, such as tall or short plants, making it
easier for Mendel to interpret his results. He would not have been able to draw his
conclusions by working on the genetically diverse weeds in his garden, and people are
more like weeds than peas in this regard. The overwhelming majority of human
characteristics that are at least partly defined by genes (height, metabolic activity and
intelligence, for example), as well as many common diseases, don't follow Mendel's
simple laws of inheritance.
Polygenic diseases are thought to result from the combined effects of several genes.
Environmental factors also affect the likelihood of disease developing, and for this reason
the diseases are also called complex traits. Unlike laboratory mice, breeding experiments
with humans are out of the question, and our environment cannot be closely controlled.
This makes it tricky to separate the contributions of environment and genetics to these
diseases. Progress in identifying the genes that contribute to some of these conditions has
thus been slower than expected.
Researchers use similar positional cloning methods to those used to study monogenic
disorders, but identifying relevant genes is much harder for several reasons. First, they do
not know whether the genes they are searching for are inherited in a dominant or
recessive fashion. Second, many genes make a contribution to the disease, making it
much harder to be sure whether a given gene is involved or not. Third, the genes that
contribute to the disease are often different in different families. These constraints mean
that researchers have to study very large numbers of individuals to track down disease
genes, and carry out complicated statistical analyses to work out the likelihood that a
given gene is contributing to the development of disease.
Despite their complexity, genetic contributions to some polygenic diseases have been
identified. In some cases, inheritance of a particular allele has a major effect on the
likelihood of developing a disease. However, the presence of other genes can influence
how this disease develops. For example, while the likelihood of late-onset Alzheimer's
disease is very strongly affected by which allele of the apolipoprotein E gene a person
carries, researchers are now gathering evidence that other genes also play a role. But they
have yet to identify them.
So far, developing effective therapies for these conditions has proved even more
difficult than identifying their genetic causes. Although it may become possible to predict
how likely a person is to develop a particular disease, there may be no effective treatment
available. Sadly, mastectomy is currently the most effective preventative treatment for a
young woman with a genetic predisposition to breast cancer. The hope is that more
effective drug therapy can be developed for many complex diseases.
Although researchers have made a great deal of progress in identifying disease genes,
so far they have mainly focused their efforts on identifying individual genes, and
inferring how genes interact with each other. But living creatures aren't run by key genes
acting in isolation, rather by a complex network of interactions among vast numbers of
genes.
Two exciting new developments are now enabling scientists to get a wealth of clues to
this complicated story. The new techniques, microarray technology and proteomics,
provide "snapshots" of all the genes expressed in a cell or tissue under different
environmental conditions. They involve analysing the expression of thousands of
messenger RNA molecules and proteins, respectively.
Microarray technology, like many fundamental techniques in molecular biology, is
based on the principle of hybridisation. If single-stranded DNA or RNA molecules are
heated together at about 65 degrees Centigrade, any strands that contain complementary
sequences stick to each other (hybridise). So if researchers label known DNA or RNA
molecules with fluorescent or radioactive compounds, they can then use these molecules
to pick out strands with similar sequences in DNA or RNA samples under investigation.
Hybridisation-based techniques can be used to find interesting DNA sequences, work out
how much of a sequence of interest is present, and even identify genes that are actively
expressed in a sample.
ERA OF GENETIC HOPES--CUSTOMISED DRUG TREATMENT
Traditional hybridisation methods work well, but as only one gene is analysed in each
experiment, they are labour-intensive. In contrast, up to a million genes or gene
fragments can be analysed simultaneously in a single microarray experiment. This is
possible as hybridisation works well on a microscopic scale. Microarrays are just
microscope slides on which tiny DNA droplets containing strands of known sequence
have been deposited by a robot. The DNA droplets on the slide are hybridised with
messenger RNA molecules extracted from cells, so that all the genes expressed in those
cells can be identified in one experiment.
Microarrays are mostly used for this kind of transcript profiling--comparing, gene by
gene, differences in levels of expression between two tissues. They can provide clues to
the function of a gene. For example, a gene expressed only in the pancreas is unlikely to
be directly involved in the pathology of Alzheimer's disease. Comparing expression in
different states of the same tissue can also give clues. By carrying out transcript profiles
of the skeletal muscles of young and old mice, scientists have discovered more than 100
genes implicated in ageing. Other studies have identified genes that are more active in
breast tumours than in normal tissue.
Microarrays should also speed up the development of effective drugs, making it easier
to choose suitable genes to target. Drugs that single out genes expressed only in a few
tissues may have fewer side effects than those that target genes expressed throughout the
body. Studying changes in gene expression when a new drug is administered can help to
explain how that drug works, and in clinical trials it can provide an early indication of
whether the drug is effective, or has toxic side effects.
Increasingly, drug treatments will be customised to particular patients, and
microarrays can help here too. Sometimes a disease can produce very similar symptoms
in different patients, although the underlying genetic cause may be completely different.
For example, two forms of leukaemia, known as AML and ALL, produce very similar
symptoms, but the effective treatments for each are quite different. Using a kind of
microarray called gene chips, scientists have found 50 genes that are expressed
differently in the two forms. As a result, new cases of leukaemia can now be diagnosed
accurately and treated appropriately.
Imagine that your doctor could customise your prescription to match your genetic
make-up. This is the aim of scientists who have set out to catalogue which SNPs
determine how different people respond to drugs. Once they have been identified,
microarrays could be used to screen patients' genomes for particular SNPs. For example,
people who carry certain variants of the enzyme cytochrome P450 do not convert the
painkiller codeine into morphine, and so do not benefit from the drug. Improved
understanding of how drugs work should reduce the numbers of patients needed for
clinical trials, and so decrease the cost of developing new drugs.
While microarray experiments have wide-ranging applications, they only investigate
messenger RNA. This is only an intermediate step in gene expression: the end-products
of most genes are proteins.
The proteome is the set of all proteins expressed by a genome, and differs
fundamentally from the genome. The genome is nearly static, whereas the proteome
changes in response to internal and external influences. For example, cells containing
identical genomes differentiate during development into different tissue types because
they express characteristic proteomes. What's more, several factors, such as differences in
how exons are spliced together, mean that many different proteins can be produced from
a single gene. So there are about ten times as many proteins as there are genes.
So genomic information alone can't give us the full picture of what goes on inside
cells. For example, despite having identified the entire 480 protein-coding genes of
Mycoplasma genitalium, we still do not understand the function of the proteins and the
relationships between them.
Proteomics--the large-scale investigation of which proteins are expressed in a cell--has
recently become a much more influential field of study, for two reasons. First, the
technique of biological mass spectroscopy has been improved so much that it has become
a powerful method for identifying proteins. Secondly, knowledge of the entire human
coding sequence means that scientists can characterise just a few amino acids from a
protein, and then search databases to identify that protein. Between them, proteomics and
microarray experiments will produce a wealth of useful data.
Molecular biology has already revolutionised biology, but new technologies are again
changing our view of how life works. Powerful techniques such as microarrays and
proteomics will give us the muscle to unpick the mind-boggling complexity of how genes
and proteins give rise to a living creature. Genomic data from a wide range of organisms
will benefit many branches of medical and biological research, holding out the prospect
of new, more finely honed treatments for disease. The journey that started with the
sequencing of the human genome has only just begun.
***
A BRIEF HISTORY OF THE HUMAN GENOME PROJECT
The Human Genome Project was launched in October 1990 with the publication of a
research plan to sequence the human genome. The plan was jointly produced by two
American agencies, the Department of Energy and the National Institutes of Health.
The project's main aims are to sequence the entire human genome, and to identify the
genes encoded. It also aims to develop techniques to analyse the data, and to address
ethical, legal and social issues that may arise. Broader sequencing projects are also under
way, including sequencing the entire genomes of several other organisms (more than 30
have been completed to date), and a project which aims to catalogue the variation in
human DNA sequences from people around the world.
However, in May 1998, entrepreneurial scientist Craig Venter launched a company
(later named Celera Genomics) to sequence the human genome in just three years--four
years ahead of the predicted completion of the publicly funded project. Venter uses a
method called shotgun sequencing. This involves smashing up the entire genome into
little pieces, sequencing the pieces, and then reassembling them in the right order using
computer programs that look for overlapping sequences between adjacent fragments.
This is faster than the approach used by the public consortium, which sequences adjacent
segments of chromosomes. Venter also announced that the company would not adhere to
the public consortium's policy of allowing researchers immediate and unrestricted access
to the sequence data.
Fearing that Venter would patent and sell the human genome sequence, the public
consortium redoubled its efforts and announced a projected completion date of spring
2001. In fact, Francis Collins (leader of the public project) and Venter made a joint
announcement at a ceremony in the White House, Washington DC, on 26 June 2000 that
they had each completed a working draft of the human genome.
The focus of the project will now shift to identifying genes and working out what they
do.
***