Download A journey into the genome: what`s there

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transcriptional regulation wikipedia , lookup

Community fingerprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Ridge (biology) wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
A journey into the genome: what's there
Henry Gee
The human genome is 95% junk. Only about 5% of it consists of genes - the instructions to
make proteins. Despite the complexity of human structure and behaviour, the number of
genes in the human genome is comparable to that in much smaller genomes.
“The human genome is 95% junk”
The Human Genome Sequencing Consortium estimates that our genomes contain 31,780
protein-coding genes. So far it has spotted 22,000. This is fewer than the 25,498 genes in
the genome of the tiny plant thale cress (Arabidopsis thaliana) and not much more than the
fruitfly's 13,601 or the roundworm's 19,099 genes.
Arabidopsis thaliana.
(C) SPL
Clearly, there is little correlation between the complexity of an organism and the amount of
DNA it has. The human genome contains at least 200 times more DNA than the yeast
genome's 12 million bases (the letters of the genetic code), but the genome of Amoeba
dubia, a unicellular creature as simple as yeast, dwarfs the human genome by 200-fold.
How many is not many?
Even with a sequence touted as almost complete, the consortium can still only roughly
estimate the number of genes contained in the human genome. There are several reasons
for this. One is that human genes are so few and far between. There are, on average, around
12 genes per million bases of human DNA, compared with 117 in fruit flies, 197 in
roundworms and 221 in Arabidopsis. Spotting genuine genes amid the morass of
meaningless DNA has proven a sore trial to current computer software.
“Human genes are so few and far between”
Another reason human genes are hard to detect is that, compared with other creatures'
genes, they are highly fragmented. In organisms more complicated than bacteria, genes
tend to be divided into sections of coding sequence, 'exons', interrupted by non-coding
spacers called 'introns' -- just as TV programmes are interrupted by commercial breaks.
Generally, human genes have many small exons and longer-than-average introns -- some
are more than 10,000 bases long.
The largest human gene is 2.4 million bases long. It encodes the muscle protein 'dystrophin'
(and malfunctions in muscular dystrophy). But most of it is non-coding DNA. The recordholder for coding sequence is the gene for 'titin', another muscle protein. The gene is 80,780
bases long, divided into 178 exons, the largest of which contains 17,106 bases.
“Apparently, it is not how many genes you have, but how you use them”
Introns in the fruitfly and roundworm have a 'preferred' length, tens or at most hundreds of
bases long. Human introns are much more variable. Most are around 87 bases long, but a
substantial population are very long, dragging the average length up to more than 3,300
bases. Human exons, in contrast, can be very small indeed, and therefore easy to miss -more than 40 are known in the human genome that are each just 19 bases long.
Arabidopsis thaliana.
(C) SPL
More than 91% of the draft sequence reported by the consortium is 99.99% accurate -- that
is, accurate to one base in 10,000. There are still many gaps, but not so many as to cause
major confusion in ordering the bases. The gaps could hide missing genes, but if so, "they
are running out of places to hide", Peer Bork and Richard Copley of the European Molecular
Biology Laboratory in Heidelberg comment in the same issue of Nature.
Apparently, it is not how many genes you have, but how you use them. The fragmentation of
human genes allows many different proteins to be built from the same genes, by combining
the instructions in different exons in different ways. At least 35% of all human genes, it
appears, may be read in several ways. In this way the human genome could encode five
times as many proteins as the less flexible genomes of the fruitfly or roundworm.
So much for the genes - what's all the other stuff?
More than half of the human genome -- including 47 known genes -- consists of
'transposable elements'. These parasitic stretches of DNA copy themselves and spread
throughout the genome, determining much of its architecture. Almost all of these rogue
elements have been inactive for millions of years.
The human genome is richer in transposable elements and other repetitive DNA sequences
than any other genome known, although the density of repeats varies widely. A 525,000base region of the X chromosome consisting of 89% of repeated sequences is the most
cluttered. At the other extreme are the 'HOX clusters' which regulate development. These
contain less than 2% of repeated elements.
“The genome is a museum of the viral infections suffered by humanity and its ancestors”
Many transposable and repeated sequences started life as the genomes of independent
entities that became integrated into the genome. Many viruses, including that of the human
immunodeficiency virus HIV-1, have genomes made of RNA, a close chemical relative of
DNA. These genomes encode an enzyme, reverse transcriptase, that makes DNA copies of
the RNA genome and integrates them into the genome of a host.
Large stretches of the human genome show signs of having once, perhaps millions of years
ago, been viruses. David Baltimore of the California Institute of Technology in Pasadena, one
of the discoverers of reverse transcriptase, says that "in places, the genome looks like a sea
of reverse-transcribed DNA with a small admixture of genes". The genome is a museum of
the viral infections suffered by humanity and its ancestors. Viruses made us what we are.
Hundreds of other genes -- encoding at least 223 proteins -- seem to have come from
bacteria. Around 40 bacterial genomes are now completely sequenced, from which it is
evident that these organisms exchange genes with bohemian abandon. But it is surprising to
find evidence for the direct transfer of bacterial genes into humans.
“The genome has come to do much more than it could possibly have been designed to do”
Some proteins of bacterial origin seem to be involved in the metabolism of antibiotics and
neurologically active agents. One such protein is the enzyme monoamine oxidase, important
in the metabolism of neuroactive substances (such as alcohol) and a target of important
psychiatric drugs.
The capacity for bacteria and viruses to exchange genes is the basis for the genetic
modification of organisms. It is perhaps ironic that all humans, including those in the anti-GM
lobby, are GM organisms.
So we know that the human genome is a large and disordered jumble of ancient viruses
punctuated by a modest collection of genes, some from bacteria, and that it has come to do
much more than it could possibly have been designed to do. But it is too much to expect that
the study of the human genome should further our understanding of what it is that gives
humans their complexity of structure, behaviour, conscious action, learning, memory -humanity.
Nonetheless, as Baltimore notes, the questions that the draft genome now open to
investigation include some of the simplest and deepest, such as: "Daddy, where did I come
from?"