Download Genomes 3/e

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

List of types of proteins wikipedia , lookup

RNA silencing wikipedia , lookup

Mutation wikipedia , lookup

Non-coding RNA wikipedia , lookup

Gene desert wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genomic imprinting wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic library wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Non-coding DNA wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Terry Brown
Genomes
Third Edition
Chapter 18:
How Genomes Evolve
Copyright © Garland Science 2007
 Mutations and recombination provide the genome with the means to
evolve, but we can get very little about the evolutionary histories of
genomes simply by studying these events in living cells.
 Rather we need to combine our understanding of mutations and
recombination with comparison between the genomes of different
organisms in order to find patterns of genome evolution that had
occurred.
 These studies will provide revealing insights about the way genomes
get evolve in this biosphere
 Cosmologists believe that the universe began some 14
Billions years ago with the gigantic “primordial fireball”
called the Big Bang.
 After going through different stages our solar system was
developed some 4.6 B years
 The earth was covered with water, and it was this huge
planetary ocean where first biological system appeared
 Cellular life seen when land masses become evident around
3.5 B years ago.
 First ocean are thought to have had a similar salt composition to those of today but the
Earth’s atmosphere, and hence the dissolved gases in the ocean was very different.
 Oxygen was very low but ammonia and methane was abundant.
 This experimental mimicry to these compositions resulted in the formation of:
 a range of amino acids (alanine, glycine, valine etc),
 hydrogen cyanide and formaldehyde which can react with amino acids to form
purines and pyrimidiens
 and sugars in very less amounts.
 It means that some of the building blocks of biomolecules could have formed in the
ancient chemosphere.
 The ocean soup in ancient chemosphere provided building block which may have polymerize




whether in ocean,
Or by repeated condensation and drying of water droplets in clouds
or at some muddy place on clay particles
Or may be at some vent was the location where these biomolecules may have polymerized.
 The precise mechanism need not concern us, what important is the condition at that time was
suitable enough that synthesis of polymeric biomolecules could be synthesized.
 The next step was the ordered assembly of these random collection of biomolecules in a form
which at least show some of the attributes associated with the life.
 The steps of this rare possibility were never reproduced experimentally and based on
speculation and computer based simulations
 But keeping it in mind that global ocean water could have 106 biomolecules/liter that may be
associated with very different geological regions remained for billions of years give good
chances to think or imagine any possible scenarios that could lead to the ordered assembly of
these biomolecules.
 Progress in understanding the origins of life was initially stalled by the apparent requirement
that polynucleotides and polypeptides must work in harness in order to produce a selfreproducing biochemical system.
 Proteins can not replicate
 Polynucleotides can not do self-replication
 This is called as polynucleotide-polypeptide dilemma
 The major breakthrough came in the mid-1980s when it was discovered that RNA can have
catalytic activity.
 The today’s ribozymes found naturally perform three important biochemcial reactions:
 Self-cleavage (self-splicing groups I, II and III introns)
 Cleavage of other RNAs (as carried out by RNaseP).
 Synthesis of peptide bonds (rRNA in ribosome).
 The in-vitro experiments with RNA showed some very
important biological reactions which can be performed by
RNA molecule
 Synthesis of ribonucletodes
 Synthesis and copying of RNA molecules
 Transfer of RNA-bound amino acid to a second amino acid
forming a dipeptide
 These activites enable RNA to perform all those functions
which are needed by a a pre-cellular system or early
biochemcal systems capable to re-produce
 The evolution of RNA early world was started at very slow
pace where RNA molecules initially replicated in a slow and
haphazard fashion, simply by acting as template for
binding of complementary nucleotides which polymerized
spontaneously.
 This replication process was very inaccurate so a variety of RNA sequences
would have been generated.
 Eventually leading to one or more with nascent ribozyme properties that were
able to direct their own, more accurate self-replication.
 The natural selection may have selected those systems which are very
efficient in replication and predominated others (experimentally proven).
 A greater accuracy in replication would have enabled RNAs to increase in length
without losing their sequence specificity, providing the potential for more
sophisticated catalytic properties
 Leading to develop more complex systems like present-day Group-1 introns and
ribosomal RNAs
 The early replicate able RNA was not a true genome but a protogenome is more
accurate to define it. Which describes as that molecule being able to:
 Self replicate
 Carry on some important and simple biochemical reactions i.e. energy metabolism (release
of free energy by ATP and GTP)
 The somehow production of long chain un branched lipids could be formed by
RNA catalyzed reactions or by chemical synthesis
 Once there these can form membrane like structures and some of the
protogenomes could compartmentalize
 This compartmentalization can enable RNA to perform those functions which were
not possible in open ocean.
 That could have provided the bases for cellular life

How did the RNA world develop into the DNA world?

The first transition was development of protein based enzymes with
RNA before DNA could took place the RNA place

There are many questions regarding the transition of catalytic powers
from RNA to proteins, but might be due to these reasons:


Chemical diversity associated with 20 different amino acids

Diverse folding patterns and hence diverse chemical capabilities in
proteins

Or compartmentalization made it essential to recruit proteins to give RNA
a bit hydrophobic coat necessary for its function related to membranes
The transition to proteins-mediated catalyst demanded a radical shift in
the function of RNA protogenomes i.e. to become coding molecules!

So RNA started playing role as protogenome directing the synthesis of proteins for early
biochemical functions.

Whether RNA become coding molecules itself or synthesized coding molecules by its
ribozyme ability was not clear but most probably the later is seems correct.

This leads to RNA being protogenome being involved in function of coding function and
left their catalytic function.

RNA is less suitable for its coding function due to its inherent instability, thus leading
to transfer of coding function to more stable molecule like DNA was inevitable.

Reduction of ribonucleotides giving rise to deoxyribonucleotides which could be utilized
to polymerize into copy of RNA by reverse transcription process.

The stability was increased by utilization of:

Thymidine instead of Uracil.

Adaptation of double stranded structure which also facilitate repair mechanism.

So the first DNA genomes comprised of many molecules, each specifying a single proteins and each was
equivalent to single gene.

The linking of these genes in a single molecules like a chromosome, may facilitate the efficient
distribution during cell division.
 If our understanding about the origin of life and early biological systems are
correct than it is possible that initial stages in the biochemical evolution
occurred many times in parallel in the ocean or atmosphere of early earth.
 So it is possible that life has originated more than once at that time?
 But there are many evidence which suggest that present day organisms are
derived from a single origin.
 The single origin is indicated by remarkable similarities between the basic molecular
biological and biochemical mechanism in bacterial, archaeal, and eukaryotic cells.
 I.e. There is not a single biological or biochemical logic that can tell why certain
codons used by living system is fairly universal? They should be at least different for
different origins
 At what stage the single origin of modern biological system
predominated at that time?
 The exact answer to it is difficult to establish but most likely
the system which developed first protein system and then DNA
genomes was predominated due to their efficient replication
and catalytic activity which competed out RNA based early
systems.
 Although it is possible that informational molecules other than
DNA or RNA may have been there at that times, like peptide
nucleic acid (PNA) or pyranosyle version of RNA, however
there is no indication that either these molecules are more
likely to be formed and evolved than RNA in prebiotic soup.

Although very old fossil record is difficult to interpret, there is reasonably
convincing evidence that by 3.5 billion years ago biochemical systems had evolved
into cells similar in appearance to modern bacteria.

It is very difficult to tell what type of genomes these cells may contains but it seems
likely that they had dsDNA based genome and consisted of small number of
chromosomes, possibly just one, each containing many multiple genes.

If we follow the fossil record forward in time we found that:








First evidence of eukaryotic like cells similar to single cell algae are found at 1.4 billion years
Multicellular algae about 0.9 billion years ago
Multicellular animals appeared around 640 millions year ago
Cambrian Revolution where many novel invertebrate life forms are found occurred 530
millions year ago.
Mass extinction occurred in 500 million years ago
Then rapid diversification occurred and first terrestrial insects, animals and plants were
established by 350 millions year ago.
Dinosaurs had been and gone by the end of Cretaceous era 65 million years ago.
And the first hominoid appeared a mere 4.5 million years ago.

Morphological evolution was accompanied by genome evolution.

It is not adequate to equate evolution with “progress” but it is undeniable that as we move up the
evolutionary tree we see increasingly complex genomes.

One indication of this complexity is gene number which varies from less that 1000 in some bacteria to
30,000-40,000 in vertebrates i.e. humans.

Within individual lineages i.e. within bacteria, change in gene number is probably is gradual with the
acquisition of new gens balanced at least in part by the loss of existing ones.

In certain evolutionary pathways the organisms evolved to have less genes than gain in genes, i.e.
minimum genome of Mycoplasma and other parasitic species.

The is two important points in evolutionary pathway where we see transitions and where organism
with great increase in gene number was appeared.


One of these transition is the arrival of first eukaryotes about 1.4 billion years ago, containing about 10,000 genes
compared to 5000 or less in prokaryotic cells.

The second transition was associated with arrival of first vertebrate soon after the end of the Cambrian, these
having at least 30,000 genes
There are two fundamentally different ways in which new genes could be acquired by a genome:

By duplication of some or all of the existing genes in the genome

By acquiring genes from other species.

A central role for gene duplication in genome evolution was first proposed in 1970.

The initial result of gene evolution is presence of two identical genes.

Selective constraints will ensure that one gene remains the same to provide functional protein, while other additional
copy can have multiple fates.

If the additional dose is beneficial for the organism than it will remain the same

If the additional gene is not beneficial then it will accumulate the mutations and some of deleterious mutations will lead
to inactivation of this gene, resultantly a pseudogene will be developed, the analysis of pseudogenes suggest that most
of the mutation which they accumulate are frameshift and nonsense mutation that occur in the coding regions of the
genes.

Some of the mutations might lead to some new gene functions that may be beneficial for the organisms.

Genes are duplicated in the past and an even a cursory examination reveals this phenomenon.

If a gene is beneficial and its increased amount is stabilized then its sequence will remains the same and
the result would be the presence of two genes with identical or near identical sequences.

Many multigene families are the example of this type of gene duplications.
 rRNA genes whose copy number ranges from two in Mycoplasma genitalium to 500 in Xenopus laevis.
 This increased copy number reflects the need of rapid synthesis of rRNA at certain stage of the cell cycle.

There should be some mechanism which ensures that the family members retains the same sequence with the
passage of evolutionary times

This type of evolution is called as concerted evolution, any advantageous mutation in one member of the family
will be spread to other members.

The molecular mechanism involved in this process is gene conservation with depends on recombination

If the duplicated genes is not under the same
evolutionary pressure as that of original copy that gene
may accumulate mutations which can give new and
useful functionality.

Multigene families provide many indication that such
events have occurred frequently in the past.

The prime example is globin gene family where
duplication and mutations result in the formation of
new family members.

The analysis of this family shows that the duplication and mutations provide new functions to its
members and by applying molecular clock we can estimate based on sequence divergence that when
these genes got duplicated.

This data also helped to understands the different events of various groups of b-globulin genes present in
different mammals.

Another striking example of gene evolution by duplication is provided by homeotic
selector genes that play important role in determination of body plans.

Drosophila has a single cluster of homeotic selector genes (called HOM-C)
containing eight genes that contains a homeodomain sequences which can bind
with the DNA.

These genes seems to evolve from an ancestral gene that existed about 1000 million
year ago.

The pattern of evolution in this cluster gives striking example that how gene
duplication and sequence divergence could in this case, have been the underlying
processes responsible for increasing the morphological complexity of the series of
organisms in Drosophila evolutionary tree

The Drosophila have one Hox cluster, while amphibians have two and vertibrates have four Hox clusters
each with sequence similarity with the genes at its position in the cluster.

The Ray-finned fishes, probably the most diverse group of vertebrates with a vast range of different
variations of the basic body plan, have seven Hox clusters.

There are many ways through which short segment of DNA containing
a single gene or a small group of genes could be duplicated.

Unequal Crossing-Over: Recombination event that are started by
similar sequences in homologous chromosomes, resulting in
duplication of a segment of DNA

Unequal Sister Chromatid Exchange: Which occur in the similar
manner but involves a pair of chromatids from a single chromosome

DNA Amplification: Where some part of gene can be amplified due to
unequal recombination between the two daughter DNA molecules

Replication Slippage: Results in duplication of short segments such
as microsatellite sequences.

These process results in tandem duplications i.e. one in which two
duplicated segments lies adjacent to one another in the genome, such
as globin gene families.

Some times duplicated genes do now lies adjacent to each other. i.e. in human genome there are three functional genes
for the metablic enzyme aldolase, each present on different choromosome.

One possibility is that these genes are present in tandem and then get apart on large scale genome rearrangements.

Other possibility is that these genes are the result of gene duplication by reterotransposition.

The processed mRNA can be converted to cDNA which then can be reinserted to the genome.

The genes duplicated in this manner are called as retrogenes. These genes lack any promoter so are pseudogenes.

These retrogenes can reinserted near already present promoter and can be expressed but distinctive feature of these genes
that they lack any introns.

In the similar process a full gene with introns can also be made by antisense RNA if that is transcribed by “wrong”
template by nearby promoter regions

So far we studies the process which can duplicate short DNA i.e. few tens of kilobases in
length.

Although duplication of an entire chromosome seems possible but it seems unlikely that it
has played any major role in genome evolution.

Because we know duplications in individual human chromosomes result in a cell that
contains three copies of one and two of all others (trisomy), it is either lethal or results in
diseases such as Down syndrome.

It seems over dose of some genes but not of other results in imbalance of gene products and disruption of the cellular
biochemistry.

Entire set of chromosomes can be duplicated and it is common in plants.

Autopolyploidy can results in aberrant meiosis

Tetraploid are stable and can reproduce while triploid are can not reproduce

Wheat (Triticum aestivum) is hexa ploid, while cotton (Gossypium hirsutum ) is tetraploid

Polyploidy is less common in animals especially which have distinct sex chromosomes,
neverthless red viscacha rat of Argentina, has a tetraploid genome

Autopolyploidy does not lead direclty to an increase in gene number as the initial product is
an organism that simply has extra copies of every gene, rather than any new gene.

But this provides the potential for increase in new genes by mutational process to those gene
which are not essential for organism.

To look for these past events is a difficult task for simple sequence comparison because many
of the duplicated genes may be deleted and many would have evolved so much that they
seems to be totally new sequences.

To look for such events we need to look for entire set of genes that have duplicated and had
the same order along the DNA molecules, if not have undergone much rearrangements.

The search of Saccharomyces cerevisiae showed many such examples which showed that its genome has undergone
genome duplication just under 100 million years ago.

The sequence comparison showed about 800 genes pair having more than 25% sequence identity in their proteins
products.

Out of these genes 376 could be placed in 55 duplicated sets each set containing at least 3 genes in same order.

These sets altogether covering half the genome! The fact that there were just two copies of each gene not three or four
supported that the copies arose by whole genome duplications
 The comparison of S. cerevisiae with other yeast species Kluyveromyces lactis and Ashbya
gossypii showed that these three species shared a common ancestor what lived over 100
million years ago, previous to time of genome duplication.
 The duplication in S. cerevisiae was also supported that this specie contains many
duplicated copies of those genes which are present in singletons in other yeast species
 Equivalent work has been carried out with other genomes which showed that whole
genome duplication is relatively frequent event in evolution of many groups of organisms.
 Arobidopsis thaliana genome sequence analysis with other plants showed that its ancestor
has undergone four rounds of genome duplication between 100 to 200 million years ago
 Human and other mammalian genomes also contains so may genes duplicates that at
least one genome duplication event is thought to have occurred in this lineage between
350 and 600 million years ago.
 In human genome in recent part there are
some smaller gene duplication events has
been occurred.
 The analysis of long are of human
chromosome number 22 showed that
about more than 200 segment of DNA
having more than 1 kb length showed
90% or more sequence similarity in the
region of 35Mb or with other
chromosomes over the period of 34
million years.
 There are other evidence for DNA
duplication over 1 to 400 kb in length
throughout the genome.

The possibility that any different size of portion of DNA can be duplicated provides
an possibility that functional units in a gene i.e. domains of a protein can be
duplicated and can be recombined with other proteins to make novel genes.

Most of the domains in a proteins are formed by a contagious sequence in the DNA.

Rearrangement of domain-encoding gene segments could result in novel protein
functions.

Domain Duplication: Can occur when the DNA segment coding for a
domain get duplicated by any of the mechanisms studied so far, can
result in duplication of the same domain in that protein.

The presence of an additional domain may confer novel properties to
that gene or after accumulating mutations can give rise to different
structure or function.

The domain duplication results in elongation of a gene, which is a
characteristic of higher organisms.

Domain Shuffling: Can occur when domains from different genes get
recombined in new ways giving to a totally new combinations or mosaic
proteins leading to the development of entirely new biochemical function.

The duplication of domains require that the domain should code for continous
stretch of DNA without any presence of introns.

Interestingly, the domains in a protein is usually coded by one exons, therefore
there physical separation facilitate the movement of full domians.

The excellent example is of a2 Type I Collagen which codes for three peptide
chains of repeated sequence of tripeptide glycine-X-Y.

The chicken a2 Type I collagen gene is split into 52 exons, 42 of which cover
the part of the gene coding for glycine-X-Y repeats.

Within this region, each exon encodes a set of complete tripeptide repeats. The
number of repeats per exon varies but is 5 (5 exons), 6 (23 exons), 11 (5 exons),
12 (8 exons) or 18 (1) exon.

This gene could have evolved by duplication of exons leading to repetition of the structural domains.

There are similar examples of many proteins involved in blood clotting in humans.
 The new genes can be acquired from other species by a process known as lateral gene
transfer.
 The lateral gene transfer has played major role in the genome evolution of prokaryotes and
is very common among them.
 That can be facilitated by multiple ways i.e. conjugation, composite transposones, direct
DNA uptake from environment.
 In higher organisms, plant are well known for acquiring genes from other species by
allopolyploidy, the cross of wheat and cotton are well characterized examples.
 In animals specie barrier is not easy to break, so its very little that acquiring genes by
lateral transfer is common among them.
 The most common way is by the transposons and retroviruses which carry genes along
with their genome from one host to another.
 The transposon P elements has shown to be transferred from Drosophila to human

The coding region only makes up 1.5% of human genome, the evolution of noncoding DNA is also
important for genome evolution.

The large amount of noncoding DNA is always remained puzzling for scientists. It has been thought
that is performaing some unknown but important function.

It is also important that the noncoding DNA might be playing some role in aspects of genome
organization and control function as can be thought by chromatin structural influence on gene
expression.

The view of some scientist is that as it has not selective pressure to get rid of it that’s why its still
there.

It seems that most of the noncoding DNA is under random evolutionary pressure except for the parts
which are preceding the coding regions and some which are involved in structural aspects of
chromosomes.

Nevertheless along other important regions transposable elements and introns have interesting
evolutionary histories and are general importance in genome evolution.

The transposable elements have a number of effects on evolution of the genome as whole.

They can initiate the recombination events by providing identical sequences at different
places.

This unequal recombination with in and between different chromosomes can lead to deletion
of in between DNA.

These deletions are harmful and results in loss of genes from genome but some time it may
also be beneficial.

The recombination between a pair of LINE-1 elements approximately 35 million
years ago resulted in the gene duplication of b-globlin gene that resulted in
gamma and alpha members of this gene family.

The movement of transposible elements in genome has important consiquences.

They may affect the transcription by inserting and removing in promoter
regions.

They can alter splicing of the genes by inactivating/activating the splice sites.

Although human evolutionary history is controversial but its is generally
acceptable that our closest relative among the primates is chimpanzee and the
most recent ancestor which we both shared lived 4.6-5.0 million years ago.

Since that split human lineage embraced two genera Australopithecus and
Homo making us which thinks to having important biological functions at
least to our eyes, which make us different from all other animals.

So how different we are then chimpanzee?

Only 1.73% nucleotide differences among humans and chimpanzee

The identity in coding regions are greater than 98.5% with 29% genes with
identical amino acid sequence, even noncoding regions are 97% identical.

The gene order is almost similar and chromosome are very similar in
appearance.

The only significant difference is that chimpanzee have one more 24
chromosome while human have 23. But still the gene content is same.