Download Gene and Genome Evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Essential gene wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Epistasis wikipedia , lookup

Koinophilia wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Mutation wikipedia , lookup

Genomic library wikipedia , lookup

Metagenomics wikipedia , lookup

Gene desert wikipedia , lookup

Copy-number variation wikipedia , lookup

Polyploid wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

Genomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Non-coding DNA wikipedia , lookup

Ridge (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Human genome wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression profiling wikipedia , lookup

Point mutation wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Gene wikipedia , lookup

Genome (book) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Gene and Genome Evolution
Model Organisms
• Most interesting experiments can’t morally be performed on humans, so
we use model organisms as a stand-in.
• How similar they are to humans depends on the situation, and occasionally
causes problems.
• Also, we would like to understand how all living things work, but finite
resources lead us to concentrate on just a few organisms that are easy to
work with.
• The main model organisms, widely used for many purposes: mice (Mus
musculus), Drosophila melanogaster, Caenorhabditis elegans (C. elegans:
nematode), Saccharomyces cerevesiae (yeast), Escherichia coli (E. coli:
bacteria), Arabidopsis thaliana (plants).
• All of these have completely sequenced genomes from several different
strains, as well as large collections of mutants and a way to transform them
(i.e. insert DNA into their genomes), plus lots of knowledge about how to work
with them.
• Other model organisms: rhesus monkeys, rats, Xenopus (Africa clawed
frog), zebrafish, fugu, Schizosaccharomyces pombii, lots of others.
Escherichia coli
• E. coli is a Gram-negative rod-shaped bacterium that lives in
the human gut. It has been an important lab organism since
the beginning of molecular biology (1940 or so).
• Originally it was used as a way to grow the bacteriophage that
early molecular biologists (notably Salvador Luria and Max
Delbruck) wanted to study to determine what genes were and
how they worked.
• It then became the organism of choice for studying gene
expression, recombination, and many other fundamental
genetic properties.
• E. coli grows quickly ( 20 minute doubling time) under easy lab
conditions: aerobic, 37oC, with easy to make and cheap
growth medium. It can be grown in liquid culture (mass
quantities) or on Petri plates (single isolated cells).
• The main E. coli strain used, K12, is non-pathogenic and has
lost the ability to grow in the human gut.
• Much biotechnology uses E. coli to grow cloned DNA
segments, or uses enzymes derived from E. coli.
Saccharomyces cerevisiae
• Saccharomyces cerevisiae is “yeast”. It is used to make
alcohol from sugars: almost all beer, wine, and distilled
spirits use S. cerevisiae in their production.
• It is also the yeast used to make bread rise, by
producing bubbles of carbon dioxide that get trapped
by the gluten proteins in the bread dough.
• S. cerevisiae is a eukaryote, a member of the fungus
kingdom. As such it is more closely related to humans
than plants are.
• S. cerevisiae is single celled, and many of the
microbiological techniques used to study E. coli and
other bacteria can be used with it. Many processes
basic to eukaryotes have been studied in yeast: control
of the cell cycle, protein-protein interactions
• It can be grown as a haploid or as a diploid, which
allows easy detection of mutants (as haploids) as well
as the ability to maintain lethal mutations as diploids.
• There are deletion mutants covering most of the
genome, and you can order knockout mutation
strains for every gene.
• It can be propagated vegetatively, but it also
undergoes sexual reproduction readily.
Caenorhabditis elegans
• C. elegans is a small, free-living nematode (also called
roundworms). “worms”
• They are animals with nervous systems and all other
typical animal tissues. Has 3 germ layers (endoderm,
mesoderm, ectoderm) like humans.
• Started as a model organism in the 1960’s by Sydney
Brenner, who had previously worked with bacteriophage.
• Has about 1000 cells, and every cell’s origin and fate is
determined. Very unlike higher animals, where cells
depend of external cues (like morphogen gradients) to
determine how they should develop.
• They live on E. coli growing on Petri plates, and can be
stored indefinitely by freezing them.
• RNA interference discovered in C. elegans, plus many
studies of simple nervous systems and meiosis.
Drosophila melanogaster
• Drosophila (“flies”) have been used in genetics research
since the early 1900’s. Thomas Hunt Morgan started using
them in 1910 at Columbia University, and they have
remained popular ever since.
• As most students know, flies have a rapid life cycle, are easy
to grow, and have many interesting morphological mutants.
• Much of genetics knowledge came from fly research. More
recently, fundamental knowledge of development came
from studying various Drosophila mutations.
• The salivary glands of the larvae have giant polytene
chromosomes, which allowed specific genes to be located
and gene activity to be detected: the polytene
chromosomes puff out when active transcription is
occurring.
Mus musculus
• The house mouse, a long term associate of humans, is considered
vermin whose life is not worth protecting. This has made mice the lab
animals of choice for a very long time.
• Use of mice in the lab is not regulated by the Animal Welfare Act.
However, the National Institutes of Health have standards for mouse care
that we at NIU follow.
• The use of mice in genetics started around 1909, when Clarence Cook
Little produced the first inbred strains. Cook later founded the Jackson
Laboratory in Bar Harbor Maine, which is the primary stock center for
mouse genetics today.
• Mice provide the main mammalian model for humans in genetics and
medicine.
• Unlike humans, mice can be made homozygous at almost all loci by
inbreeding (brother-sister matings for many generations).
• Mouse genes can be manipulated in vitro and re-inserted in the
genome of embryos to produce transgenic mice. Similar techniques
allow any specific gene to be inactivated: knockout mice.
• The immune system is almost completely inactivated in the nude
mouse. These mice can accept tissue transplants from humans,
producing mice with a human immune response.
Arabidopsis thaliana
• Arabidopsis thaliana, is the primary model
plant (angiosperm = flowering plant).
Sometimes called thale cress, but mostly
just known as Arabidopsis.
• Started in the 1980’s
• It has a very small genome: about 135 Mbp
(million base pairs), as compared to
humans (3000 Mbp) or even rice (430 Mbp)
• Arabidopsis is small and has a short
generation time (6 weeks), which makes for
easy genetics.
• Huge collection of mutant strains, easy to
transform, large research community.
• Lots of work on basic plant development
that has been easily transferred to crop
species.
Genome Changes in Evolutionary Time
• A basic principle: all current life on Earth arose from a
single common ancestor, the Last Universal Common
Ancestor (LUCA)
• There were certainly other living things before the time of
LUCA, and after it as well
• Perhaps 3.5 billion years ago
• Must have had same DNARNAprotein that we have,
plus several other features common to all living organisms
today.
• Thus we want to explore the forces of mutation and
selection that have converted LUCA into the diversity
we see today.
• Some mechanisms we will discuss:
• Whole genome duplications
• Chromosomal rearrangements: translocations,
inversions, transposon movements
• Gene family expansions
• Horizontal gene transfer
• Natural selection within genes and in regulatory
regions
Evolution by Natural Selection
• A fundamental principle: lots of mutations occur, but only a small number end up
fixed (i.e. present in all individuals) within a species. Natural selection removes
deleterious mutations.
• Some mutations are selectively neutral, neither selected for nor against. Their survival
depends on random chance events.
• A simple way of looking at the effects of selection is to compare homologous
genes (genes in different species that have the same function and are derived
from a common ancestor)
• Two types of selection that can be detected when comparing homologous genes:
• most selection is negative or purifying selection. Most genes perform the
same function in closely related species, and mutations that disrupt that
function are eliminated.
• A few genes undergo positive selection. The homologous genes are evolving
different functions, and so require different amino acid sequences.
Base Substitutions
• The simplest type of mutation is the base substitution, also called a point mutation or
a single nucleotide polymorphism (SNP). One nucleotide has been substituted for
another.
• Caused by tautomeric shifts, incorrect DNA repair, random events.
• Two basic types:
• transition: converting one purine to the other purine, or one pyrimidine into the other
pyrimidine.
• transversion: converting a purine to a pyrimidine or the reverse.
• Logically, transversions should be twice as frequent since there are twice as many
possible transversions as transitions.
• However, in practice, transitions are about twice as common as transversion. Due to a
combination of natural selection and ease of occurrence.
• Neutral substitution rate: how often to nucleotides change in the absence of selection
pressure. In a comparison of the human and mouse genomes, 165 Mbp of DNA
associated with non-functional transposon sequences were identified in both species.
These had about 67% identical bases, which implied a rate of 0.46 substitutions per
position over the 75 million years since the human and mouse lineages diverged. This
works out to 2 x 10-9 substitutions per year for each site, in the presumed absence of
selection pressure. This estimate agrees with other estimates based on different
methods.
Substitutions Within Genes
• We mostly care about the functional parts of the
genome, the genes and their control regions. Since
most of the genes are presumably necessary for life,
some mutations will be deleterious and others not.
• In the human-mouse genome comparison, variation in
the rate of substitutions across the various portions of
genes was clear: fewest in the exons, most in the
introns, and an intermediate amount in the UTRs and
flanking regions.
• For coding regions, the degeneracy of the genetic code
has a large effect.
• some sites are non-degenerate: any change results in a
different amino acid. Mostly in the first or second bases
of codons.
• other sites are two-fold degenerate: transitions give the
same amino acid while transversions give a different
amino acid.
• other sites are four-fold degenerate: any mutation gives
the same amino acid. These sites are all third positions
of codons.
• Mutations that give the same amino acid are called
silent or synonymous mutations. They are presumed
to be selectively neutral.
More on Substitution
• In addition to synonymous mutations, some
amino acid changes are conservative in that
they have little or no affect on the protein’s
function.
• for example, isoleucine and valine are both
hydrophobic and readily substitute for each
other.
• other amino acid substitutions are very
unlikely: leucine (hydrophobic) for aspartic
acid (hydrophilic and charged). This would be
a non-conservative substitution.
• Some amino acids play unique roles:
cysteines form disulfide bridges, prolines
induce kinks in the chain, etc.
• However, some amino acids are critical for
active sites and cannot be substituted.
• Tables of substitution frequencies for all pairs
of amino acids have been generated. These
are based on counts of homologous
sequences that have been aligned.
• Just counts of changes along the whole length of
the proteins, not accounting for active sites, etc.
BLOSUM62 Table. Numbers on the diagonal
indicate the likelihood of the amino acid
staying the same. The off-diagonal numbers
are relative substitution frequencies.
Numbers greater than zero indicate that the
change is seen more often than predicted by
random chance; negative numbers imply that
the substitution is less frequent than
predicted by chance.
Short Indels
• Indel – insertion/deletion, a position in a protein or DNA
sequence where one species has nucleotides or amino
acids, and the other species doesn’t.
• Since we can’t usually tell whether one species had an insertion
or the other species had a deletion, we just call it an indel.
• Short indels: 1-10 bp or so, are the second most frequent
type of mutation seen (after base substitutions).
• In the human genome, the current estimate is that short indels
occur at about 1/20 the frequency of base substitutions.
• The cause of short indels is slippage of DNA polymerase
during the replication process. The sliding clamp
mechanism keeps the polymerase bound to the DNA most
of the time, but random events (like Brownian motion) can
cause it to temporarily fall off. This can generate small
indels.
• This is especially common in Simple Sequence Repeats (SSRs) in
which a short (2-5 bp) sequence is repeated many times in a
row.
Simple Sequence Repeats
• Simple sequence repeats (SSRs) are found all over the genome.
The first high quality human genetic maps were made using SSRs as
loci.
• Realize that using visible mutant phenotypes or genetic diseases
won’t work: no one has very many of them, and controlled genetic
crosses aren’t possible.
• SSRs work well because the number of repeats at a given SSR locus
is usually stable enough to be almost always inherited from parent
to child, and because they are scattered throughout the genome.
Since everyone has all of the markers, any mating will give
informative results.
• Trinucleotide repeats (TNRs) are a type of SSRs that have an array
of 3 bp repeats.
• Because a codon is 3 bp long, TNRs within a coding region don’t
change the reading frame.
• However, some TNRs cause diseases even though they are in the
UTRs.
• Below a certain number, the repeats are relatively stable. But,
above that, the copy number can change drastically in both mitosis
and meiosis due to DNA polymerase slippage. These alleles are
called pre-mutation alleles. Above an even higher point, the
mutant phenotype appears.
SSR of the 3 base sequence CTT.
Alleles A, B, and C differ in the
number of CTT repeats present.
Huntington Disease
• Huntington Disease. A dominant autosomal disease, with most
people heterozygotes. Caused by trinucleotide repeat
mutations.
• Onset usually in middle age.
• Neurological: starts with irritability and depression, includes
fidgety behavior and involuntary movement (chorea), followed
by psychosis and death.
• Caused by CAG repeats within the coding region, giving a tract of
glutamines. Below 28 copies is normal, between 28 and 34
copies is the premutation allele: normal phenotype but unstable
copy number that puts the next generation at risk. Above 34
copies gives the disease.
• HD shows “anticipation”: the age of onset gets earlier with every
generation. This is due to a direct correlation between copy
number and age of onset.
• There is a genetic test for the disease, but in the absence of
effective treatment few actually take the test.
• Function of the protein remains unknown, the excess glutamines
cause it to aggregate and (probably) poison the nerve cells.
Comparative Genomics
• Start with 2 completely sequenced genomes.
Find regions of sequence similarity (homology)
using BLAST or some other alignment program.
• The basic principle of comparative genomics is
that sequence conservation across species
lines implies natural selection for that
sequence.
• The sequence must be important, because it affects
fitness. Mutations that alter the sequence mostly
have negative effects and tend to be eliminated by
natural selection.
• Some conserved regions are genes, while
others are regulatory, or have functions we
don't know yet.
• The further the evolutionary distance between
two species is, the less sequence conservation.
• Amino acid sequences are preserved better
than nucleotide sequences, mostly due to the
degeneracy of the genetic code.
Medicago genes
Dotplots
• In a dotplot, the chromosomal
positions of one genome is on
the x-axis, and the other genome
is on the y-axis. Sequences that
match are marked with a dot.
• A long diagonal line shows a
region with significant crossspecies homology
• Reverse diagonal lines indicate
inversions between the species.
On the left: 2 strains of E. coli are almost completely
collinear. Above, human and mouse chromosomes show
many scattered regions of homology.
Types of Sequences Conserved Between Species
• Genes: if it looks like a gene (i.e. open reading frame) and is conserved between species, it
probably is a gene. Conversely, ORFs that aren't conserved have often been shown to be
random events, not part of a gene.
• Many RNA genes have been found because they are conserved between species
• Lots of conserved sequences between human and mouse that hasn't been assigned a definite
function yet. 3.9% of the genome vs. 1.1% that is coding.
• Ultra-conserved elements: greater than 200 bp and 100% sequence identity between
species. Originally found about 400 UCEs between human and mouse. But now, some have
been found between Drosophila and humans, and sea urchins and humans, etc. They are
often found near important genes: transcription factors, developmental regulators, ion
channels. Probably involved in gene regulation, but still unclear. Some may be undetected
RNA genes.
• Human-accelerated regions (HARs). A set of 49 regions that are conserved in vertebrate
evolution but very different between humans and chimpanzees. Quite short: 140 bp average.
Mostly not in genes. One well known one, HAR1, is an RNA gene. Others are enhancers of
nearby gene activity. Many associated with neural development.
Genome Changes in Evolution
• There are very few genes found in humans and nowhere
else.
• Most of the differences between us and our closest relatives
are changes in gene families, altered functions of existing
genes, and changes in regulatory sequences.
• Human vs. chimpanzee:
• For sequences that can be aligned: 1.2% base substitutions, plus 3%
differences in insertions and deletions (indels). There are fewer indels
than base substitutions, but indels can cover many more bases.
• 1500 inversions, from very small (23 bp to 62 Mbp). 23 bp is at the
detection limit for BLAST searches, and there are probably plenty of
smaller inversions.
• Several hundred changes in gene family copy number
• Lots of changes in repeat sequences (3 x as many Alu elements in
humans as in chimps)
• Loss of function in about 80 genes (half of which are olfactory
receptors).
• About 29% of all proteins with clear orthologs are identical between
humans and chimps, and most of the rest differ by only 1 or 2 amino
acids.
Whole Genome Duplication
• As the name implies, a whole genome duplication is an event
where the genome size doubles, going from diploid to
tetraploid.
• These events also require the chromosomes to pair up as if they were
diploids during meiosis. Otherwise the organism would not produce
offspring.
• Common in plants, but very rare in animals. Plants can undergo
many generations of clonal (non-sexual) propagation.
• Two duplications in vertebrate lineage between when tunicates
(urochordates) split from the rest of the chordates and when
the cephalochordates (like Amphixous) split off.
• A third duplication in bony fish lineage, after they split from the
tetrapod lineage.
• Maintaining a polyploid state occurs frequently in amphibians
and reptiles, but it is thought that X chromosome inactivation
and the problems of maintaining gene balance with 2 different
sex chromosomes makes this very difficult in the mammals.
• The problem can be seen with the abnormalities associated with
XXY and XO individuals: Klinefelter and Turner syndromes.
Diploidization
• After a genome duplication, most of
the genes are duplicated. What
follows is a period of diploidization,
trying to regain the stable diploid
state, during which many genes lose
one or the other copy. The result is
that most genes end up with just one
copy.
• Some genes retain both copies, and
often there will be a functional
divergence: they take on different
roles.
• Notably, the Hox genes have retained all
4 copies: there are 4 clusters on different
chromosomes that are recognizably
similar all the way from the coelocanths
(cartiligenous fishes on the tetrapod side
of the fish/tetrapod split) to humans.
Hox Genes
• Hox genes specify segment identity:
different members of the cluster are
expressed in different segments as you
move from anterior to posterior. Hox
genes make transcription factors.
• Order of expression on the chromosome
is the same as order in the body.
• Same mechanism used in and all
bilateran animals. First described and
understood Drosophila. Conservation is
enough that a Drosophila Hox gene
works correctly when put into chickens.
• Hox genes contain a homeobox domain,
which is also found in plants and serves
a similar role in development.
Chromosome Rearrangements
• When comparing mammalian genomes, it is clear that synteny is common: when two
genes are neighbors in one species, they are usually neighbors in other species.
• However, comparing the genomes of two species show the results of multiple
translocations and inversions. Blocks of syntenic genes are seen, but often spread
across multiple chromosomes.
• Average size of synteny blocks between mouse and humans is 10 Mbp.
• Partly a consequence of the fact that genes on a chromosome mostly don’t
interact with their neighbors.
• New centromeres often form in what was previously euchromatin. Centromere
sequences evolve rapidly.
• The difference between human and chimp chromosomes (23 vs 24) is due to a
translocation that connected the long arms of two ape chromosomes into a single
human chromosome.
• Notable exception is the X chromosome: most X genes stay on the X over long
evolutionary time. Problems with dosage compensation.
Gene Duplication in Gene Families
• Tandem arrays of genes can very easily expand or contract
their numbers.
• Unequal crossing over, as in the beta-globin genes
• Different gene families expand in different lineages: an
expanded gene family is presumably doing something
important for that lineage.
• Between humans and mice, there are about 15,000 genes
that match 1-to-1 as homologs. However, there are
another 5000 genes in gene families with very different
copy numbers.
• Sometimes the main effect is simply to increase the
amount of gene product. A good example: the salivary
amylase genes (AMY1); amylase converts starch into
sugar. Apes have only 1 amylase gene, but humans have
multiple copies of the gene in a tandem array.
• Since the agricultural revolution (about 10,000 years ago),
we eat much more starch than our hunter-gatherer
ancestors and our ape cousins.
• The copy number of AMY1 genes is different in different
populations, and it correlates with starch levels in the diet.
• We know it's a recent duplication because the different
copies are all very similar: they have picked up very few
random mutations, even at synonymous sites.
Amylase Gene
Duplication
• Copy number is roughly correlated
with starch levels in the diet.
• The Hadza are hunter-gatherers in
Tanzania who rely on starchy roots and
tubers
• The BiAka and the Mbuti are huntergatherers in the African rain forest.
• The Datog are pastoralists (they herd
cattle) in east Africa.
• The Yakut are hunters and fishermen
from Siberia.
More on Gene Families
• When genes get duplicated, the two
copies are referred to as paralogs.
(Orthologs are the same gene in two
different species.)
• Several possibilities for the newly
formed paralogs:
• one copy gets inactivated by mutation
and becomes a pseudogene
• one paralog evolves a new function
while the other keeps the old
function. This is called
neofunctionalization.
• The two paralogs split the previous
function: they get expressed in
different tissues or different times in
development. This is called
subfunctionalization.
Orthologs: the same gene in two different
species.
Paralogs: two genes in the same species
derived from a common ancestral gene.
Globin Gene Evolution
• Start with ancestral globin
gene, 800 million years ago.
• 3 single genes on different
chromosomes, all of which
work as monomers. Myoglobin
carries oxygen in muscle cells;
others have less well known
functions.
• Two gene clusters, for alpha
and beta globins. These work
as tetramers in carry oxygen in
the blood. Zeta and epsilon are
active in the embryo, gamma-A
and gamma-g in the fetus, and
alpha, delta, and beta in the
adult.
• Also several pseudogenes in the
tandem clusters.
Horizontal Gene Transfer
• Horizontal gene transfer: transfer of DNA between distantly
related species. As opposed to vertical gene transfer: the
normal method, genes transferred from parent to offspring.
• It’s quite unusual (but it does happen) in eukaryotes (at least, things like
plants and animals), but a major issue in prokaryotes, where 10% or more
of DNA in a species has been transferred in across large evolutionary
distances.
• Prokaryotic sexual processes (conjugation, transduction, transformation)
often work very well between species.
• Detected because a gene’s sequence resembles orthologs in
very different species more than in closely related species.
• Different members of the same bacterial species often differ in
20% or more of their genes: genes present in one strain but
absent in another.
• This makes the definition of “species” difficult in bacteria.
The outer circle above is a comparison of
2 E. coli strains. Shared regions are in
blue; red regions are found in strains
EDL933 only, and yellow regions are in
strain MG1655 only.
Transposon Insertions
• At first glance, transposons seem to
be intra-nuclear parasites, bent on
increasing their copy number without
helping the organism at all. This is
the selfish DNA hypothesis. Viruses
are another example of selfish DNA.
• Some closely related organisms differ
widely in the number of transposable
elements present in their genomes.
• Transposons can cause trouble by
interrupting important genes, but they
mostly have little effect.
• Arabidopsis has about 27,000 genes
and 25 Mbp of transposon DNA;
maize has about 40,000 genes and
about 1800 Mbp of transposon DNA.
Five dipterans (flies) showing differences in
genome size, intron length, CDS length, and
transposon (TE) numbers).
Transposon Insertions and Evolution
• However, transposons can also affect gene regulation,
altering the pattern of gene expression in different
tissues. This is potentially a positive role: the raw
material for natural selection.
• Also, non-autonomous DNA transposons consist of
nothing but a pair of short inverted repeats that are
recognized by transposase. Often, random pieces of
genomic DNA are trapped between pairs of inverted
repeats, and moved to new locations.
• Functional copies of LINE-1 elements, Alu sequences,
and some endogenous retroviral sequences (LTR
retrotransposons) exist in the human genome. They
occasionally transpose into genes that give a detectable
phenotype.
• The first examples found were two independent
insertions of LINE-1 into exons of the clotting factor 8
gene. These events caused hemophilia: the inability for
blood to clot.
• Transposable element movement has also been
implicated in cancer and the chromosome
rearrangements that accompany it.
• Recombination between Alu sequences in different
parts of the genome can generate deletions and
perform exon shuffling: the insertion of a new exon into
a gene from a completely unrelated gene