Download Changes in DNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Ridge (biology) wikipedia , lookup

Primary transcript wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

X-inactivation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Polyploid wikipedia , lookup

Genomic imprinting wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Mutagen wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Transposable element wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Human genome wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Genomics wikipedia , lookup

Genetic code wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Epistasis wikipedia , lookup

RNA-Seq wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Microsatellite wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Oncogenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Genome editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Frameshift mutation wikipedia , lookup

Gene wikipedia , lookup

Mutation wikipedia , lookup

Genome evolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Changes in DNA
Mutations
•
•
•
•
•
Any change in the DNA sequence of an organism is a mutation.
Mutation is a decay force whose ultimate roots are in the second law of
thermodynamics (entropy). Living things survive inevitable mutations by a
combination of being tolerant of a certain level of mutation, repairing
mutational damage, killing cells that are mutated beyond repair, and relying
on natural selection to remove individuals with unfavorable mutations.
Mutations are the source of the altered versions of genes that provide the
raw material for evolution.
A central tenet of biology is that the flow of information from DNA to protein
is one way. DNA cannot be altered in a directed way by changing the
environment. Only random DNA changes occur.
Some terminology: the genotype is the organism’s genetic constitution, at
the bottom, the sequence of its DNA. The phenotype is the physical
characteristics of the organism: its appearance, biochemistry, reactions to
the environment, etc.
– before DNA sequencing, the genotype was deduced from the phenotypes of
parents and offspring.
– the point of genome annotation is to deduce the phenotype that will result from a
given genotype.
More Mutation Generalities
• Most mutations have no effect on the organism, especially among
the eukaryotes, because a large portion of the DNA is not in genes
and thus does not affect the organism’s phenotype.
• Even within genes, mutations can have little or no effect
– the genetic code is degenerate: some mutations ar translated into the
same amino acid
– many amino acid changes have little or no effect on protein function.
• Of the mutations that do affect the phenotype, the most common
effect of mutations is lethality, because most genes are necessary
for life.
• From a bioinformatics point of view, the three simplest types of
mutation: base substitution, small insertions and deletions (indels),
and simple sequence repeats, affect sequence alignment programs.
Larger mutations such as transposable element movements,
recombination-induced mutations, and general chromosome
rearrangements, affect large scale issues such as genomic maps.
Base Change Mutations
•
The simplest mutations are base changes,
where one base is converted to another.
(Also called “substitutions”, or “point
mutations”.) These can be classified as either:
–
–
•
•
•
--“transitions”, where one purine is changed to
another purine (A -> G, for example), or one
pyrimidine is changed to another pyrimidine (T > C, for example).
“transversions”, where a purine is substituted for
a pyrimidine, or a pyrimidine is substituted for a
purine. For example, A -> C.
Transitions are more common than
transversions, because they are easier to
create, and because transitions often have
less drastic effects than transversions.
Base change mutations are the cause of
single nucleotide polymorphisms (SNPs).
Mapping SNPs is the current best way to
locate human disease genes.
Base change mutations are the most common
mutations, and they are the easiest to handle
for statistics and evolutionary studies.
A
C
G
T
A
0.6
0.1
0.2
0.1
C
0.1
0.6
0.1
0.2
G
0.2
0.1
0.6
0.1
T
0.1
0.2
0.1
0.6
Base Change Causes
• Base changes occur naturally as errors in replication: the
wrong base gets inserted.
– DNA polymerase has an editing function that detects most
errors, then backs up, removes the wrong base and puts in the
proper base.
– enzymes that replicate RNA don’t have the editing function, so
their error rate is 100 x that of DNA polymerase, causing the high
mutation rate of RNA viruses.
•
Various chemical changes in a base can cause
mutation. For instance, the spontaneous loss of the
amino group on cytosine converts it to uracil (which will
pair with A, not G).
• environmental chemicals that attach bulky groups onto
bases (alkylating agents) can cause the bases to be misread by DNA polymerase.
Phenotypic Effects of Base
Changes
•
•
•
•
•
•
Mutations can be classified according to their effects on the protein (or mRNA)
produced by the gene that is mutated.
1. Silent mutations (synonymous mutations). Since the genetic code is degenerate,
several codons produce the same amino acid. Especially, third base changes often
have no effect on the amino acid sequence of the protein. These mutations affect the
DNA but not the protein. Therefore they are called neutral mutations, mutations
which should have no effect on the organism’s phenotype.
2. Missense mutations. Missense mutations substitute one amino acid for another.
Some missense mutations have very large effects, while others have minimal or no
effect. It depends on where the mutation occurs in the protein’s structure, and how
big a change in the type of amino acid it is.
3. Nonsense mutations convert an amino acid into a stop codon. The effect is to
shorten the resulting protein. Sometimes this has only a little effect, as the ends of
proteins are often relatively unimportant to function. However, often nonsense
mutations result in completely non-functional proteins.
4. Sense mutations are the opposite of nonsense mutations. Here, a stop codon is
converted into an amino acid codon. Since DNA outside of protein-coding regions
contains an average of 3 stop codons per 64, the translation process usually stops
after producing a slightly longer protein.
Base changes can also affect RNA initiation, splicing and termination.
More on Substitution
•
In addition to synonymous
mutations, some amino acid
changes are “conservative” in
that they have little or no affect
on the protein’s function.
–
–
–
–
•
for example, isoleucine and
valine are both hydrophobic
and readily substitute for each
other.
other amino acid substitutions
are very unlikely: leucine
(hydrophobic) for aspartic acid
(hydrophilic and charged). This
would be a non-conservative
substitution.
Some amino acids play unique
roles: cysteines form disulfide
bridges, prolines induce kinks
in the chain, etc.
However, some amino acids
are critical fro active sites and
cannot be substituted.
Tables of substitution
frequencies for all pairs of
amino acids have been
generated.
BLOSUM62 Table. Numbers on the diagonal
indicate the likelihood of the amino acid
staying the same. The off-diagonal numbers
are relative substitution frequencies.
Indels
•
Another simple type of mutation is the gain
or loss of one or a few bases. These
mutations are called indels, which is short
for “insertion/deletion”.
– When comparing two species it isn’t easy to
tell whether an insertion occurred in one
species or a deletion occurred in the other.
•
Indels are thought to be generated when
the DNA polymerase slips forward or
backward on the template DNA it is
copying.
– This occurs most easily in repeated
sequences, but can occur anywhere.
•
A second cause of short indels is
chemical- or radiation-induced loss of the
base portion of the nucleotide. The DNA
polymerase often skips right over these
sugar/phosphate stumps, leaving a
missing base in the resulting DNA chain.
Frameshifts and Reversions
•
•
•
Translation occurs codon by codon,
examining nucleotides in groups of 3.
If a nucleotide or two is added or
removed, the groupings of the codons
is altered. This is a frameshift
mutation, where the reading frame of
the ribosome is altered.
Frameshift mutations result in all
amino acids downstream from the
mutation site being completely
different from wild type. These
proteins are generally non-functional.
A reversion is a second mutation that
reverse the effects of an initial
mutation, bringing the phenotype back
to wild type (or almost).
–
Frameshift mutations sometimes have
“second site reversions”, where a
second frameshift downstream from the
first frameshift reverses the effect.
Microsatellites/Simple Sequence Repeats
•
•
Two words for the same phenomenon.
During replication, DNA polymerase can “stutter” when it replicates several tandem
copies of a short sequence, say 2-5 bp.
–
•
•
Outside of genes, this effect produces useful genetic markers called SSR (simple
sequence repeats).
They are heavily used in genetic mapping, for several reasons.
–
–
–
•
For example, CAGCAGCAGCAG, 4 copies of CAG, will occasionally be converted to 3
copies or 5 copies by DNA polymerase stuttering.
They are easy to detect,
They are fairly stable across generations yet have a high enough mutation rate that many
alleles exist in the population.
They are found in many locations in the genome of all organisms.
Within a gene, this effect can cause certain amino acids to be repeated many times
within the protein. In some cases this causes disease
Huntington Disease
•
•
•
•
•
•
•
Huntington Disease. A dominant
autosomal disease, with most people
heterozygotes.
Onset usually in middle age.
Neurological: starts with irritability and
depression, includes fidgety behavior and
involuntary movement (chorea), followed
by psychosis and death.
Caused by CAG repeats within the coding
region, giving a tract of glutamines.
Below 28 copies is normal, between 28
and 34 copies is the premutation allele:
normal phenotype but unstable copy
number that puts the next generation at
risk. Above 34 copies gives the disease.
HD shows “anticipation”: the age of onset
gets earlier with every generation. This is
due to a direct correlation between copy
number and age of onset.
There is a genetic test for the disease, but
in the absence of effective treatment few
actually take the test.
Function of the protein remains unknown,
the excess glutamines may cause it to
aggregate and lose function.
Larger Scale Mutations
•
•
•
Larger mutations include insertion of whole new
sequences, often due to movements of transposable
elements in the DNA or to chromosome changes such
as inversions or translocations.
Deletions of large segments of DNA also occurs.
These phenomena affect the order of genes on the
chromosome.
– In classical genetics, synteny means that two genes are
on the same chromosome. This term has a slightly
different meaning in genomics and bioinformatics: that a
group of genes are in the same order on the chromosome
in different species.
– Synteny tends to be conserved in closely related species,
but breaks down in more distantly related species.
•
Also, the genes at the breakpoints of a large scale
mutation are often broken in half or otherwise
disrupted.
Transposable Elements
•
•
•
•
Transposable elements are DNA sequences that
move from place to place in the genome. Unlike
genes, transposable elements don’t have a fixed
location on the chromosome.
Transposable elements are essentially parasites. In
general they don’t contribute to the evolutionary
fitness of the organism.
Most of the genes in an organism are necessary, at
least under some circumstances, for the organism’s
survival. Genes avoid being destroyed by random
mutations because individuals with mutated genes
are less fit: don’t survive or reproduce as well as
unmutated individuals.
Transposable elements avoid being destroyed by
increasing their numbers by enough to keep some
functional copies present even if some are
destroyed.
– However, too much increase in numbers will kill the
organism because sometimes transposable elements
insert within a gene, inactivating it.
More Transposable Elements
•
•
•
•
•
Two basic types: those that are strictly DNA, and those that replicate
through an RNA intermediate. These are sometimes called type 1 and
type 2, but I have a hard time keeping those arbitrary numbers
straight. The most important nomenclature issue is that the prefix
“retro-” implies the use of reverse transcriptase, which copied RNA
into DNA, the defining characteristic of RNA-intermediate
transposable elements.
Eukaryotes often contain very short (200-500 bp) elements that
contain the ends of a longer DNA transposon and miscellaneous junk
inside. They move to new locations using the transposase enzyme
from a full length element.
Most bacterial TEs are DNA only. In eukaryotes, DNA transposable
elements occur, but are less common than retrotransposons.
– Transposable elements were first studied by Barbara
McClintock in corn. They are an important source of the
variation seen in ornamental flowers.
Most common type in bacteria: Insertion Sequences (IS)
– roughly 1-3 kbp long, containing a transposase gene, and are
bounded by short (10-40 bp) inverted repeats
– many different families, not well conserved across species
Transposons are longer TEs, usually composed of 2 IS elements and
a gene(s) in between, often an antibiotic resistance gene.
Retro Elements
•
•
RNA transposable elements are called retrotransposons in
eukaryotes. They are characterized by the use of reverse
transcriptase in their life cycle.
They are related to retroviruses, such as HIV, feline leukemia
virus, etc
–
•
•
There are a variety of retro element types, some of which
contain long terminal repeats (LTRs) and some of which don’t.
Also, there are many non-functional, degenerate sequences
in eukaryotic genomes that started out as retrotransposons.
–
•
. Retrotransposons lack the gene necessary to move outside the
cell.
Up to 25% of the human genome.
In bacteria, the common RNA TE is a “mobile group II intron”.
– When transcribed into messenger RNA they can splice
themselves out without the need for proteins
– group II introns contain a gene for reverse transcriptase,
which copies the RNA back into DNA at a new location
in the genome.
Recombination-Induced Mutations
• Most recombination occurs between
homologous sites: two chromosomes
line up in meiosis and have a breakand-rejoin event at the same location,
resulting in daughter chromosomes
that contain a mixture of alleles from
both parents.
• However, any two sites that contain
similar DNA sequences can pair up
and have a crossover. These events
can significantly rearrange the
genome.
Hemophilia A: Inversion Problems
•
•
•
•
•
•
•
The clotting factor VIII gene, F8, is on the X
chromosome and is the major cause of
hemophilia.
F8 is a large gene, and completely contained
within intron 22 are two small genes
transcribed from the opposite strand.
One of these genes, F8A, has another copy
several hundred kb away, on the opposite
strand. Thus, these two very similar genes
are in opposite orientation.
Sometimes crossing over during meiosis will
pair these regions are recombination will
occur. This results in an inversion.
The inversion completely disrupts the main
F8 gene, because its 5’ half is now inverted
and far away from its 3’ half.
This accounts for about 45% of hemophilia A
cases.
Almost all new cases arise during male
meiosis: in females, the two homologous X
chromosomes are paired, which seems to
inhibit this inversion.
Tandem Duplications
• Genes are duplicated if there is more than one copy present in the
haploid genome.
– Some duplications are “dispersed”, found in very different locations from
each other.
– Other duplications are “tandem”, found next to each other.
• Tandem duplications play a major role in evolution, because it is
easy to generate extra copies of the duplicated genes through the
process of unequal crossing over.
– These extra copies can then mutate to take on altered roles in the cell,
or they can become pseudogenes, inactive forms of the gene, by
mutation.
• Most commonly tandem duplications affect only one gene, resulting
in an array of very similar genes.
– Sometimes duplicated regions exist within a gene, which can cause
havoc in trying to align the sequences
Unequal Crossing Over
•
•
•
Unequal crossing over happens during prophase
of meiosis 1. Homologous chromosomes pair at
this stage, and sometimes pairing occurs between
the similar but not identical copies of a tandem
duplication. If a crossover occurs within the
mispaired copies, one of the resulting gametes will
have an extra copy of the duplication and the
other will be missing a copy.
As an example, the beta-globin gene cluster in
humans contains 6 genes, called epsilon (an
embryonic form), gamma-G, gamma-A (the
gammas are fetal forms), pseudo-beta-one (an
inactive pseudogene), delta (1% of adult beta-type
globin), and beta (99% of adult beta-type globin.
Gamma-G and gamma-A are very similar, differing
by only 1 amino acid.
If mispairing in meiosis occurs, followed by a
crossover between delta and beta, the hemoglobin
variant Hb-Lepore is formed. This is a gene that
starts out delta and ends as beta. Since the gene
is controlled by DNA sequences upstream from
the gene, Hb-Lepore is expressed as if it were a
delta. That is, it is expressed at about 1% of the
level that beta is expressed. Since normal beta
globin is absent in Hb-Lepore, the person has
severe anemia.
Chromosome Breaks
•
•
•
DNA sometimes breaks due to mechanical stress,
ionizing radiation, or chemical attack.
Most organisms contain enzymes that reassemble
broken DNA molecules, called non-homologous
end joining.
If there is more than one break, ends are joined
randomly, which can lead to a rearranged
genome.
– This breaks up blocks of genes over evolutionary
time
Chromosome
Rearrangements
•
•
•
•
•
When comparing mammalian genomes, it is clear that synteny is common:
when two genes are neighbors in one species, they are usually neighbors in
other species.
However, comparing the genomes of two species show the results of
multiple translocations and inversions. Blocks of syntenic genes are seen,
but often spread across multiple chromosomes.
– Average size of synteny blocks between mouse and humans is 10 Mbp.
– Partly a consequence of the fact that genes on a chromosome mostly
don’t interact with their neighbors.
New centromeres often form in what was previously euchromatin.
Centromere sequences evolve rapidly.
The difference between human and chimp chromosomes (23 vs 24) is due
to a translocation that connected the long arms of two ape chromosomes
into a single human chromosome.
Notable exception is the X chromosome: most X genes stay on the X over
long evolutionary time. Problems with dosage compensation.
Genome Changes in
Evolution
•
•
•
There are very few genes found in humans and
nowhere else.
Most of the differences between us and our closest
relatives are changes in gene families, altered
functions of existing genes, and changes in regulatory
sequences.
Human vs. chimpanzee:
–
–
–
–
–
–
For sequences that can be aligned: 1.2% base substitutions,
plus 3% differences in insertions and deletions (indels). There
are fewer indels than base substitutions, but indels can cover
many more bases.
1500 inversions, from very small (23 bp to 62 Mbp). 23 bp is at
the detection limit for BLAST searches, and there are probably
plenty of smaller inversions.
Several hundred changes in gene family copy number
Lots of changes in repeat sequences (3 x as many Alu
elements in humans as in chimps)
Loss of function in about 80 genes (half of which are olfactory
receptors).
About 29% of all proteins with clear orthologs are identical
between humans and chimps, and most of the rest differ by
only 1 or 2 amino acids.
Whole Genome
Duplication
•
As the name implies, a whole genome duplication is an
event where the genome size doubles, going from diploid to
tetraploid.
–
•
•
•
•
These events also require the chromosomes to pair up as if they
were diploids during meiosis. Otherwise the organism would not
produce offspring.
Common in plants, but very rare in animals. Plants can
undergo many generations of clonal (non-sexual) propagation.
Two duplications in vertebrate lineage between when tunicates
(urochordates) split from the rest of the chordates and when
the cephalochordates (like Amphixous) split off.
A third duplication in bony fish lineage, after they split from the
tetrapod lineage.
Maintaining a polyploid state occurs frequently in amphibians
and reptiles, but it is thought that X chromosome inactivation
and the problems of maintaining gene balance with 2 different
sex chromosomes makes this very difficult in the mammals.
–
The problem can be seen with the abnormalities associated
with XXY and XO indivuals: Klinefelter and Turner
syndromes.
Diploidization
•
•
After a genome duplication, most
of the genes are duplicated. What
follows is a period of
diploidization, trying to regain the
stable diploid state, during which
many genes lose one or the other
copy. The result is that most
genes end up with just one copy.
Some genes retain both copies,
and often there will be a functional
divergence: they take on different
roles.
–
Notably, the Hox genes have
retained all 4 copies: there are 4
clusters on different chromosomes
that are recognizably similar all the
way from the coelocanths
(cartiligenous fishes on the tetrapod
side of the fish/tetrapod split) to
humans.
Hox Genes
•
•
•
•
Hox genes specify segment
identity: different members of the
cluster are expressed in different
segments as you move from
anterior to posterior. Hox genes
make transcription factors.
Order of expression on the
chromosome is the same as order
in the body.
Same mechanism used in and all
bilateran animals. First described
and understood Drosophila.
Conservation is enough that a
Drosophila Hox gene works
correctly when put into chickens.
Hox genes contain a homeobox
domain, which is also found in
plants and serves a similar role in
development.
Horizontal Gene Transfer
•
In eukaryotes, there is little doubt that almost all
genes are transmitted from parent to offspring,
with each species having a separate line of
descent.
–
–
•
This is much less true in the prokaryotes, where a
great deal of DNA is transferred across species
lines.
–
•
Large exceptions: endosymbionts, the mitochondria
and chloroplasts. Many genes from these formerly
free-living organisms have migrated into the nucleus.
There are other cases of single genes being
transferred horizontally.
I have seen an estimate of 15% of all prokaryotic
genes are derived from horizontal transfers
Horizontal gene transfer is usually identified by
performing phylogenetic linage studies on
individual genes, and seeing that some gene has
more in common with genes in distant species
than with genes in closely related species.
Sources of New DNA
•
•
Bacteria reproduce by binary
fission: replicating their DNA,
then splitting in half. Each
cell has only 1 parent, and
there is no regular sexual
process.
Bacteria have 3 main ways
of bringing in new DNA:
– conjugation: direct transfer
of DNA between 2 cells
(although not necessarily of
the same species)
– transduction: transfer of
DNA between cells using a
bacteriophage (virus) as an
intermediate
– transformation: the cell
takes up DNA molecules
from the environment
Lysogenic Bacteriophage
•
•
•
Bacteriophage (phage) are bacterial viruses: DNA (or RNA) surrounded by
a protein coat, but with no internal metabolic activity.
Most bacteriophage enter the cell, hijack its machinery to reproduce
themselves, and then kill the cell by lysing it (breaking it open). This is
called the lytic cycle.
Some phage have the ability to insert themselves into the bacterial genome
and remain there, inactive, for many generations: the lysogenic cycle.
– First described in phage lambda
– the inserted phage chromosome is called the prophage.
•
•
When conditions get harsh, the phage DNA comes out of the chromosome
and enters the normal lytic pathway. It reproduces and kills the host cell.
Sometimes the prophage is inactivated by mutation and becomes a
permanent part of the chromosome.