Download Mutation and DNA Repair

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

United Kingdom National DNA Database wikipedia , lookup

NEDD9 wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

DNA polymerase wikipedia , lookup

Transposable element wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomic library wikipedia , lookup

DNA repair wikipedia , lookup

RNA-Seq wikipedia , lookup

Replisome wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Human genome wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Genetic engineering wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA vaccination wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

DNA supercoil wikipedia , lookup

Epigenomics wikipedia , lookup

Primary transcript wikipedia , lookup

Mutagen wikipedia , lookup

Frameshift mutation wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Genomics wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Genome evolution wikipedia , lookup

Microsatellite wikipedia , lookup

Non-coding DNA wikipedia , lookup

Mutation wikipedia , lookup

Gene wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Designer baby wikipedia , lookup

Oncogenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Mutation
Types and Sources
•
Mutation is a decay force whose ultimate roots are in the second law of
thermodynamics (entropy). Living things survive inevitable mutations by a
combination of being tolerant of a certain level of mutation, repairing mutational
damage, killing cells that are mutated beyond repair, and relying on natural selection
to remove individuals with unfavorable mutations.
•
Simple mutations: base substitutions and small indels. “Indel” stands for insertiondeletion, which is based on the idea that when you see a difference in DNA sequence
between two species it is usually difficult to tell whether there was an insertion in one
species or a deletion in the other.
More complex mutations are larger events involving the insertion, rearrangement, or
deletion of large pieces of DNA. Typical events include fusion of two different genes
and insertion of transposable elements.
•
•
•
Internal sources: DNA polymerase can insert the wrong nucleotide or slip at a certain
rate. Transposable elements can move, cause other sections of DNA to move, or
produce reverse transcriptase that acts on other messenger RNAs.
External sources: damage to the DNA caused by chemicals in the environment,
including oxygen, or by radiation.
DNA Polymerase
•
•
•
DNA polymerase, the enzyme that
replicates DNA, is not perfectly
accurate. One problem is that bases
spontaneously undergo a “keto-enol
shift”, where a hydrogen moves its
position in ketones. Guanine and
thymine bases are subject to this at a
low rate, and it causes mispairing.
DNA polymerase has a proofreading
function, a 3’ to 5’ exonuclease
activity, which backs up and removes
newly inserted nucleotides if they are
mispaired. This function lowers the
DNA polymerase error rate from about
1 error in 106 nucleotides to about 1 in
109. Still, that is about 6 errors every
time the genome is replicated.
DNA polymerase also can slip,
especially when replicating short
repeats (microsatellites). This
generates small indels.
CpG Islands
•
•
•
•
•
•
•
Another chemical instability is that cytosine
occasionally gets deaminated: it loses an amino
group. This converts it into uracil, which is not a DNA
base and is removed by repair enzymes.
However, in many places, a C followed by a G (CpG:
the “p” is the connecting phosphate) gets methylated:
a CH3 group is attached to the 5 position on the ring.
When 5-methyl cytosine is spontaneously
deaminated, it is converted to thymine, a standard
DNA base. Replication leads to a base change: one
daughter stays a C-G base pair while the other is
converted to T-A.
Over evolutionary time, this has led to a loss of CpG
dinucleotides in human DNA.
However, methylation of cytosine is associated with
gene inactivation, and genes that are expressed in
most cells (housekeeping genes) usually do not have
methylated cytosines at their 5’ ends. In these areas,
the frequency of CpG stays high.
These areas of high CpG are called “CpG islands”.
There are about 30,00 of them in the human genome,
and most of them are associated with genes.
However, the presence of a CpG island does not
necessarily imply the existence of a gene, and vice
versa.
Base Substitutions
•
Two basic types:
–
–
•
•
•
transition: converting one purine to the other purine, or
one pyrimidine into the other pyrimidine.
transversion: converting a purine to a pyrimidine or the
reverse.
Logically, transversions should be twice as frequent
since there are twice as many of them as transitions.
However, in practice, transitions are about twice as
common as transversion. Due to a combination of
natural selection and ease of occurrence.
Neutral substitution rate: how often to nucleotides
change in the absence of selection pressure. In a
comparison of the human and mouse genomes, 165
Mbp of DNA associated with non-functional transposon
sequences were identified in both species. These had
about 67% identical bases, and models implied a rate
of 0.46 substitutions per position over the 75 million
years since the human and mouse lineages diverged..
This works out to 2 x 10-9 substitutions per year for
each site, in the absence of selection pressure. This
estimate agrees with other estimates based on
different methods.
Substitutions Within Genes
•
•
•
We mostly care about the functional parts of
the genome, the genes and their control
regions. Since most of the genes are
presumably necessary for life, some
mutations will be deleterious and others not.
In the human-mouse genome comparison,
variation in the rate of substitutions across
the various portions of genes was clear:
fewest in the exons, most in the introns, and
an intermediate amount in the UTRs and
flanking regions.
For coding regions, the degeneracy of the
genetic code has a large effect.
–
–
–
•
some sites are non-degenerate: any change
results in a different amino acid. 65% of
human codons.
other sites are two-fold degenerate:
transitions give the same amino acid while
transversions give a different amino acid. 19%
of codon sites.
other sites are four-fold degenerate: any
mutation gives the same amino acid. These
sites are all third positions of codons. 16% of
codon sites.
Mutations that give the same amino acid are
called silent or synonymous mutations.
They are presumed to be selectively neutral.
More on Substitution
•
In addition to synonymous
mutations, some amino acid
changes are “conservative” in
that they have little or no affect
on the protein’s function.
–
–
–
–
•
for example, isoleucine and
valine are both hydrophobic
and readily substitute for each
other.
other amino acid substitutions
are very unlikely: leucine
(hydrophobic) for aspartic acid
(hydrophilic and charged). This
would be a non-conservative
substitution.
Some amino acids play unique
roles: cysteines form disulfide
bridges, prolines induce kinks
in the chain, etc.
However, some amino acids
are critical fro active sites and
cannot be substituted.
Tables of substitution
frequencies for all pairs of
amino acids have been
generated.
BLOSUM62 Table. Numbers on the diagonal
indicate the likelihood of the amino acid
staying the same. The off-diagonal numbers
are relative substitution frequencies.
Detecting Natural Selection
•
•
•
Patterns of base substitution within a gene can be used as evidence for natural
selection, by comparing the ratio of synonymous to non-synonymous substitutions.
Compare orthologs: genes in two different species that can be traced to a common
ancestor.
Can also compare paralogs within a species: genes resulting from duplication.
–
•
•
Measured by comparing KS, the number of synonymous substitutions per site, to KA,
the number of non-synonymous substitutions per site. Note that these numbers are
corrected for the different levels of degeneracy for each site. The summary statistic
is the KA / KS ratio.
Possible results.
–
–
–
•
a confounding problem: can you accurately identify orthologs between species, or are you
comparing paralogs between the species?
neutral selection: the gene is apparently not being selected. Often seen when a pseudogene
is compared to a functional gene. Synonymous and non-synonymous substitutions occur at
the same frequency. KA / KS = 1.
negative (purifying) selection: the gene is being selected for similar functions in both species.
Synonymous substitutions are more frequent than non-synonymous. KA / KS < 1
positive (disruptive) selection: the gene is being selected for different functions in the two
species. An unexpectedly high number of non-synonymous substitutions. KA / KS > 1
The median KA / KS value for humans vs. mice was 0.115. The lowest value
(greatest purifying selection) was for calmodulin, histones, ribosomal proteins,
ubiquitin, actin: genes involved with critical cellular functions common to all
organisms. The highest ratios were seen for defense and immune response proteins
Trinucleotide Repeats
• Trinucleotide repeats (TNRs) are a type of microsatellite, an array of
3 bp repeats.
• DNA polymerase often slips at TNRs, increasing or decreasing the
copy number.
• Because a codon is 3 bp long, TNRs within a coding region don’t
change the reading frame.
• However, some TNRs cause diseases even though they are in the
UTRs.
• There are only 10 possible TNRs, considering the two DNA strands
and the different orders you could write the bases. For example,
the TNR that causes Fragile X syndrome could be written as CCG,
CGC, GCC, GGC, or GCG.
• Below a certain number, the repeats are relatively stable. But,
above that, the copy number can change drastically in both mitosis
and meiosis. These alleles are called “pre-mutation alleles”. Above
an even higher point, the mutant phenotype appears.
• Several mechanisms for causing diseases.
Huntington Disease
•
•
•
•
•
•
•
Huntington Disease. A dominant
autosomal disease, with most people
heterozygotes.
Onset usually in middle age.
Neurological: starts with irritability and
depression, includes fidgety behavior
and involuntary movement (chorea),
followed by psychosis and death.
Caused by CAG repeats within the
coding region, giving a tract of
glutamines. Below 28 copies is normal,
between 28 and 34 copies is the
premutation allele: normal phenotype
but unstable copy number that puts the
next generation at risk. Above 34
copies gives the disease.
HD shows “anticipation”: the age of
onset gets earlier with every generation.
This is due to a direct correlation
between copy number and age of onset.
There is a genetic test for the disease,
but in the absence of effective treatment
few actually take the test.
Function of the protein remains
unknown, the excess glutamines cause
it to aggregate and (probably) poison the
nerve cells.
Fragile X Syndrome
•
•
•
•
•
•
•
•
Fragile X syndrome. The most common
form of human mental retardation.
The phenotype includes moderate to
severe mental retardation,
macroorchidism, large ears, prominent jaw,
and high-pitched, jocular speech.
Expression is variable, with mental
retardation the most common feature.
Males having only 1 X, are affected more
frequently and severely than females.
Appears as a secondary constriction on
the X, which appears in cells starved for
folate. The X can actually break at that
point, but this isn’t a common feature.
Caused by CGG repeats in the 5’ UTR of
the FMR1 gene.
Normal copy number is about 30.
Between 55 and 200 copies, the copy
number is unstable, but the person is
normal. Above 200 copies, the mutant
phenotype appears.
The gene gets heavily methylated and is
not expressed.
The function of the protein is unclear, but it
is an RNA-binding protein that seems to be
involved with translational regulation,
possibly through RNA interference as part
of the RISC complex.
Mutations Affecting RNA
• Altered promoters, splice sites, poly-A
addition sites.
Gene Conversion
•
If a cell contains two different copies of
a gene, either on homologous
chromosomes or as paralogs,
sometimes one copy will “convert” the
other copy to its sequence.
–
This is the mechanism that keeps the
two copies of important genes in the Y
chromosome identical.
•
Gene conversion (at least between
homologues) is a normal outcome of
recombination. We need to look at the
Holliday molecular model of
recombination to understand this. This
model is a bit simple compared to
current theory, but is still basically
correct.
•
The homologues are paired in
prophase of meiosis 1.
Single stranded breaks in both
homologues are catalyzed by
recombinase.
The free ends invade the homologous
DNA, forming heteroduplexes.
“Branch migration” occurs and the
heteroduplexes are extended.
•
•
•
More Gene Conversion
• Recombinase cuts the DNA
molecules
• Two possibilities at this
point, occurring with equal
frequency.
• 1. A “north-south” cut occurs
after the 2 DNA molecules
twist relative to each other.
The result is a crossover: the
two homologues are broken
and rejoined at this point,
giving recombinant
chromosomes. Note that
there is a heteroduplex
region at the breakpoint.
More Gene Conversion
• 3. The other possibility is
that an “east-west” cut
occurs. This gives a
short heteroduplex
region, but the 2
chromosomes are still
intact: no crossover has
occurred.
• However, if the
heteroduplex occurs
within a gene that is
being monitored, it will
result in an offspring with
an altered gene: gene
conversion.
Steroid 21-Hydroxylase Deficiency
•
The medical condition is “congenital adrenal hyperplasia”, an autosomal recessive
condition. 21-hydroxylase is an enzyme necessary for converting cholesterol into
aldosterone and cortisol. Aldosterone affects kidney function: causes salt to be
retained. Cortisol is the main stress response hormone.
–
–
•
The biggest problem is that hormone precursors build up in the adrenals and get converted
to testosterone, the major male hormone. This causes the external genitalia to develop into
the male pattern, or develop “ambiguous genitalia” regardless of the individual’s gender
(“virilization”). In milder cases, and in males, puberty occurs early in childhood. Female
embryos develop a normal uterus and ovaries.
In some cases, salt is not retained in the body well, which is life-threatening but treatable with
hormones.
The functional gene, CYP21A2, is located about 30 kb from a pseudogene,
CYP21A2P on chromosome 6p. The pseudogene contains 9 mutations that inactivate
it. Almost all cases result from one of two causes:
1
2
An unequal crossing over between these loci, resulting in a normal 5’ end of the gene and a
mutant 3’ end (from the pseudogene), plus deletion of all the intervening sequences.
Gene conversion converts part of the normal allele to the pseudogene sequence.
Hemophilia A: Inversion Problems
•
•
•
•
•
•
•
The clotting factor VIII gene, F8, is on the X
chromosome and is the major cause of
hemophilia.
F8 is a large gene, and completely contained
within intron 22 are two small genes
transcribed from the opposite strand.
One of these genes, F8A, has another copy
several hundred kb away, on the opposite
strand. Thus, these two very similar genes
are in opposite orientation.
Sometimes crossing over during meiosis will
pair these regions are recombination will
occur. This results in an inversion.
The inversion completely disrupts the main
F8 gene, because its 5’ half is now inverted
and far away from its 3’ half.
This accounts for about 45% of hemophilia A
cases.
Almost all new cases arise during male
meiosis: in females, the two homologous X
chromosomes are paired, which seems to
inhibit this inversion.
Transposable Element Insertions
• Functional copies of LINE-1 elements, Alu sequences,
and some endogenous retroviral sequences (LTR
retrotransposons) exist in the human genome. They
occasionally transpose into genes that give a detectable
phenotype.
• The first examples found were two independent
insertions of the 3’ end of LINE-1 into exons of the
clotting factor 8 gene. Additional examples have been
found since.
• Transposable element movement has also been
implicated in cancer and the chromosome
rearrangements that accompany it.
• Recombination between Alu sequences in different parts
of the genome can generate deletions.
DNA Damage
• A list of agents that damage DNA:
– ionizing radiation: induces breaks in DNA
– Ultraviolet light: crosslinks adjacent thymidines (thymidine
dimers).
– alkylating agents: attach hydrocarbon groups to bases, either
blocking DNA polymerase or crosslinking the bases
– intercalating agents: slip between the DNA bases and cause
DNA polymerase to insert extra bases or misread the sequence.
– depurination: the link between purine bases and the deoxyribose
spontaneously breaks
– deaminination: loss of amino group from cytosine convers it to
uracil
– reactive oxygen: peroxide and superoxide attack the purine and
pyrimidine rings
DNA Repair
•
•
•
•
There are at least 5 separate DNA
repair mechanisms in human cells
Direct repair, simply reversing the
damage, is possible in some cases,
notably removing methyl groups from
guanine.
Base excision repair. A damaged
base is removed from its sugar by a
DNA glycosylase (several types).
After this, the DNA strand is cut by
AP endonuclease and the sugarphosphate without its base is
removed from the DNA chain. A new
nucleotide is added by DNA
polymerase and the chain is religated.
Nucleotide excision repair.
Abnormal bases, including thymidine
dimers, are removed along with a
number of surrounding bases. The
missing section is then resynthesized and ligated. Xeroderma
pigmentosum, a genetic disease that
causes extreme sensitivity to
sunlight, is due to defects in this
repair system.
DNA Repair
•
•
•
Post-replication repair. Double stranded
breaks are repaired by randomly joining
DNA ends, or by a gene-conversion-like
mechanism that involves the homologous
chromosome. The breast cancer
susceptibility genes BRCA1 and BRCA2 are
involved in this pathway.
Mismatch repair. Mispaired bases (those
not caught by the DNA polymerase’s editing
function) are repaired by an enzyme
complex that moves along the DNA. When
it finds a mismatched base pair, it removes a
number of bases on one of the DNA strands
and re-synthesizes them. The gene for
hereditary non-polyposis colon cancer is
involved in this system.
In addition, cells with DNA damage are often
induced to kill themselves through the
process of apoptosis, or they stop dividing
by not entering the S phase of the cell cycle.
More on this when we talk about cancer.
•
•
•
•
•
•
•
•
•
In the past 10 years, at least 10,000 cancer exomes and 2500 while genomes from tumors have
been sequenced
Tumors contain 1000-20,000 point mutations and several hundred indels and rearrangements.
Wide level of variation, mostly associated with age of the patient, and also with level of exposure
to known mutagens like cigarette smoke (lungs) or UV (skin).
– Some tumors develop hypermutation due to loss of repair pathways or chromosome integrity
checkpoints
Different mutational processes have characteristic signature or spectra of mutaitons produced.
CpG deamination is an important process.
Anotehr problem is enzymes that do deamination in RNA editing by cytidine deaminase APOBEC
enzymes (as in apolipoprotein). Some family members, esp. APOBEC3G also work on DNA,
which is where the cancer problem arises. Induces C->T and G->A mutations. They are targeted
for a very specific C, but ceratin percentage of the time they deaminate the wrong C.
Transcription-coupled DNA repair is a type of nucleotide excision repair. It repairs the transcribed
strand in preference to the non-transcribed strand.
Microhomology-mediated end joining DNA repair leads to small indels. BRCA1 and 2 affect this
pathway.
Chromothripsis: a single chromosome (sometimes 2) is broken up into many (10’s to
hunderds) pieces that are randomly reassembled. Seemingly a single event, but
mechanism is unclear.
Driver mutations: the ones that are actively selected for in growing tumors. Most
other mutations are just coming along by linkage. Currently there are 572 genes that
have recurring mutaiotns in tumors, and only 3 of tehm have been found in more than
10% of tumors of different types.
–
–
TP53 is the biggest of tehm: in 36% of all tumors
Also a few regualroty mutaitons. One mutation in the promoter of the telomerase gene TERT is found in
71% of melanomas and more than half of bladder cancers and glioblastomas. Leads to a new transcription
factor binding site that overexpresses TERT
•
•
•
•
•
•
•
Somatic mutation rate: 2-10 point mutations per diploid genome per cell division. Probablay 10x
the rate in germ line cells.
As we age, the number of accumulated mutations increases. The reate of increase of various
tumors with age suggests 5 or so mutaitons needed to get cancer.
Stem cells in skin , esophagus, and lung have 3 types of cell division: assymetric, procuding a
stgem cell and a differentiated cell, producing 2 stem cells (proliferation) or producing 2
differentiated cells (differentiated). Changing the relative frequencies of tehse can cause cancer:
TP53 and NOTCH1 both do this.
Precancerous preogession occurs in some tumor types: cervical cancer, colon polyps, breast
ductal carcinoma in situ
Only 2 major forms of childhood caner: leukemia and brain. Our brains have evolved very fast,
and the DNA repair mechanism probably haven’t kept up
Natural selection doesn’t work past the age of reproduction, so no selective advantage to
preventing late life cancers.
H. Martincorena and P.J. Campbell. Somatic Mutaion in cancer and normal cells. Science 349:
1483-1488 (2015)