* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Mutation and DNA Repair
United Kingdom National DNA Database wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
DNA polymerase wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genomic library wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Human genome wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Genetic engineering wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA vaccination wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
DNA supercoil wikipedia , lookup
Epigenomics wikipedia , lookup
Primary transcript wikipedia , lookup
Frameshift mutation wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome (book) wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Genome evolution wikipedia , lookup
Microsatellite wikipedia , lookup
Non-coding DNA wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Designer baby wikipedia , lookup
Oncogenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Mutation Types and Sources • Mutation is a decay force whose ultimate roots are in the second law of thermodynamics (entropy). Living things survive inevitable mutations by a combination of being tolerant of a certain level of mutation, repairing mutational damage, killing cells that are mutated beyond repair, and relying on natural selection to remove individuals with unfavorable mutations. • Simple mutations: base substitutions and small indels. “Indel” stands for insertiondeletion, which is based on the idea that when you see a difference in DNA sequence between two species it is usually difficult to tell whether there was an insertion in one species or a deletion in the other. More complex mutations are larger events involving the insertion, rearrangement, or deletion of large pieces of DNA. Typical events include fusion of two different genes and insertion of transposable elements. • • • Internal sources: DNA polymerase can insert the wrong nucleotide or slip at a certain rate. Transposable elements can move, cause other sections of DNA to move, or produce reverse transcriptase that acts on other messenger RNAs. External sources: damage to the DNA caused by chemicals in the environment, including oxygen, or by radiation. DNA Polymerase • • • DNA polymerase, the enzyme that replicates DNA, is not perfectly accurate. One problem is that bases spontaneously undergo a “keto-enol shift”, where a hydrogen moves its position in ketones. Guanine and thymine bases are subject to this at a low rate, and it causes mispairing. DNA polymerase has a proofreading function, a 3’ to 5’ exonuclease activity, which backs up and removes newly inserted nucleotides if they are mispaired. This function lowers the DNA polymerase error rate from about 1 error in 106 nucleotides to about 1 in 109. Still, that is about 6 errors every time the genome is replicated. DNA polymerase also can slip, especially when replicating short repeats (microsatellites). This generates small indels. CpG Islands • • • • • • • Another chemical instability is that cytosine occasionally gets deaminated: it loses an amino group. This converts it into uracil, which is not a DNA base and is removed by repair enzymes. However, in many places, a C followed by a G (CpG: the “p” is the connecting phosphate) gets methylated: a CH3 group is attached to the 5 position on the ring. When 5-methyl cytosine is spontaneously deaminated, it is converted to thymine, a standard DNA base. Replication leads to a base change: one daughter stays a C-G base pair while the other is converted to T-A. Over evolutionary time, this has led to a loss of CpG dinucleotides in human DNA. However, methylation of cytosine is associated with gene inactivation, and genes that are expressed in most cells (housekeeping genes) usually do not have methylated cytosines at their 5’ ends. In these areas, the frequency of CpG stays high. These areas of high CpG are called “CpG islands”. There are about 30,00 of them in the human genome, and most of them are associated with genes. However, the presence of a CpG island does not necessarily imply the existence of a gene, and vice versa. Base Substitutions • Two basic types: – – • • • transition: converting one purine to the other purine, or one pyrimidine into the other pyrimidine. transversion: converting a purine to a pyrimidine or the reverse. Logically, transversions should be twice as frequent since there are twice as many of them as transitions. However, in practice, transitions are about twice as common as transversion. Due to a combination of natural selection and ease of occurrence. Neutral substitution rate: how often to nucleotides change in the absence of selection pressure. In a comparison of the human and mouse genomes, 165 Mbp of DNA associated with non-functional transposon sequences were identified in both species. These had about 67% identical bases, and models implied a rate of 0.46 substitutions per position over the 75 million years since the human and mouse lineages diverged.. This works out to 2 x 10-9 substitutions per year for each site, in the absence of selection pressure. This estimate agrees with other estimates based on different methods. Substitutions Within Genes • • • We mostly care about the functional parts of the genome, the genes and their control regions. Since most of the genes are presumably necessary for life, some mutations will be deleterious and others not. In the human-mouse genome comparison, variation in the rate of substitutions across the various portions of genes was clear: fewest in the exons, most in the introns, and an intermediate amount in the UTRs and flanking regions. For coding regions, the degeneracy of the genetic code has a large effect. – – – • some sites are non-degenerate: any change results in a different amino acid. 65% of human codons. other sites are two-fold degenerate: transitions give the same amino acid while transversions give a different amino acid. 19% of codon sites. other sites are four-fold degenerate: any mutation gives the same amino acid. These sites are all third positions of codons. 16% of codon sites. Mutations that give the same amino acid are called silent or synonymous mutations. They are presumed to be selectively neutral. More on Substitution • In addition to synonymous mutations, some amino acid changes are “conservative” in that they have little or no affect on the protein’s function. – – – – • for example, isoleucine and valine are both hydrophobic and readily substitute for each other. other amino acid substitutions are very unlikely: leucine (hydrophobic) for aspartic acid (hydrophilic and charged). This would be a non-conservative substitution. Some amino acids play unique roles: cysteines form disulfide bridges, prolines induce kinks in the chain, etc. However, some amino acids are critical fro active sites and cannot be substituted. Tables of substitution frequencies for all pairs of amino acids have been generated. BLOSUM62 Table. Numbers on the diagonal indicate the likelihood of the amino acid staying the same. The off-diagonal numbers are relative substitution frequencies. Detecting Natural Selection • • • Patterns of base substitution within a gene can be used as evidence for natural selection, by comparing the ratio of synonymous to non-synonymous substitutions. Compare orthologs: genes in two different species that can be traced to a common ancestor. Can also compare paralogs within a species: genes resulting from duplication. – • • Measured by comparing KS, the number of synonymous substitutions per site, to KA, the number of non-synonymous substitutions per site. Note that these numbers are corrected for the different levels of degeneracy for each site. The summary statistic is the KA / KS ratio. Possible results. – – – • a confounding problem: can you accurately identify orthologs between species, or are you comparing paralogs between the species? neutral selection: the gene is apparently not being selected. Often seen when a pseudogene is compared to a functional gene. Synonymous and non-synonymous substitutions occur at the same frequency. KA / KS = 1. negative (purifying) selection: the gene is being selected for similar functions in both species. Synonymous substitutions are more frequent than non-synonymous. KA / KS < 1 positive (disruptive) selection: the gene is being selected for different functions in the two species. An unexpectedly high number of non-synonymous substitutions. KA / KS > 1 The median KA / KS value for humans vs. mice was 0.115. The lowest value (greatest purifying selection) was for calmodulin, histones, ribosomal proteins, ubiquitin, actin: genes involved with critical cellular functions common to all organisms. The highest ratios were seen for defense and immune response proteins Trinucleotide Repeats • Trinucleotide repeats (TNRs) are a type of microsatellite, an array of 3 bp repeats. • DNA polymerase often slips at TNRs, increasing or decreasing the copy number. • Because a codon is 3 bp long, TNRs within a coding region don’t change the reading frame. • However, some TNRs cause diseases even though they are in the UTRs. • There are only 10 possible TNRs, considering the two DNA strands and the different orders you could write the bases. For example, the TNR that causes Fragile X syndrome could be written as CCG, CGC, GCC, GGC, or GCG. • Below a certain number, the repeats are relatively stable. But, above that, the copy number can change drastically in both mitosis and meiosis. These alleles are called “pre-mutation alleles”. Above an even higher point, the mutant phenotype appears. • Several mechanisms for causing diseases. Huntington Disease • • • • • • • Huntington Disease. A dominant autosomal disease, with most people heterozygotes. Onset usually in middle age. Neurological: starts with irritability and depression, includes fidgety behavior and involuntary movement (chorea), followed by psychosis and death. Caused by CAG repeats within the coding region, giving a tract of glutamines. Below 28 copies is normal, between 28 and 34 copies is the premutation allele: normal phenotype but unstable copy number that puts the next generation at risk. Above 34 copies gives the disease. HD shows “anticipation”: the age of onset gets earlier with every generation. This is due to a direct correlation between copy number and age of onset. There is a genetic test for the disease, but in the absence of effective treatment few actually take the test. Function of the protein remains unknown, the excess glutamines cause it to aggregate and (probably) poison the nerve cells. Fragile X Syndrome • • • • • • • • Fragile X syndrome. The most common form of human mental retardation. The phenotype includes moderate to severe mental retardation, macroorchidism, large ears, prominent jaw, and high-pitched, jocular speech. Expression is variable, with mental retardation the most common feature. Males having only 1 X, are affected more frequently and severely than females. Appears as a secondary constriction on the X, which appears in cells starved for folate. The X can actually break at that point, but this isn’t a common feature. Caused by CGG repeats in the 5’ UTR of the FMR1 gene. Normal copy number is about 30. Between 55 and 200 copies, the copy number is unstable, but the person is normal. Above 200 copies, the mutant phenotype appears. The gene gets heavily methylated and is not expressed. The function of the protein is unclear, but it is an RNA-binding protein that seems to be involved with translational regulation, possibly through RNA interference as part of the RISC complex. Mutations Affecting RNA • Altered promoters, splice sites, poly-A addition sites. Gene Conversion • If a cell contains two different copies of a gene, either on homologous chromosomes or as paralogs, sometimes one copy will “convert” the other copy to its sequence. – This is the mechanism that keeps the two copies of important genes in the Y chromosome identical. • Gene conversion (at least between homologues) is a normal outcome of recombination. We need to look at the Holliday molecular model of recombination to understand this. This model is a bit simple compared to current theory, but is still basically correct. • The homologues are paired in prophase of meiosis 1. Single stranded breaks in both homologues are catalyzed by recombinase. The free ends invade the homologous DNA, forming heteroduplexes. “Branch migration” occurs and the heteroduplexes are extended. • • • More Gene Conversion • Recombinase cuts the DNA molecules • Two possibilities at this point, occurring with equal frequency. • 1. A “north-south” cut occurs after the 2 DNA molecules twist relative to each other. The result is a crossover: the two homologues are broken and rejoined at this point, giving recombinant chromosomes. Note that there is a heteroduplex region at the breakpoint. More Gene Conversion • 3. The other possibility is that an “east-west” cut occurs. This gives a short heteroduplex region, but the 2 chromosomes are still intact: no crossover has occurred. • However, if the heteroduplex occurs within a gene that is being monitored, it will result in an offspring with an altered gene: gene conversion. Steroid 21-Hydroxylase Deficiency • The medical condition is “congenital adrenal hyperplasia”, an autosomal recessive condition. 21-hydroxylase is an enzyme necessary for converting cholesterol into aldosterone and cortisol. Aldosterone affects kidney function: causes salt to be retained. Cortisol is the main stress response hormone. – – • The biggest problem is that hormone precursors build up in the adrenals and get converted to testosterone, the major male hormone. This causes the external genitalia to develop into the male pattern, or develop “ambiguous genitalia” regardless of the individual’s gender (“virilization”). In milder cases, and in males, puberty occurs early in childhood. Female embryos develop a normal uterus and ovaries. In some cases, salt is not retained in the body well, which is life-threatening but treatable with hormones. The functional gene, CYP21A2, is located about 30 kb from a pseudogene, CYP21A2P on chromosome 6p. The pseudogene contains 9 mutations that inactivate it. Almost all cases result from one of two causes: 1 2 An unequal crossing over between these loci, resulting in a normal 5’ end of the gene and a mutant 3’ end (from the pseudogene), plus deletion of all the intervening sequences. Gene conversion converts part of the normal allele to the pseudogene sequence. Hemophilia A: Inversion Problems • • • • • • • The clotting factor VIII gene, F8, is on the X chromosome and is the major cause of hemophilia. F8 is a large gene, and completely contained within intron 22 are two small genes transcribed from the opposite strand. One of these genes, F8A, has another copy several hundred kb away, on the opposite strand. Thus, these two very similar genes are in opposite orientation. Sometimes crossing over during meiosis will pair these regions are recombination will occur. This results in an inversion. The inversion completely disrupts the main F8 gene, because its 5’ half is now inverted and far away from its 3’ half. This accounts for about 45% of hemophilia A cases. Almost all new cases arise during male meiosis: in females, the two homologous X chromosomes are paired, which seems to inhibit this inversion. Transposable Element Insertions • Functional copies of LINE-1 elements, Alu sequences, and some endogenous retroviral sequences (LTR retrotransposons) exist in the human genome. They occasionally transpose into genes that give a detectable phenotype. • The first examples found were two independent insertions of the 3’ end of LINE-1 into exons of the clotting factor 8 gene. Additional examples have been found since. • Transposable element movement has also been implicated in cancer and the chromosome rearrangements that accompany it. • Recombination between Alu sequences in different parts of the genome can generate deletions. DNA Damage • A list of agents that damage DNA: – ionizing radiation: induces breaks in DNA – Ultraviolet light: crosslinks adjacent thymidines (thymidine dimers). – alkylating agents: attach hydrocarbon groups to bases, either blocking DNA polymerase or crosslinking the bases – intercalating agents: slip between the DNA bases and cause DNA polymerase to insert extra bases or misread the sequence. – depurination: the link between purine bases and the deoxyribose spontaneously breaks – deaminination: loss of amino group from cytosine convers it to uracil – reactive oxygen: peroxide and superoxide attack the purine and pyrimidine rings DNA Repair • • • • There are at least 5 separate DNA repair mechanisms in human cells Direct repair, simply reversing the damage, is possible in some cases, notably removing methyl groups from guanine. Base excision repair. A damaged base is removed from its sugar by a DNA glycosylase (several types). After this, the DNA strand is cut by AP endonuclease and the sugarphosphate without its base is removed from the DNA chain. A new nucleotide is added by DNA polymerase and the chain is religated. Nucleotide excision repair. Abnormal bases, including thymidine dimers, are removed along with a number of surrounding bases. The missing section is then resynthesized and ligated. Xeroderma pigmentosum, a genetic disease that causes extreme sensitivity to sunlight, is due to defects in this repair system. DNA Repair • • • Post-replication repair. Double stranded breaks are repaired by randomly joining DNA ends, or by a gene-conversion-like mechanism that involves the homologous chromosome. The breast cancer susceptibility genes BRCA1 and BRCA2 are involved in this pathway. Mismatch repair. Mispaired bases (those not caught by the DNA polymerase’s editing function) are repaired by an enzyme complex that moves along the DNA. When it finds a mismatched base pair, it removes a number of bases on one of the DNA strands and re-synthesizes them. The gene for hereditary non-polyposis colon cancer is involved in this system. In addition, cells with DNA damage are often induced to kill themselves through the process of apoptosis, or they stop dividing by not entering the S phase of the cell cycle. More on this when we talk about cancer. • • • • • • • • • In the past 10 years, at least 10,000 cancer exomes and 2500 while genomes from tumors have been sequenced Tumors contain 1000-20,000 point mutations and several hundred indels and rearrangements. Wide level of variation, mostly associated with age of the patient, and also with level of exposure to known mutagens like cigarette smoke (lungs) or UV (skin). – Some tumors develop hypermutation due to loss of repair pathways or chromosome integrity checkpoints Different mutational processes have characteristic signature or spectra of mutaitons produced. CpG deamination is an important process. Anotehr problem is enzymes that do deamination in RNA editing by cytidine deaminase APOBEC enzymes (as in apolipoprotein). Some family members, esp. APOBEC3G also work on DNA, which is where the cancer problem arises. Induces C->T and G->A mutations. They are targeted for a very specific C, but ceratin percentage of the time they deaminate the wrong C. Transcription-coupled DNA repair is a type of nucleotide excision repair. It repairs the transcribed strand in preference to the non-transcribed strand. Microhomology-mediated end joining DNA repair leads to small indels. BRCA1 and 2 affect this pathway. Chromothripsis: a single chromosome (sometimes 2) is broken up into many (10’s to hunderds) pieces that are randomly reassembled. Seemingly a single event, but mechanism is unclear. Driver mutations: the ones that are actively selected for in growing tumors. Most other mutations are just coming along by linkage. Currently there are 572 genes that have recurring mutaiotns in tumors, and only 3 of tehm have been found in more than 10% of tumors of different types. – – TP53 is the biggest of tehm: in 36% of all tumors Also a few regualroty mutaitons. One mutation in the promoter of the telomerase gene TERT is found in 71% of melanomas and more than half of bladder cancers and glioblastomas. Leads to a new transcription factor binding site that overexpresses TERT • • • • • • • Somatic mutation rate: 2-10 point mutations per diploid genome per cell division. Probablay 10x the rate in germ line cells. As we age, the number of accumulated mutations increases. The reate of increase of various tumors with age suggests 5 or so mutaitons needed to get cancer. Stem cells in skin , esophagus, and lung have 3 types of cell division: assymetric, procuding a stgem cell and a differentiated cell, producing 2 stem cells (proliferation) or producing 2 differentiated cells (differentiated). Changing the relative frequencies of tehse can cause cancer: TP53 and NOTCH1 both do this. Precancerous preogession occurs in some tumor types: cervical cancer, colon polyps, breast ductal carcinoma in situ Only 2 major forms of childhood caner: leukemia and brain. Our brains have evolved very fast, and the DNA repair mechanism probably haven’t kept up Natural selection doesn’t work past the age of reproduction, so no selective advantage to preventing late life cancers. H. Martincorena and P.J. Campbell. Somatic Mutaion in cancer and normal cells. Science 349: 1483-1488 (2015)