* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Driscoll Katee Driscoll Dr. Ely Genetics October 20, 2013 Effects of
Public health genomics wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Epigenetics of depression wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Molecular cloning wikipedia , lookup
Oncogenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
SNP genotyping wikipedia , lookup
DNA supercoil wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transposable element wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Cancer epigenetics wikipedia , lookup
DNA vaccination wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Epigenomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Primary transcript wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
History of genetic engineering wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Point mutation wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microsatellite wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Driscoll 1 Katee Driscoll Dr. Ely Genetics October 20, 2013 Effects of CAG Repeat Length in Huntington’s Disease Patients Huntington’s Disease (HD) is an inheritable degenerative neurological condition, which currently has no cure. As Gil and Rego (2008) describe in their review on HD, this disease occurs in approximately 3-10 of every 100,000 individuals. As a result of the degradation of the striatum and other brain structures (namely, the cerebellum, cerebral cortex and thalamus), patients experience the deterioration of muscular and cognitive processes (Gil and Rego, 2008). The striatum receives nervous information from the cerebral cortex and helps to coordinate muscle movement. With its degradation, the ability of muscular movement is diminished. Chorea is the term used to describe the abnormal involuntary movements by HD patients. Similar to the striatum, the degradation of the thalamus and the cerebellum impacts motor movement and some cognitive processes by weakening the coordination between these structures and other components of the body. HD is caused by a trinucleotide functional polymorphism on the huntingtin gene, commonly referred to as HTT. In HD, the CAG trinucleotide sequence is repeated multiple times, resulting in alleles extended beyond what is considered to be normal length. These CAG repeats code for glutamine when they are translated and extended alleles will result in extended polyglutamine tracts in the huntingtin protein. The exact function of this protein is unclear, but it is commonly found in neural cells. In the past, Driscoll 2 most studies have simply examined the effects of this functional polymorphism by dichotomously dividing samples into a group of normal controls and a group consisting of expanded alleles. However, a recent study by Lee et al. has approached the study of the cause of HD from a new angle (2013). Instead of dividing samples into simply two groups, they examined samples from HD patients with varying allelic lengths and observed the correlation within the range of polymorphic phenotypic effects. To begin the study, they examined a group of isogenic mice (Hdh CAG knock-in) with varying allele sizes and discovered that the expression of many genes was correlated to the various allelic lengths. The next step of the study involved studying gene expression and CAG repeat length in human cells. Therefore, Lee et al. employed 107 lymphoblastoid cell lines from various HD patients with allele lengths from 15 to 92 repeats, with approximately half of the group being male and female (2013). With this study, Lee et al. hoped to prove that the effects of the mutation on gene expression depended on the CAG repeat length and not upon differential levels of HTT mRNA expression. Using Affymetrix microarray probes for HTT mRNA, they determined that HTT mRNA levels were independent of CAG repeat length because there was no correlation between the lengths of the alleles and the signal intensities of the probes (see Figure 1). In other words, as the lengths of the alleles increased, the HTT mRNA expression levels did not change. Once this was determined, Lee et al. tested the hypothesis that continuous effects of CAG repeats correlate with the observed variation in genome-wide gene expression by building mathematical models that use changes in gene expression to predict CAG repeat lengths (2013). The original set of 107 Driscoll 3 lymphoblastoid cell lines was then randomly split into two sets. One set, the training set, consisted of 97 cell lines and was analyzed to find probes that showed the correlation between CAG repeat and the expression of particular genes. In other words, the probes in which the corresponding gene expression levels depended most upon the CAG repeat length were ranked as the strongest correlated. These probes were then used to build partial least square regression (PLSR) models that predicted the CAG length of the other set of cell lines—the test set. Partial least squares regression basically attempts to develop a linear correlation between two matrices containing multiple variables. One matrix contains a multitude of predictor values, which is the gene expression levels in this case, while the other matrix contains a variety of responses. Figure 1. The HTT mRNA expression levels as determined by the two probes (results from one is shown in red and results from the other is shown in blue) are plotted against the length of the alleles. B) The plot shows that as the length of the longer CAG repeats increased, the HTT mRNA expression levels remained approximately at the same level. C) The plot shows that as the length of the longer CAG repeats increased, the HTT mRNA expression levels remained approximately at the same level. D) The plot shows the combined effects of short plus long CAG repeat lengths on HTT mRNA expression, which is similar to the effects of each allele type individually. E and F) The linear regression analysis for each of the probes shows Driscoll 4 that the change, or slope, due to increasing CAG repeat length is very slight and that statistically is insignificant. (Lee et al., 2013). Optimization procedures were applied to the models in order to produce the optimal prediction model, which relied on the method of splitting the data (using some of the data as a training set to predict the HTT CAG repeat lengths for the remaining samples that comprised the other set) (Lee et al. 2013). When the model was run, it was found to predict CAG repeat lengths with a variance of 21%, which shows that the gene expression variance can be correlated to the size of the polymorphism (Lee et al. 2103). Following these results, the model was again optimized so that only the probes with the lowest error rates were used to generate the predicted length of the HTT genes. In other words, instead of using multiple probes and averaging the predicted lengths for each set, the single most accurate set of predicted length was used to represent the length. This model predicted CAG repeat lengths with a variance of 30%, meaning that the original unbiased model (that takes the average of the predicted lengths from all of the probes) was able to detect more of the variance (Lee et al. 2013). The reproducibility of the model was also tested in order to account for cell lines grown up in different culture conditions. The subsequently correlated CAG lengths between the differently cultured cell lines showed that the results of the model were indeed reproducible despite different culture conditions of samples (Lee et al. 2013). Finally, microarray probes were ranked to determine which genes have the strongest correlation with the CAG repeat size. Running optimized models 10,000 times led to the determination of the correlation rankings. A summary of the ranking is given in Figure 2. The most correlated genes were examined (using sigPathway, a bioinformatics software that analyzes cellular pathways) and found to be involved in many cellular Driscoll 5 processes. Pathways and processes impacted involved nucleic acid metabolism, energy metabolism, and processes impacting the ribosomes of the cell (Lee et al. 2013). The multitude of cellular impacts of the CAG repeat lengths corresponds with it being a pleiotropic gene, meaning that it has more than one effect on various traits. However, this does not advance the knowledge of the exact function of mutant huntingtin protein in HD patients. This study by Lee et al. provides scientists with a better approach for understanding the causes and mechanisms of HD as they know will know to examine the effects of various repeat lengths instead of simply the extended versus normal lengths (2013). However, the CAG repeat length effect does not account for all genetic variation and other factors must be taken into consideration, including the genetic background of individuals and environmental factors that can influence said background. Figure 2. This is a graphical representation of the frequency with which each probe participated in the model when it was run 10,000 times. The probes signified in the top left consistently participated in the optimized models and are CAG-correlated genes. The lower probes did not participate as consistently and their participation was most likely dependent on the way in which the samples were split into sets when the model was run each time. Driscoll 6 In addition to this study by Lee et al., another group has studied the effects of CAG repeat length on the HTT gene. Duzdevich et al. studied the effects of super-long CAG repeats on DNA structure (2011). The neuronal cells of an HD patient often contain super-long CAG repeat sequences, while the blood cells of the patient contain fewer repeats. This could simply be due to the fact that the CAG gene is expressed at a higher level in neuronal cells and therefore incurs more mutation as a result of the higher frequency of expression. Dragatsis et al. had found that in knock-in mice with super-long CAG repeats, the phenotype of the disease was delayed and theorized that this could be result from the bulky huntingtin protein’s inability to access the nucleus (2009). Duzdevich et al. hypothesized that super-long repeats would affect the structure of the DNA, resulting in deviation from the normal, linear form of DNA (2011). Unusual potential structures of DNA include hairpin loops and G-quadruplexes. Hairpin loops occur when a strand of DNA sticks out from the main helix and ends up binding to itself as it rejoins the main helix. G-quadruplexes occur near telomeres, when all of the guanine bases form stable four base pair ring-like structures. Duzdevich et al. found that when DNA only contained 216 repeats, the DNA structure was normal and linear (2011). However, when the DNA contained 350 or more repeats, unusual structures began to form and the occurrence of the structures increased with the addition of more repeat sequences (Duzdevich et al. 2011). Atomic force microscopy (AFM) was used to image the DNA samples because AFM measures substances on the level of nanometers by probing surfaces with a little mechanical probe and measuring various forces. The AFM images in Figure 3 show the typical unusual structures formed by Duzdevich et al. Driscoll 7 (2011). These structures were classified as protruding, convoluted, and folded, as explained in the figure. Figure 3. Images of DNA structures from AFM. A) DNA consisting of a normal level of CAG repeats (8) usually exhibit normal, linear behavior. B) DNA with 216 CAG repeats continues to show pretty normal, linear behavior. C) DNA with 360 CAG repeats shows unusual structural behavior. Top left inset is normal, linear segment. Convoluted structure is lower left inset. Folded structure is upper right inset. Bottom right inset is an example of protruding structure from another image of the same product. D) Pictures show more convoluted structures. E) Graph details relationship between increase in repeat length and increase in unusual structures. (Duzdevich et al. 2011) Duzdevich et al. also conducted a comparison of the different DNA structures formed and discovered that DNA with more CAG repeats developed hairpin loops and that more force was required to pull apart the DNA with greater CAG repeats (2011). This result is intuitive because DNA with hairpin loops will have DNA packed more Driscoll 8 tightly together at more places and will thus be harder to separate. Based on these results, Duzdevich et al. postulated that unusual DNA structures as formed by super long CAG repeats reduced transcription levels and led to a reduced phenotypic expression as well. While Lee et al. provided evidence that increasing CAG repeat length did not result in altered levels of HTT mRNA, Duzdevich et al. provided evidence based upon previous studies and their own experiments that showed that super-long CAG repeat lengths could in fact impact gene expression and transcription levels. In addition to this, one of the most important observations of longer than normal CAG repeat length is its effect on the age of onset. The Huntington’s Disease Collaborative Research Group has documented the inverse relationship between allele length and age at onset of clinical symptoms (1993). In other words, the longer an allele is, the earlier symptoms of motor and cognitive distress will be expressed. Likewise, Lee et al. showed that a strong negative correlation existed between the age at onset of clinical symptoms and predicted CAG lengths, supporting this earlier research (2013). When a patient has 60 or more CAG repeats, he or she usually expresses the symptoms earlier than the age of 20 years (Andresen et al. 2007). However, HD symptoms also appear differently in younger patients as opposed to older patients. As Bates et al. established, juvenile-onset HD patients’ brains deteriorate faster than adult-onset HD patients’ brains (2002). One of the major differences in symptom expression between children and adults is that children are often hypokinetic, whereas adults are hyperkinetic. In hypokinesia, patients experience gradual decrease in muscular activity; in hyperkinesia, patients experience an increase in abnormal, muscular movements. Instead of modeling the relationship between age at onset and the number of Driscoll 9 CAG repeats with one exponential regression curve, Andresen et al. developed a new model to account for the differences in slope between the adults and the juveniles. By analyzing the data from a population of Venezuelan individuals and then from a population from an earlier HD study, they developed a new regression curves that have two slope lines as seen below in Figure 4. Figure 4. Graphs of CAG repeat lengths versus the natural logarithm of the age at onset. This figure shows the old regression curves and the new ones for the two different populations. In each graph, the dashed line represents the old regression lines with only one slope and the solid line represents the new regression curves with a point of inflection where the slope changes values. The population on the left consisted of 443 Venezuelan individuals with HD. The point of inflection for this population occurred at 53 CAG repeats. The population on the right consisted of 692 individuals in the “HD MAPS” study. The inflection point for this group occurred at 49 repeats. (Andresen et al. 2007). Various explanations for the differences in effect that the CAG repeats have in adult and juvenile HD have been proposed. One explains the differences as a result of the mutant huntingtin protein’s various cellular locations, which could be dependent on the size of the mutant allele. In juvenile HD, proteins are more frequently found in the nuclei, whereas the proteins are more commonly found in the cytoplasm in adult HD (DiFiglia et al. 1997). Another explanation proposes that as the polyglutamine tract in the mutant proteins increases, the tract might operate via more toxic mechanisms. Andresen et al. Driscoll 10 proposed that eight other neurodegenerative diseases involving CAG repeat mutations be studied because their similarities to HD in terms of juvenile and adult diseases could provide some insight into how exactly the CAG repeat effects the cell. Overall, scientists are a long way from developing a cure for Huntington’s Disease because the mechanisms behind it are not completely understood. However, it is apparent that the length of the CAG repeats does play a role and should be regarded as a continuum of effects instead of simply a dichotomous display of effects based upon “normal versus abnormal” allelic lengths. Perhaps a complete structural analysis of the allele and resulting protein at each specific length needs to be conducted in order to determine precisely at which points changes in the pathogenic mechanisms occur. Once these changes can be understood as a result of varying repeat lengths, then the parameters surrounding each repeat length ought to be applied to other molecules and pathways that will be impacted in order to determine where the neuronal degradation begins. Understanding the whole impact of this trinucleotide repeat sequence is imperative because it has implications in several other diseases and will provide us with the knowledge to counteract the effects of the mutations. Driscoll 11 Works Cited 1. Bates G, Harper P, Jones L. 2002. Huntington’s Disease. 3rd Edition. Oxford University, Oxford, UK. Oxford University Press. 2. DiFiglia M, Sapp E, Chase KO, Davies SW, et al. 1997. Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. Science 277:1990-1993. 3. Dragatsis I, Goldowitz D, Del Mar N, Deng YP, et al. 2009. CAG repeat lengths ≥335 attenuate the phenotype in the R6/2 Huntington’s disease transgenic mouse. Neurobiology of Disease 33:315-330. 4. Duzdevich D, Li J, Whang J, Takahashi H, et al. 2011. Unusual structures are present in DNA fragments containing super-long huntingtin CAG Repeats. PLoS ONE 6:e17119. 5. Gil J, Rego AC. 2008. Mechanisms of Neurodegenertion in Huntington’s Disease. European Journal of Neuroscience 27:2803-2820. 6. The Huntington’s Disease Research Collaborative. 1993. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s Disease chromosomes. Cell 72:971-983. 7. Lee JM, Galkina E, Levantovsky R, Fossale E, et al. 2013. Dominant effects of the Huntington’s disease HTT CAG repeat length are captured in gene-expression data sets by a continuous analysis mathematical modeling strategy. Human Molecular Genetics 22:3227-3238.