Download Driscoll Katee Driscoll Dr. Ely Genetics October 20, 2013 Effects of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Epigenetics of depression wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Molecular cloning wikipedia , lookup

Oncogenomics wikipedia , lookup

Copy-number variation wikipedia , lookup

Genomics wikipedia , lookup

SNP genotyping wikipedia , lookup

DNA supercoil wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Non-coding DNA wikipedia , lookup

Transposable element wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Cancer epigenetics wikipedia , lookup

DNA vaccination wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Epigenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Primary transcript wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression programming wikipedia , lookup

History of genetic engineering wikipedia , lookup

NEDD9 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

RNA-Seq wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microsatellite wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Transcript
Driscoll 1
Katee Driscoll
Dr. Ely
Genetics
October 20, 2013
Effects of CAG Repeat Length in Huntington’s Disease Patients
Huntington’s Disease (HD) is an inheritable degenerative neurological condition,
which currently has no cure. As Gil and Rego (2008) describe in their review on HD, this
disease occurs in approximately 3-10 of every 100,000 individuals. As a result of the
degradation of the striatum and other brain structures (namely, the cerebellum, cerebral
cortex and thalamus), patients experience the deterioration of muscular and cognitive
processes (Gil and Rego, 2008). The striatum receives nervous information from the
cerebral cortex and helps to coordinate muscle movement. With its degradation, the
ability of muscular movement is diminished. Chorea is the term used to describe the
abnormal involuntary movements by HD patients. Similar to the striatum, the degradation
of the thalamus and the cerebellum impacts motor movement and some cognitive
processes by weakening the coordination between these structures and other components
of the body.
HD is caused by a trinucleotide functional polymorphism on the huntingtin gene,
commonly referred to as HTT. In HD, the CAG trinucleotide sequence is repeated
multiple times, resulting in alleles extended beyond what is considered to be normal
length. These CAG repeats code for glutamine when they are translated and extended
alleles will result in extended polyglutamine tracts in the huntingtin protein. The exact
function of this protein is unclear, but it is commonly found in neural cells. In the past,
Driscoll 2
most studies have simply examined the effects of this functional polymorphism by
dichotomously dividing samples into a group of normal controls and a group consisting
of expanded alleles.
However, a recent study by Lee et al. has approached the study of the cause of
HD from a new angle (2013). Instead of dividing samples into simply two groups, they
examined samples from HD patients with varying allelic lengths and observed the
correlation within the range of polymorphic phenotypic effects. To begin the study, they
examined a group of isogenic mice (Hdh CAG knock-in) with varying allele sizes and
discovered that the expression of many genes was correlated to the various allelic lengths.
The next step of the study involved studying gene expression and CAG repeat length in
human cells. Therefore, Lee et al. employed 107 lymphoblastoid cell lines from various
HD patients with allele lengths from 15 to 92 repeats, with approximately half of the
group being male and female (2013).
With this study, Lee et al. hoped to prove that the effects of the mutation on gene
expression depended on the CAG repeat length and not upon differential levels of HTT
mRNA expression. Using Affymetrix microarray probes for HTT mRNA, they
determined that HTT mRNA levels were independent of CAG repeat length because there
was no correlation between the lengths of the alleles and the signal intensities of the
probes (see Figure 1). In other words, as the lengths of the alleles increased, the HTT
mRNA expression levels did not change. Once this was determined, Lee et al. tested the
hypothesis that continuous effects of CAG repeats correlate with the observed variation
in genome-wide gene expression by building mathematical models that use changes in
gene expression to predict CAG repeat lengths (2013). The original set of 107
Driscoll 3
lymphoblastoid cell lines was then randomly split into two sets. One set, the training set,
consisted of 97 cell lines and was analyzed to find probes that showed the correlation
between CAG repeat and the expression of particular genes. In other words, the probes in
which the corresponding gene expression levels depended most upon the CAG repeat
length were ranked as the strongest correlated. These probes were then used to build
partial least square regression (PLSR) models that predicted the CAG length of the other
set of cell lines—the test set. Partial least squares regression basically attempts to develop
a linear correlation between two matrices containing multiple variables. One matrix
contains a multitude of predictor values, which is the gene expression levels in this case,
while the other matrix contains a variety of responses.
Figure 1. The HTT mRNA expression levels as determined by the two probes (results from one is shown in
red and results from the other is shown in blue) are plotted against the length of the alleles. B) The plot
shows that as the length of the longer CAG repeats increased, the HTT mRNA expression levels remained
approximately at the same level. C) The plot shows that as the length of the longer CAG repeats increased,
the HTT mRNA expression levels remained approximately at the same level. D) The plot shows the
combined effects of short plus long CAG repeat lengths on HTT mRNA expression, which is similar to the
effects of each allele type individually. E and F) The linear regression analysis for each of the probes shows
Driscoll 4
that the change, or slope, due to increasing CAG repeat length is very slight and that statistically is
insignificant. (Lee et al., 2013).
Optimization procedures were applied to the models in order to produce the
optimal prediction model, which relied on the method of splitting the data (using some of
the data as a training set to predict the HTT CAG repeat lengths for the remaining
samples that comprised the other set) (Lee et al. 2013). When the model was run, it was
found to predict CAG repeat lengths with a variance of 21%, which shows that the gene
expression variance can be correlated to the size of the polymorphism (Lee et al. 2103).
Following these results, the model was again optimized so that only the probes with the
lowest error rates were used to generate the predicted length of the HTT genes. In other
words, instead of using multiple probes and averaging the predicted lengths for each set,
the single most accurate set of predicted length was used to represent the length. This
model predicted CAG repeat lengths with a variance of 30%, meaning that the original
unbiased model (that takes the average of the predicted lengths from all of the probes)
was able to detect more of the variance (Lee et al. 2013). The reproducibility of the
model was also tested in order to account for cell lines grown up in different culture
conditions. The subsequently correlated CAG lengths between the differently cultured
cell lines showed that the results of the model were indeed reproducible despite different
culture conditions of samples (Lee et al. 2013).
Finally, microarray probes were ranked to determine which genes have the
strongest correlation with the CAG repeat size. Running optimized models 10,000 times
led to the determination of the correlation rankings. A summary of the ranking is given in
Figure 2. The most correlated genes were examined (using sigPathway, a bioinformatics
software that analyzes cellular pathways) and found to be involved in many cellular
Driscoll 5
processes. Pathways and processes impacted involved nucleic acid metabolism, energy
metabolism, and processes impacting the ribosomes of the cell (Lee et al. 2013). The
multitude of cellular impacts of the CAG repeat lengths corresponds with it being a
pleiotropic gene, meaning that it has more than one effect on various traits. However, this
does not advance the knowledge of the exact function of mutant huntingtin protein in HD
patients. This study by Lee et al. provides scientists with a better approach for
understanding the causes and mechanisms of HD as they know will know to examine the
effects of various repeat lengths instead of simply the extended versus normal lengths
(2013). However, the CAG repeat length effect does not account for all genetic variation
and other factors must be taken into consideration, including the genetic background of
individuals and environmental factors that can influence said background.
Figure 2. This is a graphical representation of the frequency with which each probe participated in the
model when it was run 10,000 times. The probes signified in the top left consistently participated in the
optimized models and are CAG-correlated genes. The lower probes did not participate as consistently and
their participation was most likely dependent on the way in which the samples were split into sets when the
model was run each time.
Driscoll 6
In addition to this study by Lee et al., another group has studied the effects of
CAG repeat length on the HTT gene. Duzdevich et al. studied the effects of super-long
CAG repeats on DNA structure (2011). The neuronal cells of an HD patient often contain
super-long CAG repeat sequences, while the blood cells of the patient contain fewer
repeats. This could simply be due to the fact that the CAG gene is expressed at a higher
level in neuronal cells and therefore incurs more mutation as a result of the higher
frequency of expression. Dragatsis et al. had found that in knock-in mice with super-long
CAG repeats, the phenotype of the disease was delayed and theorized that this could be
result from the bulky huntingtin protein’s inability to access the nucleus (2009).
Duzdevich et al. hypothesized that super-long repeats would affect the structure of the
DNA, resulting in deviation from the normal, linear form of DNA (2011). Unusual
potential structures of DNA include hairpin loops and G-quadruplexes. Hairpin loops
occur when a strand of DNA sticks out from the main helix and ends up binding to itself
as it rejoins the main helix. G-quadruplexes occur near telomeres, when all of the guanine
bases form stable four base pair ring-like structures. Duzdevich et al. found that when
DNA only contained 216 repeats, the DNA structure was normal and linear (2011).
However, when the DNA contained 350 or more repeats, unusual structures began to
form and the occurrence of the structures increased with the addition of more repeat
sequences (Duzdevich et al. 2011). Atomic force microscopy (AFM) was used to image
the DNA samples because AFM measures substances on the level of nanometers by
probing surfaces with a little mechanical probe and measuring various forces. The AFM
images in Figure 3 show the typical unusual structures formed by Duzdevich et al.
Driscoll 7
(2011). These structures were classified as protruding, convoluted, and folded, as
explained in the figure.
Figure 3. Images of DNA structures from AFM. A) DNA consisting of a normal level of CAG repeats (8)
usually exhibit normal, linear behavior. B) DNA with 216 CAG repeats continues to show pretty normal,
linear behavior. C) DNA with 360 CAG repeats shows unusual structural behavior. Top left inset is normal,
linear segment. Convoluted structure is lower left inset. Folded structure is upper right inset. Bottom right
inset is an example of protruding structure from another image of the same product. D) Pictures show more
convoluted structures. E) Graph details relationship between increase in repeat length and increase in
unusual structures. (Duzdevich et al. 2011)
Duzdevich et al. also conducted a comparison of the different DNA structures
formed and discovered that DNA with more CAG repeats developed hairpin loops and
that more force was required to pull apart the DNA with greater CAG repeats (2011).
This result is intuitive because DNA with hairpin loops will have DNA packed more
Driscoll 8
tightly together at more places and will thus be harder to separate. Based on these results,
Duzdevich et al. postulated that unusual DNA structures as formed by super long CAG
repeats reduced transcription levels and led to a reduced phenotypic expression as well.
While Lee et al. provided evidence that increasing CAG repeat length did not result in
altered levels of HTT mRNA, Duzdevich et al. provided evidence based upon previous
studies and their own experiments that showed that super-long CAG repeat lengths could
in fact impact gene expression and transcription levels. In addition to this, one of the
most important observations of longer than normal CAG repeat length is its effect on the
age of onset.
The Huntington’s Disease Collaborative Research Group has documented the
inverse relationship between allele length and age at onset of clinical symptoms (1993).
In other words, the longer an allele is, the earlier symptoms of motor and cognitive
distress will be expressed. Likewise, Lee et al. showed that a strong negative correlation
existed between the age at onset of clinical symptoms and predicted CAG lengths,
supporting this earlier research (2013). When a patient has 60 or more CAG repeats, he
or she usually expresses the symptoms earlier than the age of 20 years (Andresen et al.
2007). However, HD symptoms also appear differently in younger patients as opposed to
older patients. As Bates et al. established, juvenile-onset HD patients’ brains deteriorate
faster than adult-onset HD patients’ brains (2002). One of the major differences in
symptom expression between children and adults is that children are often hypokinetic,
whereas adults are hyperkinetic. In hypokinesia, patients experience gradual decrease in
muscular activity; in hyperkinesia, patients experience an increase in abnormal, muscular
movements. Instead of modeling the relationship between age at onset and the number of
Driscoll 9
CAG repeats with one exponential regression curve, Andresen et al. developed a new
model to account for the differences in slope between the adults and the juveniles. By
analyzing the data from a population of Venezuelan individuals and then from a
population from an earlier HD study, they developed a new regression curves that have
two slope lines as seen below in Figure 4.
Figure 4. Graphs of CAG repeat lengths versus the natural logarithm of the age at onset. This figure shows
the old regression curves and the new ones for the two different populations. In each graph, the dashed line
represents the old regression lines with only one slope and the solid line represents the new regression
curves with a point of inflection where the slope changes values. The population on the left consisted of
443 Venezuelan individuals with HD. The point of inflection for this population occurred at 53 CAG
repeats. The population on the right consisted of 692 individuals in the “HD MAPS” study. The inflection
point for this group occurred at 49 repeats. (Andresen et al. 2007).
Various explanations for the differences in effect that the CAG repeats have in
adult and juvenile HD have been proposed. One explains the differences as a result of the
mutant huntingtin protein’s various cellular locations, which could be dependent on the
size of the mutant allele. In juvenile HD, proteins are more frequently found in the nuclei,
whereas the proteins are more commonly found in the cytoplasm in adult HD (DiFiglia et
al. 1997). Another explanation proposes that as the polyglutamine tract in the mutant
proteins increases, the tract might operate via more toxic mechanisms. Andresen et al.
Driscoll 10
proposed that eight other neurodegenerative diseases involving CAG repeat mutations be
studied because their similarities to HD in terms of juvenile and adult diseases could
provide some insight into how exactly the CAG repeat effects the cell.
Overall, scientists are a long way from developing a cure for Huntington’s
Disease because the mechanisms behind it are not completely understood. However, it is
apparent that the length of the CAG repeats does play a role and should be regarded as a
continuum of effects instead of simply a dichotomous display of effects based upon
“normal versus abnormal” allelic lengths. Perhaps a complete structural analysis of the
allele and resulting protein at each specific length needs to be conducted in order to
determine precisely at which points changes in the pathogenic mechanisms occur. Once
these changes can be understood as a result of varying repeat lengths, then the parameters
surrounding each repeat length ought to be applied to other molecules and pathways that
will be impacted in order to determine where the neuronal degradation begins.
Understanding the whole impact of this trinucleotide repeat sequence is imperative
because it has implications in several other diseases and will provide us with the
knowledge to counteract the effects of the mutations.
Driscoll 11
Works Cited
1. Bates G, Harper P, Jones L. 2002. Huntington’s Disease. 3rd Edition. Oxford
University, Oxford, UK. Oxford University Press.
2. DiFiglia M, Sapp E, Chase KO, Davies SW, et al. 1997. Aggregation of
huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain.
Science 277:1990-1993.
3. Dragatsis I, Goldowitz D, Del Mar N, Deng YP, et al. 2009. CAG repeat lengths
≥335 attenuate the phenotype in the R6/2 Huntington’s disease transgenic mouse.
Neurobiology of Disease 33:315-330.
4. Duzdevich D, Li J, Whang J, Takahashi H, et al. 2011. Unusual structures are
present in DNA fragments containing super-long huntingtin CAG Repeats. PLoS
ONE 6:e17119.
5. Gil J, Rego AC. 2008. Mechanisms of Neurodegenertion in Huntington’s Disease.
European Journal of Neuroscience 27:2803-2820.
6. The Huntington’s Disease Research Collaborative. 1993. A novel gene containing
a trinucleotide repeat that is expanded and unstable on Huntington’s Disease
chromosomes. Cell 72:971-983.
7. Lee JM, Galkina E, Levantovsky R, Fossale E, et al. 2013. Dominant effects of
the Huntington’s disease HTT CAG repeat length are captured in gene-expression
data sets by a continuous analysis mathematical modeling strategy. Human
Molecular Genetics 22:3227-3238.