Download “Significant Gene Expression and Variance in Patients with Autism

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Complement component 4 wikipedia , lookup

Transcript
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
“Significant Gene Expression and Variance in Patients with
Autism and Relatedness to Paternal Age:
further analysis adopted from Alter, (2011). PLoS”
Ross C. Lagoy & KaLia Burnette
WORCESTER POLYTECHNIC INSTITUTE
MA584/BCB584: Statistical Methods in Genetics and Bioinformatics
Instructor: Zheyang Wu, PhD
Date: May 6, 2014
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
I. Table of Contents
I. Table of Contents .............................................................................................................................. i II. List of Figures ................................................................................................................................. ii III. List of Tables .................................................................................................................................. ii V. Abstract ............................................................................................................................................ iii 1.0 Background .................................................................................................................................... 1 1.1 Research Need ......................................................................................................................................... 1 1.2 Current Research .................................................................................................................................... 1 1.3 Genome Analysis Method(s) .............................................................................................................. 2 1.4 Objective(s) ............................................................................................................................................... 3 2.0 Methods ........................................................................................................................................... 5 2.1 Data Acquisition ...................................................................................................................................... 5 2.2 Microarray Data Analysis ..................................................................................................................... 5 2.3 Statistical Analysis ................................................................................................................................. 5 2.4 Variance Across Sample Set(s) & Tests of Normality ............................................................. 5 2.5 Pearson’s Correlation Coefficient .................................................................................................... 6 2.6 Spearman’s Rank Correlation ............................................................................................................ 6 2.7 Unpaired Student’s T-tests ................................................................................................................. 6 2.8 Empirical Bayes Statistics .................................................................................................................. 6 3.0 Results and Discussion ............................................................................................................ 8 3.1 Complete Data Set Analysis ............................................................................................................... 8 3.2 Gene Variance & Test of Normality ................................................................................................. 9 3.3 Variance in Gene Expression: Replicated Data ....................................................................... 10 3.4 Heatmap Illustration(s) of Significant Differential Gene Expression .............................. 12 3.5 Statistically Significant Autism Linked Genes ......................................................................... 16 4.0 Conclusions & Future Directions ........................................................................................ 18 5.0 Appendix ....................................................................................................................................... 19 6.0 References .................................................................................................................................... 20 Significant Gene Expression in Patients with Autism
i
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
II. List of Figures
Figure 1. Generation and analysis of gene expression from Alter et al.’s Affymetix probe data set. There
were 146 total patients, 64 controls, 82 cases. PBLs assayed from each individual and sored as a data
matrix with samples along the columns and 54,000+ genes along the rows. ................................................................. 3 Figure 2. Visualization of the complete Alter et al. data set using boxplots (A,B), and a MAplot (C). (A)
Illustrates the raw RMA probe values (y-axis) of each individual (x-axis). (B) Illustrates the log2
transformed RMA values plotted in the same manner as (A): controls (green), cases (red). (C) MAplot of
the complete log2 transformed data matrix. ........................................................................................................................................ 8 Figure 3. Variance in gene expression was not normally distributed across the entire population (A) or
control subjects (B) of the log2 transformed data set values (x-axis), reproduced as described from Alter
et al.............................................................................................................................................................................................................................. 9 Figure 4. Variance was normally distributed within experimental groups: (A) autism, (B) children of older
fathers, and (C) children of younger fathers of the log2 transformed data set values (x-axis), reproduced
as described from Alter et al. ....................................................................................................................................................................... 9 Figure 5. Paternal age was normally distributed across the entire study populations and within relevant
experimental groups: (A) all ages, (B) controls, (C) autism, reproduced as described from Alter et al. ....... 10 Figure 6. Increased paternal age at birth is negatively associated with overall variance in gene expression
in peripheral blood lymphocytes of normal children. (A) Illustrating our generated plot replicating (B) Alter
et al.’s plot, Fig. 3a. ......................................................................................................................................................................................... 11 Figure 7. Further analysis of Figure 6 (above) demonstrating no association between paternal ages and
variance in gene expression at birth for autism subjects illustrated as (A) a scatter plot trend and (B)
boxplot, described by Alter et al., but not shown in their report. ........................................................................................... 12 Figure 8. Heat map illustrations of the top 10 genes calculated by empirical Bayes statistics and sorted by
lowest p-value (A.1-A.3), logFC (B.1-B.3), and highest average expression (C.1-C.3). (A.1-C.1) Rows
(genes) sorted by Pearson’s coefficient and columns (patients) by Spearman’s correlation. (A.2-C.2)
Rows sorted by default, columns sorted by Spearman’s correlation. (A.3-C.3) Rows and columns sorted
by default (as shown in Table 1-3). Scale top left: green is low expression (RMA), and red is high
expression, black is neutral. ...................................................................................................................................................................... 14 Figure 9. Of the 8,400 genes with p<0.05 calculated using empirical Bayes statistics, 3 are supported as
autism linked (A): (B) METTL12, (C) UBE3A, and (D) OXT; AUT = Autism, CTRL = control. ........................... 16 III. List of Tables
Table 1. Table(s) of the top 10 genes calculated by empirical Bayes statistics and sorted by lowest pvalue. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted
for interest relevant to Alter et al.’s results and novel findings. ............................................................................................. 15 Table 2. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by logFC. Gene
descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest
relevant to Alter et al.’s results and novel findings. ...................................................................................................................... 15 Table 3. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by average
expression. Gene descriptions are included next to the Affymetrix probe code and select genes are
highlighted for interest relevant to Alter et al.’s results and novel findings. ................................................................... 15 Table 4. Additional overlap of our 48 genes reported to be autism-linked with a p-value in a range of
+0.03 of 0.05. ...................................................................................................................................................................................................... 17 Significant Gene Expression in Patients with Autism
ii
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
V. Abstract
Autism spectrum disorder (ASD) is one of the most common neurodevelopmental
disorders among children. It is defined and diagnosed by impairment in social
interaction, language, and range of interest. Autism is also often diagnosed with other
medical conditions such as seizers and anxiety. These co-occurring conditions and wide
range of variability in Autism has presented challenges in finding a universally accepted
polymorphism that causes this disorder; therefore, multiple co-existing genes have been
correlated, identified, and modeled in vitro and in vivo. Microarray analysis of gene
expression is a critical tool that can allow for the association of specific genes to
variance genetic conditions, especially for genome-wide diseases such as ASDs.
Statistical tests and data mining techniques allow for the discovery of distinct expression
levels in individual patients of gene subsets and across populations.
The project chosen for this class uses freely available data from the GEO dataset
browser to (1) replicate some of the preliminary analysis presented in a paper from Alter
et al. using the GEOquery and BioConductor package(s) in R. These results will confirm
our understating of the large data set, and provide a starting point to (2) extrapolate new
conclusions from this study as well as (3) provide future directions and ask new
question related towards the investigation of autism risk genes. In this study we found
known three Autism-linked genes such as OXT, METTL12, and ZSCAN18 with
significant difference between experimental groups and interesting expression profiles
when clustered as heat maps, to be further investigated.
Significant Gene Expression in Patients with Autism
iii
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
1.0 Background
1.1 Research Need
Autism spectrum disorder (ASD) is estimated to affect about 1 in 68 children [1],
ranking it as one of the most common neurodevelopmental disorders among children [2].
Autism is a heterogeneous syndrome defined by impairments in three core domains:
social interaction, language, and range of interests [3]. The disorder has an estimated
heritability of greater than 90%; however, its specific genetic etiology is unknown [4].
Autism is often diagnosed in occurrence with other medical conditions [5], such as
seizers or anxiety, and therefore may significantly impact the identification and
treatment needs of the diagnosed individuals [6]. Because of the challenges presented
by these co-occurring conditions, and the wide range of variability in ASD, no
universally accepted susceptibility polymorphism has been found from current research
efforts [2]. However, multiple distinct rare changes in specific genes have been
identified in small subsets of individuals that may cause or contribute to ASDs [7].
1.2 Current Research
Continued efforts in whole-genome linkage studies are used to identify potentially
important disease-risk loci. Current research supports that ASD may be caused by a
single genetic mutation in addition to many relatively rare mutations [3]. It is generally
reported that this disorder is of developmental onset, with an unknown primary cause.
The dominant hypothesis is accredited to cellular, regional, or systemic dysfunction
influenced by environmental factors and heredity (like fathers paternal age, found by
Alter et al. in 2011 by bioinformatics techniques). Genetic causes of Autism spectrum
disorders are also known to effect intracellular signaling pathways. Defective synaptic
function and abnormal brain connectivity are proposed biological themes that may
produce the heterogeneous characteristics of ASD while supporting the notion that
there is great variation among the rare genetic mutations found in these patients [8,9].
Additional hypotheses suggest defects in inhibitory synapses in patients with ASDs,
thus accounting for co-diagnosis with seizures [10]. Neurotransmitters such as serotonin
and oxytocin regulation could also potentially account for abnormalities within and
outside of the central nervous system including the brain, while imbalances of these
neurochemicals could cause ASD symptoms [11]. Calcium signaling has also shown
Significant Gene Expression in Patients with Autism
1
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
probable cause as a possible contributing mechanism to ASD [12]. Therefore,
abnormalities due to cellular dysfunction may be caused by genetic mutations and could
be identified, assessed, and correlated through standard bioinformatics methods by
surveying diagnosed and healthy populations using microarray technology.
A few specific proteins have been identified and reported for significant
prevalence in autism patients. The ubiquitin protein ligase, E3A also known as UBE3A,
and inhibitory neurotransmitter receptor, gamma-aminobutyric acid (GABRB3) are
currently thought to play a central role in ASD. Other research has shown the potential
involvement of centaurin gamma 2 (CENTG2), a GTPase-activating protein. The
synaptic adaptor protein(s), SHANK1-3, and similar scaffolding proteins have also
shown to have high linkage to the disorder. Further identifying and correlating
differential expression of these genes, and others identified in the literature, can be
appropriately assessed through state-of-the-art genome analysis and bioinformatics
techniques.
1.3 Genome Analysis Method(s)
Gene expression is a critical measure that can be quantified and translated to
describe levels that particular genes that are expressed within a cell, tissue, or organism.
One method of genetic analysis is through genome wide expression analysis with
commercially available microarrays (i.e. Affymetrix) containing specific hybridized
nucleotide arrangements (Figure 1). Microarray analysis of gene expression involves
the screening of purified and fluorescently labeled DNA or mRNA (generally from
patients peripheral blood lymphocytes, PBLs), which binds to complementary hybridized
transcripts immobilized as an organized array on a chip, called probes. Each probe
represents a specific gene (cDNA or mRNA), which translates to a fluorescence
intensity and thus an expression level (Figure 1, middle). Thus, different expression
levels can be observed via fluorescence intensities: the more bright the fluorescence
signal is for a probe, the more binding of complementary oligonucleotides, thus inferred
to have a higher prevalence in the individual’s blood (expression value) and vise versa
for low intensity signals. These recordings are assigned relative values across the
microarray containing 54,000+ genes that survey the entire genome. Statistical test
must be done to normalize the data and accurately interpret the identification of
Significant Gene Expression in Patients with Autism
2
MA 584:
Statistical
Methods
in Bioinformatics
Autism
and
Increased
Paternal
Lagoy,
R.C. & Burnette,
Age Related
Changes
in K.
Global Levels of Gene Expression Regulation
differently expressed
genes.
These tests,
along with
data-mining
and associative
1
1
2
2
3
3
Mark D. Alter *, Rutwik Kharkar , Keri E. Ramsey , David W. Craig , Raun D. Melmed , Theresa A. Grebe ,
3
3
3
4
, Sharman
Ober-Reynolds
Janet
Kirwanexpression
, Josh J. Joneslevels
, J. Blake
, Rene patients
Hen5,
R. Curtis Bay
techniques,
allow
for the
discovery3,of
distinct
inTurner
individual
of
Dietrich A. Stephan2
gene1 Center
subsets
and
a population
ofof healthy
individuals
and2 Neurogenomics
diagnosed
for Neurobiology
and across
Behavior, Department
of Psychiatry, University
Pennsylvania, Philadelphia,
Pennsylvania,(controls)
United States of America,
Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America, 3 Southwest Autism Research and Resource Center, Phoenix, Arizona,
United States
of America, 4 Division of Child and Adolescent Psychiatry, Department of Psychiatry, Columbia University, New York, New York, United States of America,
patients
(cases).
5 Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America
Abstract
A causal role of mutations in multiple general transcription factors in neurodevelopmental disorders including autism
suggested that alterations in global levels of gene expression regulation might also relate to disease risk in sporadic cases of
autism. This premise can be tested by evaluating for changes in the overall distribution of gene expression levels. For
instance, in mice, variability in hippocampal-dependent behaviors was associated with variability in the pattern of the
overall distribution of gene expression levels, as assessed by variance in the distribution of gene expression levels in the
hippocampus. We hypothesized that a similar change in variance might be found in children with autism. Gene expression
64#controls#
microarrays covering greater than 47,000 unique RNA transcripts were done on RNA from peripheral blood lymphocytes
(PBL) of children with autism (n = 82) and controls (n = 64). Variance in the distribution of gene expression levels from each
microarray was82#cases#
compared between groups of children. Also tested was whether a risk factor for autism, increased paternal
age, was associated with variance. A decrease in the variance in the distribution of gene expression levels in PBL was
associated with the diagnosis of autism and a risk factor for autism, increased paternal age. Traditional approaches to
microarray analysis of gene expression suggested a possible mechanism for decreased variance in gene expression. Gene
expression pathways involved in transcriptional regulation were down-regulated in the blood of children with autism and
children of older fathers. Thus, results from global and gene specific approaches to studying microarray data were
complimentary and supported the hypothesis that alterations at the global level of gene expression regulation are related
Humanthus,
U133
to autism and increased paternal age. Global Affymetrix
regulation of transcription,
represents a possible point of convergence
Data Values (RMA)
for multiple
etiologies of autism and other neurodevelopmental
disorders. Array
Patients
Plus 2.0 Expression
Log2 transformed
Peripheral
Blood Lymphocytes
(RNA from PBL)
Citation: Alter MD, Kharkar R, Ramsey KE, Craig DW, Melmed RD, et al. (2011) Autism and Increased Paternal Age Related Changes in Global Levels of Gene
Expression Regulation. PLoS ONE 6(2): e16715. doi:10.1371/journal.pone.0016715
Editor: Joanna Bridger, Brunel University, United Kingdom
October 6, 2010; Accepted
20, 2010;of
Published
17, 2011
FigureReceived
1. Generation
andDecember
analysis
geneFebruary
expression
from Alter et al.’s Affymetix probe data
Copyright: ! 2011 Alter et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
set. There
were 146 total patients, 64 controls, 82 cases. PBLs assayed from each individual
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
and sored
asResearch
a data
matrix
along
columns
and 54,000+
genes along
the rows.
Funding:
support:
Dr. Alter’swith
salary samples
was paid in part
through the
a NARSAD
Young Investigator
Award (http://www.narsad.org/?q
= node/124/
apply_for_grants/124); sample collection and processing, microarrays, and salary support provided through a grant from the state of Arizona. The funders had no
role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
1.4 Objective(s)
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: [email protected]
A recent study reported that mouse behavior, in genetically identical animals,
[7,8,9]. The idea
that alterations at of
the gene
global level
of gene
Introduction
could
be accurately predicted by evaluating patterns
of distribution
expression
expression regulation might be important in mediating the risk for
Autism is a severe neurodevelopmental disorder with character-
autism or other disease states has been largely underexplored.
levels
by variance
the distribution
[13].
This
same
group
applied
istic when
social andassessed
communication deficits
and ritualistic in
or repetitive
Supporting the
possible
importance
of global
regulation
of gene this
behaviors that appear by age three. Many etiologies have been
expression in neurodevelopmental disorders, genetic studies found
Though autism is associated with a high degree of heritability, few
expression were linked to neurodevelopmental disorders including
suggested
numerous risk
have been
identified of
[1]. the
method
of and
studying
thefactors
overall
pattern
to oftest
thatgene
mutationsexpression
in genes encodingdistribution
for global regulators
gene the
specific genetic mutations have been identified accounting for a
autism [5,6]. in
Pharmacological
studies also
targeting
hypothesis
that Autism is associated with alterations
global levels
ofsuggested
genethat
expression
minority of cases [2,3,4,5,6], while the majority of cases are
global levels of gene expression regulation could impact neurode-
considered sporadic. The failure to identify specific gene variants for
instance, valproate, a histone deacetylase inhibitor
regulation
Affymetrix
probe
datafactors
[14]. velopment.
TheseFor researchers
also analyzed the
most cases ofof
autism
has been attributed
to many potential
(HDACi), is a commonly used medication in the treatment of
including complex interactions of multiple genes, a heterogeneous
seizures, mental health disorders, and cancer that impacts global
or epigenetic factors not related to specific genetic mutations or
mechanisms. When given during gestation, valproate can
and they are not mutually exclusive.
in humans [10,11,12,13]. Thus, both genetic and pharmacological
correlation
expression
distribution
andthrough
paternal
age.
They
disorder withbetween
multiple causespatterns
converging onof
the the
autisticgene
phenotype,
levels of gene
expression regulation
chromatin
based
polymorphisms
None of these
hypotheses has been
confirmed
adverselyzinc
impact and
neurodevelopment
in rodentspathways)
and cause autism that
found
a list of[2,3].
genetic
transcripts
(down-regulated
transcription
Research on gene expression in autism has previously focused
studies suggest alterations in global levels of gene expression
overlap
between control patients with older fathers
and patients with Autism.
on identifying specific or a limited group of genes related to disease
regulation can interfere with normal neurodevelopment. Addi-
For
this| www.plosone.org
project, we will use this Alter
et al.’s freely
PLoS ONE
1
Februaryavailable
2011 | Volume 6and
| Issue accessible
2 | e16715
Affymetrix data from the GEO dataset browser to (1) replicate some of the preliminary
analysis presented in this paper using the GEOquery and BioConductor package(s) in R.
These results will confirm our understating of the large data set, and provide a starting
Significant Gene Expression in Patients with Autism
3
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
point to (2) extrapolate new conclusions from this study as well as (3) provide future
directions and ask new question related to the investigation of Autism-risk genes.
Significant Gene Expression in Patients with Autism
4
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
2.0 Methods
2.1 Data Acquisition
Original data from Alter et al.’s study was obtained from blood samples taken
from 146 subjects, 82 cases and 64 controls. Father’s paternal age for 78 of the children
with Autism and 57 of the controls were also recorded. Briefly, total mRNA was isolated
from these samples (PBLs) and double round amplified, cleaned, and biotin-labeled
using Affymetrix’s GeneChip Two-Cycle Target Labeling kit with a 17 promoter and
Ambion’s MEGAscript T7 High Yield Transcription kit. The Affymetrix Human Genome
Array allows for complete coverage of the human genome U133 plus 6,500 additional
genes for analysis of over 47,000 transcripts. Arrays were washed with the prepared
samples, stained, and scanned. Raw signal intensity values were extracted per probe
set on the array and scaled by a factor of 150 to normalize the array signal intensity in
Microarray Analysis Suite (MAS) 5.0. The raw data extracted from these scans were
pre-processed using MAS 5.0 so the gene expression values were not altered. Robust
Multiarray Analysis (RMA) was used to normalize, summarize, and publish the data, as
noted on the NCBI GEO accession display webpage.
2.2 Microarray Data Analysis
Microarray data for this study was obtained as a pre-processed and summarized
Microarray Analysis Suite (MAS) 5.0 and Robust Multiarray Analysis (RMA) SOFT file
from the National Center for Biotechnology Information (NCBI) Gene Expression
Omnibus (GEO) database repository, dataset accession number GDS4431. SOFT files
were imported into R-3.1.0 for data processing and further statistical analysis.
2.3 Statistical Analysis
RMA expression levels were log2 transformed and all following analyses were
complete using R (V. 3.1.0) BioConductor packages and loading the following libraries:
GEOquery, Biobase, gplots, preporocessCore, genefilter, limma, annotate, and
hgu95av2.db – as described in our R code file. The statistical analyses used in this
study are listed below.
2.4 Variance Across Sample Set(s) & Tests of Normality
To first replicate the author’s observations, overall variance in gene expression
and paternal age was calculated across the genome for subgroups: Autism, children of
Significant Gene Expression in Patients with Autism
5
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
older fathers, children of younger fathers, paternal age across the entire study
population and within relevant experimental groups. Histograms for each of these
sample subsets were generated. This also allowed the authors, and us, to determine if
parametric tests were appropriate by using the Shapiro-Wilk Test on these primary
measures. A MAplot was also generated to test for and visualize normality within the
populations log2 transformed RMA values.
2.5 Pearson’s Correlation Coefficient
Pearson’s correlation coefficient between two variables is defined as the
covariance of the two variables divided by the product of their standard deviations. Just
as Alter et al. determined, a p-value of 0.05 was considered to be statistically significant.
Pearson’s coefficient was also used to cluster our generated heat maps by genes
(rows).
2.6 Spearman’s Rank Correlation
Spearman’s correlation assesses how well the relationship between two
variables can be described using a monotonic function. This is a method generally used
to assess the relationship between two variables when the data is not normally
distributed. This rank was used to cluster our generated heat maps by individuals
(columns).
2.7 Unpaired Student’s T-tests
This statistical test compares two groups of normally distributed data to test
whether the means of the distributions are different. The p-value represents the
probability that the distributions are actually different. Just as Alter et al. determined, a
p-value of 0.05 was considered to be statistically significant.
2.8 Empirical Bayes Statistics
Empirical Bayes methods are used for statistical inference when the prior
distribution is estimated from the data, rather than when the prior distribution is fixed
before any data are observed. This is viewed as an approximation to a fully Bayesian
treatment of a hierarchical model where the parameters at the highest hierarchical level
are set to their most likely values instead of being integrated out. A table was organized
from subjecting expression differential data set to empirical Bayes statistics. Our
generated table was organized by ranking order of p-value, adjusted p-value, logFC,
Significant Gene Expression in Patients with Autism
6
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
and average expression. Heat maps were generated from resulting data and clusters
were observed for further investigation.
Significant Gene Expression in Patients with Autism
7
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
3.0 Results and Discussion
3.1 Complete Data Set Analysis
Upon completion of importing the GEO data set into R, RMA values were
calculated and plotted as a box plot. This plot showed us the range and vast amount of
data we were working with; however, did not yield visually usable information. Thus, the
gene expression data was log2 transformed and a box plot was created from this
information. This scaled our working range of gene expression values down to a range
of about 0 to 15 and organized by controls (green) and cases (red), consistent
throughout the report.
A
B
C
Figure 2. Visualization of the complete Alter et al. data set using boxplots (A,B), and a MAplot
(C). (A) Illustrates the raw RMA probe values (y-axis) of each individual (x-axis). (B) Illustrates
the log2 transformed RMA values plotted in the same manner as (A): controls (green), cases
(red). (C) MAplot of the complete log2 transformed data matrix.
A MAplot was also generated to test for and visualize normality within the
populations log2 transformed RMA values. The x-axis (A) describes the average log
intensities of the population and the y-axis (M) plots the difference in average log
intensities across individuals (Figure 2C). This plot also provides a visual for the ratio of
intensity dependence of the microarray data. The majority of values falls along y = 0 (+/0.2) suggesting a normalized matrix of log2 RMA values, especially around RMA values
ranging between 0-6 (low expression), which can be further analyzed as a cluster or
independently.
Significant Gene Expression in Patients with Autism
8
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
3.2 Gene Variance & Test of Normality
Alter et al. first described the trends of their data matrix in terms of variance
among experimental groups and paternal age distribution. They observed that gene
expression was not normally distributed across the entire population or control subjects
Tests of Normality
by using the Shapiro-Wilk Test. We calculated the variance across all genes for the
in gene
expression
was not
distributed
across
thethe
entire
population Variance
and control
subjects,
plotted
a normally
histogram
for each,
used
Shapiro-Wilk
population or control subjects.
Test, and confirmed this observation (Figure 3).
Population Variance
Control Variance
A
B
Figure 3. Variance in gene expression was not normally distributed across the entire population
(A) or control subjects (B) of the log2 transformed data set values (x-axis), reproduced as
described from Alter et al.
Tests of Normality
Using the same tests, the authors also observed that variance was normally
distributed within experimental groups; as we also confirmed by plotting histograms and
Variance was normally
distributed
within experimental groups: 1) autism,
using the Shapiro-Wilk
Test (Figure
4).
2) children of older fathers, and 3) children of younger fathers.
Autism Variance
A
Older Fathers Variance
B
Younger Fathers Variance
C
Test variance
binning
to agree within
more with
authors conclusion
Figure 4. • 
Variance
was normally
distributed
experimental
groups: (A) autism, (B) children
of older fathers, and (C) children of younger fathers of the log2 transformed data set values (xaxis), reproduced as described from Alter et al.
Significant Gene Expression in Patients with Autism
9
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
Tests of Normality
Lastly, the authors stated that paternal age was normally distributed across the
entire study population and within relevant experimental groups; as we also confirmed
Paternalhistograms
age was normally
distributed
across the entire
study populations
again by plotting
and using
the Shapiro-Wilk
Test (Figure
5).
and within relevant experimental groups: 1) autism, and 2) controls.
All Paternal Ages
Control Parental Ages
Autism Parental Ages
A
B
C
•  Basedage
on was
these
analyses
of normality,
parametric
used and within
Figure 5. Paternal
normally
distributed
across
the entire tests
studywere
populations
when
appropriate
relevant experimental groups: (A) all ages, (B) controls, (C) autism, reproduced as described
from Alter et al.
3.3 Variance in Gene Expression: Replicated Data
Since we obtained similar results and trends as the authors had, we decided to
move forward and generate some of their graphs to further demonstrate that we
understand their data matrix before using new analysis methods and discovering novel
results. We first attempted to replicate Alter et al.’s Figure 2, plotting variance in gene
expression (z-score in units of standard deviation) along the y-axis, and each
experimental group as an individual bar. The authors are not clear in how they
calculated these values, so we attempted two methods: (1) calculating variance across
all genes for individual groups, or calculating variance across all patients for individual
groups, and (2) calculating z-score across all genes for individual groups and finding the
standard deviation of this list, or calculating the z-score across all patients for individual
groups and finding the standard deviation of this list (Appendix 1). Each calculation did
not result in what the authors showed in Figure 2 – we saw values that were not
different between groups. Nonetheless, the authors concluded that overall variance in
gene expression in peripheral blood lymphocytes was decreased in children with Autism,
which we do not particularly agree with since they are not actually calculating variance
Significant Gene Expression in Patients with Autism
10
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
and cannot infer from their units that this is indeed true, unless the authors explained
their calculation more clearly.
Therefore, we continued with our first method of calculating variance across all
genes for each experimental group and binning the paternal ages together to generate
Aim 1: Replicate the authors data
Figure 3A in Alter et al.’s paper. We did not generate the same y-axis scale, but noticed
the same negative trend (Figure 6), agreeing that increased paternal age at birth is
negatively associated with overall variance in gene expression in peripheral blood
Our)plot,)generated)in)R)
lymphocytes of
healthy children.
A
Authors)plot,)Fig.)3a)
B
Increased)paternal)age)at)birth)is)nega1vely)associated)
Figure 6. Increased
paternal age at birth is negatively associated with overall variance in gene
with)overall)variance)in)PBL)of)controls.)
expression in peripheral blood
lymphocytes of normal children. (A) Illustrating our generated plot
replicating (B) Alter et al.’s plot, Fig. 3a.
Au
Alter+ et al. (2011) PLoS.!
This result was promising, and we used the same calculation method to
investigate the autism experimental group and noticed just as the authors concluded
(but did not show) that the variance in gene expression had no trend across paternal
age (Figure 7). This was also a promising result, suggesting to us that there is a
similarity to how we are calculating “variance” as the authors describe, but not exactly in
the same manner. Since we were more interested in the data set as two individual
Figure 3. Increased paternal age at birth is negatively
associated
with overall
in gene
expression
in
experimental groups and not yet sub-grouped
by paternal
agevariance
(as the
authors
already
peripheral blood lymphocytes (PBL) of normal children.
Paternal age at birth was found in multiple studies to be a risk factor
provided substantial investigation of unregulated
and Previous
downworkregulated
genetic
for autism [26,28,34,35].
indicated that factors
or
interventions that modified mouse hippocampal-dependent behavior
also modified
the overall
in gene expression
in the predicted
differences between these groups), we moved
forward
byvariance
comparing
the two
groups
direction. In controls (figure 3a) but not in children with autism (not
overall variance in log-transformed measures of gene
inclusively, across all genes, to investigateshown),
statistically
significant
differences
gene
expression
was significantly
and negatively
associated withinpaternal
age at birth (r = -.283, R2 = .08, p = .03, number of subjects = 57). For the
evaluation of paternal age effects, paternal ages were available for 78
expression as a whole. children with autism and 57 control children. To directly compare
overall microarray variance in children with autism to children of older
fathers, we divided subjects by the median paternal age at birth in our
study (31 years) and created 2 groups: 1) children from younger fathers
(less than 31 years) (65 subjects: 30 controls and 35 children with
autism); and 2) children from older fathers (31 years or older) (7011
Significant Gene Expression in Patients with Autism
subjects: 27 controls and 43 children with autism). We compared mean
levels of overall variance between children with autism and controls of
older and younger fathers. As predicted, we found that overall variance
was the same in children of older fathers and children with autism with
Dev/year of pater
estimate = 2.055
potential confound
estimate remained
.05 (p = .056, par
paternal age). Th
expression variance
to autism-related
relationship betwee
using Spearman ra
corrected for ties =
The use of a non-p
about the effects of
than 3 standard dev
from the use of n
tests. The measure
expression levels wa
control children.
Because increase
related to overall va
the overall variance
would be similar to
hypothesis, we per
age (median = 31
variance in gene e
controls in groups o
predicted, we found
blood of children f
with autism from fa
association of autis
found in children
expected from the
analysis, variance
children of older fa
control children of
Decreased varian
down-regulation
transcription (fig
To evaluate for
variance in gene ex
from older fathers
differences between
blood of children w
were compared to g
children of younger
children with autis
many more signific
genes (figure 4a). I
children of youn
significantly downgenes that were up
fathers compared t
down-regulated and
same genes were u
MA 584: Statistical Methods in Bioinformatics
A
Lagoy, R.C. & Burnette, K.
B
Figure 7. Further analysis of Figure 6 (above) demonstrating no association between paternal
ages and variance in gene expression at birth for autism subjects illustrated as (A) a scatter plot
trend and (B) boxplot, described by Alter et al., but not shown in their report.
3.4 Heatmap Illustration(s) of Significant Differential Gene Expression
An empirical Bayes statistic was used to calculate p-value, logFC, and average
expression for all 54,000+ genes. Of the 54,000+ genes, 8,400 were statistically
different between the two experimental groups. For each of the categories, a heat map
and descriptive table was generated for the top ten genes sorted by lowest p-value,
logFC, and highest average expression. For the heat maps, each row (gene) was first
organized by Pearson’s coefficient, and each column (individual) was organized and
clustered by Spearman’s correlation. Additional organizations of the rows and columns
were generated as shown and described in Figure 8 to more clearly visualize clustered
expression levels. The maps are color scaled with green being the lowest expression,
black being neutral, and red as high expression. Each heat map was visually observed
for significant clustering of high or low expression of genes or groups of individuals
within the map. Descriptions of the ten genes were reviewed in the literature for
biological relevance (known or unknown protein-coding function(s)) and/or reports that
have been linked to Autism or related genetic pathways.
Two zinc related genes, zinc finger protein (ZSCAN18) and zinc ring finger
protein (ZFP36L2), were among the top ten when the data set is arranged by p-value
and logFC, which is the log odds ratio between experimental groups. Alter et al. also
reported significance regarding down-regulated genes that are enriched for biological
Significant Gene Expression in Patients with Autism
12
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
pathways related to transcription and zinc in blood of children with Autism and controls
with paternal ages older than 31. This suggests additional significance of this gene and
its variants probed for on this chip between experimental groups. Also, a human brain
protein (hCG_2033649), Nance-Horan Syndrome (NHS) related protein, and calcium
binding protein (S100A8), showed up on these sorted lists and are genes that have
been shown to have possible genetic linkage to Autism in previous studies but not by
Alter et al., suggesting that our analysis has provided additional insight into their data
set.
The heat map generated from the top ten statistically significant genes all
showed clustering trends, some with high expression in the Autism group and lower
expression in the control or vise versa (Figure 8A.2). The individuals within these
clustered regions should be investigated further to extrapolate additional information,
like parental age, and assessed for similarities found by Alter et al. The heat map
generated by logFC sorting gave rise to two almost identically clustered genes (HLADQA1 and HLA-DQB1, both disease associated genes), but with no apparent
segregation of disease states (Figure 8B.2) and little statistical difference. Although
there is little difference between these genes, a subgroup (like paternal age) may show
a significant difference, likewise for all genes in these lists. The highest average
expression heat map(s) (Figure 8B.1-B.3) show little clustering differences between
experimental groups (also shown by these p-values), but does show expression level
differences of a couple genes and could be further extrapolated for individual clustering.
Another interesting heat map could include a lowest average expression to further
investigate the peaks for gene expression.
Additional analysis can be conducted following this method to plot and
investigate more genes. We were limited to plotting heat maps of 1,000 genes at a time
before R would stall when executing the code, thus a server would need to be used for
larger data processing.
Significant Gene Expression in Patients with Autism
13
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
A.1
A.2
A.3
B.1
B.2
B.3
C.1
C.2
C.3
Figure 8. Heat map illustrations of the top 10 genes calculated by empirical Bayes statistics and
sorted by lowest p-value (A.1-A.3), logFC (B.1-B.3), and highest average expression (C.1-C.3).
(A.1-C.1) Rows (genes) sorted by Pearson’s coefficient and columns (patients) by Spearman’s
correlation. (A.2-C.2) Rows sorted by default, columns sorted by Spearman’s correlation. (A.3C.3) Rows and columns sorted by default (as shown in Table 1-3). Scale top left: green is low
expression (RMA), and red is high expression, black is neutral.
The table below shows the Affymetrix gene code and corresponding gene name
with a description found online (Table 1). The logFC, average expression, t-value, and
resulting p-value are recorded for each of the top ten categorized genes. Highlighted
Significant Gene Expression in Patients with Autism
14
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
genes have either been previously reported in literature to be associated with Autism or
also found by Alter et al., but through a different analysis method.
Table 1. Table(s) of the top 10 genes calculated by empirical Bayes statistics and sorted by
lowest p-value. Gene descriptions are included next to the Affymetrix probe code and select
genes are highlighted for interest relevant to Alter et al.’s results and novel findings.
Table 2. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by logFC.
Gene descriptions are included next to the Affymetrix probe code and select genes are
highlighted for interest relevant to Alter et al.’s results and novel findings.
Table 3. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by
average expression. Gene descriptions are included next to the Affymetrix probe code and
select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. Significant Gene Expression in Patients with Autism
15
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
3.5 Statistically Significant Autism Linked Genes
To further assess the generated list of 8,400 statistically different autism genes
(Table 1A), we generated our own list of 48 genes and their related isoforms that have
been associated with Autism. We searched the 8,400 genes to contain any of our 48
genes and computed statistical significance using a t-test and the empirical Bayes
method (which yielded the same results, since groups of these genes between
Significant Autism-linked Genes
literature
were probed
for on the
Affymetrix
mRNA chip,
but of
the ones searched,
three
Significant
Autism-linked
Genes
Found
with
Significant
Differential
Expression
experimental groups were normally distributed). Not all Autism genes identified in the
Found with Significant Differential Expression
showed up with p-values < 0.05. These genes are shown numerically (Figure 9A) and
as boxplots
(Figure
for each
mean3expression
value.
Of the
84009B-C)
genes
withgroups
p<0.05,
are supported
to be autism-linked.
Of the 8400 genes with p<0.05, 3 are supported to be autism-linked.
A
CTRL$
AUT$B CTRL$
Log2 RMA Values
AUT$
CTRL$
C
AUT$
AUT$
CTRL$
OXT$
OXT$
Log2 RMA Values
AUT$
Log2 RMA Values
UBE3A$
UBE3A$
Log2 RMA Values
Log2 RMA Values
Log2 RMA Values
METTL12$
METTL12$
D
CTRL$
AUT$
CTRL$
Figure
Of the 8,400
with p<0.05
calculated
using
empirical
Of our9.compiled
48 genes
autism-linked
genes,
a few
were
+0.03Bayes
unitsstatistics,
above p3=are
0.05
supported as autism linked (A): (B) METTL12, (C) UBE3A, and (D) OXT; AUT = Autism, CTRL
Of= control.
our compiled
48 autism-linked genes, a few were +0.03 units above
This is an exciting discovery from a biologists perspective, since oxytocin (a
neurotransmitter), methyltransferase (DNA regulation protein), and a ubiquitin ligase
protein are all involved neurologically relevant biochemical systems in Autism. Which
offers further motivation to investigate these genes and alike biological pathways in in
vitro and in vivo models as well as potential targeted drug delivery based on real patient
data. We also decided to search the list of sorted genes that were 0.03 units above the
0.05 cutoff, and noticed three more highly relevant autism-linked genes, two also
relating to DNA methylation, an oxytocin receptor gene, and notably SHANK2, a
Significant Gene Expression in Patients with Autism
16
p = 0.
Log2 RMA Va
Log2 RMA Va
Log2 RMA Va
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
scaffolding protein which has been recently investigated and reported to have high-
AUT$
CTRL$
prevalence
of mutation
in autism cases (TableAUT$
4).
CTRL$
AUT$
CTRL$
Table 4. Additional overlap of our 48 genes reported to be autism-linked with a p-value in a
+0.03 of 0.05.48 autism-linked genes, a few were +0.03 units above
Of range
our ofcompiled
A few additional questions we pose as future directions could also be
investigated with the starting point provided by our analysis and eluded to in this report,
with inferred applications:
(1) Are the three autism-risk genes we report on related to paternal age when
subgrouped?
(2) Can we sort the heat map clusters by autism-linked genes and paternal age
subgroups?
(3) What “genetic class” is the majority of the 8,400 statistically significant
(p<0.05) reported genes?
Significant Gene Expression in Patients with Autism
17
p=0
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
4.0 Conclusions & Future Directions
Bioinformatics is a valuable tool for processing and analyzing large biological
data sets, especially of appropriate diseases with a wide spectrum of genetic causes
and differential expression, like Autism. The statistical methods used in this report
revealed potential avenues for possible gene linkage discoveries as well as confirming a
few preexisting ones. Future work includes further exploration of subgrouping
differential expression based on parental age, suggesting possible gene linkages and
demographic prevalence. Unknown genes that have and have not been previously
linked to Autism were discovered in this report, and could also be investigated using
other
biological
methods
to
test
for
significance
in
this
highly
prevalent
neurodevelopmental disorder among children.
Significant Gene Expression in Patients with Autism
18
1: Replicate
the Au
Aim 1: Replicate theAim
Authors
Data
younger than the
co
hypothesized,
theref
statistically
Pearson’s
correlation
coefficient
(r) mean
between
hypothesized,
therefore,
factors
that
modified
risk variables
for
statistically
younger
thansignificant.
the
control that
group
(autism:
- 5.5two
years
SDautism
- 2.1;is
Pearson’ssignificant.
correlation coefficient (r) between two variables is
control:
mean - 7.9
in
the
general
po
defined
as
the
covariance
of
the
two
variables
divided
by
the
in
the
general
population
might
alter
the
variance
in
gene
control: mean - 7.9 SD - 2.2, p,.0001), we tested whether subject
defined as the covariance of the two variables divided by the
age had an effect
expression
in a on
ma
product
standard
deviations.
The the
square
of the
expression
intheir
a on
manner
that
resembled
variance
inoverall
gene
Analysis
covariance
Analysis
covariance
age
had anofof
effect
the relationship
between
diagnosis
andPearson’s
product ofoftheir
standard deviations. The square of the Pearson’s
2
variance. Aninanalysi
expression
childr
correlation
coefficient
(R
) estimates
the
proportion
of
variance
in
expression
in
children
with
autism.
Paternal
age
was
found
in
2 was used to examine the potential
Analysis
of
covariance
was
used
to
examine
the
potential
Analysis
of
covariance
variance.
An
analysis
of
covariance
(ANCOVA)
demonstrated
that
correlation coefficient (R ) estimates the proportion of variance in
even whenstudies
controllin
to
aconfounding
dependent
variable
(e.g.
variance)
accounted
for byK. multiple
multiple
to ofbe
aoverall
risk
factor
forthat
autism
and
other
effects
subject
ageage,
andvariance
scan
batch
on
aexpression
significant
effects
of(e.g.
subject
agevariance)
and scan
batch
on a significant
even
whenstudies
controlling
for
subject
inis&
gene
MA
584: variable
Statistical
Methods
in
Bioinformatics
Lagoy,
R.C.
Burnette,
aconfounding
dependent
overall
that
is accounted
for by
continued to be sign
neurodevelopmenta
an
independent
variable
(e.g.
paternal
age).
Fisher’s
transformaneurodevelopmental
disorders
such
as
schizophrenia
and
mental
association
of
autism
with
decreased
overall
variance.
Variance
was
association
of
autism
with
decreased
overall
variance.
Variance
was
continued
to
be
significantly
decreased
by
the
same
amount
in
the
an independent variable (e.g. paternal age). Fisher’s transformablood of children
larw
retardation
[26,28,3
tion
was
used
tolarge-scale
calculate
a p-value
from
the
Pearson’s
r. control),
our
organization
ofbut
gene
retardation
[26,28,34,35].
We
found
that= in
controls,
not
in Determining
the dependent
variable,
diagnosis
(21
autism,
+1
=(p
the dependent
= the
autism,
+1 =r. control),
blood
of children
with autism
compared
to
controls
=For
.018,
tion
was used tovariable,
calculatediagnosis
a p-value(21
from
Pearson’s
For our Determining
parameter
estimate
children
with
autis
analyses
we
considered
p-value
ofin
.05
to expression
statistically
children
with
autism,
overall
variance
gene
was
subjectexpression
age,
and
scan
batch
(21
= lower
batch
1,autism).
+1
=bebatch
2).
All
expressi
subject age,
scan batch
(21 = batch
1, +1
2). All
in aStd
blood
of
children
parameter
estimate
= 2.45
Dev
in
When
scan
analyses
we and
considered
a p-value
of .05
to =bebatch
statistically
batch was included
significantly
and ne
significant.
significantly
and negatively
associated
paternal
agethe
(Pearson
possible
termssubject
were
also
in the model.
For
our
5.0ourAppendix
possible interaction terms were also included in the model. For
batch
wasinteraction
included
with
age included
inwith
the ANCOVA
results
significant.
with
a
with
autism
and
controls
2
remained
= .08
ralso
= 2.283,
R2 signif
= .08, p =
.03,
parameter
estimate
==
2.054
ralso
= 2.283,
analyses
weRconsidered
a (p
p-value
of
.05 to be
statistically
significant.
analyses we considered a p-value of .05 to be statistically significant.
remained
significant
= .03,
parameter
estimate
2.42 Std
Std
Dev
lower
in
autism
Spearman’s
rank correlation
(rho)
Dev
lower in autism).
Importantly,
parameter estimates for the
Spearman’s rank correlation (rho)
relationship of diagn
rank correlation
assesses
well the
relationship
ChiSpearman’s
squareof diagnosis
relationship
to variance
werehow
virtually
unchanged
in
ChiSpearman’s
square rank correlation assesses how well the relationship
the ANCOVAs indi
between
two indicating
variables
can
be
described
usingwere
a monotonic
An
internet
based
262
chi
square
contingency
table
(http://
the
ANCOVAs
that
increasing
p-values
related
to
An
internet
based
262
chi
square
contingency
table
(http://
between two variables can be described using a monotonic
increases in the deg
function.
It
is
used
to
assess
the
relationship
between
2
variables
www.graphpad.com/quickcalcs/contingency1.cfm)
was not
used
increases
in the degrees of freedom in the analysis, and
to ato
www.graphpad.com/quickcalcs/contingency1.cfm)
was2 variables
used to
function. It is used to assess the relationship between
decreased associatio
when data
issignificant
not normally
distributed.
assess
for
a
overlap
between
gene
lists.
A
chi-square
decreased
association.
assess
for
a
significant
overlap
between
gene
lists.
A
chi-square
when data is not normally distributed.
with
Yates
correction
was
used
to
calculate
chi
squared
and
a
twowith Yates correction was used to calculate chi squared and a twoIncreased paterna
variance( Increased
tailed
p-value.
Unpaired
Student’sage
T-tests
tailed
p-value.
paternal
is associated with decreased
Unpaired
Student’s T-tests
overall variance i
Compares
two
groups
of
normally
distributed
data
to
test
overall variance in gene expression levels (figure 3)
Compares two groups of normally distributed data to test
Previous work in
whether
the
means
of
the
distributions
are
different.
The
p-value
Results
Results
Previous work in mice indicated that factors or interventions
whether the means of the distributions are different. The p-value
that modified mo
represents
the mouse
probability
that the distributions
are actually
that
modified
hippocampal-dependent
behavior
also
represents the probability that the distributions are actually
Decreased
variance
in log-transformed
modified overall va
Decreased
overall variance in log-transformed measures
different.overall
Foroverall
ourvariance
analyses
considered
p-value ofmeasures
.05 toWe
be
modified
inwethe
predicteda direction
[20].
different. For our analyses we considered a p-value of .05 to be
hypothesized, theref
statistically
significant.
of
gene
expression
predicts
the
diagnosis
of
autism
of
gene
expression
predicts
the
diagnosis
of
autism
hypothesized, therefore, that factors that modified risk for autism
statistically significant.
in the general po
1 andpopulation
2)
(figures 1 and 2)
in(figures
the general
might alter the variance in gene
expression in a ma
Analysis
of
covariance
We
used
microarrays
to
measure
the
expression
levels
of
greater
We usedofmicroarrays
to measure the expression levels of greater
expression in a manner that resembled the variance in gene
Analysis
covariance
expression in childr
than
47,000
transcripts
including
38,500
well-characterAnalysis
ofunique
covariance
was
usedPaternal
to examine
the found
potential
than
47,000
unique
transcripts
including
38,500
well-characterexpression
in
children
with
autism.
age
was
in
Analysis of covariance was used to examine the potential
multiple studies to
ized
human
genes
using
the
Affymetrix
Human
U133
Plus
2.0
confounding
effects
of
subject
age
and
scan
batch
on
a
significant
ized
human
genes
using
the
Affymetrix
Human
U133
Plus
2.0
multiple studies to be a risk factor for autism and other
confounding effects of subject age and scan batch on a significant
Figure
2. Overall
neurodevelopmenta
Figure
2. Overall
variance
in from
geneoverall
expression
inVariance
peripheral
microarray
with
purified
peripheral
blood
lymphocytes
association
of
autism
withRNA
decreased
variance.
was
microarray
with
purified
RNA
from
peripheral
blood
lymphocytes
neurodevelopmental
disorders
such
as
schizophrenia
and
mental
association of autism with decreased overall variance. Variance was
blood
lymphocyte
blood
lymphocytes
(PBL)
wassporadic
decreased
inofchildren
with
lar
retardation
[26,28,3
fromdependent
each
of 82
children
with
cases
autism
and
64 Determining
the
variable,
diagnosis
(21
= in
autism,
+1
=gene
control),
from
each
of
82
children
with
sporadic
cases
of
autism
and
64
Determining
large-scale
organization
of
retardation
[26,28,34,35].
We
found
that
controls,
but
not
in
autism. We used m
the dependent variable, diagnosis (21 = autism, +1 = control),
autism. We used microarrays to measure the expression levels of
children
with
autis
control
subjects
(figure
1).
In
contrast
to
comparing
the
expression
subject
age,
and
scan
batch
(21
=
batch
1,
+1
=
batch
2).
All
expressi
control subjects
In contrast
to comparing
expression
childrenexpression
with47,000
autism,
overall
variance
genewell-characterized
expression was
greater than 47,000
greater
than
transcripts
including
38,500
subject
age, and(figure
scan 1).
batch
(21 = batch
1, +1 = the
batch
2). All
inwere
blood
of in
children
significantly
and
levels genes
of interaction
individual
genes,
we
compared
thepaternal
pattern
of the
overall
possible
terms
also
included
in the
For
our
levels of interaction
individual genes,
we compared
the pattern
of theFor
overall
human
geneswith
usingneth
human
usingnegatively
the
Affymetrix
Human
U133
Plus
2.0model.
microarray
on
significantly
and
associated
with
age
(Pearson
a
possible
terms were
also included
in the model.
our
with
autism
and
controls
r = 2.283,
R2 = .08
2 gene
RNA
from
peripheral
distribution
of
expression
levels
between
children
with
autism
analyses
we
considered
a
p-value
of
.05
to
be
statistically
significant.
RNA
from
peripheral
blood
lymphocytes
from
each
of
82
children
with
distribution
of
gene
expression
levels
between
children
with
autism
= .08, p Alter+
= .03, parameter
estimate
= PLoS.!
2.054 Std
r = 2.283, R
et
al.
(2011)
analyses we considered a p-value of .05 to be statistically significant.
variance(
autism
and
64
contr
autism
and
64
control
subjects.
Microarrays
showed
no
group
level
and controls. Measurement of the variance in the distribution of
and controls. Measurement of the variance in the distribution of
differences in quality
differences
in quality
control
measures.
Microarray
expression
levelsof
genesquare
expression
levels
assessed
for differences
at the
global level
gene expression levels assessed for differences at the global level of
Chi
were log-transformed
A.1
A.2
Chi
square
were
log-transformed
and the overall
variance
was calculated
across the
gene
expression
regulation.
To
obtain
a
normal-like
distribution,
gene expression regulation. To obtain a normal-like distribution,
internet based
262 chi
square
contingency
table
(http://
total distribution of e
totalAn
distribution
of expression
levels
on each
microarray.
Variance
in
An internet based 262 chi square contingency table (http://
geneexpression
expression
were decreased
log2-transformed.
The
overall
gene expression levels were log2-transformed. The overall
www.graphpad.com/quickcalcs/contingency1.cfm)
was
used
to
gene expression was
gene
was levels
significantly
in the blood
of children
www.graphpad.com/quickcalcs/contingency1.cfm) was used to
variance
the
distribution
was
measured
for each
with autism (p = .006)
variance of the total distribution was measured for each
assess
for of
a(p =
significant
between
gene
lists.
with
autism
.006). total
Erroroverlap
bars
represent
standard
error.A chi-square
assess for a significant overlap between gene lists. A chi-square
doi:10.1371/journal.po
microarray
(schematicwasinused
figure
1). Thechi
distribution
doi:10.1371/journal.pone.0016715.g002
microarray (schematic in figure 1). The distribution of gene
with
Yates correction
to calculate
squared andofa gene
twowith Yates correction was used to calculate chi squared and a twotailed
p-value.
Z"score(
tailed p-value.
1: Replicate
the Au
Aim 1: Replicate theAim
Authors
Data
PLoS ONE | www.plosone.org
4
PLoS ONE | www.plosone.org
February 2011 | Volume 6 | Issue 2 | e16715
4
Results
Results
Decreased overall variance in log-transformed measures
of gene expression predicts the diagnosis of autism
(figures 1 and 2)
Decreased overall variance in log-transformed measures
of gene expression predicts the diagnosis of autism
(figures 1 and 2)
We used microarrays to measure the expression levels of greater
We used microarrays to measure the expression levels of greater
than 47,000 unique transcripts including 38,500 well-characterthan 47,000 unique transcripts including 38,500 well-characterized human genes using the Affymetrix Human U133 Plus 2.0
ized human genes using the Affymetrix Human U133 Plus 2.0
microarray
with purified
RNA
blood
Figure
2. Overall
variance
in from
geneperipheral
expression
in lymphocytes
peripheral
microarray with purified RNA from peripheral blood lymphocytes
from each
of 82 children
casesinofchildren
autism and
64 • 
blood
lymphocytes
(PBL) with
wassporadic
decreased
with
from each of 82 children with sporadic cases of autism and 64
autism.
We used(figure
microarrays
to measure
the expression
levels of
control subjects
1). In contrast
to comparing
the expression
• 
control subjects (figure 1). In contrast to comparing the expression
greater
47,000 genes,
transcripts
including 38,500
well-characterized
levels ofthan
individual
we compared
the pattern
of the overall
levels of individual genes, we compared the pattern of the overall
human genes using the Affymetrix Human U133 Plus 2.0 microarray on
distribution of gene expression levels between children with autism
RNA from peripheral
blood
lymphocytes
from
each
of 82 children
with
distribution of gene expression levels between children with autism
Z"score(
Alter+
al. showed
(2011)
PLoS.!
and controls.
Measurement
theetvariance
in thenodistribution
autism
and 64 control
subjects.ofMicroarrays
group levelof
and controls. Measurement of the variance in the distribution of
gene expression
levels
assessed
for differences
at the
global level
differences
in quality
control
measures.
Microarray
expression
levelsof
gene expression levels assessed for differences at the global level of
B.1
genelog-transformed
expression regulation.
To B.2
obtain
a normal-like
distribution,
were
and the overall
variance
was calculated
across the
gene expression regulation. To obtain a normal-like distribution,
total
of expression
levelslog2-transformed.
on each microarray.The
Variance
in
genedistribution
expression
levels were
overall
gene
expression1.levels
were log2-transformed.
overall
gene
expression
was
significantly
decreased
in the
blood in
of for
children
Appendix
Attempted
methods of The
calculating
variance
gene
expression
(z-score
units
variance
ofinthe
total
distribution
was
measured
each of
variance of the total distribution was measured for each
with autism (p = .006). Error bars represent standard error.
microarray
(schematic calculating
in figure 1). The
distribution
of geneall
standard(schematic
deviation)
to generate
Alter etof al.’s
2 by (A.1-A.2)
variance
across
doi:10.1371/journal.pone.0016715.g002
microarray
in figure
1). The distribution
gene Fig.
individuals per experimental group for each gene or across all genes per individual and
PLoS ONE | www.plosone.org
averaged
as experimental groups or by (B.1-B.2)
calculating
z-score
across all individuals per4
PLoS ONE | www.plosone.org
4
February 2011 | Volume 6 | Issue 2 | e16715
experimental group for each gene and finding this standard deviation or across all genes per
individual and averaged as experimental groups.
Significant Gene Expression in Patients with Autism
19
Figure 2. Overall
blood lymphocyte
autism. We used m
greater than 47,000
human genes using th
RNA from peripheral
autism and 64 contr
differences in quality
were log-transformed
total distribution of e
gene expression was
with autism (p = .006)
doi:10.1371/journal.po
None of thes
Thus, try to r
MA 584: Statistical Methods in Bioinformatics
Lagoy, R.C. & Burnette, K.
6.0 References
1. Center for Disease Control and Prevention (CDC). (2014). “Prevalence of Autism
spectrum disorder among children aged 8 years.” Autism and Developmental
Disabilities Monitoring Network. Surveillance Summaries. 63(SS02); 1-21.
2. Ma, D. & Salyakina, D., et al. (2009). “A genome-wide association study of autism
reveals a common novel risk locus at 5p14.1.” Annals of Human Genetics. 73(3):
263-273.
3. Abrahams, B., & Geschwind. D.H., (2008). “Advances in autism genetics: on the
threshold of neurobiology”. Nature Reviews Genetics. 9(5): 341-355.
4. Gupta, A.R., & State, M.W., (2007). “Recent advances in the genetics of Autism.”
Biological Psychiatry. 61(4): 429-537.
5. Gillberg, C., & Billstedt, E., (2000). “Autism and Asperger syndrome: coexistence
with other clinical disorders.” Acta Psychiatr Scand. 102: 321-330.
6. Levy, S.E., Giarelli, E., et al. (2010). “Autism spectrum disorder and co-occurring
developmental, psychiatric, and medical conditions among children in multiple
populations of the United States.” Journal of Development Behavior Pediatrics. 31(4):
267-275.
7. O’Roak, B.J., & State, M.W., (2008). “Autism genetics: strategies, challenges, and
opportunities.” Autism Research. 1(1): 4-17.
8. Zoghbi, H.Y., et al. (2003). “Postnatal neurodevelopmental disorders: meeting at the
synapse?” Science. 302(5646): 826-830.
9. Geschwind, D.H., & Levitt, P., (2007). “Review Autism spectrum disorders:
developmental disconnection syndromes.” Curr Opin Neurobiol. 7(1): 103-111.
10. Tabuchi, K., Blundell, J., Etherton, M.R., et al. (2007). “A neuroligin-3 mutation
implicated in autism increases inhibitory synaptic transmission in mice.” Science.
318(5847): 71-76.
11. Chugani, D.C., et al. (2004). “Review: Serotonin in autism and pediatric epilepsies.”
Ment Retard Dev Disabil Res Rev. 10(2): 112-116.
12. Krey, J.F., & Dolmetsh, R.E. (2007). “Review: Molecular mechanism of autism: a
possible role for Ca2+ signaling.” Curr Opin Neurobiol. 17(1): 112-119.
13. Jamain, S., Quach, H., Betancur, C., et al. (2003). “Mutations of the X-linked genes
encoding neroligins NLGN3 and NLGN4 are associated with autism.” Nature
Genetics. 34: 27-29.
14. Alter, M.D., Kharkar, R., Ramsey, K.E., et al. (2011). “Autism and increased
paternal age related changes in global levels of gene expression regulation.”
PLoS. 6(2): e16715.*
*The study we adapted RMA data from for our study and generation of results using R.
Significant Gene Expression in Patients with Autism
20