Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. “Significant Gene Expression and Variance in Patients with Autism and Relatedness to Paternal Age: further analysis adopted from Alter, (2011). PLoS” Ross C. Lagoy & KaLia Burnette WORCESTER POLYTECHNIC INSTITUTE MA584/BCB584: Statistical Methods in Genetics and Bioinformatics Instructor: Zheyang Wu, PhD Date: May 6, 2014 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. I. Table of Contents I. Table of Contents .............................................................................................................................. i II. List of Figures ................................................................................................................................. ii III. List of Tables .................................................................................................................................. ii V. Abstract ............................................................................................................................................ iii 1.0 Background .................................................................................................................................... 1 1.1 Research Need ......................................................................................................................................... 1 1.2 Current Research .................................................................................................................................... 1 1.3 Genome Analysis Method(s) .............................................................................................................. 2 1.4 Objective(s) ............................................................................................................................................... 3 2.0 Methods ........................................................................................................................................... 5 2.1 Data Acquisition ...................................................................................................................................... 5 2.2 Microarray Data Analysis ..................................................................................................................... 5 2.3 Statistical Analysis ................................................................................................................................. 5 2.4 Variance Across Sample Set(s) & Tests of Normality ............................................................. 5 2.5 Pearson’s Correlation Coefficient .................................................................................................... 6 2.6 Spearman’s Rank Correlation ............................................................................................................ 6 2.7 Unpaired Student’s T-tests ................................................................................................................. 6 2.8 Empirical Bayes Statistics .................................................................................................................. 6 3.0 Results and Discussion ............................................................................................................ 8 3.1 Complete Data Set Analysis ............................................................................................................... 8 3.2 Gene Variance & Test of Normality ................................................................................................. 9 3.3 Variance in Gene Expression: Replicated Data ....................................................................... 10 3.4 Heatmap Illustration(s) of Significant Differential Gene Expression .............................. 12 3.5 Statistically Significant Autism Linked Genes ......................................................................... 16 4.0 Conclusions & Future Directions ........................................................................................ 18 5.0 Appendix ....................................................................................................................................... 19 6.0 References .................................................................................................................................... 20 Significant Gene Expression in Patients with Autism i MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. II. List of Figures Figure 1. Generation and analysis of gene expression from Alter et al.’s Affymetix probe data set. There were 146 total patients, 64 controls, 82 cases. PBLs assayed from each individual and sored as a data matrix with samples along the columns and 54,000+ genes along the rows. ................................................................. 3 Figure 2. Visualization of the complete Alter et al. data set using boxplots (A,B), and a MAplot (C). (A) Illustrates the raw RMA probe values (y-axis) of each individual (x-axis). (B) Illustrates the log2 transformed RMA values plotted in the same manner as (A): controls (green), cases (red). (C) MAplot of the complete log2 transformed data matrix. ........................................................................................................................................ 8 Figure 3. Variance in gene expression was not normally distributed across the entire population (A) or control subjects (B) of the log2 transformed data set values (x-axis), reproduced as described from Alter et al.............................................................................................................................................................................................................................. 9 Figure 4. Variance was normally distributed within experimental groups: (A) autism, (B) children of older fathers, and (C) children of younger fathers of the log2 transformed data set values (x-axis), reproduced as described from Alter et al. ....................................................................................................................................................................... 9 Figure 5. Paternal age was normally distributed across the entire study populations and within relevant experimental groups: (A) all ages, (B) controls, (C) autism, reproduced as described from Alter et al. ....... 10 Figure 6. Increased paternal age at birth is negatively associated with overall variance in gene expression in peripheral blood lymphocytes of normal children. (A) Illustrating our generated plot replicating (B) Alter et al.’s plot, Fig. 3a. ......................................................................................................................................................................................... 11 Figure 7. Further analysis of Figure 6 (above) demonstrating no association between paternal ages and variance in gene expression at birth for autism subjects illustrated as (A) a scatter plot trend and (B) boxplot, described by Alter et al., but not shown in their report. ........................................................................................... 12 Figure 8. Heat map illustrations of the top 10 genes calculated by empirical Bayes statistics and sorted by lowest p-value (A.1-A.3), logFC (B.1-B.3), and highest average expression (C.1-C.3). (A.1-C.1) Rows (genes) sorted by Pearson’s coefficient and columns (patients) by Spearman’s correlation. (A.2-C.2) Rows sorted by default, columns sorted by Spearman’s correlation. (A.3-C.3) Rows and columns sorted by default (as shown in Table 1-3). Scale top left: green is low expression (RMA), and red is high expression, black is neutral. ...................................................................................................................................................................... 14 Figure 9. Of the 8,400 genes with p<0.05 calculated using empirical Bayes statistics, 3 are supported as autism linked (A): (B) METTL12, (C) UBE3A, and (D) OXT; AUT = Autism, CTRL = control. ........................... 16 III. List of Tables Table 1. Table(s) of the top 10 genes calculated by empirical Bayes statistics and sorted by lowest pvalue. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. ............................................................................................. 15 Table 2. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by logFC. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. ...................................................................................................................... 15 Table 3. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by average expression. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. ................................................................... 15 Table 4. Additional overlap of our 48 genes reported to be autism-linked with a p-value in a range of +0.03 of 0.05. ...................................................................................................................................................................................................... 17 Significant Gene Expression in Patients with Autism ii MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. V. Abstract Autism spectrum disorder (ASD) is one of the most common neurodevelopmental disorders among children. It is defined and diagnosed by impairment in social interaction, language, and range of interest. Autism is also often diagnosed with other medical conditions such as seizers and anxiety. These co-occurring conditions and wide range of variability in Autism has presented challenges in finding a universally accepted polymorphism that causes this disorder; therefore, multiple co-existing genes have been correlated, identified, and modeled in vitro and in vivo. Microarray analysis of gene expression is a critical tool that can allow for the association of specific genes to variance genetic conditions, especially for genome-wide diseases such as ASDs. Statistical tests and data mining techniques allow for the discovery of distinct expression levels in individual patients of gene subsets and across populations. The project chosen for this class uses freely available data from the GEO dataset browser to (1) replicate some of the preliminary analysis presented in a paper from Alter et al. using the GEOquery and BioConductor package(s) in R. These results will confirm our understating of the large data set, and provide a starting point to (2) extrapolate new conclusions from this study as well as (3) provide future directions and ask new question related towards the investigation of autism risk genes. In this study we found known three Autism-linked genes such as OXT, METTL12, and ZSCAN18 with significant difference between experimental groups and interesting expression profiles when clustered as heat maps, to be further investigated. Significant Gene Expression in Patients with Autism iii MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 1.0 Background 1.1 Research Need Autism spectrum disorder (ASD) is estimated to affect about 1 in 68 children [1], ranking it as one of the most common neurodevelopmental disorders among children [2]. Autism is a heterogeneous syndrome defined by impairments in three core domains: social interaction, language, and range of interests [3]. The disorder has an estimated heritability of greater than 90%; however, its specific genetic etiology is unknown [4]. Autism is often diagnosed in occurrence with other medical conditions [5], such as seizers or anxiety, and therefore may significantly impact the identification and treatment needs of the diagnosed individuals [6]. Because of the challenges presented by these co-occurring conditions, and the wide range of variability in ASD, no universally accepted susceptibility polymorphism has been found from current research efforts [2]. However, multiple distinct rare changes in specific genes have been identified in small subsets of individuals that may cause or contribute to ASDs [7]. 1.2 Current Research Continued efforts in whole-genome linkage studies are used to identify potentially important disease-risk loci. Current research supports that ASD may be caused by a single genetic mutation in addition to many relatively rare mutations [3]. It is generally reported that this disorder is of developmental onset, with an unknown primary cause. The dominant hypothesis is accredited to cellular, regional, or systemic dysfunction influenced by environmental factors and heredity (like fathers paternal age, found by Alter et al. in 2011 by bioinformatics techniques). Genetic causes of Autism spectrum disorders are also known to effect intracellular signaling pathways. Defective synaptic function and abnormal brain connectivity are proposed biological themes that may produce the heterogeneous characteristics of ASD while supporting the notion that there is great variation among the rare genetic mutations found in these patients [8,9]. Additional hypotheses suggest defects in inhibitory synapses in patients with ASDs, thus accounting for co-diagnosis with seizures [10]. Neurotransmitters such as serotonin and oxytocin regulation could also potentially account for abnormalities within and outside of the central nervous system including the brain, while imbalances of these neurochemicals could cause ASD symptoms [11]. Calcium signaling has also shown Significant Gene Expression in Patients with Autism 1 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. probable cause as a possible contributing mechanism to ASD [12]. Therefore, abnormalities due to cellular dysfunction may be caused by genetic mutations and could be identified, assessed, and correlated through standard bioinformatics methods by surveying diagnosed and healthy populations using microarray technology. A few specific proteins have been identified and reported for significant prevalence in autism patients. The ubiquitin protein ligase, E3A also known as UBE3A, and inhibitory neurotransmitter receptor, gamma-aminobutyric acid (GABRB3) are currently thought to play a central role in ASD. Other research has shown the potential involvement of centaurin gamma 2 (CENTG2), a GTPase-activating protein. The synaptic adaptor protein(s), SHANK1-3, and similar scaffolding proteins have also shown to have high linkage to the disorder. Further identifying and correlating differential expression of these genes, and others identified in the literature, can be appropriately assessed through state-of-the-art genome analysis and bioinformatics techniques. 1.3 Genome Analysis Method(s) Gene expression is a critical measure that can be quantified and translated to describe levels that particular genes that are expressed within a cell, tissue, or organism. One method of genetic analysis is through genome wide expression analysis with commercially available microarrays (i.e. Affymetrix) containing specific hybridized nucleotide arrangements (Figure 1). Microarray analysis of gene expression involves the screening of purified and fluorescently labeled DNA or mRNA (generally from patients peripheral blood lymphocytes, PBLs), which binds to complementary hybridized transcripts immobilized as an organized array on a chip, called probes. Each probe represents a specific gene (cDNA or mRNA), which translates to a fluorescence intensity and thus an expression level (Figure 1, middle). Thus, different expression levels can be observed via fluorescence intensities: the more bright the fluorescence signal is for a probe, the more binding of complementary oligonucleotides, thus inferred to have a higher prevalence in the individual’s blood (expression value) and vise versa for low intensity signals. These recordings are assigned relative values across the microarray containing 54,000+ genes that survey the entire genome. Statistical test must be done to normalize the data and accurately interpret the identification of Significant Gene Expression in Patients with Autism 2 MA 584: Statistical Methods in Bioinformatics Autism and Increased Paternal Lagoy, R.C. & Burnette, Age Related Changes in K. Global Levels of Gene Expression Regulation differently expressed genes. These tests, along with data-mining and associative 1 1 2 2 3 3 Mark D. Alter *, Rutwik Kharkar , Keri E. Ramsey , David W. Craig , Raun D. Melmed , Theresa A. Grebe , 3 3 3 4 , Sharman Ober-Reynolds Janet Kirwanexpression , Josh J. Joneslevels , J. Blake , Rene patients Hen5, R. Curtis Bay techniques, allow for the discovery3,of distinct inTurner individual of Dietrich A. Stephan2 gene1 Center subsets and a population ofof healthy individuals and2 Neurogenomics diagnosed for Neurobiology and across Behavior, Department of Psychiatry, University Pennsylvania, Philadelphia, Pennsylvania,(controls) United States of America, Division, Translational Genomics Research Institute, Phoenix, Arizona, United States of America, 3 Southwest Autism Research and Resource Center, Phoenix, Arizona, United States of America, 4 Division of Child and Adolescent Psychiatry, Department of Psychiatry, Columbia University, New York, New York, United States of America, patients (cases). 5 Departments of Psychiatry and Neuroscience, Columbia University, New York, New York, United States of America Abstract A causal role of mutations in multiple general transcription factors in neurodevelopmental disorders including autism suggested that alterations in global levels of gene expression regulation might also relate to disease risk in sporadic cases of autism. This premise can be tested by evaluating for changes in the overall distribution of gene expression levels. For instance, in mice, variability in hippocampal-dependent behaviors was associated with variability in the pattern of the overall distribution of gene expression levels, as assessed by variance in the distribution of gene expression levels in the hippocampus. We hypothesized that a similar change in variance might be found in children with autism. Gene expression 64#controls# microarrays covering greater than 47,000 unique RNA transcripts were done on RNA from peripheral blood lymphocytes (PBL) of children with autism (n = 82) and controls (n = 64). Variance in the distribution of gene expression levels from each microarray was82#cases# compared between groups of children. Also tested was whether a risk factor for autism, increased paternal age, was associated with variance. A decrease in the variance in the distribution of gene expression levels in PBL was associated with the diagnosis of autism and a risk factor for autism, increased paternal age. Traditional approaches to microarray analysis of gene expression suggested a possible mechanism for decreased variance in gene expression. Gene expression pathways involved in transcriptional regulation were down-regulated in the blood of children with autism and children of older fathers. Thus, results from global and gene specific approaches to studying microarray data were complimentary and supported the hypothesis that alterations at the global level of gene expression regulation are related Humanthus, U133 to autism and increased paternal age. Global Affymetrix regulation of transcription, represents a possible point of convergence Data Values (RMA) for multiple etiologies of autism and other neurodevelopmental disorders. Array Patients Plus 2.0 Expression Log2 transformed Peripheral Blood Lymphocytes (RNA from PBL) Citation: Alter MD, Kharkar R, Ramsey KE, Craig DW, Melmed RD, et al. (2011) Autism and Increased Paternal Age Related Changes in Global Levels of Gene Expression Regulation. PLoS ONE 6(2): e16715. doi:10.1371/journal.pone.0016715 Editor: Joanna Bridger, Brunel University, United Kingdom October 6, 2010; Accepted 20, 2010;of Published 17, 2011 FigureReceived 1. Generation andDecember analysis geneFebruary expression from Alter et al.’s Affymetix probe data Copyright: ! 2011 Alter et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits set. There were 146 total patients, 64 controls, 82 cases. PBLs assayed from each individual unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. and sored asResearch a data matrix along columns and 54,000+ genes along the rows. Funding: support: Dr. Alter’swith salary samples was paid in part through the a NARSAD Young Investigator Award (http://www.narsad.org/?q = node/124/ apply_for_grants/124); sample collection and processing, microarrays, and salary support provided through a grant from the state of Arizona. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 1.4 Objective(s) Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] A recent study reported that mouse behavior, in genetically identical animals, [7,8,9]. The idea that alterations at of the gene global level of gene Introduction could be accurately predicted by evaluating patterns of distribution expression expression regulation might be important in mediating the risk for Autism is a severe neurodevelopmental disorder with character- autism or other disease states has been largely underexplored. levels by variance the distribution [13]. This same group applied istic when social andassessed communication deficits and ritualistic in or repetitive Supporting the possible importance of global regulation of gene this behaviors that appear by age three. Many etiologies have been expression in neurodevelopmental disorders, genetic studies found Though autism is associated with a high degree of heritability, few expression were linked to neurodevelopmental disorders including suggested numerous risk have been identified of [1]. the method of and studying thefactors overall pattern to oftest thatgene mutationsexpression in genes encodingdistribution for global regulators gene the specific genetic mutations have been identified accounting for a autism [5,6]. in Pharmacological studies also targeting hypothesis that Autism is associated with alterations global levels ofsuggested genethat expression minority of cases [2,3,4,5,6], while the majority of cases are global levels of gene expression regulation could impact neurode- considered sporadic. The failure to identify specific gene variants for instance, valproate, a histone deacetylase inhibitor regulation Affymetrix probe datafactors [14]. velopment. TheseFor researchers also analyzed the most cases ofof autism has been attributed to many potential (HDACi), is a commonly used medication in the treatment of including complex interactions of multiple genes, a heterogeneous seizures, mental health disorders, and cancer that impacts global or epigenetic factors not related to specific genetic mutations or mechanisms. When given during gestation, valproate can and they are not mutually exclusive. in humans [10,11,12,13]. Thus, both genetic and pharmacological correlation expression distribution andthrough paternal age. They disorder withbetween multiple causespatterns converging onof the the autisticgene phenotype, levels of gene expression regulation chromatin based polymorphisms None of these hypotheses has been confirmed adverselyzinc impact and neurodevelopment in rodentspathways) and cause autism that found a list of[2,3]. genetic transcripts (down-regulated transcription Research on gene expression in autism has previously focused studies suggest alterations in global levels of gene expression overlap between control patients with older fathers and patients with Autism. on identifying specific or a limited group of genes related to disease regulation can interfere with normal neurodevelopment. Addi- For this| www.plosone.org project, we will use this Alter et al.’s freely PLoS ONE 1 Februaryavailable 2011 | Volume 6and | Issue accessible 2 | e16715 Affymetrix data from the GEO dataset browser to (1) replicate some of the preliminary analysis presented in this paper using the GEOquery and BioConductor package(s) in R. These results will confirm our understating of the large data set, and provide a starting Significant Gene Expression in Patients with Autism 3 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. point to (2) extrapolate new conclusions from this study as well as (3) provide future directions and ask new question related to the investigation of Autism-risk genes. Significant Gene Expression in Patients with Autism 4 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 2.0 Methods 2.1 Data Acquisition Original data from Alter et al.’s study was obtained from blood samples taken from 146 subjects, 82 cases and 64 controls. Father’s paternal age for 78 of the children with Autism and 57 of the controls were also recorded. Briefly, total mRNA was isolated from these samples (PBLs) and double round amplified, cleaned, and biotin-labeled using Affymetrix’s GeneChip Two-Cycle Target Labeling kit with a 17 promoter and Ambion’s MEGAscript T7 High Yield Transcription kit. The Affymetrix Human Genome Array allows for complete coverage of the human genome U133 plus 6,500 additional genes for analysis of over 47,000 transcripts. Arrays were washed with the prepared samples, stained, and scanned. Raw signal intensity values were extracted per probe set on the array and scaled by a factor of 150 to normalize the array signal intensity in Microarray Analysis Suite (MAS) 5.0. The raw data extracted from these scans were pre-processed using MAS 5.0 so the gene expression values were not altered. Robust Multiarray Analysis (RMA) was used to normalize, summarize, and publish the data, as noted on the NCBI GEO accession display webpage. 2.2 Microarray Data Analysis Microarray data for this study was obtained as a pre-processed and summarized Microarray Analysis Suite (MAS) 5.0 and Robust Multiarray Analysis (RMA) SOFT file from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database repository, dataset accession number GDS4431. SOFT files were imported into R-3.1.0 for data processing and further statistical analysis. 2.3 Statistical Analysis RMA expression levels were log2 transformed and all following analyses were complete using R (V. 3.1.0) BioConductor packages and loading the following libraries: GEOquery, Biobase, gplots, preporocessCore, genefilter, limma, annotate, and hgu95av2.db – as described in our R code file. The statistical analyses used in this study are listed below. 2.4 Variance Across Sample Set(s) & Tests of Normality To first replicate the author’s observations, overall variance in gene expression and paternal age was calculated across the genome for subgroups: Autism, children of Significant Gene Expression in Patients with Autism 5 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. older fathers, children of younger fathers, paternal age across the entire study population and within relevant experimental groups. Histograms for each of these sample subsets were generated. This also allowed the authors, and us, to determine if parametric tests were appropriate by using the Shapiro-Wilk Test on these primary measures. A MAplot was also generated to test for and visualize normality within the populations log2 transformed RMA values. 2.5 Pearson’s Correlation Coefficient Pearson’s correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. Just as Alter et al. determined, a p-value of 0.05 was considered to be statistically significant. Pearson’s coefficient was also used to cluster our generated heat maps by genes (rows). 2.6 Spearman’s Rank Correlation Spearman’s correlation assesses how well the relationship between two variables can be described using a monotonic function. This is a method generally used to assess the relationship between two variables when the data is not normally distributed. This rank was used to cluster our generated heat maps by individuals (columns). 2.7 Unpaired Student’s T-tests This statistical test compares two groups of normally distributed data to test whether the means of the distributions are different. The p-value represents the probability that the distributions are actually different. Just as Alter et al. determined, a p-value of 0.05 was considered to be statistically significant. 2.8 Empirical Bayes Statistics Empirical Bayes methods are used for statistical inference when the prior distribution is estimated from the data, rather than when the prior distribution is fixed before any data are observed. This is viewed as an approximation to a fully Bayesian treatment of a hierarchical model where the parameters at the highest hierarchical level are set to their most likely values instead of being integrated out. A table was organized from subjecting expression differential data set to empirical Bayes statistics. Our generated table was organized by ranking order of p-value, adjusted p-value, logFC, Significant Gene Expression in Patients with Autism 6 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. and average expression. Heat maps were generated from resulting data and clusters were observed for further investigation. Significant Gene Expression in Patients with Autism 7 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 3.0 Results and Discussion 3.1 Complete Data Set Analysis Upon completion of importing the GEO data set into R, RMA values were calculated and plotted as a box plot. This plot showed us the range and vast amount of data we were working with; however, did not yield visually usable information. Thus, the gene expression data was log2 transformed and a box plot was created from this information. This scaled our working range of gene expression values down to a range of about 0 to 15 and organized by controls (green) and cases (red), consistent throughout the report. A B C Figure 2. Visualization of the complete Alter et al. data set using boxplots (A,B), and a MAplot (C). (A) Illustrates the raw RMA probe values (y-axis) of each individual (x-axis). (B) Illustrates the log2 transformed RMA values plotted in the same manner as (A): controls (green), cases (red). (C) MAplot of the complete log2 transformed data matrix. A MAplot was also generated to test for and visualize normality within the populations log2 transformed RMA values. The x-axis (A) describes the average log intensities of the population and the y-axis (M) plots the difference in average log intensities across individuals (Figure 2C). This plot also provides a visual for the ratio of intensity dependence of the microarray data. The majority of values falls along y = 0 (+/0.2) suggesting a normalized matrix of log2 RMA values, especially around RMA values ranging between 0-6 (low expression), which can be further analyzed as a cluster or independently. Significant Gene Expression in Patients with Autism 8 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 3.2 Gene Variance & Test of Normality Alter et al. first described the trends of their data matrix in terms of variance among experimental groups and paternal age distribution. They observed that gene expression was not normally distributed across the entire population or control subjects Tests of Normality by using the Shapiro-Wilk Test. We calculated the variance across all genes for the in gene expression was not distributed across thethe entire population Variance and control subjects, plotted a normally histogram for each, used Shapiro-Wilk population or control subjects. Test, and confirmed this observation (Figure 3). Population Variance Control Variance A B Figure 3. Variance in gene expression was not normally distributed across the entire population (A) or control subjects (B) of the log2 transformed data set values (x-axis), reproduced as described from Alter et al. Tests of Normality Using the same tests, the authors also observed that variance was normally distributed within experimental groups; as we also confirmed by plotting histograms and Variance was normally distributed within experimental groups: 1) autism, using the Shapiro-Wilk Test (Figure 4). 2) children of older fathers, and 3) children of younger fathers. Autism Variance A Older Fathers Variance B Younger Fathers Variance C Test variance binning to agree within more with authors conclusion Figure 4. • Variance was normally distributed experimental groups: (A) autism, (B) children of older fathers, and (C) children of younger fathers of the log2 transformed data set values (xaxis), reproduced as described from Alter et al. Significant Gene Expression in Patients with Autism 9 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. Tests of Normality Lastly, the authors stated that paternal age was normally distributed across the entire study population and within relevant experimental groups; as we also confirmed Paternalhistograms age was normally distributed across the entire study populations again by plotting and using the Shapiro-Wilk Test (Figure 5). and within relevant experimental groups: 1) autism, and 2) controls. All Paternal Ages Control Parental Ages Autism Parental Ages A B C • Basedage on was these analyses of normality, parametric used and within Figure 5. Paternal normally distributed across the entire tests studywere populations when appropriate relevant experimental groups: (A) all ages, (B) controls, (C) autism, reproduced as described from Alter et al. 3.3 Variance in Gene Expression: Replicated Data Since we obtained similar results and trends as the authors had, we decided to move forward and generate some of their graphs to further demonstrate that we understand their data matrix before using new analysis methods and discovering novel results. We first attempted to replicate Alter et al.’s Figure 2, plotting variance in gene expression (z-score in units of standard deviation) along the y-axis, and each experimental group as an individual bar. The authors are not clear in how they calculated these values, so we attempted two methods: (1) calculating variance across all genes for individual groups, or calculating variance across all patients for individual groups, and (2) calculating z-score across all genes for individual groups and finding the standard deviation of this list, or calculating the z-score across all patients for individual groups and finding the standard deviation of this list (Appendix 1). Each calculation did not result in what the authors showed in Figure 2 – we saw values that were not different between groups. Nonetheless, the authors concluded that overall variance in gene expression in peripheral blood lymphocytes was decreased in children with Autism, which we do not particularly agree with since they are not actually calculating variance Significant Gene Expression in Patients with Autism 10 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. and cannot infer from their units that this is indeed true, unless the authors explained their calculation more clearly. Therefore, we continued with our first method of calculating variance across all genes for each experimental group and binning the paternal ages together to generate Aim 1: Replicate the authors data Figure 3A in Alter et al.’s paper. We did not generate the same y-axis scale, but noticed the same negative trend (Figure 6), agreeing that increased paternal age at birth is negatively associated with overall variance in gene expression in peripheral blood Our)plot,)generated)in)R) lymphocytes of healthy children. A Authors)plot,)Fig.)3a) B Increased)paternal)age)at)birth)is)nega1vely)associated) Figure 6. Increased paternal age at birth is negatively associated with overall variance in gene with)overall)variance)in)PBL)of)controls.) expression in peripheral blood lymphocytes of normal children. (A) Illustrating our generated plot replicating (B) Alter et al.’s plot, Fig. 3a. Au Alter+ et al. (2011) PLoS.! This result was promising, and we used the same calculation method to investigate the autism experimental group and noticed just as the authors concluded (but did not show) that the variance in gene expression had no trend across paternal age (Figure 7). This was also a promising result, suggesting to us that there is a similarity to how we are calculating “variance” as the authors describe, but not exactly in the same manner. Since we were more interested in the data set as two individual Figure 3. Increased paternal age at birth is negatively associated with overall in gene expression in experimental groups and not yet sub-grouped by paternal agevariance (as the authors already peripheral blood lymphocytes (PBL) of normal children. Paternal age at birth was found in multiple studies to be a risk factor provided substantial investigation of unregulated and Previous downworkregulated genetic for autism [26,28,34,35]. indicated that factors or interventions that modified mouse hippocampal-dependent behavior also modified the overall in gene expression in the predicted differences between these groups), we moved forward byvariance comparing the two groups direction. In controls (figure 3a) but not in children with autism (not overall variance in log-transformed measures of gene inclusively, across all genes, to investigateshown), statistically significant differences gene expression was significantly and negatively associated withinpaternal age at birth (r = -.283, R2 = .08, p = .03, number of subjects = 57). For the evaluation of paternal age effects, paternal ages were available for 78 expression as a whole. children with autism and 57 control children. To directly compare overall microarray variance in children with autism to children of older fathers, we divided subjects by the median paternal age at birth in our study (31 years) and created 2 groups: 1) children from younger fathers (less than 31 years) (65 subjects: 30 controls and 35 children with autism); and 2) children from older fathers (31 years or older) (7011 Significant Gene Expression in Patients with Autism subjects: 27 controls and 43 children with autism). We compared mean levels of overall variance between children with autism and controls of older and younger fathers. As predicted, we found that overall variance was the same in children of older fathers and children with autism with Dev/year of pater estimate = 2.055 potential confound estimate remained .05 (p = .056, par paternal age). Th expression variance to autism-related relationship betwee using Spearman ra corrected for ties = The use of a non-p about the effects of than 3 standard dev from the use of n tests. The measure expression levels wa control children. Because increase related to overall va the overall variance would be similar to hypothesis, we per age (median = 31 variance in gene e controls in groups o predicted, we found blood of children f with autism from fa association of autis found in children expected from the analysis, variance children of older fa control children of Decreased varian down-regulation transcription (fig To evaluate for variance in gene ex from older fathers differences between blood of children w were compared to g children of younger children with autis many more signific genes (figure 4a). I children of youn significantly downgenes that were up fathers compared t down-regulated and same genes were u MA 584: Statistical Methods in Bioinformatics A Lagoy, R.C. & Burnette, K. B Figure 7. Further analysis of Figure 6 (above) demonstrating no association between paternal ages and variance in gene expression at birth for autism subjects illustrated as (A) a scatter plot trend and (B) boxplot, described by Alter et al., but not shown in their report. 3.4 Heatmap Illustration(s) of Significant Differential Gene Expression An empirical Bayes statistic was used to calculate p-value, logFC, and average expression for all 54,000+ genes. Of the 54,000+ genes, 8,400 were statistically different between the two experimental groups. For each of the categories, a heat map and descriptive table was generated for the top ten genes sorted by lowest p-value, logFC, and highest average expression. For the heat maps, each row (gene) was first organized by Pearson’s coefficient, and each column (individual) was organized and clustered by Spearman’s correlation. Additional organizations of the rows and columns were generated as shown and described in Figure 8 to more clearly visualize clustered expression levels. The maps are color scaled with green being the lowest expression, black being neutral, and red as high expression. Each heat map was visually observed for significant clustering of high or low expression of genes or groups of individuals within the map. Descriptions of the ten genes were reviewed in the literature for biological relevance (known or unknown protein-coding function(s)) and/or reports that have been linked to Autism or related genetic pathways. Two zinc related genes, zinc finger protein (ZSCAN18) and zinc ring finger protein (ZFP36L2), were among the top ten when the data set is arranged by p-value and logFC, which is the log odds ratio between experimental groups. Alter et al. also reported significance regarding down-regulated genes that are enriched for biological Significant Gene Expression in Patients with Autism 12 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. pathways related to transcription and zinc in blood of children with Autism and controls with paternal ages older than 31. This suggests additional significance of this gene and its variants probed for on this chip between experimental groups. Also, a human brain protein (hCG_2033649), Nance-Horan Syndrome (NHS) related protein, and calcium binding protein (S100A8), showed up on these sorted lists and are genes that have been shown to have possible genetic linkage to Autism in previous studies but not by Alter et al., suggesting that our analysis has provided additional insight into their data set. The heat map generated from the top ten statistically significant genes all showed clustering trends, some with high expression in the Autism group and lower expression in the control or vise versa (Figure 8A.2). The individuals within these clustered regions should be investigated further to extrapolate additional information, like parental age, and assessed for similarities found by Alter et al. The heat map generated by logFC sorting gave rise to two almost identically clustered genes (HLADQA1 and HLA-DQB1, both disease associated genes), but with no apparent segregation of disease states (Figure 8B.2) and little statistical difference. Although there is little difference between these genes, a subgroup (like paternal age) may show a significant difference, likewise for all genes in these lists. The highest average expression heat map(s) (Figure 8B.1-B.3) show little clustering differences between experimental groups (also shown by these p-values), but does show expression level differences of a couple genes and could be further extrapolated for individual clustering. Another interesting heat map could include a lowest average expression to further investigate the peaks for gene expression. Additional analysis can be conducted following this method to plot and investigate more genes. We were limited to plotting heat maps of 1,000 genes at a time before R would stall when executing the code, thus a server would need to be used for larger data processing. Significant Gene Expression in Patients with Autism 13 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. A.1 A.2 A.3 B.1 B.2 B.3 C.1 C.2 C.3 Figure 8. Heat map illustrations of the top 10 genes calculated by empirical Bayes statistics and sorted by lowest p-value (A.1-A.3), logFC (B.1-B.3), and highest average expression (C.1-C.3). (A.1-C.1) Rows (genes) sorted by Pearson’s coefficient and columns (patients) by Spearman’s correlation. (A.2-C.2) Rows sorted by default, columns sorted by Spearman’s correlation. (A.3C.3) Rows and columns sorted by default (as shown in Table 1-3). Scale top left: green is low expression (RMA), and red is high expression, black is neutral. The table below shows the Affymetrix gene code and corresponding gene name with a description found online (Table 1). The logFC, average expression, t-value, and resulting p-value are recorded for each of the top ten categorized genes. Highlighted Significant Gene Expression in Patients with Autism 14 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. genes have either been previously reported in literature to be associated with Autism or also found by Alter et al., but through a different analysis method. Table 1. Table(s) of the top 10 genes calculated by empirical Bayes statistics and sorted by lowest p-value. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. Table 2. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by logFC. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. Table 3. Table of the top 10 genes calculated by empirical Bayes statistics and sorted by average expression. Gene descriptions are included next to the Affymetrix probe code and select genes are highlighted for interest relevant to Alter et al.’s results and novel findings. Significant Gene Expression in Patients with Autism 15 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 3.5 Statistically Significant Autism Linked Genes To further assess the generated list of 8,400 statistically different autism genes (Table 1A), we generated our own list of 48 genes and their related isoforms that have been associated with Autism. We searched the 8,400 genes to contain any of our 48 genes and computed statistical significance using a t-test and the empirical Bayes method (which yielded the same results, since groups of these genes between Significant Autism-linked Genes literature were probed for on the Affymetrix mRNA chip, but of the ones searched, three Significant Autism-linked Genes Found with Significant Differential Expression experimental groups were normally distributed). Not all Autism genes identified in the Found with Significant Differential Expression showed up with p-values < 0.05. These genes are shown numerically (Figure 9A) and as boxplots (Figure for each mean3expression value. Of the 84009B-C) genes withgroups p<0.05, are supported to be autism-linked. Of the 8400 genes with p<0.05, 3 are supported to be autism-linked. A CTRL$ AUT$B CTRL$ Log2 RMA Values AUT$ CTRL$ C AUT$ AUT$ CTRL$ OXT$ OXT$ Log2 RMA Values AUT$ Log2 RMA Values UBE3A$ UBE3A$ Log2 RMA Values Log2 RMA Values Log2 RMA Values METTL12$ METTL12$ D CTRL$ AUT$ CTRL$ Figure Of the 8,400 with p<0.05 calculated using empirical Of our9.compiled 48 genes autism-linked genes, a few were +0.03Bayes unitsstatistics, above p3=are 0.05 supported as autism linked (A): (B) METTL12, (C) UBE3A, and (D) OXT; AUT = Autism, CTRL Of= control. our compiled 48 autism-linked genes, a few were +0.03 units above This is an exciting discovery from a biologists perspective, since oxytocin (a neurotransmitter), methyltransferase (DNA regulation protein), and a ubiquitin ligase protein are all involved neurologically relevant biochemical systems in Autism. Which offers further motivation to investigate these genes and alike biological pathways in in vitro and in vivo models as well as potential targeted drug delivery based on real patient data. We also decided to search the list of sorted genes that were 0.03 units above the 0.05 cutoff, and noticed three more highly relevant autism-linked genes, two also relating to DNA methylation, an oxytocin receptor gene, and notably SHANK2, a Significant Gene Expression in Patients with Autism 16 p = 0. Log2 RMA Va Log2 RMA Va Log2 RMA Va MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. scaffolding protein which has been recently investigated and reported to have high- AUT$ CTRL$ prevalence of mutation in autism cases (TableAUT$ 4). CTRL$ AUT$ CTRL$ Table 4. Additional overlap of our 48 genes reported to be autism-linked with a p-value in a +0.03 of 0.05.48 autism-linked genes, a few were +0.03 units above Of range our ofcompiled A few additional questions we pose as future directions could also be investigated with the starting point provided by our analysis and eluded to in this report, with inferred applications: (1) Are the three autism-risk genes we report on related to paternal age when subgrouped? (2) Can we sort the heat map clusters by autism-linked genes and paternal age subgroups? (3) What “genetic class” is the majority of the 8,400 statistically significant (p<0.05) reported genes? Significant Gene Expression in Patients with Autism 17 p=0 MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 4.0 Conclusions & Future Directions Bioinformatics is a valuable tool for processing and analyzing large biological data sets, especially of appropriate diseases with a wide spectrum of genetic causes and differential expression, like Autism. The statistical methods used in this report revealed potential avenues for possible gene linkage discoveries as well as confirming a few preexisting ones. Future work includes further exploration of subgrouping differential expression based on parental age, suggesting possible gene linkages and demographic prevalence. Unknown genes that have and have not been previously linked to Autism were discovered in this report, and could also be investigated using other biological methods to test for significance in this highly prevalent neurodevelopmental disorder among children. Significant Gene Expression in Patients with Autism 18 1: Replicate the Au Aim 1: Replicate theAim Authors Data younger than the co hypothesized, theref statistically Pearson’s correlation coefficient (r) mean between hypothesized, therefore, factors that modified risk variables for statistically younger thansignificant. the control that group (autism: - 5.5two years SDautism - 2.1;is Pearson’ssignificant. correlation coefficient (r) between two variables is control: mean - 7.9 in the general po defined as the covariance of the two variables divided by the in the general population might alter the variance in gene control: mean - 7.9 SD - 2.2, p,.0001), we tested whether subject defined as the covariance of the two variables divided by the age had an effect expression in a on ma product standard deviations. The the square of the expression intheir a on manner that resembled variance inoverall gene Analysis covariance Analysis covariance age had anofof effect the relationship between diagnosis andPearson’s product ofoftheir standard deviations. The square of the Pearson’s 2 variance. Aninanalysi expression childr correlation coefficient (R ) estimates the proportion of variance in expression in children with autism. Paternal age was found in 2 was used to examine the potential Analysis of covariance was used to examine the potential Analysis of covariance variance. An analysis of covariance (ANCOVA) demonstrated that correlation coefficient (R ) estimates the proportion of variance in even whenstudies controllin to aconfounding dependent variable (e.g. variance) accounted for byK. multiple multiple to ofbe aoverall risk factor forthat autism and other effects subject ageage, andvariance scan batch on aexpression significant effects of(e.g. subject agevariance) and scan batch on a significant even whenstudies controlling for subject inis& gene MA 584: variable Statistical Methods in Bioinformatics Lagoy, R.C. Burnette, aconfounding dependent overall that is accounted for by continued to be sign neurodevelopmenta an independent variable (e.g. paternal age). Fisher’s transformaneurodevelopmental disorders such as schizophrenia and mental association of autism with decreased overall variance. Variance was association of autism with decreased overall variance. Variance was continued to be significantly decreased by the same amount in the an independent variable (e.g. paternal age). Fisher’s transformablood of children larw retardation [26,28,3 tion was used tolarge-scale calculate a p-value from the Pearson’s r. control), our organization ofbut gene retardation [26,28,34,35]. We found that= in controls, not in Determining the dependent variable, diagnosis (21 autism, +1 =(p the dependent = the autism, +1 =r. control), blood of children with autism compared to controls =For .018, tion was used tovariable, calculatediagnosis a p-value(21 from Pearson’s For our Determining parameter estimate children with autis analyses we considered p-value ofin .05 to expression statistically children with autism, overall variance gene was subjectexpression age, and scan batch (21 = lower batch 1,autism). +1 =bebatch 2). All expressi subject age, scan batch (21 = batch 1, +1 2). All in aStd blood of children parameter estimate = 2.45 Dev in When scan analyses we and considered a p-value of .05 to =bebatch statistically batch was included significantly and ne significant. significantly and negatively associated paternal agethe (Pearson possible termssubject were also in the model. For our 5.0ourAppendix possible interaction terms were also included in the model. For batch wasinteraction included with age included inwith the ANCOVA results significant. with a with autism and controls 2 remained = .08 ralso = 2.283, R2 signif = .08, p = .03, parameter estimate == 2.054 ralso = 2.283, analyses weRconsidered a (p p-value of .05 to be statistically significant. analyses we considered a p-value of .05 to be statistically significant. remained significant = .03, parameter estimate 2.42 Std Std Dev lower in autism Spearman’s rank correlation (rho) Dev lower in autism). Importantly, parameter estimates for the Spearman’s rank correlation (rho) relationship of diagn rank correlation assesses well the relationship ChiSpearman’s squareof diagnosis relationship to variance werehow virtually unchanged in ChiSpearman’s square rank correlation assesses how well the relationship the ANCOVAs indi between two indicating variables can be described usingwere a monotonic An internet based 262 chi square contingency table (http:// the ANCOVAs that increasing p-values related to An internet based 262 chi square contingency table (http:// between two variables can be described using a monotonic increases in the deg function. It is used to assess the relationship between 2 variables www.graphpad.com/quickcalcs/contingency1.cfm) was not used increases in the degrees of freedom in the analysis, and to ato www.graphpad.com/quickcalcs/contingency1.cfm) was2 variables used to function. It is used to assess the relationship between decreased associatio when data issignificant not normally distributed. assess for a overlap between gene lists. A chi-square decreased association. assess for a significant overlap between gene lists. A chi-square when data is not normally distributed. with Yates correction was used to calculate chi squared and a twowith Yates correction was used to calculate chi squared and a twoIncreased paterna variance( Increased tailed p-value. Unpaired Student’sage T-tests tailed p-value. paternal is associated with decreased Unpaired Student’s T-tests overall variance i Compares two groups of normally distributed data to test overall variance in gene expression levels (figure 3) Compares two groups of normally distributed data to test Previous work in whether the means of the distributions are different. The p-value Results Results Previous work in mice indicated that factors or interventions whether the means of the distributions are different. The p-value that modified mo represents the mouse probability that the distributions are actually that modified hippocampal-dependent behavior also represents the probability that the distributions are actually Decreased variance in log-transformed modified overall va Decreased overall variance in log-transformed measures different.overall Foroverall ourvariance analyses considered p-value ofmeasures .05 toWe be modified inwethe predicteda direction [20]. different. For our analyses we considered a p-value of .05 to be hypothesized, theref statistically significant. of gene expression predicts the diagnosis of autism of gene expression predicts the diagnosis of autism hypothesized, therefore, that factors that modified risk for autism statistically significant. in the general po 1 andpopulation 2) (figures 1 and 2) in(figures the general might alter the variance in gene expression in a ma Analysis of covariance We used microarrays to measure the expression levels of greater We usedofmicroarrays to measure the expression levels of greater expression in a manner that resembled the variance in gene Analysis covariance expression in childr than 47,000 transcripts including 38,500 well-characterAnalysis ofunique covariance was usedPaternal to examine the found potential than 47,000 unique transcripts including 38,500 well-characterexpression in children with autism. age was in Analysis of covariance was used to examine the potential multiple studies to ized human genes using the Affymetrix Human U133 Plus 2.0 confounding effects of subject age and scan batch on a significant ized human genes using the Affymetrix Human U133 Plus 2.0 multiple studies to be a risk factor for autism and other confounding effects of subject age and scan batch on a significant Figure 2. Overall neurodevelopmenta Figure 2. Overall variance in from geneoverall expression inVariance peripheral microarray with purified peripheral blood lymphocytes association of autism withRNA decreased variance. was microarray with purified RNA from peripheral blood lymphocytes neurodevelopmental disorders such as schizophrenia and mental association of autism with decreased overall variance. Variance was blood lymphocyte blood lymphocytes (PBL) wassporadic decreased inofchildren with lar retardation [26,28,3 fromdependent each of 82 children with cases autism and 64 Determining the variable, diagnosis (21 = in autism, +1 =gene control), from each of 82 children with sporadic cases of autism and 64 Determining large-scale organization of retardation [26,28,34,35]. We found that controls, but not in autism. We used m the dependent variable, diagnosis (21 = autism, +1 = control), autism. We used microarrays to measure the expression levels of children with autis control subjects (figure 1). In contrast to comparing the expression subject age, and scan batch (21 = batch 1, +1 = batch 2). All expressi control subjects In contrast to comparing expression childrenexpression with47,000 autism, overall variance genewell-characterized expression was greater than 47,000 greater than transcripts including 38,500 subject age, and(figure scan 1). batch (21 = batch 1, +1 = the batch 2). All inwere blood of in children significantly and levels genes of interaction individual genes, we compared thepaternal pattern of the overall possible terms also included in the For our levels of interaction individual genes, we compared the pattern of theFor overall human geneswith usingneth human usingnegatively the Affymetrix Human U133 Plus 2.0model. microarray on significantly and associated with age (Pearson a possible terms were also included in the model. our with autism and controls r = 2.283, R2 = .08 2 gene RNA from peripheral distribution of expression levels between children with autism analyses we considered a p-value of .05 to be statistically significant. RNA from peripheral blood lymphocytes from each of 82 children with distribution of gene expression levels between children with autism = .08, p Alter+ = .03, parameter estimate = PLoS.! 2.054 Std r = 2.283, R et al. (2011) analyses we considered a p-value of .05 to be statistically significant. variance( autism and 64 contr autism and 64 control subjects. Microarrays showed no group level and controls. Measurement of the variance in the distribution of and controls. Measurement of the variance in the distribution of differences in quality differences in quality control measures. Microarray expression levelsof genesquare expression levels assessed for differences at the global level gene expression levels assessed for differences at the global level of Chi were log-transformed A.1 A.2 Chi square were log-transformed and the overall variance was calculated across the gene expression regulation. To obtain a normal-like distribution, gene expression regulation. To obtain a normal-like distribution, internet based 262 chi square contingency table (http:// total distribution of e totalAn distribution of expression levels on each microarray. Variance in An internet based 262 chi square contingency table (http:// geneexpression expression were decreased log2-transformed. The overall gene expression levels were log2-transformed. The overall www.graphpad.com/quickcalcs/contingency1.cfm) was used to gene expression was gene was levels significantly in the blood of children www.graphpad.com/quickcalcs/contingency1.cfm) was used to variance the distribution was measured for each with autism (p = .006) variance of the total distribution was measured for each assess for of a(p = significant between gene lists. with autism .006). total Erroroverlap bars represent standard error.A chi-square assess for a significant overlap between gene lists. A chi-square doi:10.1371/journal.po microarray (schematicwasinused figure 1). Thechi distribution doi:10.1371/journal.pone.0016715.g002 microarray (schematic in figure 1). The distribution of gene with Yates correction to calculate squared andofa gene twowith Yates correction was used to calculate chi squared and a twotailed p-value. Z"score( tailed p-value. 1: Replicate the Au Aim 1: Replicate theAim Authors Data PLoS ONE | www.plosone.org 4 PLoS ONE | www.plosone.org February 2011 | Volume 6 | Issue 2 | e16715 4 Results Results Decreased overall variance in log-transformed measures of gene expression predicts the diagnosis of autism (figures 1 and 2) Decreased overall variance in log-transformed measures of gene expression predicts the diagnosis of autism (figures 1 and 2) We used microarrays to measure the expression levels of greater We used microarrays to measure the expression levels of greater than 47,000 unique transcripts including 38,500 well-characterthan 47,000 unique transcripts including 38,500 well-characterized human genes using the Affymetrix Human U133 Plus 2.0 ized human genes using the Affymetrix Human U133 Plus 2.0 microarray with purified RNA blood Figure 2. Overall variance in from geneperipheral expression in lymphocytes peripheral microarray with purified RNA from peripheral blood lymphocytes from each of 82 children casesinofchildren autism and 64 • blood lymphocytes (PBL) with wassporadic decreased with from each of 82 children with sporadic cases of autism and 64 autism. We used(figure microarrays to measure the expression levels of control subjects 1). In contrast to comparing the expression • control subjects (figure 1). In contrast to comparing the expression greater 47,000 genes, transcripts including 38,500 well-characterized levels ofthan individual we compared the pattern of the overall levels of individual genes, we compared the pattern of the overall human genes using the Affymetrix Human U133 Plus 2.0 microarray on distribution of gene expression levels between children with autism RNA from peripheral blood lymphocytes from each of 82 children with distribution of gene expression levels between children with autism Z"score( Alter+ al. showed (2011) PLoS.! and controls. Measurement theetvariance in thenodistribution autism and 64 control subjects.ofMicroarrays group levelof and controls. Measurement of the variance in the distribution of gene expression levels assessed for differences at the global level differences in quality control measures. Microarray expression levelsof gene expression levels assessed for differences at the global level of B.1 genelog-transformed expression regulation. To B.2 obtain a normal-like distribution, were and the overall variance was calculated across the gene expression regulation. To obtain a normal-like distribution, total of expression levelslog2-transformed. on each microarray.The Variance in genedistribution expression levels were overall gene expression1.levels were log2-transformed. overall gene expression was significantly decreased in the blood in of for children Appendix Attempted methods of The calculating variance gene expression (z-score units variance ofinthe total distribution was measured each of variance of the total distribution was measured for each with autism (p = .006). Error bars represent standard error. microarray (schematic calculating in figure 1). The distribution of geneall standard(schematic deviation) to generate Alter etof al.’s 2 by (A.1-A.2) variance across doi:10.1371/journal.pone.0016715.g002 microarray in figure 1). The distribution gene Fig. individuals per experimental group for each gene or across all genes per individual and PLoS ONE | www.plosone.org averaged as experimental groups or by (B.1-B.2) calculating z-score across all individuals per4 PLoS ONE | www.plosone.org 4 February 2011 | Volume 6 | Issue 2 | e16715 experimental group for each gene and finding this standard deviation or across all genes per individual and averaged as experimental groups. Significant Gene Expression in Patients with Autism 19 Figure 2. Overall blood lymphocyte autism. We used m greater than 47,000 human genes using th RNA from peripheral autism and 64 contr differences in quality were log-transformed total distribution of e gene expression was with autism (p = .006) doi:10.1371/journal.po None of thes Thus, try to r MA 584: Statistical Methods in Bioinformatics Lagoy, R.C. & Burnette, K. 6.0 References 1. Center for Disease Control and Prevention (CDC). (2014). “Prevalence of Autism spectrum disorder among children aged 8 years.” Autism and Developmental Disabilities Monitoring Network. Surveillance Summaries. 63(SS02); 1-21. 2. Ma, D. & Salyakina, D., et al. (2009). “A genome-wide association study of autism reveals a common novel risk locus at 5p14.1.” Annals of Human Genetics. 73(3): 263-273. 3. Abrahams, B., & Geschwind. D.H., (2008). “Advances in autism genetics: on the threshold of neurobiology”. Nature Reviews Genetics. 9(5): 341-355. 4. Gupta, A.R., & State, M.W., (2007). “Recent advances in the genetics of Autism.” Biological Psychiatry. 61(4): 429-537. 5. Gillberg, C., & Billstedt, E., (2000). “Autism and Asperger syndrome: coexistence with other clinical disorders.” Acta Psychiatr Scand. 102: 321-330. 6. Levy, S.E., Giarelli, E., et al. (2010). “Autism spectrum disorder and co-occurring developmental, psychiatric, and medical conditions among children in multiple populations of the United States.” Journal of Development Behavior Pediatrics. 31(4): 267-275. 7. O’Roak, B.J., & State, M.W., (2008). “Autism genetics: strategies, challenges, and opportunities.” Autism Research. 1(1): 4-17. 8. Zoghbi, H.Y., et al. (2003). “Postnatal neurodevelopmental disorders: meeting at the synapse?” Science. 302(5646): 826-830. 9. Geschwind, D.H., & Levitt, P., (2007). “Review Autism spectrum disorders: developmental disconnection syndromes.” Curr Opin Neurobiol. 7(1): 103-111. 10. Tabuchi, K., Blundell, J., Etherton, M.R., et al. (2007). “A neuroligin-3 mutation implicated in autism increases inhibitory synaptic transmission in mice.” Science. 318(5847): 71-76. 11. Chugani, D.C., et al. (2004). “Review: Serotonin in autism and pediatric epilepsies.” Ment Retard Dev Disabil Res Rev. 10(2): 112-116. 12. Krey, J.F., & Dolmetsh, R.E. (2007). “Review: Molecular mechanism of autism: a possible role for Ca2+ signaling.” Curr Opin Neurobiol. 17(1): 112-119. 13. Jamain, S., Quach, H., Betancur, C., et al. (2003). “Mutations of the X-linked genes encoding neroligins NLGN3 and NLGN4 are associated with autism.” Nature Genetics. 34: 27-29. 14. Alter, M.D., Kharkar, R., Ramsey, K.E., et al. (2011). “Autism and increased paternal age related changes in global levels of gene expression regulation.” PLoS. 6(2): e16715.* *The study we adapted RMA data from for our study and generation of results using R. Significant Gene Expression in Patients with Autism 20