* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Morgan Levine: A weighted gene correlation network analysis
Human–animal hybrid wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene expression profiling wikipedia , lookup
Medical genetics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Population genetics wikipedia , lookup
Gene expression programming wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic engineering wikipedia , lookup
SNP genotyping wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Behavioural genetics wikipedia , lookup
Genome (book) wikipedia , lookup
Public health genomics wikipedia , lookup
Heritability of IQ wikipedia , lookup
Human genetic variation wikipedia , lookup
A Weighted SNP Correlation Network Analysis for the Estimation of Polygenic Risk Scores Morgan Levine Department of Human Genetics, UCLA PERSONALIZED MEDICINE • Genetic association studies were expected to “revolutionize the diagnosis, prevention and treatment of human diseases.” • Unfortunately, identifying predictive genetic markers has proven to be more difficult than anticipated. • Many results fail to replicate or only explain a very small proportion of the variance in a given trait. GENETIC ASSOCIATION • In GWAS, the association between the trait of interest and m SNPs (often in the millions) is assessed using m regression models, where for each model, the trait is regressed on a single SNP. • However most complex traits are thought to be “polygenic”—influenced by multiple loci. POLYGENIC SCORES Polygenic methods have the ability to aid in genetic association studies by: • Increasing statistical power to detect true effects via dimension reduction; • Providing biological insight (e.g. pathways, mechanisms) In 2007, Wray et al. proposed a method for examining the additive influence of multiple genetic markers. Biology is complex! It is likely that genes operate in a non-linear manner to influence complex traits. RESEARCH AIM • Develop a network-based method for creating polygenic scores. • Networks and systems biology approaches have been successfully used to associate gene expression and methylation to various traits. • Use human height for proof of concept. • Highly heritable (around 80%) • Easy to measure • Used in lots of genetic studies. WSCNA Methodology • In weighted SNP correlation network analysis (WSCNA), a network can be represented by an adjacency matrix A=[akj] that encodes a connection between a pair of nodes (SNPs k,j). 𝑥1,1 ⋯ 𝑥𝑘,1 ⋮ ⋱ ⋮ 𝑥1,𝑗 ⋯ 𝑥𝑘,𝑗 • Use genotype data or published GWAS results from multiple studies/cohorts to build the matrix. • Genotype data from 10,466 persons of European ancestry who were participants in the Health and Retirement Study (HRS). WSCNA Methodology 1. We randomly divided our sample into a training set (70%) and a test set (30%). 2. A GWAS for human height was run using the training data in order to prune SNPs based on linkage disequilibrium (LD). • Most significant SNP per block was selected 3. Generated 60 subsamples of 500 participants each (with replacement) from training data. GWAS for human height was run for each of these subsamples using only SNPs selected from step 2. 4. Beta coefficients from GWAS used to populate n×m matrices • n refers to the number of examined SNPs • m refers to the number of GWAS from which results have been gathered. • We generated a 32,284×60 matrix. Module Detection SNP Dendrogram • n×m matrix is used to create adjacency matrix from biweight midcorrelations raised to a power (b). • Topological Overlap Matrix (based on shared SNP neighbors) calculated from adjacencies, defines dissimilarities for hierarchical clustering. • Modules (colors) represent clusters (or networks) of densely interconnected SNPs. Results 1. Fifty-five modules were identified. 2. Calculated “eigen-nodes” (network-specific polygenic scores). 3. Tested whether they were associated with human height in the validation samples. 4. Seven modules were found to be associated with height. Significant Modules Identified using WSCNA Modules SNPs SNPs in Genes Light Green 193 100 Salmon 235 90 Ivory 109 48 Navajo White2 63 37 Violet 131 66 Thistle2 85 38 Dark orange2 93 49 b (P-value) 0.305 (3.84E-6) -0.236 (0.003) 0.187 (0.015) 0.162 (0.021) -0.150 (0.028) 0.145 (0.030) -0.137 (0.043) Beta Coefficients and P-values come from a single model with the residual of height (adjusting for age, sex and PC1-4) as the dependent variable and all 55 modules identified in WSCNA as the independent variables. Results 1. Created traditional polygenic scores to compare against. 2. Examined the variance in height as explained by traditional vs. network polygenic scores. Variance in Human Height Explained by Models Containing Different Polygenic Scores Independent Variable/s in Each Model R2 Adjusted R2 PRS 0.05 (n=32,284) 0.0096 0.0093 PRS 0.005 (n=4,318) 0.0084 0.0082 PRS 0.0005 (n=507) 0.0083 0.0079 Light Green Module (n=193) 0.007 0.0066 The seven significant WSCNA modules (n=909) 0.0163 0.0139 n refers to the number of SNPs used to generate the polygenic score/s in each model. Hub SNPs Calculate intra-modular connectivity to identify “hubs” in each network. Do hubs have higher significance in the GWAS? Yes (light green network) Are there any hubs previously associated with height? HHIP gene, which has a reported association with height in GWAS and microarray (gene expression) Conclusions • Incorporation of network structure in the analysis of large-scale genetic association data can be used to estimate genetic scores for specific traits, identify hub SNPs/genes, and lead to biological insight into the pathways involved. • Scores generated from WSCNA better relate to phenotypes of interest in validation analysis compared to traditional polygenic risk scores. • Next Steps: Use this methodology to examine genetic network by environment effects (GxE), and examine genetic pleiotropy. • Create “aging” scores (e.g. networks identified by comparing SNP contributions across multiple aging-related conditions). Acknowledgements Collaborators Steve Horvath, UCLA Peter Langfelder, UCLA NIH Funding NINDS T32NS048004 NIA 5R01AG042511-02