Download Accelerating Sparse Canonical Correlation Analysis for Large Brain

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
ACCELERATING SPARSE
CANONICAL CORRELATION
ANALYSIS FOR LARGE BRAIN
IMAGING GENETICS DATA
Jingw en Yan, Hui Zhang, Lei Du, Eric Wernert, Andew J. Saykin, Li
Shen
OUTLINE
• Imaging Genetics
• Sparse Canonical Correlation Analysis (SCCA)
• Computational Challenges and Methods
• Data Simulation
• Experimental Results
IMAGING GENETICS
Genes
Cells
Systems
UCI, S. Potkin et al.
Behavior:
Disorders,
Complex
interactions,
phenomena,
diseases.
IMAGING GENETICS
Underlying Biological
Pathway and Mechanism
IMAGING GENETICS
Candidate Gene/SNP
Biological Pathway
Genome-wide
Single
ROI
Risacher et al 2010
Sloan et al 2010
Potkin et al 2009;
Saykin et al 2010
Circuit
Risacher et al 2013
AV45 ROIs & APOE
Swaminathan et al 2012 PiB
ROIs & amyloid pathway
Potkin et al 2009 Mol Psych
schizophrenia study
Ho et al 2010 FTO;
Reiman et al PNAS 2009
Chiang et al 2012 SNP/Gene
networks & WM integrity
Shen et al 2010 ROIs;
Stein et al 2010 voxels
Whole
Brain
OUTLINE
• Imaging Genetics
• Sparse Canonical Correlation Analysis (SCCA)
• Computational Challenges and Methods
• Data Simulation
• Experimental Results
SCCA
X1
Y1
X1
Y1
X1
Y1
𝑹
X2
Y2
X2
Y2
X2
W’X
Y2
Xu
Yv
X3
Y3
X3
Y3
X3
Y3
Xn
Yn
Xn
Yn
Xn
Yn
Massive
Univariate
Analysis
Multivariate
Multiple
Regression
Canonical
Correlation Analysis
SCCA
• Sparse canonical correlation analysis (SCCA)
• R package: Penalized Multivariate Analysis (PMA) (Witten,
et al, 2009)
max𝒖,𝒗 𝒖𝑻 𝑿𝑻 𝒀𝒗
subject to
•
•
•
•
•
𝒖𝑻 𝑿𝑻 𝑿𝒖 = 1, 𝒗𝑻 𝒀𝑻 𝒀𝒗 = 1
𝑃1 𝒖 ≤ 𝑐1 , 𝑃2 𝒗 ≤ 𝑐2
X, Y : imaging and genetics data respectively
𝑃1 𝒖 , 𝑃2 𝒗 : sparse penalties, mostly 𝐿1 norm
For simplicity, assuming 𝑿𝑻 𝑿 = 𝑰 and 𝒀𝑻 𝒀 = 𝑰
Bi-convex and non differentiable problem
Iterative solution
SCCA
•
Sparse canonical correlation analysis (SCCA)
•
Problem
max𝒖,𝒗 𝒖𝑻 𝑿𝑻 𝒀𝒗
subject to
•
𝒖𝑻 𝒖 = 1, 𝒗𝑻 𝒗 = 1, 𝒖
≤ 𝑐1 , 𝒗
1
Iterative solution
1. 𝒖 ← arg max𝒖 𝒖𝑻 𝑿𝑻 𝒀𝒗,
subject to 𝒖𝑻 𝒖 = 1, 𝒖
2. 𝒗 ← arg max𝒖 𝒖𝑻 𝑿𝑻 𝒀𝒗,
subject to 𝒗𝑻 𝒗 = 1,
•
1
S(𝑿𝑻 𝒀𝒗 , ∆)
S(𝑿𝑻 𝒀𝒗 , ∆) 2
𝒗
1
1
≤ 𝑐1
≤ 𝑐2
𝒖 ←
, 𝑺(𝑿𝑻 𝒀𝒗 , ∆) is the soft thresholding
operator and ∆ ≥ 0 is chosen so that u 1 ≤ c1
≤ 𝑐2
OUTLINE
• Imaging Genetics
• Sparse Canonical Correlation Analysis (SCCA)
• Computational Challenges and Methods
• Data Simulation
• Experimental Results
COMPUTATIONAL
CHALLENGES
•
Example SCCA run at a small scale
•
•
•
•
•
•
Scale up
•
•
•
•
•
Participants: 1000
Genotype: 3,200 SNPs
Phenotype: 10,000 voxels
Permutation: 10,000 permutation tests
Running time: more than 12,000 hours
Genotype (array): 6M SNPs
Genotype (NGS): 40M variants
Phenotype: 200K voxels, imaging, cognitive and biomarker
Permutation: 10M permutation to reach p=10-7
Parameter tuning via cross-validation
•
•
10-fold cross-validation coupled with an 11-by-11 grid search
SCCA runs: 10×11×11 = 1,210
ACCELERATION WITH
MKL
•
Intel Math Kernel Library (MKL)
•
•
•
MKL has been optimized to utilize
•
•
•
•
accelerate application performance and reduce development
time
highly vectorized and threaded linear algebra, fast fourier
transforms (FFT), vector math and statistics functions
multiple processing cores
wider vector units
more varied architectures available in a high end system
MKL can provide parallelism transparently and speed up
programs with supported math routines without changing code.
•
Compiling R with MKL
ACCELERATION WITH
OFFLOAD MODEL
• Xeon Phi SE10P Coprocessor
•
60 cores with 8GB GDDR5
•
Intel x86 instruction set
•
Usage of familiar programming
models, software, and tools
• Pros
•
The host system can offload
computing workload partially to the
Xeon Phi
•
Independently run a compatible
program
COMPUTATIONAL PLATFORM
•
Texas Advanced Computing Center
Stampede cluster
•
•
Each computing node
•
•
•
•
•
MKL + offload
Two Intel Xeon E5-2680 processors
each with eight cores @2.7GHz.
32GB DDR3 memory
The Xeon Phi SE10P Coprocessor
has 61 cores with 8GB GDDR5
The NVIDIA K20 GPUs on each
node have 5GB of on-board
GDDR5
Software
•
•
CentOS 6.3.
Stock R 3.01 package compiled
with the Intel compilers (v.13) and
built with MKL v.11.
OUTLINE
• Imaging Genetics
• Sparse Canonical Correlation Analysis (SCCA)
• Computational Challenges and Methods
• Data Simulation
• Experimental Results
SYNTHETIC DATA (GENETICS)
• FREGENE genome simulator
• Simulate sequence-like data over large genomic regions in
large diploid populations
• Simulated data
• N=1,000 diploid individuals over 20,000 generations
• 10 Mb genome with the average mutation rate as 2.5e-8
/site/generation
• 3,274 SNPs with minor allele frequency (MAF) greater
than 0.05 included
• Four SNP data sets (i.e., g500, g1000, g2000, and g3274)
by taking the first 500, 1,000, 2,000, and 3,274 SNPs from
the entire data, respectively.
SYNTHETIC DATA (GENETICS)
SYNTHETIC DATA (IMAGING)
• Assumption
• Each image with multiple regions of interest (ROIs)
• Voxel within each ROI highly correlated
• Simulation
• Random positive definite non-overlapping group structured
covariance matrix 𝑴
• Apply Cholesky decomposition to obtain the background
imaging data
• Individual: N=1000, Size: 100x100
• We created three sets of phenotypic imaging data (i.e.,
p1000, p5000, and p10000), consisting of 1,000, 5,000
and 10,000 voxels respectively
SYNTHETIC DATA (IMAGING)
OUTLINE
• Imaging Genetics
• Sparse Canonical Correlation Analysis (SCCA)
• Computational Challenges and Methods
• Data Simulation
• Experimental Results
RESULTS
• R snowfall package (sfLapply) with MKL and offload model
Baseline
Parallel (MKL+ offload)
RESULTS
Correlation coefficient between the first pair of
canonical components
• Accelerated SCCA
implementations
yielded the same
results
• These correlation
coefficients are
close to the ground
truth value of 1
RESULTS
CONCLUSION
•
Initial steps to accelerate the SCCA implementation for brain imaging genetics
applications.
•
Parallelism achieved in system implementation level to accelerate linear
algebra computation using math kernel library (MKL) and partial offloading
computing workload.
•
The 2-fold speedup, although encouraging, is still insufficient to handle
extremely large-scale neuroimaging genetics data
•
•
millions of image voxels and millions of SNPs.
Future work
•
Big data analytic strategies at the parallel computing model level
•
•
Parallelization of multiplicative algorithms using MapReduce and CUDA.
Application to accelerate enhanced SCCA models as well as other bimultivariate statistical models for analyzing brain imaging genetics data.
ACKNOWLEDGEMENT
This research was
supported by
•
•
•
•
•
•
NIH R01 LM011360
NIH U01 AG024904
NIH RC2 AG036535
NIH R01 AG19771
NIH P30 AG10133
NSF IIS-1117335
Thank you