Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Performance Optimization of Clustal W:
Parallel Clustal W, HT Clustal and
MULTICLUSTAL
Arunesh Mishra
CMSC 838 Presentation
Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts
SGI
Problem Statement
Multiple Sequence Alignment (MSA)
Goal : Parallelize Clustal W
Basis for phylogenetic analysis - Infer homology relationships
Building protein families - conserved region may imply common function
Aids in function/structure prediction of new proteins
Global MSA – Clustal W
Is it computationally expensive ? Yes, for 100 sequences.
Clustal W takes hours for 100 or more sequences
Parallelization possible for the algorithm
Contribution of the paper
Parallel Clustal W
Parallel version of basic Clustal W
HT Clustal
Parallelize heterogeneous Multiple Sequence Alignment problems
MULTICLUSTAL
Parallel version of an optimization on Clustal W
CMSC 838T – Presentation
Talk Overview
Overview of talk
Motivation
Background
Sequential Clustal W
Parallel Clustal W
HT Clustal
Problem Statement
Optimizations
MULTICLUSTAL
Sequential Algorithm
Optimizations
Observations
CMSC 838T – Presentation
Introduction
Sequential Clustal W Algorithm
Given N sequences of length M each
Pairwise Alignment (PA)
Creates distance matrix N x N based on pairwise alignment
scores
Evolutionary distance
Guide Tree (GT) construction (Phylogenetic tree)
Use Neighbor-joining algorithm
Progressive Multiple Alignment (PA)
Use guide tree to align closely related pairs of sequences
Progressively align next sequence to existing alignment
CMSC 838T – Presentation
Parallel Clustal W
Problem Statement
Parallelize the Sequential Clustal W
Execution time breakup
PW = pairwise alignment, GT = guide tree, PA = progressive alignment
CMSC 838T – Presentation
Parallel Clustal W
Pairwise Alignment Stage
N(N-1)/2 pairwise alignments
Send them randomly to different processors
Random – as jobs of different load
Random also produces statistically uniform distribution
(over a large set of jobs)
1.8X speedup achieved on a 1000 sequence MSA with 8 CPUs
Guide Tree Stage
Parallelize “find closest neighbors from distance matrix”
Used in the neighbor joining algorithm
Find minimum element of each row concurrently
Use this to find minimum element of matrix
CMSC 838T – Presentation
Parallel Clustal W
Progressive Alignment Stage
Computation of a function score(I,J) precomputed in parallel
Alignment score of sequence I and J
Not much parallelization in the third stage
Overall Speedup
Speedup of 10x for 600 MA sequences using 16 CPUs
Time reduced from 1 hr 7
minutes to 6.5 minutes
Relative scaling is
better for larger inputs
CMSC 838T – Presentation
HT Clustal
Problem Statement
Calculate large numbers of MSAs of various sizes (independent
problems)
Such problems seen in high-throughput (HT) research
environments
Representative Problem (from paper) :
Perform independent MSA over
100 sets of sequences
Each set has between 20 to
100 sequences with average
of 60 sequences
Average Length
of sequence = 390
CMSC 838T – Presentation
HT Clustal - Optimizations
Basic Idea
Each MSA operation (on one set of sequences) is independent
of the other
Run ClustalW as a uniprocessor job on one MSA problem
Launch multiple Clustal W jobs on different processors
Job Scheduling
Jobs of different duration – depends on sequence set
Two scheduling options explored:
Schedule dynamically – if processor is free, schedule an
MSA job – chosen randomly
Schedule dynamically – Sequences are presorted (based on
filesize)
CMSC 838T – Presentation
HT Clustal – Performance Numbers
Speedups
Almost linear speedups
31x on 32 CPUs for the representative MSA problem
116X on 128 CPUs for a larger test case
Solution time reduced from 18.5 hours to 9.5 minutes
Speedup shown for the example MSA set:
CMSC 838T – Presentation
HT Clustal – Effect of Presorting
Effect of presorting
Figure shows effect of
presorting for the example
MSA set
32 CPUs, 100 sets,
~3 jobs per CPU
If average number of
jobs per CPU < 5
presorting helps
For larger number of jobs
per CPU statistical averaging
reduces load imbalance
CMSC 838T – Presentation
MULTICLUSTAL
MULTICLUSTAL Algorithm
A Perl script to generate high quality MSA with little user intervention
Searches for best combination of Clustal W input parameters
To reduce gaps, increase clustering
Parameters to vary :
Scoring matrices : pairwise and multiple
Gap open and extension penalties (pairwise and multiple)
Sequential Algorithm :
Till all parameters are sufficiently varied {
2.
alignment = Run Clustal W ()
3.
Calculate quality of alignment
4.
Change Parameters }
Quality of alignment
A numerical quantity based on
1.
identitical amino acid matches
Conservative amino acid substitutions
Gap events, amino acid islands I.e. –X-, -XX-, -XXX-, -XXXX-
CMSC 838T – Presentation
MULTICLUSTAL Optimizations
Optimization on MULTICLUSTAL
Run Clustal W once
Reuse tree generated in the PW/GT Stages
Guide tree calculated only once for multiple runs
Results in speedups from 1.5X to 3X
Use Parallel Clustal W for each run of Clustal W
CMSC 838T – Presentation
Observations
Parallelizability
First (pairwise alignment) and second (guide tree) stages are
parallelizable
Third stage is mostly sequential – speedup limited
100 sequence MSAs possible ?
PIR at NBRF (Georgetown University) takes maximum of 20 sequences
for MSA
Speedup improves user response, for 20 sequences a PC would be
sufficient
Probable applications:
Research Environments ?
PIR servers ?
Speedup only on shared memory SGI 3000 workstation ?
CMSC 838T – Presentation