Download consensus clusters

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Generating Robust and Consensus
Clusters from Gene Expression Data
Allan Tuckera, Stephen Swifta, Xiaohui Liua,
Nigel Martinb, Christine Orengoc, Paul Kellamc
a
b
c
Introduction
• Many different clustering algorithms used
for gene expression analysis
• Little work on inter-method consistency or
cross-comparison
• Important due to differing results (each
algorithm implicitly forces a structure on
data)
• Obtaining a consensus across methods
should improve confidence
The Talk
• Compare a number of existing methods for
clustering gene expression data
• Algorithms for generating robust clusters and
consensus clusters
• Tested on a set of Amersham Scorecard data with
known structure and experimentally obtained virus
B-Cell data
• Provides specific advantages in the analysis of
array based gene expression data
Clustering Methods
•
•
•
•
Hierarchical Clustering (R)
PAM (R)
CAST (C++)
Simulated Annealing (C++)
Datasets
• Amersham Scorecard
– 597 genes, 24 blocks with 32 columns and 12
rows under 30 experimental conditions
– Repeated experiments which we assume should
cluster together
• B Cell Data
– 1987 genes
Comparison of Methods
The Agreement Matrix
Robust Clustering
• Takes agreement matrix as input
• Place all genes into robust clusters that have
full agreement
• Deterministic algorithm
• Should give higher degree of confidence in
clusters
• Not all genes will be assigned
Robust Clustering
Dataset
ASC
Bcell
No. of Robust
Clusters
24
154
% of variables
assigned
79%
25%
Max. Robust
Cluster size
44
14
Min. Robust
Cluster size
2
2
Mean Robust
Cluster size
10.2
3.2
Consensus Clustering
• “Full agreement” requirement for robust
clusters can be too restrictive
• Algorithm for generating consensus clusters
given minimum agreement parameter
• Approximate stochastic algorithm
Consensus Clustering
0
-5
Agreement Matrix
Consensus Clusters
5
0
-5
cmdscale(disthhv8)[,2]
-10
-10
0
10
20
f12
0
f13
f 23
f14
f 24
    
    

0
f 34
    






 






 
 

0

0

    


f ij
f1n 
f 2n 

f 3n 

 
 

 
 

 
f n 1n 

0 
-10
0
10
-5
0
cmdscale(disthhv8)[,1]
-10
cmdscale(disthhv8)[,2]
5
cmdscale(disthhv8)[,1]
0













0
0
20
-5
10
cmdscale(disthhv8)[,2]
0
cmdscale(disthhv8)[,1]
-10
-10
5
-10
cmdscale(disthhv8)[,2]
5
Input Cluster Results
-10
0
10
cmdscale(disthhv8)[,1]
20
20
Consensus Clustering
B-Cell Dataset
ASC Dataset
Consensus Clustering
Consensus Clustering
Summary
• Clustering biological data is very useful
• Biases in clustering algorithms can mean
success in identification of patterns vary
• Consensus algorithms used in protein
secondary structure prediction
• We apply similar strategy with robust and
consensus clustering
Conclusions
• Robust clusters good for identifying
common transcriptional modules
• Also for identifying genes with common
functional pathway
• Useful for creating clusters of genes with
high confidence
• Can be restrictive in discarding genes that
do not have full agreement.
Conclusions
• Consensus clustering relaxes full agreement
requirement
• Resembles defined clusters in synthetic data
very well
• Reliably picks out features in the virus gene
expression data
• Fulfils desire not to rely on one clustering
algorithm during gene expression analysis
Acknowledgements
• The Biotechnology and Biological Sciences
Research Council (BBSRC), UK
• The Engineering and Physical Sciences
Research Council (EPSRC), UK
Related documents