Download Consensus Group Stable Feature Selection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Consensus Group
Stable Feature Selection
Steven Loscalzo
Dept. of Computer
Science
Binghamton University
Lei Yu
Dept. of Computer
Science
Binghamton University
Chris Ding
Dept. of Computer
Science and Engineering
University of Texas at Arlington
The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Overview
• Background and motivation
• Propose Consensus Feature Group Framework
•
•
Finding Consensus Groups
Feature Selection from Consensus Groups
• Experimental Study
• Conclusion
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
2
Feature Selection Stability
Sampling
Model Building
Acc %
Feature Selection
Sample 1
All Training Data
F={f2,f5}
92%
F’={f4,f10}
91%
F’’={f5, f11}
93%
Sample 2
…
Sample k
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
3
Motivation
• Need for stable feature selection
•
•
Give confidence to lab tests
Uncover “truly” relevant information
• Utility of feature groups
•
•
Model feature interaction
Lack information about a single feature, another in
the group may be well studied
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
4
Dense Feature Group Framework
• Dense feature groups can provide stability and accuracy
[Yu, Ding, Loscalzo, KDD-08]
• Dense Group Stable Feature Selection Framework
•
•
•
Map features as points in sample space
Apply kernel density estimation locate dense feature groups
Select top relevant groups from dense groups
• Limitations of this framework
•
•
Unreliable density estimation in high-dimensional spaces
Restricts selection of relevant groups to dense groups
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
5
Consensus Feature Group Framework
• Consensus feature groups are ensemble of feature
grouping results
• Select relevant groups from whole spectrum of
consensus groups
• Challenges
•
•
Base algorithm for ensemble: dense group finder
[Yu, Ding, Loscalzo, KDD-08]
Aggregate feature grouping results
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
6
Group Aggregation
Data
sub-sample
• 3 aggregation ideas:
•
•
•
1
f1 f2
f3 f4
f5
2
f2
f1 f
3
f4
f5
3
f1 f2
f4 f3
Heuristics (reference set)
Cluster based [Fern,
Brodley, ICML-03]
Instance based [Fern,
Brodley, ICML-03]
Feature Group Results
f4
f5
Consensus
Feature Groups
f2
f1
June 30th, 2009
f5
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
f3
7
The CGS Algorithm
D
D1
Result Grouping 1
...
…
CGS: The Consensus Group
Stable Feature Selection Algorithm
Dt
for i = 1 to t do
Construct Training Partition Di from D
Run DGF on Di
for every pair of features Xi and Xj in D
Result Grouping t
...
Measure Instance Co-occurrence
Hierarchical Clustering
Update Wi,j := freq. Xi and Xj appear together in results
create consensus groups CG1,CG2,…,CGL via hierarchical
clustering of all features based on Wi,j
for i = 1 to L do
Obtain a representative feature Xi from CGi
Consensus
Feature Groups
June 30th, 2009
...
Measure relevance of Xi set as relevance of CGi
Rank CG1,CG2,…,CGL and return the top k
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
8
Experimental Setup
Setting
• Used 10 random shuffles of data:
•
•
•
10 fold cross validation
9/10 folds training
1/10 folds testing
• Results shown are averages
across 10 folds x 10 shuffles
Data Set
# Genes
# Samples
# Classes
Colon
2000
62
2
Leukemia
7129
72
2
Lung
12533
181
2
Prostate
6034
102
2
Lymphoma
4026
62
3
SRBCT
2308
63
4
Algorithms
CGS – sub-samples t = 10
DRAGS [Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection
SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights
found after training an SVM
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
9
Stability
Selected Groups
June 30th, 2009
Stability
Selected Features
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
10
Accuracy Results
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
11
Conclusion
• Proposed consensus group stable feature
selection framework
•
•
Stable
Accurate
• Future directions
•
•
Apply different ensemble techniques
Incorporate new group finding algorithms
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
12
References
Fern, X. Z., and Brodley, C. Random projection for high-dimensional data
clustering: a cluster ensemble approach. In Proceedings of the 20th
Conference on Machine Learning (ICML-03). 186-192, 2003.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer
classification using support vector machines. Machine Learning (ML02);46:389–422, 2002.
Yu, L., Ding, C., and Loscalzo, S. Stable feature selection via dense feature
groups. In Proceedings of the 14th ACM International Conference on
Knowledge Discovery and Data Mining (KDD-08). 803-811, 2008.
June 30th, 2009
Loscalzo, Yu, Ding
Consensus Group Stable Feature Selection
13