Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Consensus Group Stable Feature Selection Steven Loscalzo Dept. of Computer Science Binghamton University Lei Yu Dept. of Computer Science Binghamton University Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Overview • Background and motivation • Propose Consensus Feature Group Framework • • Finding Consensus Groups Feature Selection from Consensus Groups • Experimental Study • Conclusion June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 2 Feature Selection Stability Sampling Model Building Acc % Feature Selection Sample 1 All Training Data F={f2,f5} 92% F’={f4,f10} 91% F’’={f5, f11} 93% Sample 2 … Sample k June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 3 Motivation • Need for stable feature selection • • Give confidence to lab tests Uncover “truly” relevant information • Utility of feature groups • • Model feature interaction Lack information about a single feature, another in the group may be well studied June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 4 Dense Feature Group Framework • Dense feature groups can provide stability and accuracy [Yu, Ding, Loscalzo, KDD-08] • Dense Group Stable Feature Selection Framework • • • Map features as points in sample space Apply kernel density estimation locate dense feature groups Select top relevant groups from dense groups • Limitations of this framework • • Unreliable density estimation in high-dimensional spaces Restricts selection of relevant groups to dense groups June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 5 Consensus Feature Group Framework • Consensus feature groups are ensemble of feature grouping results • Select relevant groups from whole spectrum of consensus groups • Challenges • • Base algorithm for ensemble: dense group finder [Yu, Ding, Loscalzo, KDD-08] Aggregate feature grouping results June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 6 Group Aggregation Data sub-sample • 3 aggregation ideas: • • • 1 f1 f2 f3 f4 f5 2 f2 f1 f 3 f4 f5 3 f1 f2 f4 f3 Heuristics (reference set) Cluster based [Fern, Brodley, ICML-03] Instance based [Fern, Brodley, ICML-03] Feature Group Results f4 f5 Consensus Feature Groups f2 f1 June 30th, 2009 f5 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection f3 7 The CGS Algorithm D D1 Result Grouping 1 ... … CGS: The Consensus Group Stable Feature Selection Algorithm Dt for i = 1 to t do Construct Training Partition Di from D Run DGF on Di for every pair of features Xi and Xj in D Result Grouping t ... Measure Instance Co-occurrence Hierarchical Clustering Update Wi,j := freq. Xi and Xj appear together in results create consensus groups CG1,CG2,…,CGL via hierarchical clustering of all features based on Wi,j for i = 1 to L do Obtain a representative feature Xi from CGi Consensus Feature Groups June 30th, 2009 ... Measure relevance of Xi set as relevance of CGi Rank CG1,CG2,…,CGL and return the top k Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 8 Experimental Setup Setting • Used 10 random shuffles of data: • • • 10 fold cross validation 9/10 folds training 1/10 folds testing • Results shown are averages across 10 folds x 10 shuffles Data Set # Genes # Samples # Classes Colon 2000 62 2 Leukemia 7129 72 2 Lung 12533 181 2 Prostate 6034 102 2 Lymphoma 4026 62 3 SRBCT 2308 63 4 Algorithms CGS – sub-samples t = 10 DRAGS [Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights found after training an SVM June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 9 Stability Selected Groups June 30th, 2009 Stability Selected Features Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 10 Accuracy Results June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 11 Conclusion • Proposed consensus group stable feature selection framework • • Stable Accurate • Future directions • • Apply different ensemble techniques Incorporate new group finding algorithms June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 12 References Fern, X. Z., and Brodley, C. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20th Conference on Machine Learning (ICML-03). 186-192, 2003. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning (ML02);46:389–422, 2002. Yu, L., Ding, C., and Loscalzo, S. Stable feature selection via dense feature groups. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD-08). 803-811, 2008. June 30th, 2009 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection 13