Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
VARIANCE REDUCTION FOR STABLE FEATURE SELECTION Presenter: Yue Han Advisor: Lei Yu Department of Computer Science 10/27/10 OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks INTRODUCTION AND MOTIVATION FEATURE SELECTION APPLICATIONS T1 T2 ….…… TN C Pixels D1 D2 12 0 ….…… 6 Sports Vs … … 0 11 ….…… 16 … DM 3 10 ….…… 28 Travel … Documents Terms Jobs Samples Features(Genes or Proteins) Features INTRODUCTION AND MOTIVATION FEATURE SELECTION FROM HIGH-DIMENSIONAL DATA High-Dimensional Data Feature Selection Algorithm MRMR, SVMRFE, Relief-F, F-statistics, etc. p: # of features n: # of samples High-dimensional data: p >> n Curse of Dimensionality: •Effects on distance functions •In optimization and learning •In Bayesian statistics Low-Dimensional Data Learning Models Classification, Clustering, etc. Knowledge Discovery on High-dimensional Data Feature Selection: Alleviating the effect of the curse of dimensionality. Enhancing generalization capability. Speeding up learning process. Improving model interpretability. INTRODUCTION AND MOTIVATION STABILITY OF FEATURE SELECTION Feature Selection Method Feature Subset Feature Subset Feature Subset Training Data Training Data Training Data Consistent or not??? Stability Issue of Feature Selection Stability of Feature Selection: the insensitivity of the result of a feature selection algorithm to variations to the training set. Training Training Data Training Data Data Learning Learning Model Learning Model Model Learning Algorithm Stability of Learning Algorithm is firstly examined by Turney in 1995 Stability of feature selection was relatively neglected before and attracted interests from researchers in data mining recently. INTRODUCTION AND MOTIVATION MOTIVATION FOR STABLE FEATURE SELECTION Samples Features D1 Given Unlimited Sample Size of D: Feature selection results from D1 and D2 are the same D2 Size of D is limited: (n<<p for high dimensional data) Feature selection results from D1 and D2 are different Challenge: Increasing #of samples could be very costly or impractical Experts from Biology and Biomedicine are interested in: not only the prediction accuracy but also the consistency of feature subsets; validating stable genes or proteins less sensitive to variations to training data; biomarkers to explain the observed phenomena. OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks BACKGROUND AND RELATED WORK FEATURE SELECTION METHODS Original set Subset Generation Subset Subset Evaluation Goodness of subset no Search Strategies: Complete Search Sequential Search Random Search Stopping Criterion Evaluation Criteria Filter Model Wrapper Model Embedded Model Yes Result Validation Representative Algorithms Relief, SFS, MDLM, etc. FSBC, ELSA, LVW, etc. BBHFS, Dash-Liu’s, etc. BACKGROUND AND RELATED WORK STABLE FEATURE SELECTION Comparison of Feature Selection Algorithms w.r.t. Stability (Davis et al. Bioinformatics, vol. 22, 2006; Kalousis et al. KAIS, vol. 12, 2007) Quantify the stability in terms of consistency on subset or weight; Algorithms varies on stability and equally well for classification; Choose the best with both stability and accuracy. Bagging-based Ensemble Feature Selection (Saeys et al. ECML07) Different bootstrapped samples of the same training set; Apply a conventional feature selection algorithm; Aggregates the feature selection results. Group-based Stable Feature Selection (Yu et al. KDD08; Loscalzo et al. KDD09) Explore the intrinsic feature correlations; Identify groups of correlated features; Select relevant feature groups. BACKGROUND AND RELATED WORK MARGIN BASED FEATURE SELECTION Sample Margin: how much can an instance travel before it hits the decision boundary Hypothesis Margin: how much can the hypothesis travel before it hits an instance (Distance between the hypothesis and the opposite hypothesis of an instance) Representative Algorithms: Relief, Relief-F, G-flip, Simba, etc. margin is used for feature weighting or feature selection (totally different use in our study) OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks PUBLICATIONS Yue Han and Lei Yu. An Empirical Study on Stability of Feature Selection Algorithms. Technical Report from Data Mining Research Laboratory, Binghamton University, 2009. Yue Han and Lei Yu. Margin Based Sample Weighting for Stable Feature Selection. In Proceedings of the 11th International Conference on Web-Age Information Management (WAIM2010), pages 680-691, Jiuzhaigou, China, July 15-17, 2010. Yue Han and Lei Yu. A Variance Reduction Framework for Stable Feature Selection. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM2010), Sydney, Australia, December 14-17, 2010, To Appear. Lei Yu, Yue Han and Michael E. Berens. Stable Gene Selection from Microarray Data via Sample Weighting. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2010, Major Revision Under Review. OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks THEORETICAL FRAMEWORK BIAS-VARIANCE DECOMPOSITION OF FEATURE SELECTION ERROR Training Data: D; Data Space: ; FS Result: r(D); True FS Result: r* Expected Loss(Error): Bias: Variance: Bias-Variance Decomposition of Feature Selection Error: o Reveals relationship between accuracy(opposite of loss) and stability (opposite of variance); o Suggests a better trade-off between the bias and variance of feature selection. THEORETICAL FRAMEWORK VARIANCE REDUCTION VIA IMPORTANCE SAMPLING Feature Selection (Weighting) Monte Carlo Estimator Relevance Score: Monte Carlo Estimator: Variance of Monte Carlo Estimator: Impact Factor: feature selection algorithm and sample size ? Increasing sample size Importance Sampling impractical and costly A good importance sampling function h(x) Intuition behind h(x) : More instances draw from important regions Less instances draw from other regions Instance Weighting Intuition behind instance weight : Increase weights for instances from important regions Decrease weights for instances from other regions OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks EMPIRICAL FRAMEWORK OVERALL FRAMEWORK Challenges: How to produce weights for instances from the point view of feature selection stability; How to present weighted instances to conventional feature selection algorithms. Margin Based Instance Weighting for Stable Feature Selection EMPIRICAL FRAMEWORK MARGIN VECTOR FEATURE SPACE Original Space For each Nearest Hit Margin Vector Feature Space Nearest Miss hit miss Hypothesis Margin: captures the local profile of feature relevance for all features at Instances exhibit different profiles of feature relevance; Instances influence feature selection results differently. EMPIRICAL FRAMEWORK AN ILLUSTRATIVE EXAMPLE (a) (b) Hypothesis-Margin based Feature Space Transformation: (a) Original Feature Space (b) Margin Vector Feature Space. EMPIRICAL FRAMEWORK MARGIN BASED INSTANCE WEIGHTING ALGORITHM exhibits different profiles of feature relevance Instance influence feature selection results differently Review: Variance reduction via Importance Sampling More instances draw from important regions Instance Weighting Weighting: Higher Outlying Degree Lower Weight Lower Outlying Degree Higher Weight Outlying Degree: Less instances draw from other regions EMPIRICAL FRAMEWORK ALGORITHM ILLUSTRATION Time Complexity Analysis: o Dominated by Instance Weighting: o Efficient for High-dimensional Data with small sample size (n<<d) OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks EMPIRICAL STUDY SUBSET STABILITY MEASURES Feature Selection Method Training Data Training Data Training Data Feature Subset Feature Subset Feature Subset Consistent or not??? Stability of Feature Selection Average Pair-wise Similarity: Feature Subset Jaccard Index; nPOGR; SIMv. Kuncheva Index: Feature Ranking: Spearman Coefficient Feature Weighting: Pearson Correlation Coefficient EMPIRICAL STUDY EXPERIMENTS ON SYNTHETIC DATA Synthetic Data Generation: Feature Value: two multivariate normal distributions Covariance matrix is a 10*10 square matrix with elements 1 along the diagonal and 0.8 off diagonal. 100 groups and 10 feature each Class label: a weighted sum of all feature values with optimal feature weight vector 500 Training Data: 100 instances with 50 from Leave-one-out Test Data: 5000 instances and 50 from Method in Comparison: SVM-RFE: Recursively eliminate 10% features of previous iteration till 10 features remained. Measures: Variance, Bias, Error Subset Stability (Kuncheva Index) Accuracy (SVM) EMPIRICAL STUDY EXPERIMENTS ON SYNTHETIC DATA Observations: Error is equal to the sum of bias and variance for both versions of SVM-RFE; Error is dominated by bias during early iterations and is dominated by variance during later iterations; IW SVM-RFE exhibits significantly lower bias, variance and error than SVM-RFE when the number of remaining features approaches 50. EMPIRICAL STUDY EXPERIMENTS ON SYNTHETIC DATA Conclusion: Variance Reduction via Margin Based Instance Weighting better bias-variance tradeoff increased subset stability improved classification accuracy EMPIRICAL STUDY EXPERIMENTS ON REAL-WORLD DATA Microarray Data: Experiment Setup: 10-fold Cross-Validation Methods in Comparison: SVM-RFE Ensemble SVM-RFE Instance Weighting SVM-RFE Test Data 20-Ensemble SVM-RFE Bootstrapped Training Data Bootstrapped Training Data Feature Subset ... 20 ... Measures: Variance Subset Stability Accuracies (KNN, SVM) ... 10 fold Training Data Feature Subset Aggregated Feature Subset EMPIRICAL STUDY EXPERIMENTS ON REAL-WORLD DATA Observations: Non-discriminative during early iterations; SVM-RFE sharply increase as # of features approaches 10; IW SVM-RFE shows significantly slower rate of increase. Note: 40 iterations starting from about 1000 features till 10 features remain EMPIRICAL STUDY EXPERIMENTS ON REAL-WORLD DATA Observations: Both ensemble and instance weighting approaches improve stability consistently; Ensemble is not as significant as instance weighting; As # of features increases, stability score decreases because of the larger correction factor. EMPIRICAL STUDY EXPERIMENTS ON REAL-WORLD DATA Conclusions: Improves stability of feature selection without sacrificing prediction accuracy; Performs much better than ensemble approach and more efficient; Leads to significantly increased stability with slight extra cost of time. OUTLINE Introduction and Motivation Background and Related Work Preliminaries Publications Theoretical Framework Empirical Framework : Margin Based Instance Weighting Empirical Study Planned Tasks PLANNED TASKS OVERALL FRAMEWORK Theoretical Framework of Feature Selection Stability Empirical Instance Weighting Framework Text Data Various Realworld Data Set Gene Data HHSVM State-of-the-art Weighting Schemes F-statistics Representative FS Algorithms Relief-F Iterative Approach SVM-RFE Margin-based Instance Weighting Relationship Between Feature Selection Stability and Classification Accuracy PLANNED TASKS LISTED TASKS A Extensive Study on Instance Weighting Framework A1 Extension to Various Feature Selection Algorithms A2 Study on Datasets from Different Domains B Development of Algorithms under Instance Weighting Framework B1 Development of Instance Weighting Schemes B2 Iterative Approach for Margin Based Instance Weighting C Investigation on the Relationship between Stable Feature Selection and Classification Accuracy C1 How Bias-Variance Properties of Feature Selection Affect Classification Accuracy C2 Study on Various Factors for Stability of Feature Selection Oct-Dec 2010 A1 A2 B1 B2 C1 C2 Jan-Mar 2011 April-June2011 July-Aug 2011 Thank you and Questions?