Download Feature Selection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Feature Selection:
Algorithms and Challenges
Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu
Xindong Wu
University of Vermont, USA;
Hefei University of Technology, China
合肥工业大学计算机应用长江学者讲座教授
1
Deduction
Induction: My Research Background
…
1990
Expert Systems
Expert Systems
1995
2004
…
1988
2
Outlines
1.
2.
3.
4.
5.
Why feature selection
What is feature selection
Components of feature selection
Some research efforts by myself
Challenges in feature selection
3
1. Why Feature Selection?

High-dimensional data often contain irrelevant or
redundant features




reduce the accuracy of data mining algorithms
slow down the mining process
be a problem in storage and retrieval
hard to interpret
4
2. What Is Feature Selection?

Select the most “relevant” subset of attributes
according to some selection criteria.
5
Outlines
1.
2.
3.
4.
5.
Why feature selection
What is feature selection
Components of feature selection
Some research efforts by myself
Challenges in feature selection
6
Traditional Taxonomy

Wrapper approach


Filter approach


Features are selected as part of the mining algorithm
Features selected before a mining algorithm,using
heuristics based on general characteristics of the data,
rather than a learning algorithm to evaluate the merit of
feature subsets
Wrapper approach is generally more accurate but
also more computationally expensive.
7
Components of Feature Selection

Feature selection is actually a search problem,
including four basic components:
1.
an initial subset
2.
one or more selection criteria (*)
3.
4.
a search strategy (*)
some given stopping conditions
8
Feature Selection Criteria

Selection criteria generally use “relevance” to
estimate the goodness of a selected feature subset in
one way or another:






Distance Measure
Information Measure
Inconsistency Measure
Relevance Estimation
Selection Criteria related to Learning Algorithms (wrapper
approach)
Some unified framework for relevance has been
proposed recently.
9
Search Strategy

Exhaustive Search

Every possible subset is evaluated and the best one is
chosen



Guarantee the optimal solution
Low efficiency
A modified approach: B&B
10
Search Strategy (2)

Heuristic search

Sequential search, including SFS,SFFS,SBS and SBFS

SFS: Start with empty attribute set

Add “best” of attributes
Add “best” of remaining attributes

Repeat until the maximum performance is reached


SBS: Start with the entire attribute set


Remove “worst” of attributes
Repeat until the maximum performance has been reached.
11
Search Strategy (3)

Random search
It proceeds in two different ways
Inject randomness into classical sequential approaches
(simulated annealing, beam search, the genetic algorithm ,
and random-start hill-climbing)
 Generate the next subset randomly
The use of randomness can help to escape local optima in
the search space, and the optimality of the selected subset
would depend on the available resources.

12
Outlines
1.
2.
3.
4.
5.
Why feature selection
What is feature selection
Components of feature selection
Some research efforts by myself
Challenges in feature selection
13
RITIO: Rule Induction Two In One




Feature selection using the information gain
in a reverse order
Delete features that are lest informative
Results are significant compared to forward
selection
[Wu et al 1999, TKDE].
14
Induction as Pre-processing

Use one induction algorithm to select attributes for
another induction algorithm




Can be a decision-tree method for rule induction, or vice
versa
Accuracy results are not as good as expected
Reason: feature selection normally causes
information loss
Details: [Wu 1999, PAKDD].
15
Subspacing with Asysmetric Bagging




When the number of examples is less than the
number of attributes
When the number of positive examples is smaller
than the number of negative examples
An example: content-based information retrieval
Details: [Tao et al., 2006, TPAMI].
16
Outlines
1.
2.
3.
4.
5.
Why feature selection
What is feature selection
Components of feature selection
Some research efforts by myself
Challenges in feature selection
17
Challenges in Feature Selection (1)

Dealing with ultra-high dimensional data and feature
interactions
Traditional feature selection encounter two major problems when the
dimensionality runs into tens or hundreds of thousands:
1.
curse of dimensionality
2.
the relative shortage of instances.
18
Challenges in Feature Selection (2)

Dealing with active instances (Liu et al., 2005)
When the dataset is huge, feature selection performed on the
whole dataset is inefficient,
so instance selection is necessary:
 Random sampling (pure random sampling without
exploiting any data characteristics)
 Active feature selection (selective sampling using data
characteristics achieves better or equally good results with
a significantly smaller number of instances).
19
Challenges in Feature Selection (3)

Dealing with new data types (Liu et al., 2005)
traditional data type: an N*M data matrix
Due to the growth of computer and Internet/Web techniques, new
data types are emerging:

text-based data (e.g., e-mails, online news, newsgroups)

semistructure data (e.g., HTML, XML)

data streams.

20
Challenges in Feature Selection (4)

Unsupervised feature selection



Feature selection vs classification: almost every
classification algorithm
Subspace method with the curse of
dimensionality in classification
Subspace clustering.
21
Challenges in Feature Selection (5)

Dealing with predictive-but-unpredictable attributes
in noisy data




Attribute noise is difficult to process, and removing noisy
instances is dangerous
Predictive attributes: essential to classification
Unpredictable attributes: cannot be predicted by the class
and other attributes
Noise identification, cleansing, and measurement
need special attention [Yang et al., 2004]
22
Challenges in Feature Selection (6)

Deal with inconsistent and redundant features



Redundancy can indicate reliability
Inconsistency can also indicate a problem for handling
Researchers in Rough Set Theory: What is the purpose of
feature selection?




Can you really demonstrate the usefulness of reduction, in data
mining accuracy, or what?
Removing attributes can well result in information loss
When the data is very noisy, removals can cause a very different data
distribution
Discretization can possibly bring new issues.
23
Concluding Remarks



Feature selection is and will remain an important
issue in data mining, machine learning, and related
disciplines
Feature selection has a price in accuracy for
efficiency
Researchers need to have the bigger picture in mind,
not just doing selection for the purpose of feature
selection.
24