Download Methods of attribute relevance analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Methods of attribute relevance analysis
The general idea behind attribute relevance analysis is to compute some measure which is
used to quantify the relevance of an attribute with respect to a given class. Such measures
include the information gain, Giniindex, uncertainty, and correlation coefficients.
Let S be a set of training object (or tuple) where the class label of each tuple is known.
Suppose that there are m classes. Let S contain si objects of class Ci, for i = 1,…,m. An
arbitrary object belongs to class Ci with probability si/s, where s is the total number of
objects in set S. The expected information needed to classify given tuple is
If an attribute A with values {a1, a2 . . . .av} is used to partition S into the subsets {S1,
S2, . . . Sv }, where Sj contains those objects in S that have value aj of A. Let Sj contain
sij objects of class Ci. The expected information based on this partitioning by A is known
as the entropy of A. It is the weighted average:
The information gained by branching on A is defined by:
The attribute which maximizes Gain(A) is selected.
Attribute relevance analysis for class description is performed as follows.
1. Data Collection: Collect data for both the target class and the contrasting class by
query processing. Notice that for class comparison, both the target class and the
contrasting class are provided by the user in the data mining query. For class
characterization, the target class is the class to be characterized, whereas the contrasting
class is the set of comparable data which are not in the target class.
2. Preliminary Relevance analysis using conservative AOI: Attribute-oriented induction
(AOI) can be used to perform some preliminary relevance analysis on the data by
removing or generalizing attributes having a large number of distinct values(such as
name and phone#). Such attributes are unlikely to be meaningful for concept description.
To be conservative , the AOI should employ attribute generalization thresholds that are
set reasonably large.( so as to allow more attributes to be considered in further relevance
analysis by selected measure performed in step-3). The relation obtained by such an
attribute removal and attribute generalization process is called the candidate relation of
the mining task.
3. Remove irrelevant or weakly relevant attributes using the selected measure : The
selected relevance measure used is used evaluate(or rank) each attribute in the candidate
relation. For example, the information gain measure described above may be used. The
attributes are then sorted (i.e., ranked) according to their computed relevance measure
value. Attribute that are not relevant or weakly relevant are then removed based on the set
threshold. The resulting relation is called “Initial Target/Contrast class Working
Relation”.