Download Methods of attribute relevance analysis

Methods of attribute relevance analysis The general idea behind attribute relevance analysis is to compute some measure which is used to quantify the relevance of an attribute with respect to a given class. Such measures include the information gain, Giniindex, uncertainty, and correlation coefficients. Let S be a set of training object (or tuple) where the class label of each tuple is known. Suppose that there are m classes. Let S contain si objects of class Ci, for i = 1,…,m. An arbitrary object belongs to class Ci with probability si/s, where s is the total number of objects in set S. The expected information needed to classify given tuple is If an attribute A with values {a1, a2 . . . .av} is used to partition S into the subsets {S1, S2, . . . Sv }, where Sj contains those objects in S that have value aj of A. Let Sj contain sij objects of class Ci. The expected information based on this partitioning by A is known as the entropy of A. It is the weighted average: The information gained by branching on A is defined by: The attribute which maximizes Gain(A) is selected. Attribute relevance analysis for class description is performed as follows. 1. Data Collection: Collect data for both the target class and the contrasting class by query processing. Notice that for class comparison, both the target class and the contrasting class are provided by the user in the data mining query. For class characterization, the target class is the class to be characterized, whereas the contrasting class is the set of comparable data which are not in the target class. 2. Preliminary Relevance analysis using conservative AOI: Attribute-oriented induction (AOI) can be used to perform some preliminary relevance analysis on the data by removing or generalizing attributes having a large number of distinct values(such as name and phone#). Such attributes are unlikely to be meaningful for concept description. To be conservative , the AOI should employ attribute generalization thresholds that are set reasonably large.( so as to allow more attributes to be considered in further relevance analysis by selected measure performed in step-3). The relation obtained by such an attribute removal and attribute generalization process is called the candidate relation of the mining task. 3. Remove irrelevant or weakly relevant attributes using the selected measure : The selected relevance measure used is used evaluate(or rank) each attribute in the candidate relation. For example, the information gain measure described above may be used. The attributes are then sorted (i.e., ranked) according to their computed relevance measure value. Attribute that are not relevant or weakly relevant are then removed based on the set threshold. The resulting relation is called “Initial Target/Contrast class Working Relation”.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Methods of attribute relevance analysis