Download Subjective Measures and their Role in Data Mining Process

Subjective Measures and their Role in Data Mining Process Ahmed Sultan Al-Hegami Department of Computer Science University of Delhi Delhi-110007 INDIA [email protected] Abstract Knowledge Discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a huge amount of data stored in databases. Data mining is a stage of the entire KDD process that involves applying a particular data mining algorithm to extract an interesting knowledge. One of the very important aspects of any data mining task is the evaluation process of the discovered knowledge. Furthermore, the major issue that faces data mining community is how to use our existing knowledge about domain to evaluate the discovered patterns. For the patterns to be interesting, the user has to be involved by providing his/her prior knowledge about domain. While objective measures can be quantified by using statistical methods, subjective measures are determined based on the user understandability of the domain. Use of objective measures of interestingness in popular data mining algorithms often leads to another data mining problem, although of reduced complexity. The reduction in the volume of the discovered patterns is desirable in order to improve the efficiency of the overall KDD process. Subjective measures of interestingness are required to achieve this. In this paper we study the subjective interestingness of the discovered patterns and show their role in extracting novel and interesting knowledge. Keywords Knowledge discovery in databases, data mining, subjective measures, objective measures, domain knowledge, classification, machine learning, decision tree. 1 Introduction It is not exaggeration to say that, the information get doubled every year due to the mechanical production of texts [2]. This Potentially large datasets are rich in information but it is difficult to find the meaningful facts we seek, unless there are methods for developing models to exploit this wealth. Researchers in different areas of Artificial Intelligence, Expert Systems, Statistics, Machine Learning, Databases, etc., are struggling to find new mechanisms, methods and techniques to transfer this ocean of data into a useful, effective, meaningful, and interesting information that play an effective role for decision support systems [26]. Knowledge Discovery of Databases (KDD) is a new area of research that attempts to solve the complexity mentioned above. It is a process of extracting previously unknown, hidden, novel and interesting knowledge from massive volumes of data stored in databases [18,44,16,7,11]. It is an iterative process carried out in three stages. The KDD process begins with the understanding of problem and ends with the analysis and evaluation of the results. It includes preprocessing of the data (Data Preparation) stage), extracting information from the data mining stage, and analyzing the discovered knowledge (Analysis stage) [18,16]. Actual extraction of patterns is preceded by preliminary analysis of data, followed by selection of relevant horizontal or vertical subset and appropriate data transformations. This is the preprocessing stage of KDD and it is considered to be the most time-consuming stage [45]. Often, the preparation of the data is influenced by the extraction algorithms used during the mining (second) stage. Data mining algorithms are applied during the second stage of the KDD process, which is considered to be the core stage. It involves selection and application of appropriate mining algorithm to search for patterns in the data. Sometimes combination of mining algorithms may be required to extract interesting patterns from the pre-processed data [15,54]. The outcome of this stage is the discovery of models/patterns hidden in databases, which are interpreted and analyzed during the third stage. The final stage of KDD process is analysis and evaluation of the knowledge discovered in the second stage. Having obtained patterns/models is not the end of the KDD process. Evaluation and analysis is equally important (if not more), particularly in view of the proliferation of KDD techniques being used to solve real-life applications. It is common knowledge that the volume of patterns discovered from data mining algorithms becomes huge due to the large size of target database [33,41,48,30,36]. Identifying interesting patterns from the vast set of discovered patterns still remains fundamentally a mining problem, though of reduced complexity. Time required to generate rules, space required to store, maintain and understand the rules by end users are some of the practical issues that need attention. 2 Integrating Subjective Measures with Data Mining The problem of reducing the volume of the discovered knowledge has been attacked at all the three stages of KDD process. Psaila proposed analysis of data to identify meta-patterns during pre-processing stage [43]. During the data mining stage, researchers commonly use either constraints or appropriate measures of interestingness to reduce the number of discovered rules. The third stage of the KDD process, which aims at analysis and interpreting the discovered knowledge is carried out by the end user. Post analysis of the discovered patterns as proposed in [33,32, 30] aids the user to focus on a small subset of discovered patterns. On account of wide variation of users' need and their subjectivity, the end users design and develop need based filters in an adhoc manner. We briefly discuss the three approaches in the following subsections. 2.1 Interestingness Measures Use of interestingness measures is one of the primary techniques to reduce the number of rules discovered by the user. The interestingness measures guide the KDD process in both mining stage and analysis stage, in order to restrict to rules that are of the user interest [12]. There are two types of rule interestingness measures that have been studied in data mining literature, namely, objective and subjective measures. Objective measures are based on the structure and statistical significance of the patterns [33,34,41]. Subjective measures are based on the subjectivity of the user who evaluates the patterns on the basis of novelty, actionability unexpectedness etc. [32, 48,49,58,59]. Novelty, unexpectedness and actionability are some of the subjective measures that are of immense importance to the end user of the KDD endeavor [12,36,59]. Novelty of a rule1 is the extent to which the rule is added to the prior knowledge of the user [6,59]. Unexpectedness [33,36] is the extent to which the rule is surprising to the user. Actionability indicates the benefit that the rule can bring to the user. It is implicitly captured by novelty and unexpectedness. It is important to distinguish between novelty and unexpectedness measures. While the former implies discovering knowledge that are totally new to some extent, the later implies discovering knowledge that would increase/decrease the user expectation about the domain. 2.2 Constrained Mining Constraint based mining allows the users to specify the rules to be discovered according to their background knowledge, thereby making the data mining process more effective. Han and Kamber elaborate on various types of constraints viz. knowledge type constraints, data constraints and dimension/level constraints [22]. Though not all of the specified constraints can be pushed into the data mining algorithm [40], yet recently constraints have been successfully used to restrict the search space [8,9,10]. 2.3 Post-processing Filters After the rules have been discovered by the mining algorithm, further focusing is possible by use of post-analysis filters. Because of their inherent subjective needs, the end users develop and design their own post-processing filters in an adhoc manner. Consequently, they are highly specific and have not been an active area of research. To the best of our knowledge, the only generalized work on post-analysis is reported by [31,32]. 3 Interestingness Measures and Data Mining 3.1 Overview One of the majors issues that face data mining community is how to use our existing knowledge about domain to discover novel and interesting rules. For this reason, researchers lay more emphasis on the use of interestingness measures as one of the most important ways of reducing the number of discovered rules. Such measures can help to confine the number of uninteresting patterns discovered. This issue is very crucial and the most complicated one. The interestingness measures can guide the analysis stage in order to look for the rules that are of the user interest [12]. There are two aspects of rules’ interestingness that have been studied in data mining literature, objective and subjective measures [46,22,33,32,48,49,30,36,23, 25, 47,31]. Objective measures are data-driven and domain-independent. Generally, these measures 1 Rules are one of the commonly used form of representing the discovered models/patterns [13,46]. evaluate the rules based on the quality as well as the similarity between the rules, rather than considering the user believe about domain. Subjective measures by contrast, are user-driven and domain-dependent. For example, the user may be involved to specify rule template, indicating which attribute(s) must occur in the rule to be interesting from his/her point of view [27]. Another example is that, the user is asked to give a general, high level description of his expectation about the domain, then the system searches for the only rules which are unexpected to the user [33,32,30,31] 3.2 Objective Measures Objective measures play a critical role in the different stages of KDD process. In data mining stage, quantitative measures can be used to reduce the search space. For instance, support measure is used to reduce the number of itemsets to be examined [3,4,5]. In the evaluation stage of KDD, objective measures are used to select interesting rules from a set of discovered rules [48,49,20,21,24,42,50,51,53,56,57]. For instance, confidence measure of association rules is used to select only strong rules from a set of discovered association rules [4]. Furthermore, objective measures are used in the consolidating and acting on discovered rules. In this phase, these measures are used to quantify the effectiveness and usefulness of discovered rules. For instance, cost, classification error, and classification accuracy are used to establish such role [19]. Objective measures are based on the structure and statistics of the patterns [46,33, 41,49,14]. Many measures such as confidence, support, classification error, etc., are defined based on statistical characteristics of rules. Usually using statistical methods are easier to use. These methods applied to data or rules in order to obtain the nature of relationship between variables (attributes). These constraints based mining allow the users to determine the interestingness of the rules to be discovered. It should be noted that both objective and subjective measures are complementary. While objective measure can be used as a kind of first filter to select interesting rules, the subjective measures can finally be used as a final filter to select the desired interesting rules. Objective measures will not be studied further, as they are beyond the scope of this paper. 4 Subjective Measures Subjective measures are based on the subjectivity of the user who examines the patterns such as actionability and unexpectedness [33,32,48,30,36,31]. This paper studies the subjective interestingness. There are two main subjective measures that have been studied in data mining literature namely, unexpectedness, and actionability. A rule is unexpected if it contradicts the user belief about the domain and therefore surprise the user [33,32,37]. A rule is actionable if he/she can do any action to his/her advantage based on this rule [33,32,1]. Another important subjective measure, which has met less attention in data mining community, is the novelty measure of discovered rules [18,17,49]. The rule is novel if to some extent contributes to new knowledge. 4.1 Unexpectedness Unexpectedness of the discovered rules has been studied exhaustively in the literature [41,48,49,30,36,37,38,52,27,23,25,47]. However, [41], [48,49], [30], [36], [37,38], [52] and [39] present different approaches to tackle this measure. [41] studied the issue of interestingness of the discovered rules in the context of a health care application. KEFIR tries to look for the deviation in the data and looks at how a relevant action may affect a deviation. The system analyzes health care information to uncover “key findings”. The interesting rules are provided after measuring the degree of interestingness. The degree of interestingness is estimated by the amount of benefit when an action is taken. The analyst provides recommendations based on his/her prior knowledge. It will rank all the rules based on interestingness of the deviation. This system is considered to be a good method for incorporating the domain knowledge into an application system. However, the system is domain dependent and cannot be used for any other application. [48,49] studied the subjective interestingness by providing a framework to measure rules’ unexpectedness with respect to the user belief. They proposed to use a probabilistic belief and belief revision methods. The belief is used for defining the unexpectedness. A revision method is used to modify the belief confidence when new evidence arrived. A rule is considered unexpected if there is some change in this belief. In practice, it is difficult to obtain belief information, especially specific domain knowledge. The approach presented in [30] is based on a syntactic comparison between a discovered rule and a rule in domain knowledge. Both rules are dissimilar if either the consequents of both rules are similar but the antecedents are “far apart” or the consequents are “far apart” but the antecedents are similar. Where similarity and dissimilarity are defined based on the structure of the rules. The problem with this approach is that it does not specify the degree of the unexpectedness and does not consider the case in which both antecedents and consequents are dissimilar between the discovered rules and rules in domain knowledge. [36,37,38] proposed a new definition of unexpectedness in term of a logical contradiction of a rule with respect to belief. Given a rule AÆB and belief XÆY, where both A and X are antecedents and both B and Y are a single atomic conditions, if the rule A Æ B is unexpected with respect to belief X ÆY, then the rule A, XÆB also holds. An alternative approach is presented in [52] that proposes autonomous probabilistic estimation method that can discover all rule pairs (i.e., an exception rule associated with a common sense rule) with high confidence. The approach discovers pairs of rules AÆB and their corresponding exception A, C Æ B’, where A and C are conjunction of <attribute, value> pairs and B and B’ are <attribute, value> pairs corresponding to the same attribute but with different values. In addition, the unexpectedness of the exception rule is defined by an additional constraint that the “reference rule ”C Æ B’ has low confidence. Neither users’ evaluation nor domain knowledge is required in this approach. Another approach to measure subjective interestingness requires the user to specify what types of rules that are interesting and uninteresting. Then, the matching techniques are performed to generate rules taking in the consideration the user belief. [27] proposes this kind of user belief and uses a template-based approach in which the user specifies a set of interesting and uninteresting rules using templates. A template describes a set of rules in terms of items occurred in the antecedent and consequence parts. Finally, the system retrieves the matching rules from the set of discovered rules. Another methods of quantifying subjective interesting rules are query-based [23, 25,47]. For example M-SQL in [25], DMQL in [23], and Metaqueries in [47]. These methods look at the process of finding subjective interesting rules as a query-based process. The user basically specifies a set of rules or constraints on the rules using data mining query. The system, then, finds the rules that satisfy this query [29]. The drawbacks of query-based approaches are that, they find only those expected rules, which match the query specified by the user. The real interesting rules, which are unexpected or novel can never be found by these methods. Furthermore, the user may not be able to determine what is interesting to him/her. 4.2 Actionability The actionability measure is based on the rules’ benefit to the user, that is, the user can do something to his/her interest [12,33,32,48,49,30,31,29]. This measure is very important for the rules to be interesting in the sense that the users always looking for patterns to improve their performance and establishing better work. The users can take some actions in response to the actionable knowledge. It is therefore important to remember that, one of primary domain of most data mining algorithms is dealing with business activities. From business point of view, getting information in not desired purely for its own interest. The practical implication of getting information is to improve the business, that is, the information must ensure the success of business for decision-making. Making business is an action performed to make business succeed. However, in practice, it is not an easy task to determine which information is actionable. [48,49] quantify the actionability in term of unexpectedness. They define unexpectedness as a subjective measure of interestingness. They show that the most of actionable knowledge is unexpected and most of the unexpected knowledge is actionable. Since actionability is a subjective measure that is hard to define, they propose that unexpectedness is a good approximation for actionability. Furthermore, they argue that actionability is a good measure for unexpectedness. Since unexpectedness is easier to measure than actionability, unexpectedness is the measurement used to address actionability. In [48,49] subjective interestingness is categorized into three categories: 1. Rules that are both unexpected and actionable, 2. Rules that are unexpected and not actionable, and 3. Rules that are expected and actionable. They argue that new metric is not needed since 1 and 2 can be handled by finding rules, which are unexpected, and 3 can be handled by finding rules that conform to the user existing knowledge about domain. In fact that process does not solve the problem of determining how actionability affects interestingness. The actionability and unexpectedness measures must be addressed individually and each represented separately in the interestingness measure. If a rule is unexpected and not actionable, this rule is not as interesting as a rule that is unexpected and actionable. Both rules must be presented with different degrees of interestingness. In order to measure the actionability, the user needs to be involved. The user can rank each attribute based on his ability to act on that attribute [12]. It may be possible also to rank a rule based on its actionability. Actionability in addition can be measured with respect to different data mining algorithms, that is, a set of discovered rules generated by a particular algorithm is more actionable than those rules discovered by another data mining algorithm. Therefore, the actionability has to be measured independently, not through measuring the unexpectedness. 4.3 Novelty Measure A key factor in determining whether a KDD process is successful is whether it provides the user with previously unknown, useful, and interesting knowledge [17, 48,49]. The term, previously unknown, has been argued to imply interesting [48,49]. This implies that the interestingness increases as the newness of the knowledge increases and vise-versa. For instance, if the following discovered rule: age>50 ^ sex=female Æloan=no, then if the user does not know this rule and this rule is not discovered previously, then novel rule is provided to the user. This is interesting since it increases the user knowledge. However, if the user already knows this rule, no novel information is provided to the user and the rule is considered uninteresting to the user. Since novelty measure is based on the users’ feeling and subjectivity of the user about the discovered rules, it is considered subjective measure. Novelty is very important aspect of KDD process. It can be applied to the different stages of KDD process. In pre-processing stage, the novelty measure can be used as a filter to select and concentrate on a set of instances that should be given more attention. It can also be used to determine what features are more important to the learning algorithms and hence focus attention when something new comes. In the second stage of KDD process, the novelty measure can guide the mining process to form a constraint in order to discover the only novel rules. In post-processing stage of KDD, this measure can analyze the discover knowledge objectively and/or subjectively to form a filter that minimize the number of discovered rules which are easier to understand by the user. There are many proposals that studied the novelty in other disciplines such as robotics, machine learning and statistical outlier detection. Generally, these methods build a model of a training set that is selected to contain no examples of the important (i.e. novel) class. Then, the model built; detect the deviation from this model by some way. For instance, Kohonen and Oja proposed a novelty filter, which based on computing the bit-wise difference between the current input and the closest match in the training set [28]. In [55] a sample application of applying association rule learning is presented. By monitoring the variance of the confidence of particular rules inferred from the association rule learning on training data, it provides information on the difference of such parameters before and after the testing data entering the system. Hence, with some pre-defined threshold, abnormalities can be fairly detected. The techniques that have been proposed in statistical literature are focused on modeling the support of the dataset and then detecting inputs that don’t belong to that support. The choice of whether to use statistical methods or machine learning methods is based on the data that is available, the application, and the domain knowledge [35]. To our knowledge no concrete work has been conducted to tackle the novelty measure of data mining. The only work that has been proposed is detecting the novelty of rules mined from text [6]. In [6], the novelty is estimated based on the lexical knowledge in WordNet. The proposed approach defines a measure of semantic distance between two words in WordNet and determined by the length of the shortest path between the two words (wi,wj). The novelty then is defined as the average of this distance across all pairs of the words (wi,wj), where wi in a word in the antecedent and wj is a word in the consequent. In [59], we proposed a framework to quantify the novelty in terms of the computing the deviation of currently discovered knowledge with respect to domain knowledge and previously discovered knowledge. The approach presented in [59] is used as a post analysis filter in order to discover only novel rules. 5 COMPARISON OF SUBJECTIVE MEASURES Most existing approaches to measure subjective interestingness require a user to explicitly state what type of knowledge the user expects. The system then applies some searching techniques to select rules according to the user previous expectation. Most of these measures concentrate on the unexpectedness and actionability measures as the most influential aspect of rules’ interestingness. However, no general approach was proposed for handling novelty. Although, the actionability and unexpectedness are important, the rule to be interesting must also be novel. We assume that if novelty is occurred, this implies explicitly or implicitly that the rule may also be unexpected and/or actionable. Even though the unexpectedness measure may sound like novelty in some aspects, most of the research on unexpectedness focused on generating rules, which contradict the user belief about domain [33, 32, 48,49,30,36]. Table 2 shows the subjective measures and their importance with each other. Non-Actionable Rule Novel Rule Unexpected Rule Most Interesting Expected Rule Not Interesting Actionable Rule Most Interesting More Interesting Non-Novel rule Less Interesting Not Interesting Less Interesting Not Interesting Table 1. Subjective interestingness measures categories References 1. Adomavicius, G. , Tuzhilin, A., “Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach”, In Proceedings of the Third International Conference of Knowledge Discovery & Data Mining, The AAAI Press,1997. 2. Adrians, P., Zantiage, D.,”Data Mining”, 1st edition, Addison Wesley, longman, 1999. 3. Agrawal, R. , Mannila, H. , Srikant, R. , Toivonen, H. , Inkeri Verkamo, A., “Fast Discovery of Association Rules”, In Advances in knowledge discovery and data mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press,1996. 4. Agrawal, R., Imielinski, T., Swami, A., ”Mining Association Rules between Sets of Items in Large Databases”, In ACM SIGMOD Conference of Management of Data. Washington D.C., 1993. 5. Agrawal, R., Srikant, R., “Fast Algorithms for Mining Association Rules in Large Databases”, In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago. Chile, 1994. 6. Basu, S., Mooney, R. J., Pasupuleti, K. V., Ghosh, J., ”Using Lexical Knowledge to Evaluate the Novelty of Rules Mined from Text”, In Proceedings of the NAACL workshop and other Lexical Resources: Applications, Extensions and Customizations, 2001. 7. Brachman, R. J., Anand, T., “The Process of Knowledge Discovery in Databases”, In Advances in Knowledge Discovery and Data mining. Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, 1996. 8. Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., “Adaptive Constraint Pushing in Frequent Pattern Mining”, In Proceedings of the 17th European Conference on PAKDD03, 2003. 9. Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., “ExAMiner: Optimized Level-wise Frequent pattern Mining with Monotone Constraints”, In Proceedings of the 3rd International Conference on Data Mining (ICDM03), 2003. 10. Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., “Exante: Anticipated Data Reduction in Constrained Pattern Mining”, In Proceedings of the 7th PAKDD03, 2003). 11. Cabena, P., Hadjinian, P., Stadler, R., Verhess, J., Zanasi, A., “Discovering Data Mining from Concepts to Implementation’, New Jersey, Prentice Hall, 1998. 12. Clair, C., “A Usefulness Metric and its Application to Decision Tree Based Classification”, Ph.D. thesis, School of Computer Science, USA, 1999. 13. Clark, P., Niblett, T., “The CN2 Induction Algorithm”, In Machine learning 3(4), 1989. 14. Dhar, V., Tuzhilin, A., “Abstract-Driven Pattern Discovery in Databases”, In IEEE Transactions on Knowledge and Data Engineering 5(6), 1993. 15. Duda, R. O, Hart, P. E., Stork, D. G., ” Pattern Classification”, 2nd Edition. John Wiley & Sons ( Asia) PV. Ltd, 2002. 16. Dunham M. H., ”Data Mining: Introductory and Advanced Topics”, 1st Edition Pearson Education (Singapore) Pte. Ltd., 2003. 17. Fayyad, U. M., Djorgovski, S. G., Weir, N., “Automating the Analysis and Cataloging of Sky Surveys”,. In Advances in Knowledge Discovery and Data Mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P,, Menlo Park, CA:AAAI/MIT Press, 1996. 18. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., “From Data Mining to Knowledge Discovery”, In Advances in Knowledge Discovery and Data Mining. Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, 1996. 19. Freitas, A. A., “On Rule Interestingness Measures”, Knowledge-Based Systems 12, 1999. 20. Gray, B., Orlowska, M. E., “CCAIIA: Clustering Categorical Attributes into Interesting Association Rules”, In Proceedings of the 2nd PacificAsia Conference, PAKDD-98, Lecture Notes in Artificial Intelligence, 1998. 21. Guillaume, S., Guillet, F., Philippé, J., “Improving the Discovery of Association Rules with Intensity of Implication”, In Proceedings of the 2nd European Symposium, PKDD98, Lecture Notes in Artificial Intelligence, 1998. 22. Han, J., Kamber, M.:, “Data Mining: Concepts and Techniques”, 1st Edition, Harcourt India Private Limited, 2001. 23. Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O., “DMQL: A Data Mining Query Language for Relational Databases”, In Proceedings of the SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1996 24. Hong, J., Mao, C., “Incremental Discovery of Rules and Structure by Hierarchical and Parallel Clustering. In Knowledge Discovery in Databases, 1991. 25. Imielinski, T., Virmani, A., Abdulghani, A., “DataMine: Application Programming Interface and Query Language for Database Mining”, KDD-96, 1996. 26. Janakiramn, V. K., Saurukesi, K., “Decision Support Systems”, 2nd edition, Perntice-Hall,India, 2001. 27. Klemetinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A. I., “Finding Interesting Rules from Large Sets of Discovered Association Rules”, In Proceedings of the 3rd International Conference on Information and Knowledge Management. Gaithersburg, Maryland, 1994. 28. Kohonen, T., “Self-Organization and Associative Memory”, 3rd edition, Springer, Berlin, 1993. 29. Liu, B., Hsu, W., Chen, S., Ma, Y., “Analyzing the Subjective Interestingness of Association Rules”, IEEE Intelligent Systems, 2000. 30. Liu, B., Hsu, W., “Post Analysis of Learned Rules”, In Proceedings of the 13th National Conference on AI (AAAI’96), 1996. 31. Liu, B., Hsu, W., Lee, H-Y., Mum, L-F., “Tuple-Level Analysis for Identification of Interesting Rules”, In Technical Report TRA5/95, SoC., National University of Singapore, Singapore, 1996. 32. Liu, B., Hsu, W., ”Finding Interesting Patterns Using User Expectations”, DISCS Technical Report, 1995. 33. Liu, B., Hsu, W., Chen, S., “Using General Impressions to Analyze Discovered Classification Rules”, In Proceedings of the 3rd International Conference on Knowledge Discovery and Data mining (KDD 97), 1997. 34. Luger, G. F., “Artificial Intelligence: Structure and Strategies for Complex Problem Solving”, 4th Edition, Pearson Education Ltd.,Delhi, India, 2002. 35. Marsland, S., “On-Line Novelty Detection Through Self-Organization, with Application to Robotics”, Ph.D. Thesis, Department of Computer Science, University of Manchester, 2001. 36. Padmanabhan, B., Tuzhilin, A., “Unexpectedness as a Measure of Interestingness in Knowledge Discovery”, Working paper # IS-97-. Dept. of Information Systems, Stern School of Business, NYU, 1997. 37. Padmanabhan, B., Tuzhilin, A., “A Belief-Driven Method for Discovering Unexpected Patterns”, KDD-98, 1998 38. Padmanabhan, B., Tuzhilin, A., “Small is Beautiful: Discovering the Minimal Set of Unexpected Patterns”, KDD-2000, 2000. 39. Patterson, D. W., “Introduction to Artificial Intelligence and Expert Systems”, 8th Edition, Prentice-Hall, India, 2000. 40. Pei, J., Han, J., “Can We Push More Constraints into Frequent Pattern Mining”, In Proceeding of the 6th ACM SIGKDD, 2000. 41. Piatetsky-Shapiro, G., Matheus, C. J., “The Interestingness of Deviations”, In Proceedings of AAAI Workshop on Knowledge Discovery in Databases, 1994. 42. Piatetsky-Shapiro, G., “Discovery, Analysis, and Presentation of Strong Rules”, In Knowledge Discovery in Databases, The AAAI Press, 1991. 43. Psaila, G., “Discovery of Association Rules Meta-Patterns”, In Proceedings of 2nd International Conference on Data Warehousing and Knowledge Discovery (DAWAK99), 1999). 44. Pujari , A. K., “Data Mining Techniques”, 1st Edition, Universities Press (India) Limited, 2001. 45. Pyle, D., “Data Preparation for Data Mining”, Morgan Kaufmanns, San Francisco, CA, USA, 1999. 46. Quinlan, J. R”, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmanns, 1993. 47. Shen, W-M., Ong, K-L., Mitbander, B., Zaniolo, C., “Metaqueries for Data Mining”, In Advances in Knowledge Discovery and Data Mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, 1996. 48. Silberschatz, A., Tuzhilin, A., “On Subjective Measures of Interestingness in Knowledge Discovery”, In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, 1995. 49. Silberschatz, A., Tuzhilin, A., “What Makes Patterns Interesting in Knowledge Discovery Systems”, IEEE Trans. and Data Engineering. V.5, no.6, 1996. 50. Smyth, P., Goodman, R. M., “Rule Induction Using Information Theory”, In Knowledge Discovery in Databases, 1991. 51. Suzuki, E., Kodratoff, Y., “Discovery of Surprising Exception Rules Based on Intensity of Implication”, In Proceedings of the 2nd European Symposium, PKDD98, Lecture Notes in Artificial Intelligence, 1998. 52. Suzuki, E., “Autonomous Discovery of Reliable Exception Rules”, In Proceedings of The 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, 1997. 53. Wang, K., Tay, S.H.W., Liu, B., “Interestingness-Based Interval Merger for Numeric Association Rules”, In Proceedings of the 4th International Conference on Knowledge and Data Mining, 1998. 54. Williams, G. J., “Evolutionary Hot Spot Data Mining: An Architecture for Exploring For Interesting Discoveries”, In Proceeding of the 3rd PAKDD99, 1999. 55. Yairi, T., Kato, Y., Hori K., “Fault Detection by Mining Association Rules from House-keeping Data”, In Proceedings of International Symposium on Artificial Intelligence, Robotics and Automation in Space (SAIRAS 2001), 2001. 56. Yao, Y. Y., Liau, C. J., “A generalized Decision Logic Language for Granular Computing”, FUZZ-IEE on Computational Intelligence, 2002. 57. Yao, Y. Y., Zhong, N., “An Analysis of Quantitative Measures Associated with Rules”, In Proceedings of PAKDD, 1999. 58. Al-Hegami, A. S., “Interestingness Measures for KDD: A Comparative Analysis”, In Proceedings of the 11th International Conference on Concurrent Engineering: Research and Applications, Beijing, China, 2004, pp 321-326. 59. Al-Hegami, A. S., Kumar, N., Bhatnagar, V., “ Novelty Framework for Knowledge Discovery in Databases”, In Proceedings of the 6th International Conference on Data warehousing and Knowledge Discovery (DaWak 2004), Zaragoza, Spain, 2004, pp 48-55.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Subjective Measures and their Role in Data Mining Process