Download Subjective Measures and their Role in Data Mining Process

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Subjective Measures and their Role in Data Mining Process
Ahmed Sultan Al-Hegami
Department of Computer Science
University of Delhi
Delhi-110007
INDIA
[email protected]
Abstract
Knowledge Discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a
huge amount of data stored in databases. Data mining is a stage of the entire KDD process that involves applying a particular data
mining algorithm to extract an interesting knowledge. One of the very important aspects of any data mining task is the evaluation
process of the discovered knowledge. Furthermore, the major issue that faces data mining community is how to use our existing
knowledge about domain to evaluate the discovered patterns. For the patterns to be interesting, the user has to be involved by
providing his/her prior knowledge about domain. While objective measures can be quantified by using statistical methods,
subjective measures are determined based on the user understandability of the domain. Use of objective measures of interestingness
in popular data mining algorithms often leads to another data mining problem, although of reduced complexity. The reduction in
the volume of the discovered patterns is desirable in order to improve the efficiency of the overall KDD process. Subjective
measures of interestingness are required to achieve this. In this paper we study the subjective interestingness of the discovered
patterns and show their role in extracting novel and interesting knowledge.
Keywords
Knowledge discovery in databases, data mining, subjective measures, objective measures, domain
knowledge, classification, machine learning, decision tree.
1 Introduction
It is not exaggeration to say that, the information get doubled every year due to the mechanical production of texts [2]. This
Potentially large datasets are rich in information but it is difficult to find the meaningful facts we seek, unless there are methods for
developing models to exploit this wealth.
Researchers in different areas of Artificial Intelligence, Expert Systems, Statistics, Machine Learning, Databases, etc., are
struggling to find new mechanisms, methods and techniques to transfer this ocean of data into a useful, effective, meaningful, and
interesting information that play an effective role for decision support systems [26].
Knowledge Discovery of Databases (KDD) is a new area of research that attempts to solve the complexity mentioned above. It is
a process of extracting previously unknown, hidden, novel and interesting knowledge from massive volumes of data stored in
databases [18,44,16,7,11]. It is an iterative process carried out in three stages. The KDD process begins with the understanding of
problem and ends with the analysis and evaluation of the results. It includes preprocessing of the data (Data Preparation) stage),
extracting information from the data mining stage, and analyzing the discovered knowledge (Analysis stage) [18,16]. Actual
extraction of patterns is preceded by preliminary analysis of data, followed by selection of relevant horizontal or vertical subset and
appropriate data transformations. This is the preprocessing stage of KDD and it is considered to be the most time-consuming stage
[45]. Often, the preparation of the data is influenced by the extraction algorithms used during the mining (second) stage. Data
mining algorithms are applied during the second stage of the KDD process, which is considered to be the core stage. It involves
selection and application of appropriate mining algorithm to search for patterns in the data. Sometimes combination of mining
algorithms may be required to extract interesting patterns from the pre-processed data [15,54]. The outcome of this stage is the
discovery of models/patterns hidden in databases, which are interpreted and analyzed during the third stage. The final stage of
KDD process is analysis and evaluation of the knowledge discovered in the second stage. Having obtained patterns/models is not
the end of the KDD process. Evaluation and analysis is equally important (if not more), particularly in view of the proliferation of
KDD techniques being used to solve real-life applications.
It is common knowledge that the volume of patterns discovered from data mining algorithms becomes huge due to the large size
of target database [33,41,48,30,36]. Identifying interesting patterns from the vast set of discovered patterns still remains
fundamentally a mining problem, though of reduced complexity. Time required to generate rules, space required to store, maintain
and understand the rules by end users are some of the practical issues that need attention.
2 Integrating Subjective Measures with Data Mining
The problem of reducing the volume of the discovered knowledge has been attacked at all the three stages of KDD process. Psaila
proposed analysis of data to identify meta-patterns during pre-processing stage [43]. During the data mining stage, researchers
commonly use either constraints or appropriate measures of interestingness to reduce the number of discovered rules. The third
stage of the KDD process, which aims at analysis and interpreting the discovered knowledge is carried out by the end user. Post
analysis of the discovered patterns as proposed in [33,32, 30] aids the user to focus on a small subset of discovered patterns.
On account of wide variation of users' need and their subjectivity, the end users design and develop need based filters in an adhoc
manner. We briefly discuss the three approaches in the following subsections.
2.1 Interestingness Measures
Use of interestingness measures is one of the primary techniques to reduce the number of rules discovered by the user. The
interestingness measures guide the KDD process in both mining stage and analysis stage, in order to restrict to rules that are of the
user interest [12]. There are two types of rule interestingness measures that have been studied in data mining literature, namely,
objective and subjective measures. Objective measures are based on the structure and statistical significance of the patterns
[33,34,41]. Subjective measures are based on the subjectivity of the user who evaluates the patterns on the basis of novelty,
actionability unexpectedness etc. [32, 48,49,58,59].
Novelty, unexpectedness and actionability are some of the subjective measures that are of immense importance to the end user of
the KDD endeavor [12,36,59]. Novelty of a rule1 is the extent to which the rule is added to the prior knowledge of the user [6,59].
Unexpectedness [33,36] is the extent to which the rule is surprising to the user. Actionability indicates the benefit that the rule can
bring to the user. It is implicitly captured by novelty and unexpectedness. It is important to distinguish between novelty and
unexpectedness measures. While the former implies discovering knowledge that are totally new to some extent, the later implies
discovering knowledge that would increase/decrease the user expectation about the domain.
2.2 Constrained Mining
Constraint based mining allows the users to specify the rules to be discovered according to their background knowledge, thereby
making the data mining process more effective. Han and Kamber elaborate on various types of constraints viz. knowledge type
constraints, data constraints and dimension/level constraints [22]. Though not all of the specified constraints can be pushed into the
data mining algorithm [40], yet recently constraints have been successfully used to restrict the search space [8,9,10].
2.3 Post-processing Filters
After the rules have been discovered by the mining algorithm, further focusing is possible by use of post-analysis filters. Because
of their inherent subjective needs, the end users develop and design their own post-processing filters in an adhoc manner.
Consequently, they are highly specific and have not been an active area of research. To the best of our knowledge, the only
generalized work on post-analysis is reported by [31,32].
3 Interestingness Measures and Data Mining
3.1 Overview
One of the majors issues that face data mining community is how to use our existing knowledge about domain to discover novel
and interesting rules. For this reason, researchers lay more emphasis on the use of interestingness measures as one of the most
important ways of reducing the number of discovered rules. Such measures can help to confine the number of uninteresting patterns
discovered. This issue is very crucial and the most complicated one. The interestingness measures can guide the analysis stage in
order to look for the rules that are of the user interest [12].
There are two aspects of rules’ interestingness that have been studied in data mining literature, objective and subjective measures
[46,22,33,32,48,49,30,36,23, 25, 47,31]. Objective measures are data-driven and domain-independent. Generally, these measures
1
Rules are one of the commonly used form of representing the discovered models/patterns [13,46].
evaluate the rules based on the quality as well as the similarity between the rules, rather than considering the user believe about
domain. Subjective measures by contrast, are user-driven and domain-dependent. For example, the user may be involved to specify
rule template, indicating which attribute(s) must occur in the rule to be interesting from his/her point of view [27]. Another example
is that, the user is asked to give a general, high level description of his expectation about the domain, then the system searches for
the only rules which are unexpected to the user [33,32,30,31]
3.2 Objective Measures
Objective measures play a critical role in the different stages of KDD process. In data mining stage, quantitative measures can be
used to reduce the search space. For instance, support measure is used to reduce the number of itemsets to be examined [3,4,5]. In
the evaluation stage of KDD, objective measures are used to select interesting rules from a set of discovered rules
[48,49,20,21,24,42,50,51,53,56,57]. For instance, confidence measure of association rules is used to select only strong rules from a
set of discovered association rules [4]. Furthermore, objective measures are used in the consolidating and acting on discovered
rules. In this phase, these measures are used to quantify the effectiveness and usefulness of discovered rules. For instance, cost,
classification error, and classification accuracy are used to establish such role [19].
Objective measures are based on the structure and statistics of the patterns [46,33, 41,49,14]. Many measures such as confidence,
support, classification error, etc., are defined based on statistical characteristics of rules. Usually using statistical methods are easier
to use. These methods applied to data or rules in order to obtain the nature of relationship between variables (attributes).
These constraints based mining allow the users to determine the interestingness of the rules to be discovered. It should be noted
that both objective and subjective measures are complementary. While objective measure can be used as a kind of first filter to
select interesting rules, the subjective measures can finally be used as a final filter to select the desired interesting rules. Objective
measures will not be studied further, as they are beyond the scope of this paper.
4 Subjective Measures
Subjective measures are based on the subjectivity of the user who examines the patterns such as actionability and unexpectedness
[33,32,48,30,36,31]. This paper studies the subjective interestingness. There are two main subjective measures that have been
studied in data mining literature namely, unexpectedness, and actionability. A rule is unexpected if it contradicts the user belief
about the domain and therefore surprise the user [33,32,37]. A rule is actionable if he/she can do any action to his/her advantage
based on this rule [33,32,1].
Another important subjective measure, which has met less attention in data mining community, is the novelty measure of
discovered rules [18,17,49]. The rule is novel if to some extent contributes to new knowledge.
4.1 Unexpectedness
Unexpectedness of the discovered rules has been studied exhaustively in the literature [41,48,49,30,36,37,38,52,27,23,25,47].
However, [41], [48,49], [30], [36], [37,38], [52] and [39] present different approaches to tackle this measure. [41] studied the issue
of interestingness of the discovered rules in the context of a health care application. KEFIR tries to look for the deviation in the data
and looks at how a relevant action may affect a deviation. The system analyzes health care information to uncover “key findings”.
The interesting rules are provided after measuring the degree of interestingness. The degree of interestingness is estimated by the
amount of benefit when an action is taken. The analyst provides recommendations based on his/her prior knowledge. It will rank all
the rules based on interestingness of the deviation. This system is considered to be a good method for incorporating the domain
knowledge into an application system. However, the system is domain dependent and cannot be used for any other application.
[48,49] studied the subjective interestingness by providing a framework to measure rules’ unexpectedness with respect to the user
belief. They proposed to use a probabilistic belief and belief revision methods. The belief is used for defining the unexpectedness.
A revision method is used to modify the belief confidence when new evidence arrived. A rule is considered unexpected if there is
some change in this belief. In practice, it is difficult to obtain belief information, especially specific domain knowledge. The
approach presented in [30] is based on a syntactic comparison between a discovered rule and a rule in domain knowledge. Both
rules are dissimilar if either the consequents of both rules are similar but the antecedents are “far apart” or the consequents are “far
apart” but the antecedents are similar. Where similarity and dissimilarity are defined based on the structure of the rules. The
problem with this approach is that it does not specify the degree of the unexpectedness and does not consider the case in which both
antecedents and consequents are dissimilar between the discovered rules and rules in domain knowledge.
[36,37,38] proposed a new definition of unexpectedness in term of a logical contradiction of a rule with respect to belief. Given a
rule AÆB and belief XÆY, where both A and X are antecedents and both B and Y are a single atomic conditions, if the rule A Æ
B is unexpected with respect to belief X ÆY, then the rule A, XÆB also holds.
An alternative approach is presented in [52] that proposes autonomous probabilistic estimation method that can discover all rule
pairs (i.e., an exception rule associated with a common sense rule) with high confidence. The approach discovers pairs of rules
AÆB and their corresponding exception A, C Æ B’, where A and C are conjunction of <attribute, value> pairs and B and B’ are
<attribute, value> pairs corresponding to the same attribute but with different values. In addition, the unexpectedness of the
exception rule is defined by an additional constraint that the “reference rule ”C Æ B’ has low confidence. Neither users’ evaluation
nor domain knowledge is required in this approach.
Another approach to measure subjective interestingness requires the user to specify what types of rules that are interesting and
uninteresting. Then, the matching techniques are performed to generate rules taking in the consideration the user belief. [27]
proposes this kind of user belief and uses a template-based approach in which the user specifies a set of interesting and
uninteresting rules using templates. A template describes a set of rules in terms of items occurred in the antecedent and
consequence parts. Finally, the system retrieves the matching rules from the set of discovered rules.
Another methods of quantifying subjective interesting rules are query-based [23, 25,47]. For example M-SQL in [25], DMQL in
[23], and Metaqueries in [47]. These methods look at the process of finding subjective interesting rules as a query-based process.
The user basically specifies a set of rules or constraints on the rules using data mining query. The system, then, finds the rules that
satisfy this query [29]. The drawbacks of query-based approaches are that, they find only those expected rules, which match the
query specified by the user. The real interesting rules, which are unexpected or novel can never be found by these methods.
Furthermore, the user may not be able to determine what is interesting to him/her.
4.2 Actionability
The actionability measure is based on the rules’ benefit to the user, that is, the user can do something to his/her interest
[12,33,32,48,49,30,31,29]. This measure is very important for the rules to be interesting in the sense that the users always looking
for patterns to improve their performance and establishing better work. The users can take some actions in response to the
actionable knowledge. It is therefore important to remember that, one of primary domain of most data mining algorithms is dealing
with business activities. From business point of view, getting information in not desired purely for its own interest. The practical
implication of getting information is to improve the business, that is, the information must ensure the success of business for
decision-making. Making business is an action performed to make business succeed. However, in practice, it is not an easy task to
determine which information is actionable.
[48,49] quantify the actionability in term of unexpectedness. They define unexpectedness as a subjective measure of
interestingness. They show that the most of actionable knowledge is unexpected and most of the unexpected knowledge is
actionable. Since actionability is a subjective measure that is hard to define, they propose that unexpectedness is a good
approximation for actionability. Furthermore, they argue that actionability is a good measure for unexpectedness. Since
unexpectedness is easier to measure than actionability, unexpectedness is the measurement used to address actionability. In [48,49]
subjective interestingness is categorized into three categories:
1. Rules that are both unexpected and actionable,
2. Rules that are unexpected and not actionable, and
3. Rules that are expected and actionable.
They argue that new metric is not needed since 1 and 2 can be handled by finding rules, which are unexpected, and 3 can be
handled by finding rules that conform to the user existing knowledge about domain. In fact that process does not solve the problem
of determining how actionability affects interestingness. The actionability and unexpectedness measures must be addressed
individually and each represented separately in the interestingness measure. If a rule is unexpected and not actionable, this rule is
not as interesting as a rule that is unexpected and actionable. Both rules must be presented with different degrees of interestingness.
In order to measure the actionability, the user needs to be involved. The user can rank each attribute based on his ability to act on
that attribute [12]. It may be possible also to rank a rule based on its actionability.
Actionability in addition can be measured with respect to different data mining algorithms, that is, a set of discovered rules
generated by a particular algorithm is more actionable than those rules discovered by another data mining algorithm. Therefore, the
actionability has to be measured independently, not through measuring the unexpectedness.
4.3 Novelty Measure
A key factor in determining whether a KDD process is successful is whether it provides the user with previously unknown,
useful, and interesting knowledge [17, 48,49]. The term, previously unknown, has been argued to imply interesting [48,49]. This
implies that the interestingness increases as the newness of the knowledge increases and vise-versa. For instance, if the following
discovered rule: age>50 ^ sex=female Æloan=no, then if the user does not know this rule and this rule is not discovered previously,
then novel rule is provided to the user. This is interesting since it increases the user knowledge. However, if the user already knows
this rule, no novel information is provided to the user and the rule is considered uninteresting to the user. Since novelty measure is
based on the users’ feeling and subjectivity of the user about the discovered rules, it is considered subjective measure.
Novelty is very important aspect of KDD process. It can be applied to the different stages of KDD process. In pre-processing
stage, the novelty measure can be used as a filter to select and concentrate on a set of instances that should be given more attention.
It can also be used to determine what features are more important to the learning algorithms and hence focus attention when
something new comes. In the second stage of KDD process, the novelty measure can guide the mining process to form a constraint
in order to discover the only novel rules. In post-processing stage of KDD, this measure can analyze the discover knowledge
objectively and/or subjectively to form a filter that minimize the number of discovered rules which are easier to understand by the
user.
There are many proposals that studied the novelty in other disciplines such as robotics, machine learning and statistical outlier
detection. Generally, these methods build a model of a training set that is selected to contain no examples of the important (i.e.
novel) class. Then, the model built; detect the deviation from this model by some way. For instance, Kohonen and Oja proposed a
novelty filter, which based on computing the bit-wise difference between the current input and the closest match in the training set
[28]. In [55] a sample application of applying association rule learning is presented. By monitoring the variance of the confidence
of particular rules inferred from the association rule learning on training data, it provides information on the difference of such
parameters before and after the testing data entering the system. Hence, with some pre-defined threshold, abnormalities can be
fairly detected.
The techniques that have been proposed in statistical literature are focused on modeling the support of the dataset and then
detecting inputs that don’t belong to that support. The choice of whether to use statistical methods or machine learning methods is
based on the data that is available, the application, and the domain knowledge [35].
To our knowledge no concrete work has been conducted to tackle the novelty measure of data mining. The only work
that has been proposed is detecting the novelty of rules mined from text [6]. In [6], the novelty is estimated based on the lexical
knowledge in WordNet. The proposed approach defines a measure of semantic distance between two words in WordNet and
determined by the length of the shortest path between the two words (wi,wj). The novelty then is defined as the average of this
distance across all pairs of the words (wi,wj), where wi in a word in the antecedent and wj is a word in the consequent. In [59], we
proposed a framework to quantify the novelty in terms of the computing the deviation of currently discovered knowledge with
respect to domain knowledge and previously discovered knowledge. The approach presented in [59] is used as a post analysis filter
in order to discover only novel rules.
5 COMPARISON OF SUBJECTIVE MEASURES
Most existing approaches to measure subjective interestingness require a user to explicitly state what type of knowledge the user
expects. The system then applies some searching techniques to select rules according to the user previous expectation. Most of
these measures concentrate on the unexpectedness and actionability measures as the most influential aspect of rules’
interestingness. However, no general approach was proposed for handling novelty.
Although, the actionability and unexpectedness are important, the rule to be interesting must also be novel. We assume that if
novelty is occurred, this implies explicitly or implicitly that the rule may also be unexpected and/or actionable. Even though the
unexpectedness measure may sound like novelty in some aspects, most of the research on unexpectedness focused on generating
rules, which contradict the user belief about domain [33, 32, 48,49,30,36]. Table 2 shows the subjective measures and their
importance with each other.
Non-Actionable Rule
Novel Rule
Unexpected Rule
Most Interesting
Expected Rule
Not Interesting
Actionable Rule
Most Interesting
More Interesting
Non-Novel rule
Less Interesting
Not Interesting
Less Interesting
Not Interesting
Table 1. Subjective interestingness measures categories
References
1. Adomavicius, G. , Tuzhilin, A., “Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach”, In Proceedings of the
Third International Conference of Knowledge Discovery & Data Mining, The AAAI Press,1997.
2. Adrians, P., Zantiage, D.,”Data Mining”, 1st edition, Addison Wesley, longman, 1999.
3. Agrawal, R. , Mannila, H. , Srikant, R. , Toivonen, H. , Inkeri Verkamo, A., “Fast Discovery of Association Rules”, In Advances in knowledge
discovery and data mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press,1996.
4. Agrawal, R., Imielinski, T., Swami, A., ”Mining Association Rules between Sets of Items in Large Databases”, In ACM SIGMOD Conference
of Management of Data. Washington D.C., 1993.
5. Agrawal, R., Srikant, R., “Fast Algorithms for Mining Association Rules in Large Databases”, In Proceedings of the 20th International
Conference on Very Large Data Bases, Santiago. Chile, 1994.
6. Basu, S., Mooney, R. J., Pasupuleti, K. V., Ghosh, J., ”Using Lexical Knowledge to Evaluate the Novelty of Rules Mined from Text”, In
Proceedings of the NAACL workshop and other Lexical Resources: Applications, Extensions and Customizations, 2001.
7. Brachman, R. J., Anand, T., “The Process of Knowledge Discovery in Databases”, In Advances in Knowledge Discovery and Data mining.
Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, 1996.
8. Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., “Adaptive Constraint Pushing in Frequent Pattern Mining”, In Proceedings of the 17th
European Conference on PAKDD03, 2003.
9. Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., “ExAMiner: Optimized Level-wise Frequent pattern Mining with Monotone
Constraints”, In Proceedings of the 3rd International Conference on Data Mining (ICDM03), 2003.
10. Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., “Exante: Anticipated Data Reduction in Constrained Pattern Mining”, In Proceedings
of the 7th PAKDD03, 2003).
11. Cabena, P., Hadjinian, P., Stadler, R., Verhess, J., Zanasi, A., “Discovering Data Mining from Concepts to Implementation’, New Jersey,
Prentice Hall, 1998.
12. Clair, C., “A Usefulness Metric and its Application to Decision Tree Based Classification”, Ph.D. thesis, School of Computer Science, USA,
1999.
13. Clark, P., Niblett, T., “The CN2 Induction Algorithm”, In Machine learning 3(4), 1989.
14. Dhar, V., Tuzhilin, A., “Abstract-Driven Pattern Discovery in Databases”, In IEEE Transactions on Knowledge and Data Engineering 5(6),
1993.
15. Duda, R. O, Hart, P. E., Stork, D. G., ” Pattern Classification”, 2nd Edition. John Wiley & Sons ( Asia) PV. Ltd, 2002.
16. Dunham M. H., ”Data Mining: Introductory and Advanced Topics”, 1st Edition Pearson Education (Singapore) Pte. Ltd., 2003.
17. Fayyad, U. M., Djorgovski, S. G., Weir, N., “Automating the Analysis and Cataloging of Sky Surveys”,. In Advances in Knowledge
Discovery and Data Mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P,, Menlo Park, CA:AAAI/MIT Press, 1996.
18. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., “From Data Mining to Knowledge Discovery”, In Advances in Knowledge Discovery and
Data Mining. Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, 1996.
19. Freitas, A. A., “On Rule Interestingness Measures”, Knowledge-Based Systems 12, 1999.
20. Gray, B., Orlowska, M. E., “CCAIIA: Clustering Categorical Attributes into Interesting Association Rules”, In Proceedings of the 2nd PacificAsia Conference, PAKDD-98, Lecture Notes in Artificial Intelligence, 1998.
21. Guillaume, S., Guillet, F., Philippé, J., “Improving the Discovery of Association Rules with Intensity of Implication”, In Proceedings of the 2nd
European Symposium, PKDD98, Lecture Notes in Artificial Intelligence, 1998.
22. Han, J., Kamber, M.:, “Data Mining: Concepts and Techniques”, 1st Edition, Harcourt India Private Limited, 2001.
23. Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O., “DMQL: A Data Mining Query Language for Relational Databases”, In Proceedings of
the SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1996
24. Hong, J., Mao, C., “Incremental Discovery of Rules and Structure by Hierarchical and Parallel Clustering. In Knowledge Discovery in
Databases, 1991.
25. Imielinski, T., Virmani, A., Abdulghani, A., “DataMine: Application Programming Interface and Query Language for Database Mining”,
KDD-96, 1996.
26. Janakiramn, V. K., Saurukesi, K., “Decision Support Systems”, 2nd edition, Perntice-Hall,India, 2001.
27. Klemetinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A. I., “Finding Interesting Rules from Large Sets of Discovered
Association Rules”, In Proceedings of the 3rd International Conference on Information and Knowledge Management. Gaithersburg, Maryland,
1994.
28. Kohonen, T., “Self-Organization and Associative Memory”, 3rd edition, Springer, Berlin, 1993.
29. Liu, B., Hsu, W., Chen, S., Ma, Y., “Analyzing the Subjective Interestingness of Association Rules”, IEEE Intelligent Systems, 2000.
30. Liu, B., Hsu, W., “Post Analysis of Learned Rules”, In Proceedings of the 13th National Conference on AI (AAAI’96), 1996.
31. Liu, B., Hsu, W., Lee, H-Y., Mum, L-F., “Tuple-Level Analysis for Identification of Interesting Rules”, In Technical Report TRA5/95, SoC.,
National University of Singapore, Singapore, 1996.
32. Liu, B., Hsu, W., ”Finding Interesting Patterns Using User Expectations”, DISCS Technical Report, 1995.
33. Liu, B., Hsu, W., Chen, S., “Using General Impressions to Analyze Discovered Classification Rules”, In Proceedings of the 3rd International
Conference on Knowledge Discovery and Data mining (KDD 97), 1997.
34. Luger, G. F., “Artificial Intelligence: Structure and Strategies for Complex Problem Solving”, 4th Edition, Pearson Education Ltd.,Delhi, India,
2002.
35. Marsland, S., “On-Line Novelty Detection Through Self-Organization, with Application to Robotics”, Ph.D. Thesis, Department of Computer
Science, University of Manchester, 2001.
36. Padmanabhan, B., Tuzhilin, A., “Unexpectedness as a Measure of Interestingness in Knowledge Discovery”, Working paper # IS-97-. Dept.
of Information Systems, Stern School of Business, NYU, 1997.
37. Padmanabhan, B., Tuzhilin, A., “A Belief-Driven Method for Discovering Unexpected Patterns”, KDD-98, 1998
38. Padmanabhan, B., Tuzhilin, A., “Small is Beautiful: Discovering the Minimal Set of Unexpected Patterns”, KDD-2000, 2000.
39. Patterson, D. W., “Introduction to Artificial Intelligence and Expert Systems”, 8th Edition, Prentice-Hall, India, 2000.
40. Pei, J., Han, J., “Can We Push More Constraints into Frequent Pattern Mining”, In Proceeding of the 6th ACM SIGKDD, 2000.
41. Piatetsky-Shapiro, G., Matheus, C. J., “The Interestingness of Deviations”, In Proceedings of AAAI Workshop on Knowledge Discovery in
Databases, 1994.
42. Piatetsky-Shapiro, G., “Discovery, Analysis, and Presentation of Strong Rules”, In Knowledge Discovery in Databases, The AAAI Press,
1991.
43. Psaila, G., “Discovery of Association Rules Meta-Patterns”, In Proceedings of 2nd International Conference on Data Warehousing and
Knowledge Discovery (DAWAK99), 1999).
44. Pujari , A. K., “Data Mining Techniques”, 1st Edition, Universities Press (India) Limited, 2001.
45. Pyle, D., “Data Preparation for Data Mining”, Morgan Kaufmanns, San Francisco, CA, USA, 1999.
46. Quinlan, J. R”, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmanns, 1993.
47. Shen, W-M., Ong, K-L., Mitbander, B., Zaniolo, C., “Metaqueries for Data Mining”, In Advances in Knowledge Discovery and Data Mining,
Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, 1996.
48. Silberschatz, A., Tuzhilin, A., “On Subjective Measures of Interestingness in Knowledge Discovery”, In Proceedings of the 1st International
Conference on Knowledge Discovery and Data Mining, 1995.
49. Silberschatz, A., Tuzhilin, A., “What Makes Patterns Interesting in Knowledge Discovery Systems”, IEEE Trans. and Data Engineering. V.5,
no.6, 1996.
50. Smyth, P., Goodman, R. M., “Rule Induction Using Information Theory”, In Knowledge Discovery in Databases, 1991.
51. Suzuki, E., Kodratoff, Y., “Discovery of Surprising Exception Rules Based on Intensity of Implication”, In Proceedings of the 2nd European
Symposium, PKDD98, Lecture Notes in Artificial Intelligence, 1998.
52. Suzuki, E., “Autonomous Discovery of Reliable Exception Rules”, In Proceedings of The 3rd International Conference on Knowledge
Discovery and Data Mining, Newport Beach, CA, USA, 1997.
53. Wang, K., Tay, S.H.W., Liu, B., “Interestingness-Based Interval Merger for Numeric Association Rules”, In Proceedings of the 4th
International Conference on Knowledge and Data Mining, 1998.
54. Williams, G. J., “Evolutionary Hot Spot Data Mining: An Architecture for Exploring For Interesting Discoveries”, In Proceeding of the 3rd
PAKDD99, 1999.
55. Yairi, T., Kato, Y., Hori K., “Fault Detection by Mining Association Rules from House-keeping Data”, In Proceedings of International
Symposium on Artificial Intelligence, Robotics and Automation in Space (SAIRAS 2001), 2001.
56. Yao, Y. Y., Liau, C. J., “A generalized Decision Logic Language for Granular Computing”, FUZZ-IEE on Computational Intelligence, 2002.
57. Yao, Y. Y., Zhong, N., “An Analysis of Quantitative Measures Associated with Rules”, In Proceedings of PAKDD, 1999.
58. Al-Hegami, A. S., “Interestingness Measures for KDD: A Comparative Analysis”, In Proceedings of the 11th International Conference on
Concurrent Engineering: Research and Applications, Beijing, China, 2004, pp 321-326.
59. Al-Hegami, A. S., Kumar, N., Bhatnagar, V., “ Novelty Framework for Knowledge Discovery in Databases”, In Proceedings of the 6th
International Conference on Data warehousing and Knowledge Discovery (DaWak 2004), Zaragoza, Spain, 2004, pp 48-55.