Download Closed Pattern Mining for the Discovery of User Preferences in a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Closed Pattern Mining for the Discovery of
User Preferences in a Calendar Assistant
Alfred Krzywicki and Wayne Wobcke
School of Computer Science and Engineering
University of New South Wales
Sydney NSW 2052, Australia
{alfredk|wobcke}@cse.unsw.edu.au
Abstract. We use closed pattern mining to discover user preferences in appointments in order to build structured solutions for a calendar assistant. Our choice
of closed patterns as a user preference representation is based on both theoretical
and practical considerations supported by Formal Concept Analysis. We simulated interaction with a calendar application using 16 months of real data from
a user’s calendar to evaluate the accuracy and consistency of suggestions, in order to determine the best data mining and solution generation techniques from a
range of available methods. The best performing data mining method was then
compared with decision tree learning, the best machine learning algorithm in this
domain. The results show that our data mining method based on closed patterns
converges faster than decision tree learning, whilst generating only consistent solutions. Thus closed pattern mining is a better technique for generating appointment attributes in the calendar domain.
Keywords: Data mining, closed patterns, Formal Concept Analysis, calendar assistants.
1 Introduction
We are interested in the problem of providing automated assistance to the user of a calendar system to help in defining appointments. In a calendar application, the user may
initially specify some of an appointment’s attributes (e.g. title and day), and the task of
the system is to suggest any or all of the remaining attributes (e.g. time and location).
What makes this problem difficult is that both the set of attributes given initially by the
user and the set of attributes that can be suggested are not fixed; some appointments
may contain only basic information such as the title, time, date and duration, while
others may have additional attributes specified, such as the location, the attendees, etc.
Furthermore the attributes mutually constrain one another.
A calendar appointment can be regarded as a structured solution. A problem requires a structured solution if the solution has the form of a set of components that are
constrained by other components in the solution. In many practical systems, the challenge of building a consistent solution is solved by defining a set of rules that describe
the constraints between components, McDermott [8]. In the calendar domain, the solution “components” are attributes with their values, and the “constraints” are provided
by a model of the user’s preferences. Additional constraints affecting the solution are
the presence of other appointments, dependencies between attributes and other user
knowledge not directly represented in the calendar system. For example, an appointment time and duration may depend on the availability of attendees and the meeting
location. These dependencies are not given explicitly, but may be represented in the
form of patterns.
In this paper, we investigate the use of closed pattern mining to discover user preferences over calendar appointments and to build structured solutions for a calendar
assistant. Traditionally the aim of data mining is to discover association rules, Agrawal
and Srikant [1]. We found, however, that mining association rules is not the most suitable method for applications with real-time user interaction, due to the potentially large
number of frequent patterns and the number of rules that can be generated from each
pattern. In contrast, the number of closed frequent patterns can be an order of magnitude smaller than the number of frequent patterns. In fact, all frequent patterns can be
generated from a complete set of closed frequent patterns. The data mining algorithm
used in this paper is based on the FP-Growth algorithm introduced by Han, Pei and
Yin [6] and implemented by Coenen, Goulbourne and Leng [3]. In order to find closed
frequent patterns, we filter out all non-closed patterns as they are computed by the FPGrowth method. Details of the pattern mining algorithm can be found in Section 3.
Discovered frequent patterns are treated as possibly inconsistent fragments of different solutions that need to be integrated into consistent suggestions before presenting
them to the user. We found that it is best to create solutions only from non-conflicting
patterns, which makes generated solutions less likely to conflict with user preferences.
The method for pattern selection and the support required for pattern generation
were determined from the results of simulated user sessions. The simulation enabled us
to compare the accuracy of our appointment prediction method with the best machine
learning technique, decision tree learning, on realistic calendar data taken from a user’s
diary for a 16 month period. We present the results of the comparison and discuss some
advantages of our pattern mining approach over decision tree learning
The remainder of this paper is organized as follows. In the next section, we provide the formal framework for the problem of generating structured solutions in the
calendar domain. Section 3 describes our data mining and solution generation method,
which is evaluated and compared with other methods in Section 4. Section 5 contains a
discussion of related research.
2 Formal Problem Statement
This section provides definitions specific to the problem of closed pattern mining for
generating structured solutions in the calendar domain.
Definition 1. Let A = {a1 , a2 , . . . , an } be a set of n attributes used in all appointments.
Let each attribute ai have a set of values Vai specific for the domain of the attribute.
For example, Vday = {Sunday, Monday, . . . , Saturday}. A feature is an attribute-value
pair {ai , vij }, where vij is an element of Vai . The set of all features is denoted I.
Definition 2. A data case or case is a nonempty set of features stored in the database
of cases, e.g. {{ai1 , vi1 j1 }, . . . , {aim , vim jm }}. An attribute may appear in a case at
most once and may not occur at all.
For example, a single appointment stored in the calendar database is a data case.
Definition 3. A solution is a potential data case created by the system. A number of
solutions can be selected by the system from a set of solutions and presented to the user
as suggestions for consideration.
Definition 4. A pattern is any part of a data case, a set of features, containing at least
one feature. Solutions/cases may contain more than one pattern.
Definition 5. Two features are overlapping if they have the same attribute.
Definition 6. Two features are conflicting if they have the same attribute with different
values.
Definition 7. Two patterns are conflicting if they contain at least one pair of conflicting
features.
Definition 8. Two patterns are overlapping if they contain overlapping features.
We also call two conflicting patterns inconsistent. It is worth noting that conflicting features/patterns are always overlapping, therefore the “no overlap” condition is stronger
than the “no conflict” condition in the solution generation algorithms below.
The underlying theory of closed patterns is based on Formal Concept Analysis,
Wille [10]. Pasquier et al. [9] extended the theory and introduced the idea of closed
patterns, applying Formal Concept Analysis to data mining. The key terminology of
this theory is summarized below, slightly adjusted for consistency with the above definitions.
Definition 9. A data mining context is a triple D=hO, I, Ri, where O is a set of objects,
I is a set of features and R ⊆ O × I is a binary relation between objects and features.
The fact that object o has feature i can be expressed as (o, i) ∈ R.
Definition 10. Let D=hO, I, Ri be a data mining context and let O ⊆ O, I ⊆ I. The f
and g functions map powersets 2O →2I and 2I →2O respectively:
f (O) = {i ∈ I|∀o ∈ O, (o, i) ∈ R}
(1)
g(I) = {o ∈ O|∀i ∈ I, (o, i) ∈ R}
(2)
Less formally, f maps a set of objects into a set of features common to those objects.
Similarly, g maps a set of features into a set of objects containing all those features.
Definition 11. The functions h = f ◦ g, i.e. h(I) = f (g(I)), and h′ = g ◦ f , i.e.
h′ (O) = g(f (O)), are Galois closure operators.
Definition 12. Let I ⊆ I be a set of features. I is a closed pattern iff h(I) = I.
It follows from the last two definitions that a closed pattern is a maximal set of features
common to a given set of objects. We regard each mined closed pattern as an implicit
user preference. This mapping between closed patterns and user preferences proved to
be very useful in data mining for supporting appointment attribute suggestion in the
calendar domain.
3 Pattern Mining for Generating Structured Solutions
This section provides a summary of the closed pattern mining method based on the
FP-Tree algorithm of Han, Pei and Yin [6], and our approach to generating structured
solutions.
3.1
Mining Closed Frequent Patterns
Closed frequent patterns are mined in two steps: 1) build an FP-Tree from the database,
and 2) retrieve frequent patterns from the FP-Tree, filtering out all non-closed patterns.
In the method implemented by Coenen, Goulbourne and Leng [3], frequent patterns
are mined using the FP-Growth algorithm and then stored in a T-Tree structure (Total
Support Tree), which also stores the support calculated for all frequent patterns. In our
implementation, we store only closed frequent patterns in the T-Tree, which provides
fast access to the set of closed frequent patterns. In the first step, an FP-Tree is constructed from the database of past cases using the original FP-Growth method. In the
second step, all closed frequent patterns are extracted from the FP-Tree and stored in a
T-Tree.
In order to filter out non-closed patterns we use the following property, due to
Pasquier et al. [9]: if I is any pattern, then support(I) = support(h(I)). Thus the
support of any pattern is the same as the support of the smallest closed pattern containing it. Therefore any frequent pattern properly contained in the smallest closed pattern
containing it is not a closed pattern. This means we can use the following simple algorithm to filter out non-closed patterns.
Algorithm 1 (Finding closed patterns)
1
T-Tree = {}
2
while not last frequent pattern
3
FrPat = GetFrPatFromFP-Tree()
4
SmallestClosedFrPat = FindSmallestClosedPatContaining(FrPat, T-Tree)
5
if (SmallestClosedFrPat does not exist)
6
or (SmallestClosedFrPat.Support 6= FrPat.Support)
7
T-Tree = Add(FrPat, T-Tree)
8
end
9
end
10
Output(T-Tree)
The algorithm searches the T-Tree for a smallest closed pattern (line 4) containing the
pattern collected from FP-Tree (line 3). If such a pattern is found and it has the same
support as the original FP-Tree pattern, it is discarded, otherwise the pattern is stored in
the T-Tree (line 7). The original FP-Tree mining algorithm has been modified in such a
way that larger patterns are always mined before smaller ones, which enables the above
algorithm to discover all closed frequent patterns.
3.2
Generating Solutions
Patterns found in the data mining process are used as building blocks to construct calendar appointment solutions. Individual patterns may complement one another, conflict or
overlap (as defined in Section 2). In order to generate useful suggestions, we aim to efficiently find solutions that make use of as many patterns as possible. The algorithm presented below uses the “no conflict” method for pattern selection (for the “no-overlap”
method, lines 8 and 17 need to be modified).
The following algorithm is not guaranteed to find all possible solutions, though it
has been experimentally verified to provide sufficient time performance and solution
quality. The algorithm first computes the set of all patterns that do not conflict with, but
have at least one common feature with, the initial user features. The algorithm heuristically finds subsets of these patterns jointly consistent with the initial user features; each
such set is heuristically extended to one maximal set of non-conflicting features.
Algorithm 2 (Generating user suggestions)
1 Input: InitFeatures, ClosedPatterns
2 Output: Solns
3
Solns = {}
4
InitSoln = MakeInitSoln(InitFeatures)
5
InitSoln.Patterns = {}
6
PatternList = {}
7
for each Pattern in ClosedPatterns
8
if not Conflicting(Pattern, FeatureList)
9
and HasCommonFeature(Pattern, FeatureList)
10
Add(Pattern, PatternList)
11
end
12
end
13
UnusedPatterns = PatternList
14
while UnusedPatterns.Size > 0
15
Soln = InitSoln
16
for each Pattern in UnusedPatterns
17
if not Conflicting(Pattern,Soln.Patterns)
18
Soln = Update(Soln, Pattern)
19
Soln.Patterns = Add(Pattern,Soln.Patterns)
20
end
21
end
22
for each Pattern in PatternList
23
if not Conflicting (Pattern,Soln.Patterns)
24
Soln = Update(Soln, Pattern)
25
Soln.Patterns = Add(Pattern,Soln.Patterns)
26
end
27
end
28
for each Pattern in Soln.Patterns
29
UnusedPatterns = Delete(Pattern,UnusedPatterns)
30
end
31
Solns = Add(Soln, Solns)
32
end
As an example, suppose the initial features are as follows:
Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”
Suppose the existing closed frequent patterns are as follows:
P1. Category=”Team Meeting”, Period=”Semester”, AmPm=”am”, Time=1030
P2. Category=”Team Meeting”, Period=”Break”, AmPm=”pm”
P3. Category=”AI Lecture”, Period=”Semester”, AmPm=”pm”, Time=1500
P4. AmPm=”pm”, Day=”Wednesday”, Attendees=”Anna, Alfred, Rita, Wayne”
P5. Period=”Semester”, AmPm=”am”, Time=1030, Day=”Monday”,
Attendees=”Anna, Alfred, Wayne”
P6. Category=”Team Meeting”, Day=”Wednesday”
The initial solution (line 4) is just the initial set of features entered by the user. Since
patterns P2 and P3 are conflicting and P4 has no common features with the initial solution, the PatternList and UnusedPatterns sets (line 13) contain only patterns P1, P5 and
P6. Solutions always start with an initial user solution. A new solution is generated in
lines 16–27. Since the initial solution has no associated patterns, P1 is added and the
solution becomes:
Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”, AmPm=”am”,
Time=1030
In the next iteration (lines 16–21), P5 is evaluated and, since it is not conflicting, is also
added to the solution, which becomes:
Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”, AmPm=”am”,
Time=1030, Day=”Monday”, Attendees=”Anna, Alfred, Wayne”
Next P6 is evaluated and rejected as conflicting with this solution. The procedure then
continues to add patterns from the PatternsList set (lines 22–27), but there is nothing
new to add at this stage. Therefore it becomes the first solution and UnusedPatterns
is updated. UnusedPatterns is still not empty, so the solution generation iterates again,
this time adding P6 to the initial solution, generating the second solution, as follows:
Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”,
Day=”Wednesday”
4 Experimental Evaluation
In this section, we describe our experimental framework for evaluating the closed pattern mining approach for solution generation using simulations over data extracted from
a single user’s calendar. This gave us around 1000 cases of real calendar data (about 16
months of appointments).
4.1
Method
The simulation was conducted in two stages. In the first stage, we compared two methods of appointment solution generation used in conjunction with closed pattern mining:
the “no conflict” and the “no overlap” methods. This confirmed the superiority of the
“no conflict” approach, which is the algorithm presented above. In the second stage,
we compared these results with those generated using decision tree learning, the best
performing machine learning method.
The simulator runs real calendar data through the solution generation system in a
manner resembling interaction with the real user of a calendar system. The calendar
data used for the simulation had 8 attributes: Category, Period, Attendees, Location,
Duration, AmPm, Day and Time. The simulation was conducted as follows. The “user”
(which means “simulated user”) enters case n, which is stored in the database and data
mining on all past cases is performed. Then the “user” enters the first three attributes
of case n + 1 as initial attributes, which are always assumed to be the Category, Period
and Attendees (this is the typical behaviour of actual users based on our informal observation). The system produces a number of suggestions out of which the “user” selects
one closest to case n + 1. Differences are then calculated between the best suggestion
and the real case n + 1. These differences reflect the number of modifications the “user”
needs to make to turn the suggestion into case n + 1. The “user” needs to either add a
missing feature or delete one which is not required, therefore each difference is counted
as 1. These differences are then averaged over a number of data cases. For compatibility with the decision tree learning method, as explained further in this section, the
simulator produces 32 suggestions.
The machine learning part of the simulation was done using the C4.5 decision tree
algorithm implemented in the Weka toolkit [11], called J48. The method was selected
by testing the performance of a range of machine learning algorithms on the calendar
data for various combinations of attributes. The tested algorithms were rule induction
(OneR), decision tree learning (J48), Bayesian methods (Naive Bayes, BayesNet), knearest neighbour (IBk) and case based reasoning (KStar). The best performing, J48,
was then used on five calendar data sets, each to predict one of the five attributes of case
n + 1 not specified by the “user” (i.e. the Location, Duration, AmPm, Day and Time).
The predicted values were put together to make a set of complete calendar appointments
as in the data mining method. So that the decision tree learning methods could generate
a number of alternative solutions, we had to combine a number of suggested values for
each attribute. This was achieved by modifying the Weka code so that each prediction
consisted of two different values rather than one. For the five attributes to be predicted
this was equivalent to 25 = 32 possible solutions for each appointment.
4.2
Results
We first present the results comparing the “no conflict” and “no overlap” methods used
with closed pattern mining, shown in Figure 1.
The difference between the two methods shows that the “no conflict” method produces significantly better results. This can be explained by the way solutions are created.
It is generally easier to find overlapping patterns in a solution than non-overlapping,
hence the former method creates a higher number and variety of solutions.
One of our objectives in evaluating machine learning methods for generating solutions was to compare our system with CAP, Dent et al. [4]. Although a direct comparison was not possible, the method (ID3 for CAP) and overall results are similar. The
accuracy achieved by decision tree learning on our data set is shown in Figure 2. These
results broadly confirm those reported in the original experiments with CAP, e.g. accuracy for location is close to 70% after around 150 cases. Note that our experiments
compensate for a deficiency in the experimental setup with CAP in that a parameter
Fig. 1. Average accuracy of appointment prediction for two methods: “no conflict” (thick line)
and “no overlap” (normal line).
Fig. 2. Decision tree learning prediction results of calendar appointments. The thick line shows the
overall average accuracy of appointment prediction, the continuous line is the average appointment date prediction and the dotted line reflects the number of inconsistent values in predicted
data (AmPm and Time).
specific to the time period (e.g. semester, break) is included, which means the average
accuracy fluctuates much less than in the original CAP evaluation. However, our results also show a greater fluctuation of average accuracy for decision tree learning than
with closed pattern mining. We suspect that this could be because at certain points in
the simulation, the decision tree is restructured, resulting in a loss of accuracy, whereas
the pattern mining approach produces smoother behaviour over time. Additionally, unlike the pattern mining method, where values within one case are created from nonconflicting patterns, the decision tree learning method predicts values separately for
each attribute. In consequence, it is possible that some associated attributes may conflict, e.g. AmPm=am and Time=1500. In effect, the system has to choose randomly
between the two options to resolve the conflict, meaning that the date suggestion is often wrong (our simulation arbitrarily chooses the value of Time to resolve the conflict,
and the date is determined from the next free time slot satisfying the chosen solution).
The chart in Figure 2 provides some confirmation of this explanation, where a low average date prediction accuracy corresponds roughly to a high number of inconsistencies
between AmPm and Time.
Comparison of Figure 2 with Figure 1 shows that, although the average accuracy
of prediction is similar for the two methods (closed pattern mining 69%, decision tree
learning 68%), closed pattern mining gives significantly better prediction in the first 200
cases. More specifically, the closed pattern mining method reaches its average accuracy
after only 62 cases, whereas the decision tree learning method reaches its average after
224 cases, staying about 10 percent lower in the first 200 cases. This is an important
difference for interactive calendar users, who would clearly prefer useful suggestions in
a shorter period of time (this corresponds to roughly 1 month for closed pattern mining
vs. 3 months for decision tree learning). Moreover, decision tree learning prediction is
less stable, showing greater sensitivity to user preference changes in transition periods.
5 Related Work
As far as we know, there are no calendar applications supported by pattern mining,
however there are examples of research where some kind of machine learning has been
applied. As described above, the CAP system, Dent et al. [4], provides suggestions for
various appointment attributes. Two methods for predicting attributes were compared:
backpropagation neural networks and ID3. Their results showed that for the Location
attribute, around 70% accuracy was achieved by both learning methods after sufficient
training. As described above, our experiments broadly confirm this result in the case
of decision tree learning, though over the whole set of predicted attributes (not only
Location). As also mentioned above, CAP predicts each attribute of the appointment
separately, which may result in inconsistent appointment solutions when these predictions are combined.
Another preference learning calendar assistant is described by Berry et al. [2]. Their
system, PCalM, is a framework designed to schedule meetings in the open calendar environment. Instead of learning to predict individual appointment attributes, as in CAP,
PCalM learns to rank candidate appointments from pairwise selections provided by
the user. Unlike our calendar system, designed to build and present suggestions unobtrusively, PCalM forces the user to choose amongst suggestions in order to generate
training data for learning the preference function. Furthermore, similar to our method,
PCalM has been evaluated using simulated user interactions, however the data used in
the PCalM evaluation is synthetically generated, while we have used appointments from
a user’s real calendar, providing a more realistic data set for experimentation.
6 Conclusion
We have designed and evaluated a structured solution builder with data mining support
for generating suggestions for calendar appointments. Closed patterns proved to be a
suitable alternative to association rules due to their compactness and flexibility. Moreover, pattern mining has an advantage over single class machine learning methods in
that it better supports creating multiple solutions with consistent structures. We simulated user interaction with real calendar data to configure and tune data mining and
appointment solution generation methods. Our results show the superiority of closed
pattern mining to decision tree learning, the best performing machine learning algorithm in this domain.
We believe that concept based data mining for building structured solution can be
applied to other configuration domains. Due to the fact that cases are added to the
system incrementally, it might be possible to use incremental data mining methods in
conjunction with the FP-Growth algorithm, similar to the approaches of Ezeife and
Su [5] and Koh and Shiehr [7].
Acknowledgments. This work was funded by the CRC for Smart Internet Technology. We would like to thank Paul Compton for making his calendar data available
for research and Frans Coenen for his open source implementation of the FP-Growth
algorithm.
References
1. Agrawal, R. and Srikant, R. Fast Algorithms for Mining Association Rules. In Proceedings
of the 20th Conference on Very Large Data Bases, pp. 478–499, 1994.
2. Berry, P. M., Gervasio, M., Uribe, T., Myers, K. and Nitz, K. A Personalized Calendar
Assistant. In Proceedings of the AAAI Spring Symposium on Interaction between Humans
and Autonomous Systems over Extended Operation, 2004.
3. Coenen, F., Goulbourne, G. and Leng, P. Tree Structures for Mining Association Rules. Data
Mining and Knowledge Discovery, vol. 8, pp. 25–51, 2004.
4. Dent, L., Boticario, J., Mitchell, T. M. and Zabowski, D. A. A Personal Learning Apprentice.
In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), pp.
96–103, 1992.
5. Ezeife, C. I. and Su, Y. Mining Incremental Association Rules with Generalized FP-Tree.
In Cohen, R. and Spencer, B., editors, Advances in Artificial Intelligence, pp. 147–160,
Springer-Verlag, Berlin, 2002.
6. Han, J., Pei, J. and Yin, Y. Mining Frequent Patterns without Candidate Generation. In
Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data,
pp. 1–12, 2000.
7. Koh, J.-L. and Shiehr, S.-F. An Efficient Approach for Maintaining Association Rules Based
on Adjusting FP-Tree Structures. In Lee, Y., Li, J., Whang, K.-Y. and Lee, D., editors,
Database Systems for Advances Applications, pp. 417–424, Springer-Verlag, Berlin, 2004.
8. McDermott, J. R1: A Computer-Based Configurer of Computer Systems. Artificial Intelligence, vol. 19, pp. 39–88, 1982.
9. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. Efficient Mining of Association Rules
Using Closed Itemset Lattices. Information Systems, vol. 24, pp. 25–46, 1999.
10. Wille, R. Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies. In Ganter, B., Stumme, G. and Wille, R., editors, Formal Concept Analysis, pp.
1–23, Springer-Verlag, Berlin, 2005.
11. Witten, I. H. and Frank, E. Data Mining. Morgan Kaufmann, San Francisco, CA, 2005.