Download doc - MAPEKUS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Transcript
A
LogAnalyzer – Log Analyzer
A.1
Basic Information
Personalization is becoming more important if we want to preserve the effectiveness of
work with information, providing larger and larger amount of the content. Systems are
becoming adaptive by taking into account characteristics of their users.
LogAnalyzer supports personalization of web-based systems by discovering user
characteristics. It processes the user activity log acquired by SemanticLog tool,
identifies meaningful user characteristics and updates the user model to reflect the
newly gained knowledge. It is focused on the evaluation of user navigation in the
available information space.
A.1.1 Basic Terms
User characteristic Any piece of information concerning the user which could
be used to provide personalization (e.g., goals, interests,
knowledge).
URI
Uniform Resource Identifier, a compact string of characters
used to identify or name a resource.
A.1.2 Method Description
LogAnalyzer uses rule-based approach to analysis of user logs. The analysis process is
depicted on Fig. 1. Externally defined rules represent heuristics, which link interesting
navigational patterns with sets of changes to be applied on the user model if an
occurrence of the pattern is detected.
LogAnalysis
Intermediate
results
Domain
model
references
2
User
actions
1
Pattern
detection
3
User model
update
User model
3
1
Patterns
4
Changes
Rules
Fig. 1. Overview of user characteristics acquisition process. Data from
presentation tools and client-side logging are stored in a database of user actions.
LogAnalyzer tries to detect (1) occurrences of predefined patterns and optionally
stores intermediate results (2). The heuristics associated with the detected pattern
(3) predict the update of characteristics stored in a user model (4).
The main benefits of the used approach are its re-usability and flexibility. We can easily
change the behavior of the process by changing underlying heuristics.
A.1.3 Scenarios of Use
LogAnalyzer can be used in the following scenarios:
 Navigational model of a web-based application allows individual users to navigate
freely in the information space (so user behavior can be influenced by user
characteristics).
 Presentation layer of a web application is able to perform adaptation of content
and/or navigation according to characteristics stored in a user model.
LogAnalyzer should not be used in following cases:
 Unsuitable navigational model (e.g., strictly sequential without possibilities to
change the order of displayed pages).
External Links and Publications
 Tvarožek, M., Barla, M., Bieliková, M.: Personalized Presentation in Web-based
information Systems. In van Leeuwen, J. et al., (Eds.): SOFSEM 2007 , Springer,
LNCS 4362, Harrachov, ČR, pp. 796-807, 2007.
 Tvarožek, M., Barla, M., Bieliková, M.: Personalized Recommendation of
Browsing in Large Information Spaces with Semantics. In: Sobecki, J. (Ed.):
Special Issue of New Generation Computing – Web-based Recommendation
Systems Technologies and Applications, Vol.26, No.3 May 2008 submitted.
 Commons Configuration, Java configuration API, Apache Software Foundation,
(http://commons.apache.org/configuration/)
 Hibernate, Relational Persistence for Java, Red Hat Middleware, LLC
(http://www.hibernate.org/)
 Log4J,
Java-based
logging
utility,
(http://logging.apache.org/log4j)
Apache
Software
Foundation.
 ITG, Integration technology developed within NAZOU project
 OntoCM, Java-based connector to ontological part of corporate memory developed
within NAZOU project
 SemanticLog, Java-based logging service developed within NAZOU project
 UserLogs, java based library providing object-oriented representation of user
actions records
A.2
Integration Manual
LogAnalyzer is developed in Java (Standard Edition 5) and distributed as a jar archive.
Access to the functionality of the tool is provided through Java Interface. LogAnalyzer
is not a stand-alone application; the tool is proposed to be included in other
application/tool, which will call the LogAnalyzer interface methods (like SemanticLog
tool).
A.2.1 Dependencies
LogAnalyzer uses these external tools and libraries:
 Commons Configuration – for processing of xml-based configuration files.
 Hibernate – for accessing log records and storing of intermediate results
 OntoCM – for accessing and updating ontology-based user model.
 ITG – for accessing configuration files stored in a common folder
 UserLogs
 Log4J logging utility
A.2.2 Installation
Deploying LogAnalyzer into other application requires the following steps (any Java
Integrated Development Environment should be used):
1. LogAnalyzer as well as all external
project.
2.
hibernate.cfg.xml
3. Files
jar
archives must be included into existing
and log4j.properties files must be included in the classpath.
LogAnalyzer.properties,
UserModelUpdater.properties,
UserCharacteristics.properties, OntoMem.properties
and file containing set of
rules for LogAnalyzer must be included in the LogAnalyzer directory in the
directory holding common configuration files (see ITG for further details).
4. Integration technology ITG must be deployed correctly – i.e.,
bootstrap.properties file must be placed in the classpath.
5. Database
UserLogs
of
SemanticLog
createTables_LogAnalyzer.sql script.
tool
should
be
Nazou-
updated
by
A.2.3 Configuration
Several configuration files need to be set in order to allow LogAnalyzer working
properly:
LogAnalyzer.properties

startTime
– contains the date and time when the LogAnalyzer was invoked the last
time.

rules

configDigest
– points to an xml configuration file containing the rules the LogAnalyzer
should work with.
– contains the MD5 digest of the rules configuration file. If the set of
LogAnalyzer’s rules changes, the tool omits all intermediate results held currently
in relational database.
UserCharacteristics.properties
Each property is a type of user characteristic identified by its URI while its value is
a full class name (including the package name) of a class which provides the tool with
access to the characteristics of the given type.
UserModelUpdater.properties

changeCharacteristicsFromOtherTools

sourceProperty
– boolean value determines whether the
tool is allowed to change characteristics with their source different to LogAnalyzer.
– URI of a property, which connects a characteristic with its
source.

sourceInstance
– URI of an instance representing LogAnalyzer as a source of user
characteristics.

predicateToDomainDepUser

domainIndepNamespace

domainDepNamespace

countOfUpdatesProperty
– URI of a property linking domain dependent user
from a domain independent instance.
– namespace of a domain independent part of user model.
– namespace of a domain dependent part of user model.
– URI of a property linking a characteristic with its count
of updates.

timeStampProperty
– URI of a property linking a characteristic with its timestamp
of last update.

confidenceProperty
value.
– URI of a property linking a characteristic with its confidence

relevanceProperty
– URI of a property linking a characteristic with its relevance
value.

– URI of a property linking a domain
dependent user instance to a characteristic instance.
characteristicConnectionToDomainDepUser
OntoMem.properties
Contains configuration of OntoCM tool to access an ontological repository. Could be
left blank if the tool should use the global configuration from Nazoucommons.properties.
Rule configuration file
An xml-based configuration file contains the root element <rules> which encapsulates
<rule> elements, each having an id attribute (URI). Each rule has two children:
<sequence> and <subsequences> elements.

element represents a sequence of event and sub-sequences in the a
pattern. It can define contextual conditions in child <context> element. Sequence
can also have <sequence> and <event> as children elements and have following
attributes:
<sequence>
– serves for cross-referencing purposes within the configuration file and
internal processing, should be unique through whole configuration file.
o
id
o
count-of-occurrence
o
isContinuous
– an integer value defining the required count of
occurrence of a sequence within a pattern. If a value is set to -1, it means the
sequence is optional (i.e., not strictly required for the pattern to be found in the
log).
– boolean value defines whether a sequence is optional or not, i.e.,
whether the events of the sequence must strictly follow each other or not.

<context>

<event>

element (as child of a sequence element) is used to restrict events
mapped to the sequence to those fulfilling the restriction on an attribute of certain
type. The displayed item is specified in <typeOfDisplayedItem> child element and
its type of attribute in <attribute> child element.
element represent a single event in the event log. An event can define
several contextual conditions represented by multiple children elements <context>.
Event element has following attributes:
– serves for cross-referencing purposes within the configuration file and
internal processing, should be unique through whole configuration file
o
id
o
type
– represents type of an event as it is stored in the log database (typically
identified by its URI)
element (as a child of an event element) is used to express relation of
current event to the previous one by value of some attribute. It has an attribute
type, which can be either http://fiit.stuba.sk/loganalyzer#SameAsPrevious or
http://fiit.stuba.sk/loganalyzer#DifferentThanPrevious.
Next,
an
<attribute> children element defines the type of an event attribute on which the
contextual condition is applied
<context>

element wraps multiple <change> elements representing change of
one characteristic. Each change relates to one type of characteristic, which is
expressed by one <characteristic> sub-element where name attribute defines the
type of characteristic. Next, <change> element can have multiple child elements
<property> concerning properties of a characteristic. Each property has a name
attribute which contains URI of the property in used ontology. Property can be of
three different types, distinguished by a type attribute:
<consequences>
o Referencing property – whose value is a value of an event or displayed item
attribute(s)
o Used property – whose value is stated directly in the configuration file
o Processed property – whose value is computed using instruction present in the
configuration file
o
element can have several different attributes and child elements
according to its type. See javadoc for further details.
<property>
A.2.4 Integration Guide
LogAnalyzer is invoked when processing of events stored in user logs is required. It is
done by calling signal method of ILogAnalyzer interface. The tool upon its calling
processes all events with timestamp between actual time and the time of the last run of
LogAnalyzer. This time is maintained in startTime property of LogAnalyzer.properties
file and can be changed manually if needed.
A.3
Development Manual
A.3.1 Tool Structure
LogAnalyzer consists of the following packages (structure and dependencies of the
packages are displayed in Fig. 2):

sk.fiit.nazou.loganalyzer

sk.fiit.nazou.loganalyzer.exception

sk.fiit.nazou.loganalyzer.instances

sk.fiit.nazou.loganalyzer.kb
– the main package, which contains core files.
– defines exceptions thrown by the tool.
– contains classes that
intermediate results which could be persisted in relational database.
represent
– contains classes that represents object model of
rules the tool should work with.

sk.fiit.nazou.loganalyzer.model

sk.fiit.nazou.loganalyzer.modelprovider

sk.fiit.nazou.loganalyzer.updateStrategies
– contains classes for object oriented
representation of user model (used for open user modeling purposes).
– defines interface of user model
provider and contains implementation of NAZOU user model provider (used for
open user modeling purposes).
– defines an interface and abstract
strategy for update of a processed property as well as an implementation of three
different update strategies.

sk.fiit.nazou.loganalyzer.usercharacteristicprovider

sk.fiit.nazou.loganalyzer.usercharacteristicprovider.nazou

sk.fiit.nazou.loganalyzer.util
– defines an interface of
user characteristic provider and a factory for providing specific implementations of
this interface.
–
contains
implementation of user characteristic providers for characteristics used in NAZOU
project.
– encapsulate classes providing supporting
functionality.
LogAnalyzer
kb
instances
modelprovider
usercharacteristicprovider
updatestrategies
usercharacteristicprovider.nazou
model
Fig. 2. LogAnalyzer packages – structure and dependencies.
A.3.2 Method Implementation
Fig. 3 depicts the processing of events – pattern detection performed by LogAnalyzer
tool. After a pattern is found, each change of a consequence part of the rule is executed.
This means that the tool retrieves the user characteristic determined by referencing and
used properties (if such a characteristic does not exist yet, it is created) and updates all
processed properties according to a given strategy as well as additional metadata about
characteristic such as timestamp or count of updates.
:client
:LogAnalyzer
:KBProcessor
:PatternDetector
instances:RuleInstance
:UserModelUpdater
signal
processProperties
processRules
rules
retrieveEventsToProcess
processEvents
findApplicableRuleInstances
apply(event)
computeNextExpectedEvent
[pattern is found]updateUM
Fig. 3. Sequence diagram – pattern detection.
A.3.3 Enhancements and Optimizing
LogAnalyzer is implemented as a singleton, which ensures that the rule configuration
file is loaded only once, when JVM loads the LogAnalyzer class. At present,
LogAnalyzer was not optimized for fast response times as it is not a part of the user
interface. The primary bottleneck in performance is the ontological repository Sesame.
Future enhancements could introduce meta-rules which could dynamically change the
working set of rules or dynamic rules whose parameters would be deduced
automatically from the actual domain context.
A.4
Manual for Adaptation to Other Domains
Rule based method of log analysis implemented by LogAnalyzer does not depend on
used domain, but rather on the used navigation model. However, various domains could
presume various types of characteristics derived from various user actions. This would
lead to change of rules the LogAnalyzer is working with.
A.4.1 Configuring to Other Domain
If another structure of user model is used, it is necessary to provide specific
implementations of characteristic providers, implementing methods defined in
sk.fiit.nazou.loganalyzer.usercharacteristicprovider.IUserCharacteristicProvid
er
and
provide
mapping
from
characteristic
to
these
providers
in
UserCharacteristics.properties.
Various types of update strategies which could be more suitable to particular domain
can
be
implemented
using
sk.fiit.nazou.loganalyzer.updateStrategies.UpdateStrategy interface, optionally
deriving it from AbstractUpdateStrategy from the same package.
A.4.2 Dependencies
Most dependencies are domain independent and thus require no domain specific
adjustments. UserLogs package is derived from user log database, filled by
SemanticLog tool. This data structure was devised to be independent enough from
actually used domain and navigation model.