Download Webocrat - People(dot)tuke(dot)sk

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Overview of the presentation
Knowledge Technologies
and their Applications
• IT technologies for support of knowledge
management (KM)
– Knowledge modelling, ontologies
– Knowledge discovery
Ján
Ján Paralič,
Paralič, Tomáš
Tomáš Sabol,
Sabol,
Marián
Mach,
Karol
Furdík,
Marián Mach, Karol Furdík,
Peter
Peter Bednár
Bednár aa ďalší
ďalší
• Some outcomes of our projects
– KDD Package – Knowledge discovery in databases
– Webocrat system
– JBowl – Java-based library for support of text mining
and retrieval
Knowledge
Knowledge Technologies
Technologies Group
Group
Technical
University
of
Košice,
Technical University of Košice, Slovakia
Slovakia
2
KM tools
Some project outcomes
•
• Support for preservation of existing knowledge
– organizational memories (KnowWeb, Webocrat)
– knowledge modeling, ontologies (KnowWeb, Webocrat,
OntoServer)
• Support for dissemination of existing knowledge
– “Web in Support of Knowledge Management in Company (KnowWeb)”,
FP4, Esprit Project 29065, 1998 - 2000
•
CEDAR toolkit
– “Enriching Representations of Work to Support Organisational Learning
(ENRICH)”, FP4 Project 29015, 1998 - 2000
•
– various communication channels (Webocrat)
– ontologies (KnowWeb, Webocrat, OntoServer)
KnowWeb toolkit
KDD Package
– “Geographical On-line Analysis, GIS – Data warehouse integration
(GOAL)”, INCO Copernicus project 977091, 1998 – 2001
•
• Support for creation of new knowledge
Webocrat and OntoServer
– “Web Technologies Supporting Direct Participation in Democratic
Processes (Webocracy)“, FP5 IST-1999-20364, 2000-2003
– knowledge discovery in databases (KDD Package)
– knowledge discovery in texts (JBowl)
•
JBowl
– “Document classification and annotation for the Semantic web”, Slovak
Grant Agency project Nr. 1/1060/04, 2004-2006 and the previous one
3
Knowledge discovery in databases
Data mining: the core of
knowledge discovery
process
7. Result interpretation and use
6. Results evaluation
5. Data mining
4. Data transformation
3. Data selection
4
KDD Package
• KDD package serves as
Discovered patterns
– a special module within the GOAL project
– as an open stand alone application for complete support of knowledge
discovery process
• Based on pilot applications the following data mining
functionality has been implemented:
– Classification/Description
• Rule induction module (CN2, RISE and various combining strategies)
• Classification tree induction module (C4.5)
Task-relevant Data
2. Data Integration
1. Data Cleaning
– Prediction
• Regression-based (linear regression, regression trees, model trees – M5’)
• Case-based reasoning (k-nearest neighbours with weights optimisation by
Data Warehouse
means of genetic algorithm)
Databases
5
6
1
Architecture of the KDD Package
Webocrat system
•
Visual tools for data
pre-processing
Data access
DB,
DWS,
Text file
Classification
/
Description
•
Prediction
`
. . .
Modules for data preprocessing
DM
Modules
Visualization of
discovered patterns
7
Webocrat – overall architecture
Webocrat is a modular web-based system that is capable to
improve communication between LG & citizens, increase
accessibility of LG services, provide new types of services and
increase efficiency of LG
Some of the Webocrat advantages:
–
–
–
–
–
–
–
–
Multi-channel communication tool with integrated ontology
Open and modular system, platform independent
Intelligent retrieval and access mechanism
User-dependent view of ontology
Customisable user interface, personalised services
Security and role management
Log management, Calendar module
Strong multilinguality (down to the on content level)
8
Webocrat information server
Users,
Users,
Systems
SystemsSettings
Settings
Knowledge
Knowledge
Model
Model
Data
Data
Metadata
Metadata
OntoServer
OntoServer
Webocrat
Webocrat
Information
Information
Server
Server
System
Administrator
Resource
Resource
Management
Management
Categories
Categories
Security,
Authentication,
Auditing
Resources
Pollings
Pollings
Metadata
Submissions
Submissions
Citizen
Ontology
Engineer
LA Employee
(Service Operator)
9
Webocrat applications
Web
Webresources
resources
LA Employee
(Service Operator)
Tenders
Tenders
Discussions
Discussions
Submissions
Submissions
Web
Weblinks
links
Messages
Messages
(Extended)
(Extended)Protégé
Protégé
2000
2000editor
editor
Documents
Documents
Pollings
Pollings
Articles
Articles
Forums
Forums
System
Administrator
Citizen
CitizenInterface
Interface
Knowledge
Model
Citizen
Tenders
Tenders
Searching
Searching
and
andReporting
Reporting
10
JBowl – motivation, main goals
• Project pilot applications’ sites
– 2 local authorities in Kosice (http://www.kosice-dh.sk,
www.tahanovce.sk/mutah),
– 1 in Wolverhampton (http://www.wolforum.org/)
• Kosice self-governing region
– eFiling Room application (http://intersoft.sk/epodat/)
• Carpathian Foundation (http://intersoft.sk/cf/)
– their web sites (in 5 countries) and information system
driven by Webocrat
• Regensburg (Germany) – in progress
– To become Cultural City of the Europe in 2010
11
• For research and development purposes a system
with following functionality is needed:
– Pre-process (potentially large) collections of
text documents
– Text documents in various formats (plain text,
HTML, XML) and in different languages
– Support for indexing and retrieval of
information from these information resources
– Interface to knowledge models (e.g. ontologies)
12
2
Why JBowl was needed?
JBowl – main characteristics
• The existing relevant software systems and tools may be
divided in 4 groups:
– Text indexing and retrieval tools (such as e.g. Lucene
or EGOTHOR)
– Tools for text processing (e.g. GATE, JavaNLP)
– KDD tools (Weka or KDD Package)
– Frameworks for work with ontologies (e.g. KAON)
• Well focused on one particular subtask, but lack support
for the others => not very well suited for text mining and
semantic retrieval
• JBowl is a software system developed in Java for support
of
– intelligent information retrieval
– text mining
• Main characteristics:
– provide an easy extensible, easy to use,
– modular framework for pre-processing and indexing of
large text collections, as well as for
– creation and evaluation of supervised and unsupervised
text-mining models
13
14
Knowledge discovery in texts
Text mining: the core of the
process of knowledge
discovery in texts
7. Result interpretation and use
6. Results evaluation
1. Document analysis
– support for NLP methods and document vector model
5. Text mining
4. Term selection
3. Text pre-processing
JBowl – supported tasks
Discovered patterns
Internal form
2. Building text mining models
– support for categorization (e.g. SVM, kNN, decision
trees and rules …), clustering (modified GHSOM) and
attribute selection model (DF, IG, MI …)
3. Testing a model
2. Text Cleaning
– estimation of model accuracy
1. Texts selection
and acquisition
4. Applying a model
– batch processing as well as on-line processing supported
Text documents
15
Use of text mining methods
JBowl used for Webocrat
• Clustering and visualisation of large set of
existing textual documents (modified GHSOM
algorithm)
ontology
text mining
document
analysis
vector
representation
16
intelligent
retrieval
indexer
• Document categorisation methods for:
– Semi-automatic linking of textual resources to concepts
from ontology
– Semi-automatic routing of electronic submissions
(requests for information …)
full-text
search
• Discovery of association rules – can be used for
ontology management
Java library for support of text mining and retrieval
Specific Webocrat functionality
17
18
3
navigation bar
menu
(category list)
actual content
list of news
messages
list of resources
relevant to the
actual category
19
list of relevant
resources
search banner
20
full-text
search query
advanced
search settings
search banner
results
21
4