Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Overview of the presentation Knowledge Technologies and their Applications • IT technologies for support of knowledge management (KM) – Knowledge modelling, ontologies – Knowledge discovery Ján Ján Paralič, Paralič, Tomáš Tomáš Sabol, Sabol, Marián Mach, Karol Furdík, Marián Mach, Karol Furdík, Peter Peter Bednár Bednár aa ďalší ďalší • Some outcomes of our projects – KDD Package – Knowledge discovery in databases – Webocrat system – JBowl – Java-based library for support of text mining and retrieval Knowledge Knowledge Technologies Technologies Group Group Technical University of Košice, Technical University of Košice, Slovakia Slovakia 2 KM tools Some project outcomes • • Support for preservation of existing knowledge – organizational memories (KnowWeb, Webocrat) – knowledge modeling, ontologies (KnowWeb, Webocrat, OntoServer) • Support for dissemination of existing knowledge – “Web in Support of Knowledge Management in Company (KnowWeb)”, FP4, Esprit Project 29065, 1998 - 2000 • CEDAR toolkit – “Enriching Representations of Work to Support Organisational Learning (ENRICH)”, FP4 Project 29015, 1998 - 2000 • – various communication channels (Webocrat) – ontologies (KnowWeb, Webocrat, OntoServer) KnowWeb toolkit KDD Package – “Geographical On-line Analysis, GIS – Data warehouse integration (GOAL)”, INCO Copernicus project 977091, 1998 – 2001 • • Support for creation of new knowledge Webocrat and OntoServer – “Web Technologies Supporting Direct Participation in Democratic Processes (Webocracy)“, FP5 IST-1999-20364, 2000-2003 – knowledge discovery in databases (KDD Package) – knowledge discovery in texts (JBowl) • JBowl – “Document classification and annotation for the Semantic web”, Slovak Grant Agency project Nr. 1/1060/04, 2004-2006 and the previous one 3 Knowledge discovery in databases Data mining: the core of knowledge discovery process 7. Result interpretation and use 6. Results evaluation 5. Data mining 4. Data transformation 3. Data selection 4 KDD Package • KDD package serves as Discovered patterns – a special module within the GOAL project – as an open stand alone application for complete support of knowledge discovery process • Based on pilot applications the following data mining functionality has been implemented: – Classification/Description • Rule induction module (CN2, RISE and various combining strategies) • Classification tree induction module (C4.5) Task-relevant Data 2. Data Integration 1. Data Cleaning – Prediction • Regression-based (linear regression, regression trees, model trees – M5’) • Case-based reasoning (k-nearest neighbours with weights optimisation by Data Warehouse means of genetic algorithm) Databases 5 6 1 Architecture of the KDD Package Webocrat system • Visual tools for data pre-processing Data access DB, DWS, Text file Classification / Description • Prediction ` . . . Modules for data preprocessing DM Modules Visualization of discovered patterns 7 Webocrat – overall architecture Webocrat is a modular web-based system that is capable to improve communication between LG & citizens, increase accessibility of LG services, provide new types of services and increase efficiency of LG Some of the Webocrat advantages: – – – – – – – – Multi-channel communication tool with integrated ontology Open and modular system, platform independent Intelligent retrieval and access mechanism User-dependent view of ontology Customisable user interface, personalised services Security and role management Log management, Calendar module Strong multilinguality (down to the on content level) 8 Webocrat information server Users, Users, Systems SystemsSettings Settings Knowledge Knowledge Model Model Data Data Metadata Metadata OntoServer OntoServer Webocrat Webocrat Information Information Server Server System Administrator Resource Resource Management Management Categories Categories Security, Authentication, Auditing Resources Pollings Pollings Metadata Submissions Submissions Citizen Ontology Engineer LA Employee (Service Operator) 9 Webocrat applications Web Webresources resources LA Employee (Service Operator) Tenders Tenders Discussions Discussions Submissions Submissions Web Weblinks links Messages Messages (Extended) (Extended)Protégé Protégé 2000 2000editor editor Documents Documents Pollings Pollings Articles Articles Forums Forums System Administrator Citizen CitizenInterface Interface Knowledge Model Citizen Tenders Tenders Searching Searching and andReporting Reporting 10 JBowl – motivation, main goals • Project pilot applications’ sites – 2 local authorities in Kosice (http://www.kosice-dh.sk, www.tahanovce.sk/mutah), – 1 in Wolverhampton (http://www.wolforum.org/) • Kosice self-governing region – eFiling Room application (http://intersoft.sk/epodat/) • Carpathian Foundation (http://intersoft.sk/cf/) – their web sites (in 5 countries) and information system driven by Webocrat • Regensburg (Germany) – in progress – To become Cultural City of the Europe in 2010 11 • For research and development purposes a system with following functionality is needed: – Pre-process (potentially large) collections of text documents – Text documents in various formats (plain text, HTML, XML) and in different languages – Support for indexing and retrieval of information from these information resources – Interface to knowledge models (e.g. ontologies) 12 2 Why JBowl was needed? JBowl – main characteristics • The existing relevant software systems and tools may be divided in 4 groups: – Text indexing and retrieval tools (such as e.g. Lucene or EGOTHOR) – Tools for text processing (e.g. GATE, JavaNLP) – KDD tools (Weka or KDD Package) – Frameworks for work with ontologies (e.g. KAON) • Well focused on one particular subtask, but lack support for the others => not very well suited for text mining and semantic retrieval • JBowl is a software system developed in Java for support of – intelligent information retrieval – text mining • Main characteristics: – provide an easy extensible, easy to use, – modular framework for pre-processing and indexing of large text collections, as well as for – creation and evaluation of supervised and unsupervised text-mining models 13 14 Knowledge discovery in texts Text mining: the core of the process of knowledge discovery in texts 7. Result interpretation and use 6. Results evaluation 1. Document analysis – support for NLP methods and document vector model 5. Text mining 4. Term selection 3. Text pre-processing JBowl – supported tasks Discovered patterns Internal form 2. Building text mining models – support for categorization (e.g. SVM, kNN, decision trees and rules …), clustering (modified GHSOM) and attribute selection model (DF, IG, MI …) 3. Testing a model 2. Text Cleaning – estimation of model accuracy 1. Texts selection and acquisition 4. Applying a model – batch processing as well as on-line processing supported Text documents 15 Use of text mining methods JBowl used for Webocrat • Clustering and visualisation of large set of existing textual documents (modified GHSOM algorithm) ontology text mining document analysis vector representation 16 intelligent retrieval indexer • Document categorisation methods for: – Semi-automatic linking of textual resources to concepts from ontology – Semi-automatic routing of electronic submissions (requests for information …) full-text search • Discovery of association rules – can be used for ontology management Java library for support of text mining and retrieval Specific Webocrat functionality 17 18 3 navigation bar menu (category list) actual content list of news messages list of resources relevant to the actual category 19 list of relevant resources search banner 20 full-text search query advanced search settings search banner results 21 4