Download Ontologies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Integration of Overlay UM for
Close Domains
Based on
Domain Ontology Mapping
Sergey Sosnovsky
PAWS@SIS@PITT
L3S
• Learning Lab of Lower Saxony
• > 60 people (~ 5 professors, ~20 postdocs, ~35
PhD students)
• > 5 million annual budget
• Area of Research:
– Technology Enhanced Learning
– Semantic Web and Digital Libraries
– Distributed Systems and Networks
• A coordinator for PROLEARN, REWERS and
KnowledgeWeb networks of excellence.
• KBS (Knowledge Based Systems) Lab
Outline
•
•
•
•
Project Motivation (Simple Scenario)
Addressed Problem and Chosen Solution
GLUE O-Mapping Algorithm
System Details:
– Developed Ontologies
– LO Repositories
– Implementation
• Summary…
• Discussion
Scenario
Java
C
UM of C
knowledge
UM of
Java
knowledge
Problem-Solution
• Problem: “Cold-start”
• Source for solution: Closeness of domains.
• Solution:
– Apply ontology mapping techniques to identify similar
concepts in both domains
– Using the found mappings translate overlay models of
student’s knowledge
• Broader Goals:
– To verify the possibility of User Model mediation
between relative domains;
– To check how effective for this could be Ontology
mapping technologies.
Ontology Mapping
• O-Mapping Approaches:
– Using a Shared Ontology
– Heuristics and Rule-Based
– ML-Based
• GLUE-Approach
– Joint Probability Distribution Estimation
– Similarity Estimation
GLUE: Distribution Estimator (1)
•
Input data:
– Ontology O1 with corresponding set of instances U1
– Ontology O2 with corresponding set of instances U2
•
For every pair of concepts A from O1 and B from O2 the
algorithm is as follows:
A
¬A
B
¬B
1. Partition U1 into U1 and U1
2. Partition U2 into U2 and U2
U1
A
¬A
B
U2
¬B
3. Train a Classifier C for instances of A using U1A and
U1¬A as positive and negative training sets
C1
GLUE: Distribution Estimator (2)
C1
4. Apply Classifier L to each instance in set U2B.
This partition U2B into 2 sets U2AB and U2¬AB.
Similarly for the set U2¬B.
U2AB
U2A¬B
C1
U2¬AB U2¬A¬B
5. Repeat two previous steps with the roles
of O1 and O2 being reversed to obtain
U1AB , U1¬AB, U1A¬B and U1¬A¬B .
C2
C2
6. Now we can compute joint probabilities
for the concepts A and B:
•
•
•
P(A,B) = [N(U1AB)+N(U2AB)] / [N(U1)+N(U2)]
P(A, ¬B) = [N(U1A¬B)+N(U2A¬B)] / [N(U1)+N(U2)]
P(¬A, B) = [N(U1¬AB)+N(U2¬AB)] / [N(U1)+N(U2)]
GLUE: Similarity Estimator
• Jaccard-sim(A,B) = P(A∩B) / P(AUB) =
P(A,B) / [P(A,B) + P(A,¬B) + P(¬A,B)]
• It takes 0, when A and B are disjoint
and 1, when they are the same
concept
Ontologies
•
Java Ontology:
–
–
–
–
•
Java Ontology designed for Java Personal Reader
http://hoersaal.kbs.uni-hannover.de/rdf/java_ontology.rdf
Format – rdfs
# of Classes – 544
Relations – rdfs:subClassOf
C ontology:
–
–
–
–
Our C ontology (next version)
http://www.sis.pitt.edu/~paws/ont/c_programming.rdf
Format – rdfs
# of Classes – 546
Relations – cprog:isA, cprog:partOf (both are
rdfs:subPropertyOf rdfs:subClassOf)
Repositories
•
Java Repository:
–
Sun Java Tutorial
http://java.sun.com/docs/books/tutorial/java/index.html
# of pages – 208
Repository description – rdf
Namespaces used:
–
–
–
obsolete
•
•
•
•
•
lom="http://ltsc.ieee.org/2002/09/lom-base#"
lom_cls="http://www.imsproject.org/rdf/imsmd_classificationv1p2#"
dc="http://purl.org/dc/elements/1.1/”
vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
C Repository
–
Miles C Tutorial (processed by R2Net tool)
http://www.sis.pitt.edu/~sergeys/_dev/c_tutorial_Miles/
# of pages – 117
Repository description – rdf
Namespaces used:
–
–
•
lom-edu="http://ltsc.ieee.org/2002/09/lom-educational#“
•
dc="http://purl.org/dc/elements/1.1/”
System Implementation
•
•
Tomcat servlet
Four free third-party API’s are used:
–
–
–
–
•
Tidy for HTML parsing and text extraction.
Apache Lucene for indexing of text documents.
HP Jena for RDF processing and Ontology inference.
Weka for classification of concept instances.
Input:
–
–
•
Two domain ontologies
Two repository descriptions
Output:
–
Mapping of two ontologies (rdf)
Summary: Current State
•
•
•
•
Ontologies for Java and C programming languages
are developed.
Two repositories of learning objects are described in
terms of corresponding ontologies: Sun Java Tutorial
and Robert Miles’s C Tutorial.
During manual mapping about 100 possible mapping
cases have been identified, which is a good
percentage for both ontologies. All possible semantic
mapping situations are found (1:n, m:1, n:m
mapping). For dealing with such granularity
discrepancies on the level of calculating actual values
so far we’re going to use weighed average function.
The GLUE algorithm for Ontology Mapping based on
the joint probability distributions of concepts in their
instances is implemented.
Current Problems
•
•
•
PerformancePerformance
precision tradeoff is necessary
Precision
Interface development.
– Mapping presentation to the user
– Mapping navigation and manual update
– 1:n, m:1, n:m mapping situations
•
•
Mapping of scales
Students – subjects for the experiment
Performance Problem Solutions
•
Change ML-library
–
•
Feature selection
–
•
Currently 3500 features for about 300 docs
Modification of GLUE algorithm
–
•
SMILE instead of WEKA
Divide and conquer approach to reduce the
dimensionality of the problem
Another ML-based O-Mapping method
–
e.g. based on support vector machines
[Jason Chaffee, Susan Gauch. Personal Ontologies For Web Navigation. In Proceedings
of the 9th International Conference On Information Knowledge Management (CIKM),
2000, pp. 227-234.]
•
Excluding Human feedback
Precision Problem Solutions
•
Two sources of information not taken into account by
the algorithm yet:
–
–
Structural dependencies
Naming of concepts
Scale Mapping Problem Solution
•
If some scale ontology exists, we can use it. If not we
need to develop one
Experiment
• Undergraduate course: INFSCI 1090 Object-Oriented
Programming I
• ~40 students
• 22 student have C/C++ experience and no Java
experience
• only 4 students have taken our IS12 => we have UM’s of
their C knowledge
• All students took a pre-quiz on C assessing their
knowledge of 56 C-concepts related to Java
• Every week students take a quiz on Java with 2-3 extra
credit questions assessing Java concepts not covered
yet in the class
Other Solutions of Subject Problem
• Administer Java test at the end of our IS-0012
courses
• Send Java test to those students, who have
taken our IS-0012 course (C models exist) and
ask them to voluntarily participate in the
experiment.
• The mapping is bi-directional => We can act the
other way around. If we have available some
people, who knows Java, but does not know C,
we can try to assess their Java knowledge and
then evaluate the knowledge transfer to C
domain.
Discussion: Other Open Problems
•
General Recommendation on Close Domain Ontology Mapping.
When it is worthwhile?
What are the metrics of domain closeness?
How can we say, that for these two domains knowledge mediation is
possible?
It could be the percentage of related concepts.
It could be the closeness of domains in the general hierarchy of domains
based on some Common Sense Upper Level Ontology (SUMO,
DOLCHE, CYC).
It could be some IR-based metrics of analyzing of related recourses (for
example: C tutorial and Java Tutorial).
Comparative evaluation of several different methods of UM mediation:
•
–
–
–
–
–
•
Ontology-mapping
Collaborative
Stereotype-based clustering
Provided by Expert
Self-estimation based on scrutable UM
Architectural issues. Should the developed component act as a part of a
central ontology server, or it can be a mediator in decentralized world, or
every application should have such facilities and perform its own
mapping? Where the mapping should be stored?
Possible Ways of L3S-PAWS
Collaboration
•
•
•
•
Java Personal Reader as a component of
ADAPT2 architecture
Implementation of some components of
ADAPT2 (KnowledgeSea, AnnotatED) as
Adaptation Services for Java Personal Reader
Mohamed’s Web-service based protocol for
concurrent learning (WIDEIn and Java
Personal Reader).
Teresa’s publication recommender and COPe.