Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Semantic Middleware Semantic Middleware • Investigating fundamental issues in entity/relationship extraction, disambiguation (matching & mapping) and annotation. 1. 2. 3. Entity identification Entity Disambiguation Semantic Annotation ----------------------------------------------------------------------------------------------------------------------------------------------------- World Model Lexical Analysis, Natural Language Processing, Additional linguistic resources: Thesaurus,Dictionary (synonymns, common variations) Entity Identification / Metadata Creation Documents to annotate YES Multiple matches found during lookup? NO Knowledge Base Semantic Annotation of selected documents Annotated Documents Entity Disambiguation Semantic Annotation • Entities in a drug advisory annotated with concepts and relationships from a Drug Ontology Excerpt of Drug Ontology Excerpt of Drug Ontology Sample Created Metadata <Entity id="122805" class="DrugOntology#prescription_drug_brandname"> Bextra <Relationship id=”442134” class="DrugOntology#has_interaction"> <Entity id="14280" class="DrugOntology #interaction_with_physical_condition>sulfa allergy </Entity> </Relationship> </Entity> Semantic Associations • Identifying implicit semantic associations between entities in the document Annotated RSS Feed Ontology Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from the market. Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency. Grey and white circles indicate ontology nodes in identified semantic associations. Lighter nodes lie in less relevant associations. Indicate entities in the RSS feed that were extracted and annotated with concepts in an ontology (shown in red) Disambiguation • Functionality: – merging two databases / ontologies, multiple references pointing to the same logical entity – Adding new instances to an ontology, a similar entity already exists and has to be merged with the new one – Example: merging person instances recorded in a government ontology and an incoming choice point person entity. Approaches • Feature-based Similarity Approach – – – – Set-Theory Similarity Approach Information-Theory Similarity Approach Clustering Approach Hybrid Approach • Relationship-based Similarity Approach • Hybrid Similarity Approach Challenges • Varying information content in entities – Differences in schema – Variations in representation • Use of abbreviations, mis-spellings, different naming convention, representation formats changing over time etc. • Insufficient information while merging two entities Exploiting relationships and other /previous reconciliation decisions Schema Conflicting instances Person Tim Robins Timothy Wallace Robinson -- SSN -- 889889889 -- 889889889 -- TelNumber -- 7065434567 -- 7062123443 -- FirstName -- Tim -- Timothy -- MiddleName -- -- Wallace -- LastName -- Robins -- Robinson -- Generation -- -- -- Marital Status -- Single -- Married -- Applicant -- -- -- dependent of -- -- -- spouse of -- -- person12332 -- works for -- People Soft -- Oracle -- affiliated with -- -- -- foreign influence event -- event7823 -- event099 -- address -- place23 -- place23 Nature of attribute indicates its relative importance – SSN given a high weight in disambiguating person entities String similarity metrics Recognized as a time sensitive attribute Reconciling Oracle and PeopleSoft indicates the two person entities work for the same organization Application using this disambiguation algorithm • Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Nominated for Best Paper Award at WWW 2006 • Disambiguate entities in a FOAF and DBLP dataset Schema DBLP FOAF Person rdfs:literal rdfs:literal rdfs:subClassOf rdfs:literal rdfs:literal dblp:label dblp:no_of_co_authors dblp:homepage dblp:no_of_publications dblp:coauthor foaf:knows foaf:Person rdfs:literal foaf:surname foaf:homepage dblp:iswcLocation foaf:mbox_sha1sum rdfs:literal foaf:nickName rdfs:literal rdfs:literal foaf:Person #4_2629 dblp:Researcher #2_553 dblp:Researcher #2_1417 dblp:Researcher #2_324 foaf:Person #4_19269 foaf:Person #4_35126 foaf:Person #4_28045 dblp:coauthor dblp:coauthor rdfs:literal rdfs:literal rdfs:literal Instance rdfs:literal foaf:firstName foaf:depiction dblp:iswc_affiliation rdfs:literal foaf:workplacepage rdfs:literal dblp:iswc_type rdfs:literal rdfs:subClassOf foaf:mbox foaf:schoolpage foaf:label rdfs:literal dblp:Researcher rdfs:literal foaf:knows foaf:knows dblp:coauthor dblp:Researcher #2_1518 dblp:no_of_publications foaf:knows 124 dblp:homepage dblp:label dblp:no_of_co_authors http://lsdis.cs.uga.edu/~amit/ Amit Sheth 134 Amit Sheth foaf:Person #4_38624 foaf:label foaf:knows Amit foaf:nickName foaf:mbox_sha1sum foaf:schoolpage foaf:homepage foaf:workplacepage 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c http://lsdis.cs.uga.edu/~amit http://lsdis.cs.uga.edu,http://www.semagix.com http://www.bitsaa.org/, http://www.cse.ohio-state.edu/ Statistics of the population and the results Provenance • When you see some data on the Web / database / ontology, do you know – where it came from? – why it is there? • Provenance is the lineage / history of a piece of information The need for provenance • Reliability and Quality – Identify lineage, measure credibility • Justification and Audit – Not only when and how but also why certain derivations have been made • Re-usability, reproducibility – not only how data has been produced but also all necessary information to reproduce results • Ownership , Security and Copyright – Provides a trusted source from which we can procure who the information belongs to and when and how it was created Challenges • In recording and using provenance information – Systematic Annotation for recording Provenance – Using provenance involves propagating annotations – location and propagation rules for annotations • Multiple Granularity – at what level to annotate? the whole database , the relations , the tuples or the data values? Cost and Feasibility of each? – Formal annotations - machine readable and executable – Meaningful annotations require versioning http://db.cis.upenn.edu/DL/fsttcs.pdf