* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Semantic Web in the real world
Entity–attribute–value model wikipedia , lookup
Operational transformation wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Clusterpoint wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Information privacy law wikipedia , lookup
Web analytics wikipedia , lookup
Database model wikipedia , lookup
Semantic Web In Industry R. Guha Two Levels of the Semantic Web • Deep Semantic Web: – Intelligent agents performing inference – Semantic Web as distributed AI – Small problem … the AI problem is not yet solved • Shallow Semantic Web: using SW/Knowledge Representation techniques for – Data integration – Search – Is starting to see traction in industry Integration: The new buzzword in bussiness • Huge explosion in the number of new databases, applications, documents, … in the 90s – Lots of redundancy, duplication … => high inefficiency • Economic pressures forcing consolidation and efforts to reduce inefficiency • Two aspects to integration: Process & Data – Process integration depends on data integration Data Integration for Science • Many experimental fields will generate more data in the next 2 years than exists today • Large part of research consists of writing programs to analyze data, e.g., NASA • Tools to normalize, share, integrate data stuck in the 80s (ftp, perl, …) • Semantic Web could create a “web of data” that changes all this. • Example of the Internet Observatory Varieties of Data Integration: Data Transformation • Data Transformation Example – Contact Information in SAP, Siebel, PeopleSoft, … – We want to reflect updates in one data source into another XSLT, etc. App. Server Siebel Clarify PeopleSoft Varieties of Data Integration: Data Aggregation • Data Aggregation Example – Clinical trial data at Stanford, UCSF, Mayo … – We want to give a Meta-analyst a uniform view of data from these different clinical trials – Example of how this would have helped recent meta studies such as the estrogen study Relational Views Meta-Analyst UCSF DBMS Stanford Mayo Data Integration Layers • Coping with software from different vendors – Oracle vs. DB2 vs. SQL Server … this is a solved problem • Coping with different formats – Relational vs. XML vs. ISAM… this too is a solved problem • Coping with different schemas – Solved for the small case where one person understands all the schemas – No products for the case where it is truly distributed • We know how to do it in theory, but lots of practical problems • Coping with data from unknown sources – Wide open … lots of unsolved problems Typical Data Integration Methodology • Use a common namespace of terms for the concepts in the domain of the data sources being integrated, e.g., Employee, Customer, Patient, weight, height, bodyTemperature, … • Mappings relate data items in data sources to terms in namespace • Transformation algorithms map queries in terms of common namespace into corresponding queries in terms of data source vocabularies • Background knowledge about terms essential for transformations … e.g., Employee subClassOf Person, 2 people with the same last name, first name and street address are likely to be the same, I.e., common namespace is really an Ontology • Mappings and common namespace are the workhorse Role of Semantic Web in Data Integration • The XML stack (XML, XSD, XPath, XQuery, …) does not have the concepts (objects, classes, properties, …) required for representing ontologies • RDF/S does … • Neither of the them have a language for expressing mappings – But RDF/S, being closer to logic, has more of the machinery that is required Kinds of Mappings • Simple structural – DB1.patient.weight corresponds to Patient’s weight • Conditional structural – If DB1.patient.type equals Outpatient then DB1.patient.foo corresponds to Patient’s visits duration … • Term mappings – CA in DB1 corresponds to California in domain namespace – Object with ssn 7687667 in database 1 corresponds to object with id “aksdks” in database 2 Challenges and non-challenges in data integration • Non-challenge: algorithms for doing the transformations (ISI, MCC, SU & AT&T) • Engineering Challenges – Creating large, useful ontologies that are shared by many – Creating mappings • Research Challenges – Semantic Drift – Fuzzy terms, probabilistic mappings – Trust Engineering Challenges • Creating large, detailed ontologies is complex and expensive – But it is happening … CrossWorlds for business concepts, MAGE, etc. for medicine – Danger: some of them might turn out to be proprietary • Creating mappings is tedious and time consuming • Object mappings pose special challenges – Mappings need to be dynamic and constantly updated Research Challenges with mappings • Semantic Drift – The meaning of terms as interpreted by different members of a community, over time could drift – Cyc experience shows that Description Logic mechanisms are not adequate for either detecting or fixing these • Fuzzy mappings – E.g., walmart’s concept of chair is similar to but not the same as MOMA’s concept of chair • Probabilistic mappings – There is a 82% likelihood that Michael Jordan in database 1 is the same as Michael Jordan in database 2 Other data web related challenges • Trust: How should the program know whether to trust some new data source? – Without this, we will only have closed systems – Options: centralized approaches like UDDI or decentralized approaches like WOTs • Inverse trust: how can I trust you not to indiscriminately distribute my data? A big issue in fresh scientific data • Systems challenges – Caching – Preventing accidental DOS attacks Forecast for SW and Data Integration • We already have a number of data integration tools on the market • We are seeing the first generation of ontology based data integration tools from small companies • At least some of the big players will probably have some offerings for doing data integration based on Semantic Web concepts in the near future – Whether they use Semantic Web formats and acronyms is an open question … • These common vocabularies will exhibit very strong network effects Semantic Web for Search: Going beyond search as Location Bar • Keywords a particular page – Typically a home page or well known hub page – United airlines www.united.com – Unix gnu.org, linux.org, freebsd.org • Search as a smarter location bar • Page rank is ideally suited for this – This is largely a solved problem Varieties of Search: Research searches • User is searching for info about something • Could be directed – user is looking for a particular property – Price of something, location of some event, … • Or undirected – user is looking for some general class of properties – Reviews/feedback on product, info on person or country • If there is no hub page on the thing, existing search engines perform very poorly • New focus is on this class of searches Semantic Web for Search • Keyword based approaches haven’t made significant advances since PageRank • Improvements may be gained by adding a modicum of understanding about the *object* denoted by the search query • Improvements not just in search itself but also in the relevance of search related advertising Basic Issues • Need database of potential objects user may be referring to, along with some properties of the object … e.g., its type • Too many objects to manually construct DB – At least 300 million distinct object references on Web • If it does know something more about the search term’s denotation, (e.g., it denotes a musician), how can the search engine do better? Building the Web KB • Many different automated approaches – Simple natural language processing (Riloff, TAP, …) – Scrappers – Machine Learning • Most commercial efforts lead to proprietary KBs • Huge opportunity for wider SW community – Collaborate to actually create the KB Using the KB • Word Sense Disambiguation., e.g., MSN Search, Teoma • Incorporating data feeds into search results. E.g., MSN with popular musicians • Incorporating object type specific actions. E.g., Google with addresses and stock symbols • Coming soon … KB construction driven by ads Conclusions • Please help Eric miller