Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Finding knowledge, data and answers on the Semantic Web Tim Finin University of Maryland, Baltimore County http://ebiquity.umbc.edu/resource/html/id/183/ Joint work with Li Ding, Anupam Joshi, Yun Peng, Cynthia Parr, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F3060297-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP. UMBC an Honors University in Maryland 1 This talk • Motivation • Swoogle Semantic Web search engine • Use cases and applications • Conclusions UMBC an Honors University in Maryland 2 Google has made us smarter UMBC an Honors University in Maryland 3 But what about our agents? tell register UMBC an Honors University in Maryland Agents still have a very minimal understanding of text and images. 4 But what about our agents? Swoogle Swoogle Swoogle Swoogle tell Swoogle Swoogle Swoogle register Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle A Google for knowledge on the Semantic Web is needed by software agents and programs UMBC an Honors University in Maryland 5 This talk • Motivation • Swoogle Semantic Web search engine • Use cases and applications • Conclusions UMBC an Honors University in Maryland 6 • http://swoogle.umbc.edu/ • Running since summer 2004 • 1.6M RDF docs, 300M triples, 10K ontologies, 15K namespaces, 1.3M classes, 175K properties, 43M instances, 420 registered users UMBC an Honors University in Maryland 7 Swoogle Architecture Analysis SWD classifier Ranking Index … Search Services IR Indexer SWD Indexer Semantic Web metadata Web Server Web Service html Discovery document cache Candidate URLs SwoogleBot Bounded Web Crawler Google Crawler rdf/xml the Web Semantic Web human machine Legends UMBC an Honors University in Maryland Information flow Swoogle‘s web interface 8 This talk • Motivation • Swoogle Semantic Web search engine • Use cases and applications • Conclusions UMBC an Honors University in Maryland 12 Applications and use cases 1 Supporting Semantic Web developers – Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors, statistics, etc. 2 Searching specialized collections – Spire: aggregating observations and data from biologists – InferenceWeb: searching over and enhancing proofs – SemNews: Text Meaning of news stories 3 Supporting SW tools – Triple shop: finding data for SPARQL queries UMBC an Honors University in Maryland 13 1 UMBC an Honors University in Maryland 14 80 ontologies were found that had these three terms By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. Let’s look at this one UMBC an Honors University in Maryland 15 Basic Metadata hasDateDiscovered: 2005-01-17 hasDatePing: 2006-03-21 hasPingState: PingModified type: SemanticWebDocument isEmbedded: false hasGrammar: RDFXML hasParseState: ParseSuccess hasDateLastmodified: 2005-04-29 hasDateCache: 2006-03-21 hasEncoding: ISO-8859-1 hasLength: 18K hasCntTriple: 311.00 hasOntoRatio: 0.98 hasCntSwt: 94.00 hasCntSwtDef: 72.00 hasCntInstance: 8.00 UMBC an Honors University in Maryland 16 UMBC an Honors University in Maryland 17 rdfs:range was used 41 times owl:ObjectProperty was time:Cal… defined once and to assert a value. instantiated 28 times used 24 times (e.g., as range) UMBC an Honors University in Maryland 18 These are the namespaces this ontology uses. Clicking on one shows all of the documents using the namespace. All of this is available in RDF form for the agents among us. UMBC an Honors University in Maryland 19 Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies. UMBC an Honors University in Maryland 20 We can also search for terms (classes, properties) like terms for “person”. UMBC an Honors University in Maryland 21 10K terms associated with “person”! Ordered by use. Let’s look at foaf:Person’s metadata UMBC an Honors University in Maryland 22 UMBC an Honors University in Maryland 23 UMBC an Honors University in Maryland 24 UMBC an Honors University in Maryland 25 87K documents used foaf:gender with a foaf:Person instance as the subject UMBC an Honors University in Maryland 26 3K documents used dc:creator with a foaf:Person instance as the object UMBC an Honors University in Maryland 27 Swoogle’s archive saves every version of a SWD it’s seen. UMBC an Honors University in Maryland 28 UMBC an Honors University in Maryland 29 2 An NSF ITR collaborative project with • University of Maryland, Baltimore County • University of Maryland, College Park • U. Of California, Davis • Rocky Mountain Biological Laboratory UMBC an Honors University in Maryland 30 An invasive species scenario • Nile Tilapia fish have been found in a California lake. • Can this invasive species thrive in this environment? • If so, what will be the likely consequences for the ecology? • So…we need to understand the effects of introducing this fish into the food web of a typical California lake UMBC an Honors University in Maryland 31 Food Webs • A food web models the trophic (feeding) relationships between organisms in an ecology – Food web simulators are used to explore the consequences of changes in the ecology, such as the introduction or removal of a species – A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them. • Goal: automatically construct a food web for a new location using existing data and knowledge • ELVIS: Ecosystem Location Visualization and Information System UMBC an Honors University in Maryland 32 East River Valley Trophic Web UMBC http://www.foodwebs.org/ an Honors University in Maryland 33 Species List Constructor Click a county, get a species list UMBC an Honors University in Maryland 34 The problem • We have data on what species are known to be in the location and can further restrict and fill in with other ecological models • But we don’t know which of these the Nile Tilapia eats of who might eat it. • We can reason from taxonomic data (simlar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps. UMBC an Honors University in Maryland 35 UMBC an Honors University in Maryland 36 Food Web Constructor Predict food web links using database and taxonomic reasoning. UMBC an Honors University in Maryland In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected 37 Evidence Provider Examine evidence for predicted links. UMBC an Honors University in Maryland 38 Status • Goal is ELVIS (Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location. • Background ontologies – SpireEcoConcepts: concepts and properties to represent food webs, and ELVIS related tasks, inputs and outputs – ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources • Under development – Connect to visualization software – Connect to triple shop to discover more data UMBC an Honors University in Maryland 39 UMBC Triple Shop • http://sparql.cs.umbc.edu/ 3 • Online SPARQL RDF query processing with several interesting features • Automatically finds SWDs for give queries using Swoogle backend database • Datasets, queries and results can be saved, tagged, annotated, shared, searched for, etc. • RDF datasets as first class objects – Can be stored on our server or downloaded – Can be materialized in a database or (soon) as a Jena model UMBC an Honors University in Maryland 40 Web-scale semantic web data access agent data access service ask (“person”) Search vocabulary Compose query Populate RDF database inform (“foaf:Person”) the Web Index RDF data Search URIrefs in SW vocabulary ask (“?x rdf:type foaf:Person”) inform (doc URLs) Search URLs in SWD index Fetch docs Query local RDF database UMBC an Honors University in Maryland 41 Who knows Anupam Joshi? Show me their names, email address and pictures UMBC an Honors University in Maryland 42 The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles UMBC an Honors University in Maryland 43 PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE { ?p1 foaf:surname "Joshi" . FROM clause! ?p1 foaf:firstName No “Anupam" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . } ORDER BY ?p2name UMBC an Honors University in Maryland 44 log in specify dataset Enter query w/o FROM clause! UMBC an Honors University in Maryland 45 UMBC an Honors University in Maryland 46 UMBC an Honors University in Maryland 47 302 RDF documents were found that might have useful data. UMBC an Honors University in Maryland 48 We’ll select them all and add them to the current dataset. UMBC an Honors University in Maryland 49 We’ll run the query against this dataset to see if the results are as expected. UMBC an Honors University in Maryland 50 The results can be produced in any of several formats UMBC an Honors University in Maryland 51 UMBC an Honors University in Maryland 52 Looks like a useful dataset. Let’s save it and also materialize it the TS triple store. UMBC an Honors University in Maryland 53 UMBC an Honors University in Maryland 54 We can also annotate, save and share queries. UMBC an Honors University in Maryland 55 Work in Progress • There are a host of performance issues • We plan on supporting some special datasets, e.g., – FOAF data collected from Swoogle – Definitions of RDF and OWL classes and properties from all ontologies that Swoogle has discovered • Expanding constraints to select candidate SWDs to include arbitrary metadata and embedded queries – FROM “documents trusted by a member of the SPIRE project” • We will explore two models for making this useful – As a downloadable application for client machines – As an (open source?) downloadable service for servers supporting a community of users. UMBC an Honors University in Maryland 56 This talk • Motivation • Swoogle Semantic Web search engine • Use cases and applications • State of the Semantic Web • Conclusions UMBC an Honors University in Maryland 57 Will Swoogle Scale? How? Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling System/date Terms Documents Individuals Triples Bytes Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109 Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010 2006 1x106 5x107 5x107 5x109 5x1011 2008 5x106 5x109 5x109 5x1011 5x1013 We think Swoogle’s centralized approach can be made to work for the next few years if not longer. UMBC an Honors University in Maryland 58 How much reasoning should Swoogle do? • SwoogleN (N<=3) does limited reasoning – It’s expensive – It’s not clear how much should be done • More reasoning would benefit many use cases – e.g., type hierarchy • Recognizing specialized metadata – E.g., that ontology A some maps terms from B to C UMBC an Honors University in Maryland 59 A RDF Dictionary • We’d hope to develop an RDF dictionary. • Given an RDF term, returns a graph of its definiton – Term definition from “official” ontology – Term+URL definition from SWD at URL – Term+* union definition – Optional argument recursively adds definitions of terms in definition excluding RDFS and OWL terms – Optional arguments identifies more namespaces to exclude UMBC an Honors University in Maryland 60 Conclusion • The web will contain the world’s knowledge in forms accessible to people and computers – We need better ways to discover, index, search and reason over SW knowledge • SW search engines address different tasks than html search engines – So they require different techniques and APIs • Swoogle like systems can help create consensus ontologies and foster best practices – Swoogle is for Semantic Web 1.0 – Semantic Web 2.0 will make different demands UMBC an Honors University in Maryland 61 For more information http://ebiquity.umbc.edu/ Annotated in OWL UMBC an Honors University in Maryland 62