Download SIMILE: Practical Metadata for the Semantic Web

Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer Science and Artificial Intelligence Laboratory Ryan Lee, W3C Research Engineer January 26, 2005 - XML.com Massachusetts Institute of Technology (MIT) Research Activity Alireza Abbasi Technology Management, Economics and Policy Program (TEMEP), College of Eng., SNU SIMILE Project  Focused on collecting and publishing Semantic Web data to the (non-Semantic) Web.  Researching solutions to data interoperability problems for digital libraries using semantic web technologies.  RDF-based Tools  Longwell  Gadget  RDFizer  Welkin  Fresnel  Timeline *new  Referee *new  Crowbar *new  Piggy Bank *new  Solvent *new 2 Introduction  Digital Libraries’ Problem:  Browsing digital libraries is a difficult process of navigating through different interfaces and different terminologies for each collection  SIMILE Project [Semantic Interoperability of Metadata In unLike Environments]  Make it easier to wander from collection to collection,  And, more generally, to find your way around in the Semantic Web  Motivated by DSpace   a repository for storing, indexing, preserving, and redistributing digital assets. Manages metadata about the content and distributes on the web 3 DSpace  Jointly developed by HP Research Labs and the MIT Libraries. (Open source software)  Used by many research- producing Organizations, and often by their libraries, to manage digital data and for researchers to find that data 4 DSpace (2)  Needs to support additional metadata schemas for a variety of purposes:  finding digital research material described in various, domain- specific ways,  managing that digital content over time in order to preserve it.  As DSpace expands to use new metadata schemas, it will have to deal with the problem of interoperability. 5 Incentive of SIMILE The Semantic Web Core stack (RDF, RDFS, and OWL) enables people to create ontologies to describe their specialized metadata and to make them generally reusable  But most people are not trained Semantic Web developers.  So, they need some tools for this and assess whether they did the job correctly. 6 Goals of SIMILE  To extend DSpace, enhancing support for arbitrary schemas and metadata and providing an architecture for disseminating digital assets  Creating tools that metadata specialists (e.g., librarians) need, to produce good-quality RDF.  Due to limited expertise in defining ontologies, creating RDF, and converting existing XML-based metadata into RDF. Make metadata interoperability easier for digital libraries by providing useful tools for browsing, searching and mapping heterogeneous metadata in RDF 7 SIMILE – Delivered Components  Tools for Metadata Managers  Gadget - XML inspector  RDFizers - Batch tools to transform existing XML data into RDF  Solvent* - Firefox extension for Javascript screen scraping  Welkin - Graphical tool to inspect/edit RDF graph  Tools for End-Users  Longwell - Web-based RDF faceted metadata browser  Frensel – extensible universal information client  Piggy Bank* - Firefox extension for personal info. management of metadata in RDF  Semantic Bank* - Web-based server that allows data publishing and sharing by individuals, groups, or communities  Exibit* - lightweight structured data publishing framework  Timeline* - AJAXy widget for visualizing time-based events *: new tools after the paper 8 SIMILE: Tools for Metadata Managers  RDFizers  Batch tools to transform existing XML data into RDF  Gadget  XML inspector  Welkin  Graphical tool to inspect/edit RDF graph  Solvent*  Firefox extension for Javascript screen scraping *: new tools after the paper 9 RDFizers: Transform XML data into RDF  RDF’s strength is “defining models in the highly distributed nature”  But, RDF/XML serialization is a very unfriendly compromise  So, RDFizers is created  to create and catalog software tools and scripts, which are able to transform data from existing syntaxes into RDF.  allows people to explore their existing data in available RDF browsing tools.  It helps to resolve the SW chicken-and-egg problem  "not much RDF data will be created without a killer app., but no killer app. will be created without more RDF data“  Solution: making it easier for specialists (like librarians and other metadata experts) to convert popular and widely available metadata sources into RDF. 10 RDFizers (2)  Done with XSLT style sheets, simple scripts  Need to define RDF “ontologies” for each  List of RDFizers in SIMILE:  MARC/MODS  RDF . OAI-PMH  RDF  OCW  RDF . EMail  RDF  BibTEX  RDF . Flat  RDF  Weather  RDF . Java  RDF  Javadoc  RDF . Jira  RDF  Subversion  RDF . Random  RDF Gadget: XML inspector  Problem in transformation of existing XML datasets into RDF  lack of tools that give you an at-a-glance overview of an XML dataset (or a collection of XML documents).  Gadget helps data managers understand the structure of an XML dataset by providing a summary of the  count, unique values, and percentage of unique values for XML attributes.  Works on any well-formed XML  Used for       Data exploration, understanding Data migration, transformation Data cleanup Complexity evaluation Schema adherence understanding Schema emergence (if none provided) 12 Gadget: sample OCW: 2,002,015 Lines of XML 13 Welkin: Graphical tool to inspect/edit RDF graph  Configuring tools like Longwell requires a thorough understanding of the structure of the data being examined.  it is hard to get a global overview of an RDF model,  a few tools for summarizing RDF and giving a quick mental model of the data being manipulated with a browser.  So Welkin is created  an interactive graphical RDF browser that visualizes any RDF model without requiring prior configuration (like Knowle, but unlike Longwell)  displays RDF as a clustered set of nodes and arcs.  useful for understanding and mining the layout of unfamiliar datasets.  tries to empower the user with an interactive approach,  allowing users to mine, zoom, drag, select, cluster, filter, and highlight nodes and arcs. 14 Welkin: Graphical tool to inspect/edit RDF graph 15 Solvent (new*): Easier Scraping to RDF  a Firefox extension that helps write Javascript screen scrapers for Piggy Bank.  Motivation:  turns a regular web page into a semantic web page, freeing the data from the page/site that contains it.  Unfortunately, not many web pages embed or link to RDF information.  Piggy Bank needs web pages to embed information in RDF.  Piggy Bank is capable to execute a particular screen scraper on particular pages in order to "extract" the information it needs. 16 Solvent (example) 17 SIMILE: Tools for End-Users  Longwell  Web-based RDF faceted metadata browser  Frensel  Vocabulary for specifying how RDF graphs are presented  Piggy Bank*  Firefox extension for personal info. management of metadata in RDF  Semantic Bank*  Web-based server that allows data publishing and sharing by individuals, groups, or communities  Exibit*  lightweight structured data publishing framework *: new tools after the paper 18 Longwell: RDF faceted metadata browser  RDF browsing for library users  Longwell, a web-based RDF-powered highly-configurable faceted browser  targets users by hiding the presence of the underlying RDF model  Knowle (shipped as part of the Longwell distribution), a node-focused graph navigation browser   targeted at people who want to see or debug the underlying RDF model. The browsing suite is written as Java servlets and is built around HP's Jena2 Semantic Web toolkit. 19 Longwell (sample) 20 Haystack: extensible "universal information client“  enables users to manage diverse sources of information (e.g., email, calendars, address books, and web pages)  by defining whichever arrangements of, connections between, and views of information they find most effective.  the interaction offered by a web-browser interface is too limited, So, The Haystack project is exploring a "rich client" interface  that allows RDF data to be manipulated as well as navigated.  Unlike Welkin, which displays information as a graph, Haystack aims for a Longwell-like presentation of information that is natural for simple end users.  It uses standard primitives like drag and drop and context menus to give users access to various operations on the data being viewed at any given time.  It is currently being repackaged as a plugin in the Eclipse platform. 21 Fresnel: vocabulary for specifying how RDF graphs are presented  In working on RDF browsing for both SIMILE and Haystack, they found that it is better to have a general ontology governing how to display RDF,  a kind of stylesheet for RDF that allows user to indicate how we would like to present some abstract data to the user.  Together with other members of the Semantic Web development community, SIMILE is working on putting together Fresnel, a generic ontology for describing how to render RDF in a human-friendly manner. 22 Piggy Bank*: information management of metadata in RDF  Firefox extension for managing metadata  Loads RDF into local Longwell server  Search and faceted browse of local RDF  Views defined by library, other users  Users can find, collect, annotate RDF  23 Can then publish for access by others ©MIT CNI Spring 2006 Piggy Bank* (Sample) 24 ©MIT CNI Spring 2006 Semantic Bank*: Web-based server that allows data publishing and sharing by individuals, groups, or communities  To persist remotely, share, and publish data on a server  For individuals, groups, communities  e.g. conference proceedings  Ability to tag resources  Longwell facetted browsing view of published information 25 ©MIT CNI Spring 2006 Exibit*: create web pages with support for sorting, filtering, and rich visualizations SIMILE Categories of Work 27 ©MIT CNI Spring 2006 Projects after this Paper - Done Timeplot  a cross-browser DHTML (canvas-based) time series plotting widget. Timeline  A DHTML AJAX timeline widget for visualizing temporal information. 28 Projects after this Paper - ongoing  Piggy Bank  An extension to the Firefox that turns it into a Semantic Web browser letting you make use of existing information on the Web in more useful and flexible ways not offered by the original Web sites.  Semantic Bank  The server companion of Piggy Bank that lets you persist, share and publish data collected by individuals, groups or communities.  Solvent  A Firefox extension that helps you write Javascript screen scrapers for Piggy Bank.  jsTeX  a javascript library that is capable of interpreting some (basic) TeX encodings and transform them into HTML definitions right directly on a web page.  Citeline  A web application to facilitate the web publishing of bibliographies and citation collections as interactive exhibits and facilitate the sharing of this type of data.  Zotz  a Firefox add-on giving you the ability to publish citations from your Zotero to an Exhibit (via Citeline) in one step. 29 Projects after this Paper – ongoing (2)  Referee  reads your web server logs, crawls your referrers (the links that point to your pages) and extract metadata from those pages and text around the links that pointed to your pages.  Babel  lets you convert between various data formats.  Exhibit  lets you create web pages with support for sorting, filtering, and rich visualizations by writing only HTML and optionally some CSS and Javascript code.  Appalachian  a Firefox add-on that adds the ability to manage and use several OpenIDs to ease the login parts of your browsing experience.  Seek  adds faceted browsing features to Mozilla Thunderbird and lets you search through your email more effectively. 30 An Incomplete Picture  For metadata specialists and system developers,  What about editing RDF?     What about building new ontologies?   Universidad Politécnica de Madrid’s School of Computing (FIUPM) have developed a new method for building multilingual ontologies that can be applied to the Semantic Web. What about storing vast quantities of (potentially distributed) RDF and accessing it efficiently?    http://www.altova.com/features_RDF.html http://www.cs.rpi.edu/~puninj/rdfeditor http://rhodonite.angelite.nl http://tucana.es.northropgrumman.com/solutions/technology.htm What about using performance-enhancing techniques (such as caching) for RDF? What about quickly inferencing over RDF data?  For users,    Can we design faceted browsing interfaces that scale to dozens of RDF ontologies? How about improving navigation across the linkages between ontologies? How can we support searching that will start in one domain/ontology and expand into relevant related domains/ontologies? 31 References  SIMILE: Practical Metadata for the Semantic Web,  by Stefano Mazzocchi, Stephen Garland, Ryan Lee [January 26, 2005]  http://www.xml.com/pub/a/2005/01/26/simile.html  http://simile.mit.edu/  http://en.wikipedia.org/wiki/SIMILE  “MIT’s SIMILE Project: Demonstrating Practical Value of Semantic Web Technology for Digital Libraries” by MacKenzie Smith, MIT Libraries  “Tutorial – Semantic Digital Libraries, Comparison and the Future” by Sebastian R. Kruk, Bernhard Haslhofer, Philipp Nußbaumer, Sandy Payette, Tomasz Woroniecki, Univ. of Vienna, 2007. 32 33 Faceted browsing  a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information.  Displays only the metadata fields that are configured to be 'facets' (i.e., to be important for the user browsing data in one or more specific domains)  using values for those fields as a means for zooming into a collection by selecting those items with a particular field-value pair (e.g., 26 works of art in the example dataset have a subject of Abstract Expressionism).  Provides a mechanism that allows users to explore different schemas from different domains  with a unified interface and to discover the synergies across them.  For example, the interface can be designed to show users that one schema uses a "subject" facet while another uses a "topic" facet for similar information. 34 Welkin (sample)  Welkin is used to browse a fragment of the MIT OpenCourseWare metadata converted to RDF. 35 Timeline*: visualizing temporal information Behind the Curtain  Four groups support SIMILE:  HP Research Labs, the W3C, MIT Libraries, and MIT CSAIL.  The principal investigators have included  Mick Bass, Eric Miller, MacKenzie Smith, and David Karger.  The developers are  Stefano Mazzocchi, Stephen Garland, and Ryan Lee.  Mark Butler (bootstraper of the Longwell project) 37

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download SIMILE: Practical Metadata for the Semantic Web