Download SIMILE: Practical Metadata for the Semantic Web

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc.
Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer Science and Artificial Intelligence Laboratory
Ryan Lee, W3C Research Engineer
January 26, 2005 - XML.com
Massachusetts Institute of Technology (MIT) Research Activity
Alireza Abbasi
Technology Management, Economics and Policy Program (TEMEP),
College of Eng., SNU
SIMILE Project
 Focused on collecting and publishing Semantic Web data to the
(non-Semantic) Web.
 Researching solutions to data interoperability problems for digital
libraries using semantic web technologies.
 RDF-based Tools
 Longwell
 Gadget
 RDFizer
 Welkin
 Fresnel
 Timeline *new
 Referee *new
 Crowbar *new
 Piggy Bank *new
 Solvent *new
2
Introduction
 Digital Libraries’ Problem:
 Browsing digital libraries is a difficult process of navigating
through different interfaces and different terminologies for
each collection
 SIMILE Project [Semantic Interoperability of Metadata In unLike Environments]
 Make it easier to wander from collection to collection,

And, more generally,
to find your way around in the Semantic Web
 Motivated by DSpace


a repository for storing, indexing, preserving, and redistributing
digital assets.
Manages metadata about the content and distributes on the web
3
DSpace
 Jointly developed by
HP Research Labs
and the MIT Libraries.
(Open source software)
 Used by many research-
producing Organizations,
and often by their libraries, to
manage digital data and
for researchers to find
that data
4
DSpace (2)
 Needs to support additional metadata schemas for a variety
of purposes:
 finding digital research material described in various, domain-
specific ways,
 managing that digital content over time in order to preserve it.
 As DSpace expands to use new metadata schemas,
it will have to deal with the problem of interoperability.
5
Incentive of SIMILE
The Semantic Web Core stack (RDF, RDFS, and OWL) enables
people to create ontologies to describe their
specialized metadata and to make them generally
reusable
 But most people are not trained Semantic Web
developers.
 So, they need some tools for this and assess whether
they did the job correctly.
6
Goals of SIMILE
 To extend DSpace, enhancing support for arbitrary schemas
and metadata and providing an architecture for disseminating
digital assets
 Creating tools that metadata specialists (e.g., librarians) need, to
produce good-quality RDF.

Due to limited expertise in defining ontologies, creating RDF, and converting
existing XML-based metadata into RDF.
Make metadata interoperability easier
for digital libraries by providing useful tools for
browsing, searching and mapping heterogeneous
metadata in RDF
7
SIMILE – Delivered Components
 Tools for Metadata Managers
 Gadget - XML inspector
 RDFizers - Batch tools to transform existing XML data into RDF
 Solvent* - Firefox extension for Javascript screen scraping
 Welkin - Graphical tool to inspect/edit RDF graph
 Tools for End-Users
 Longwell - Web-based RDF faceted metadata browser
 Frensel – extensible universal information client
 Piggy Bank* - Firefox extension for personal info. management of metadata in RDF
 Semantic Bank* - Web-based server that allows data publishing and sharing by
individuals, groups, or communities
 Exibit* - lightweight structured data publishing framework
 Timeline* - AJAXy widget for visualizing time-based events
*: new tools after the paper
8
SIMILE: Tools for Metadata Managers
 RDFizers
 Batch tools to transform existing XML data into RDF
 Gadget
 XML inspector
 Welkin
 Graphical tool to inspect/edit RDF graph
 Solvent*
 Firefox extension for Javascript screen scraping
*: new tools after the paper
9
RDFizers: Transform XML data into RDF
 RDF’s strength is “defining models in the highly distributed nature”
 But, RDF/XML serialization is a very unfriendly compromise
 So, RDFizers is created
 to create and catalog software tools and scripts, which are able to
transform data from existing syntaxes into RDF.

allows people to explore their existing data in available RDF browsing tools.
 It helps to resolve the SW chicken-and-egg problem
 "not much RDF data will be created without a killer app., but no killer app.
will be created without more RDF data“
 Solution: making it easier for specialists (like librarians and other metadata
experts) to convert popular and
widely available metadata sources into
RDF.
10
RDFizers (2)
 Done with XSLT style sheets, simple scripts
 Need to define RDF “ontologies” for each
 List of RDFizers in SIMILE:
 MARC/MODS  RDF
. OAI-PMH  RDF
 OCW  RDF
. EMail  RDF
 BibTEX  RDF
. Flat  RDF
 Weather  RDF
. Java  RDF
 Javadoc  RDF
. Jira  RDF
 Subversion  RDF
. Random  RDF
Gadget: XML inspector
 Problem in transformation of existing XML datasets into RDF
 lack of tools that give you an at-a-glance overview of an XML
dataset (or a collection of XML documents).

Gadget helps data managers understand the structure of an
XML dataset by providing a summary of the
 count, unique values, and
percentage of unique values for XML attributes.
 Works on any well-formed XML
 Used for






Data exploration, understanding
Data migration, transformation
Data cleanup
Complexity evaluation
Schema adherence understanding
Schema emergence (if none provided)
12
Gadget: sample
OCW: 2,002,015 Lines of XML
13
Welkin: Graphical tool to inspect/edit RDF graph
 Configuring tools like Longwell requires a thorough understanding of
the structure of the data being examined.
 it is hard to get a global overview of an RDF model,
 a few tools for summarizing RDF and giving a quick mental model of the
data being manipulated with a browser.
 So Welkin is created
 an interactive graphical RDF browser that visualizes any RDF
model without requiring prior configuration (like Knowle, but unlike
Longwell)
 displays RDF as a clustered set of nodes and arcs.
 useful for understanding and mining the layout of unfamiliar
datasets.
 tries to empower the user with an interactive approach,

allowing users to mine, zoom, drag, select, cluster, filter, and highlight nodes and arcs.
14
Welkin: Graphical tool to inspect/edit RDF graph
15
Solvent (new*): Easier Scraping to RDF
 a Firefox extension that helps write Javascript screen scrapers for Piggy
Bank.
 Motivation:
 turns a regular web page into a semantic web page, freeing the data from
the page/site that contains it.
 Unfortunately, not many web pages embed or link to RDF information.
 Piggy Bank needs web pages to embed information in RDF.
 Piggy Bank is capable to execute a particular screen scraper on particular
pages in order to "extract" the information it needs.
16
Solvent (example)
17
SIMILE: Tools for End-Users
 Longwell
 Web-based RDF faceted metadata browser
 Frensel
 Vocabulary for specifying how RDF graphs are presented
 Piggy Bank*
 Firefox extension for personal info. management of metadata in RDF
 Semantic Bank*

Web-based server that allows data publishing and sharing by individuals, groups, or
communities
 Exibit*
 lightweight structured data publishing framework
*: new tools after the paper
18
Longwell: RDF faceted metadata browser
 RDF browsing for library users
 Longwell, a web-based RDF-powered highly-configurable faceted
browser

targets users by hiding the presence of the underlying RDF model
 Knowle (shipped as part of the Longwell distribution), a node-focused
graph navigation browser


targeted at people who want to see or debug the underlying RDF
model.
The browsing suite is written as Java servlets and is built around HP's Jena2 Semantic Web toolkit.
19
Longwell
(sample)
20
Haystack: extensible "universal information client“
 enables users to manage diverse sources of information (e.g.,
email, calendars, address books, and web pages)
 by defining whichever arrangements of, connections between, and views of
information they find most effective.
 the interaction offered by a web-browser interface is too limited, So, The
Haystack project is exploring a "rich client" interface
 that allows RDF data to be manipulated as well as navigated.
 Unlike Welkin, which displays information as a graph, Haystack aims for a
Longwell-like presentation of information that is natural for simple end
users.
 It uses standard primitives like drag and drop and context menus to give
users access to various operations on the data being viewed at any given
time.
 It is currently being repackaged as a plugin in the Eclipse platform.
21
Fresnel: vocabulary for specifying how RDF graphs are
presented
 In working on RDF browsing for both SIMILE and Haystack, they
found that it is better to have a general ontology governing how to
display RDF,
 a kind of stylesheet for RDF that allows user to indicate how we
would like to present some abstract data to the user.
 Together with other members of the Semantic Web development community,
SIMILE is working on putting together Fresnel, a generic
ontology for describing how to render RDF in a human-friendly
manner.
22
Piggy Bank*: information management of metadata in RDF
 Firefox extension for managing metadata

Loads RDF into local Longwell server
 Search and faceted browse of local RDF

Views defined by library, other users
 Users can find, collect, annotate RDF

23
Can then publish for access by others
©MIT
CNI Spring 2006
Piggy Bank* (Sample)
24
©MIT
CNI Spring 2006
Semantic Bank*: Web-based server that allows data
publishing and sharing by individuals, groups, or communities
 To persist remotely,
share, and publish
data on a server
 For individuals,
groups, communities
 e.g. conference
proceedings
 Ability to tag resources
 Longwell facetted
browsing view of
published information
25
©MIT
CNI Spring 2006
Exibit*: create web pages with support for sorting, filtering,
and rich visualizations
SIMILE Categories of Work
27
©MIT
CNI Spring 2006
Projects after this Paper - Done
Timeplot
 a cross-browser DHTML (canvas-based) time series
plotting widget.
Timeline
 A DHTML AJAX timeline widget for visualizing
temporal information.
28
Projects after this Paper - ongoing
 Piggy Bank

An extension to the Firefox that turns it into a Semantic Web browser letting you make use of
existing information on the Web in more useful and flexible ways not offered by the original
Web sites.
 Semantic Bank

The server companion of Piggy Bank that lets you persist, share and publish data collected by
individuals, groups or communities.
 Solvent

A Firefox extension that helps you write Javascript screen scrapers for Piggy Bank.
 jsTeX

a javascript library that is capable of interpreting some (basic) TeX encodings and transform
them into HTML definitions right directly on a web page.
 Citeline

A web application to facilitate the web publishing of bibliographies and citation collections as
interactive exhibits and facilitate the sharing of this type of data.
 Zotz

a Firefox add-on giving you the ability to publish citations from your Zotero to an Exhibit (via
Citeline) in one step.
29
Projects
after this Paper – ongoing (2)
 Referee

reads your web server logs, crawls your referrers (the links that point to your pages) and extract
metadata from those pages and text around the links that pointed to your pages.
 Babel

lets you convert between various data formats.
 Exhibit

lets you create web pages with support for sorting, filtering, and rich visualizations by writing only
HTML and optionally some CSS and Javascript code.
 Appalachian

a Firefox add-on that adds the ability to manage and use several OpenIDs to ease the login parts
of your browsing experience.
 Seek

adds faceted browsing features to Mozilla Thunderbird and lets you search through your email
more effectively.
30
An Incomplete Picture
 For metadata specialists and system developers,

What about editing RDF?




What about building new ontologies?


Universidad Politécnica de Madrid’s School of Computing (FIUPM) have developed a
new method for building
multilingual ontologies that can be applied to the Semantic Web.
What about storing vast quantities of (potentially distributed) RDF and accessing it efficiently?



http://www.altova.com/features_RDF.html
http://www.cs.rpi.edu/~puninj/rdfeditor
http://rhodonite.angelite.nl
http://tucana.es.northropgrumman.com/solutions/technology.htm
What about using performance-enhancing techniques (such as caching) for RDF?
What about quickly inferencing over RDF data?
 For users,



Can we design faceted browsing interfaces that scale to dozens of RDF ontologies?
How about improving navigation across the linkages between ontologies?
How can we support searching that will start in one domain/ontology and expand into
relevant related domains/ontologies?
31
References
 SIMILE: Practical Metadata for the Semantic Web,
 by Stefano Mazzocchi, Stephen Garland, Ryan Lee [January 26, 2005]
 http://www.xml.com/pub/a/2005/01/26/simile.html
 http://simile.mit.edu/
 http://en.wikipedia.org/wiki/SIMILE
 “MIT’s SIMILE Project: Demonstrating Practical Value of Semantic Web Technology for
Digital Libraries” by MacKenzie Smith, MIT Libraries
 “Tutorial – Semantic Digital Libraries, Comparison and the Future” by Sebastian R. Kruk,
Bernhard Haslhofer, Philipp Nußbaumer, Sandy Payette, Tomasz Woroniecki, Univ. of Vienna, 2007.
32
33
Faceted browsing
 a technique for accessing a collection of information represented using a
faceted classification, allowing users to explore by filtering available
information.
 Displays only the metadata fields that are configured to be
'facets' (i.e., to be important for the user browsing data in one or more specific
domains)
 using values for those fields as a means for zooming into a collection by selecting
those items with a particular field-value pair (e.g., 26 works of art in the example dataset
have a subject of Abstract Expressionism).
 Provides a mechanism that allows users to explore different
schemas from different domains
 with a unified interface and to discover the synergies across them.
 For example, the interface can be designed to show users that one schema uses a
"subject" facet while another uses a "topic" facet for similar information.
34
Welkin (sample)
 Welkin is used to
browse a fragment
of the MIT
OpenCourseWare
metadata
converted to RDF.
35
Timeline*: visualizing temporal information
Behind the Curtain
 Four groups support SIMILE:
 HP Research Labs, the W3C, MIT Libraries, and MIT
CSAIL.
 The principal investigators have included
 Mick Bass, Eric Miller, MacKenzie Smith, and David
Karger.
 The developers are
 Stefano Mazzocchi, Stephen Garland, and Ryan Lee.

Mark Butler (bootstraper of the Longwell project)
37