Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002) December 19, 2002 Outline • Digital Libraries and the Distributed Information Environment. • Document Representation and Full-Text • Digital Library Tools • Illinois Projects. • XML Technologies. • Metadata Technologies. • DOIs, Linking, Local Resolver • Portals, Simultaneous Search, Linking • Grainger Search Aid • Issues & Trends. The Digital Library • ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time. • Tendency to apply term to collections and resources. • Digital Collections vs. Digital Library. • Emphasis on the integration of collections and services (e.g. NSDL grant). • Application of standards and protocols is important. Scholarly Communication Overview • • • • • • • E-Resources are Web-based and publisher-centric. Growth of Heterogeneous Distributed Repositories. Value-added services and ‘branding’ of journals. Prestige of Journals and Publishers Reciprocal linking relationships between publishers. Cooperation on linking standards (DOI, CrossRef). Alternative publishing models - Academia, Preprint Servers, disintermediation. Distributed Information Environment • We live in a world of multiple, heterogeneous information repositories, resources, portals, and IR systems. – OPACs – local, regional, national shared bibliographic databases. – Local and remote A & I Services. – Discrete publisher and vendor repositories (full-text). – Web search engines, vertical portals, custom portals (NSDL, ARL Portal). – Local metadata, digital objects, GIS, finding aids. – Preprint servers and institutional repositories (D-Space). – Instructional (course) management systems (WebCT, Blackboard). – Harvestable (OAI) sites and services. Distributed Repository - Issues • Integration of discrete, heterogeneous information resources. • Role of federated and broadcast searching of distributed resources. • Integration of collections with reference, instructional and navigation services -TOC, remote reference assistance. • Integration of Library, institutional, vendor, publisher, and government portals and information services. • Linking technologies. • Metadata harvesting, archiving. Distributed Environment Action Plan • Pressing need for document representation, retrieval, transmission, and linking middleware tools and standards. • Metadata standards, DOIs, OpenURL. • Factor: changing landscape of Scholarly Communication and disintermediation of publishers and libraries. • Federated search and simultaneous search with reference linking as mechanism to integrate DL landscape. Portal Functions: Linking: --Between full-text using DOI, CrossRef, Appropriate Copy. Web Client --Between A&I and full-text. --Between OPAC and full-text. Portal Presentation Level Local Link Server, Local Value-Added OPAC A& I Services (Local and Remote) E-Resource Registry Aggregator (Ebsco, OCLC) Full-Text Resources Publisher Portal (Elsevier) --Authorization --Linking mechanisms between resources and among resources. --Simultaneous search. --Navigation CrossRef Metadata DOI Server Web Local Databases Resources & and OAI Knowledge Resources via DBMS Environments Document Representation • Continuum of Web-Enabled technologies -all presently being utilized. • Evolving technologies and standards. • Role and history of markup. • XML: its role and importance. • The Smart Document. Digital Library Tools • We have at our disposal the tools to create integrated digital libraries from the distributed digital resources environment in which we operate: – Standard retrieval environment (Web) and interface/client (Web Browser); – Standard transport mechanisms to connect heterogeneous content (HTTP, OAI, SOAP); – Standard metalanguages and tools for describing and transforming content and metadata (XML, DTDs & Schemas, XSLT, DC/DCQ, RDF, METS); – Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50, Object Oriented Databases); – Standard linking tools and infrastructure (DOI, OpenURL, CrossRef). • Candidate set of ‘best practices’ for IR. Work by Illinois DLI Group • We are attempting to address many of these issues within the Digital Library Initiatives group. • Headquartered at Grainger Engineering Library Information Center at UIUC. • Grant Work: – Digital Library Initiative I (NSF, others), 1994-1998. – Corporation for National Research Initiatives (CNRI) D-Lib Test Suite, 1998-2001. – Collaborating Partners Program, 1998--. – Andrew Mellon Foundation OAI Harvesting grant, 2001-2002. – NSF NSDL (National Science, Engineering, Technology, and Mathematics Digital Library) Program, 2002-2004. – Institute of Museum and Library Services (IMLS) Registry and Integration grant, 2002-2005. Illinois Testbed Project • Funded under DLI-I by NSF, DARPA, and NASA, 1994--1998. Awards made to 6 universities. • Large-scale Testbed, Distributed Repository models, evaluation, Web software. • Funded under CNRI D-Lib Test Suite Program, 1998—2001. • Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier. • All XML Journal -- AIP, APS, ACM. Illinois Full-Text Testbed • American Institute of Physics--APL, JAP, RSI – 19,000+ articles, 1995--. • American Physical Society--PRL – 15,000+ articles, 1995--, weekly updates. • ASCE Journals (25 titles) – 11,000+ articles, 1995--. • IEE Proceedings and Electronics Letters – 9,500+ articles, 1993--. • IEEE Computer Society. • ASM (American Society for Materials) Handbook. • ACM (Association for Computing Machinery) Transactions. • Elsevier Science. Accomplishments • Process & retrieve from multiple publishers & heterogeneous DTDs. • SGML to XML Conversion. • Development of a metadata specification that uses RDF, Dublin Core (DCQ and XML) XML Schemas, local Namespace. • Cross-repository searching (Testbed & D-LIB Test Suite). Full-Text and Metadata. • XSLT, CSS, for transformation & rendering, including Mathematics. Accomplishments (2) • Introduction of numerous technologies now deployed within publisher repositories: – Forward and Backward links in bibliographies -- within Testbed/Repository, from/to A & I Services. – Use of XSLT for transforming XML to HTML. – Rich extended abstracts. • Conversion of ISO 12083 math markup to MathML. CSS/DHTML mathematics rendering. Use of plug-ins. • Enhanced Web retrieval mechanisms: Author Word Wheels, Co-Occurrence Matrices. • Local Link Server for DOIs, Context-Sensitive linking. • • • • • • • • XML (eXtensible Markup Language) Like SGML, a Data Description Metalanguage. XML a subset/version of SGML. Document representation and interchange Standard. Allows fine-granularity markup of content and structure. Author can create their own elements (extensible). Tags define the structure of document not the presentation format. Validated vs. “well-formed” - separation of authoring process from representation & presentation. Either validated in DTD/Schema or well-formed. Integrated with relational DBs. XML Features • The milestones in document description and transmission: ASCII, TCP/IP, HTTP and HTML, XML. Web Programmability. • DTD not required with XML. Needed if internal entities. • Use of Document Object Model (DOM). • Technology approach from Web developer’s standpoint: XML data, CSS presentation layer, XSLT to transform the structure (‘view’) of the data/document. XML in Information Technologies • Used in Open Archives Initiative (OAI), NSDL. • Compatible with MS SQL Server, Tamino (Software AG), Oracle, DLXS/XPAT (University of Michigan/OpenText), others. • Integral to Web Services (WSDL) and SOAP – Google Web Service. • Used in Library of Congress MODS and METS metadata technologies. • Baked into XyVision and publishing packages. XML, XSLT, and CSS • Use XML full-text articles as ordered hierarchy of content objects. • Generate item-level metadata in XML, using RDF and Dublin Core syntax and semantics. • XSLT and CSS used to present metadata and articles in either XML or HTML format depending on Browser. • Mathematics rendering using MathML tools (conversion from ISO 12083 to MathML). • Real-time transformation between XML and HTML using XSLT. Schemas vs. DTDs • Both are systems of representing a data model that defines the data’s elements and attributes, and the relationship among elements. • Schema addresses limitations of DTDs and the increasingly data-oriented role of XML. • W3C XML Schema Working Group: two documents: XML structures and datatypes. Schema Justification • Description of document type’s structure should be in an XML document instead of written in special syntax (DTD). • Schema are in XML: easier to edit and process using standard XML DOM manipulation tools. • DTD notation doesn’t allow schema designers the power to impose strong data typing -- for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices. Metadata and Linking Standards • Digital Object Identifier (DOI) and Persistent Object Identifiers. • OpenURL and Value-Added Service Components (SFX). • Open Archives Initiative (OAI), Dublin Core and Qualifiers, RDF. • Local Resolver Servers. Open Archives Initiative (OAI) • Released version 1.0 of metadata harvesting protocols. Frozen through second quarter 2001. • Mechanism for data providers to expose their metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories. • Roots in e-print archives. • Lightweight, low-barrier. Easy to implement Web server to handle OAI protocol requests; need to develop procedures to access and extract your metadata. Ongoing Investigations • Relationship between interoperability models for search and discovery: federated searching (OAI harvested) and broadcast, simultaneous searching of distributed repositories. Not mutually exclusive. • OAI Provider and Harvesting software. Encoding Archival Description (EAD). OAI Engineering/CS/Physics site. • Role of HTTP harvesting, Spider technology. • Reference Linking integration built on OpenURL and DOI. • Reference Assistant software with simultaneous search, point-of-contact assistance, and remote reference capability. Portals and Gateways • Role is to bring together and integrate disparate e-resources. • Provide a systematic ‘view’ of the information landscape, particularly full-text. • Two primary foci: robust search/navigation and the ability to link everywhere from anywhere in the environment of OPACs, A & I Services, full-text. • Central to this implementation is federated and simultaneous search and reference linking technologies. Digital Object Identifier (DOI) • DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. Persistent object identifier. • ‘The ISBN for the 21st Century’ -- Norman Paskin. • DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database. • Developed by AAP (Association of American Publishers), now managed by International DOI Foundation. DOI Construction • First real open standard for content identification. • DOI is a number that identifies a digital object: – 10.1063/S000369519903216 • 10 Registration Agency Prefix • 1063 Publisher Prefix • S000369519903216 Suffix (Publisher-assigned ID) • Suffix can be SICI or PII. • The DOI and URL pointing to the digital object, is registered with the International DOI Foundation, e.g: – 10.1063/333 | http://www.pubsite.org/apr99/artl1.pdf Using a DOI • DOIs are resolved using the Handle System technology from CNRI (Corporation for National research Initiatives). • Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g: – dx.doi.org/10.1063/333 redirects to www.pubsite.org/apr99/artl1.pdf Reference Linking • CrossRef Publisher system: major Sci-Tech professional societies and commercial publishers. • System design calls for one URL for each DOI; underlying technology can handle multiple URLs however. • Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator). Appropriate Copy problem. Cookie on OpenURL Client client (Web Browser) dx.doi.org/10.1063/1234 DOI Proxy Nosfx=y AIP Handle Server IEE Aware Elsevier Local AIP, IEE OpenURL CrossRef Metadata Database DOI Illinois Local Link Server Metadata UIUC Metadata Registry Local Value Added Simultaneous Search Implementations • • • • • • • • • • • DialIndex from Dialog. Ex Libris MetaLib service. Endeavor EnCompass. Innovative Interfaces MetaFind. Ovid Multiple Search and reference De-Duping. ISI Web of Knowledge. Gale Corporation InfoTrac Total Access. WebFeat. California Digital Library SearchLight system. Los Alamos FlashPoint system. Fretwell-Downing partnering with ARL Portal and Monash University. Grainger Search Aid • Assist users in the selection of appropriate databases . • Normalize user search arguments and display search results from candidate databases. • Cross-database asynchronous concurrent searching. • Article level and e-journal Web site access to publisher full-text repositories. • Utilize OpenURL, CrossRef metadata database and DOI for reference linking at the article level. • Proxying of vendor systems and capability of ‘taking over’ the search in vendor native mode. Grainger Search Aid Reference Assistant Project • Utilize Search Aid simultaneous search and link capabilities. • Opportunity to explore interface and navigation issues. • Mimics the behavior of reference librarian. • Allows the application of ‘best match’ and ‘quorum searching’ algorithms. Reference Assistant Top Menu Simultaneous Search Implementations • Shared Blackboard approach employing Independent Searchbots dedicated to searching information resources and passing results to Web clients. • Event-Driven, Asynchronous HTTP Queries from within a Single Script returning results to Web browser. Event-Driven, Asynchronous Queries • Single, event-driven web server process, asynchronously querying multiple resources. • Uses WinHTTP from ASP and VBScript • Simpler, not as flexible. Search algorithms and processing coded in scripts. • This is the approach we currently use for our service. • Implementation of multi-step login and session variable passthru being investigated. OpenURL-Based Services • Standard for expressing and transmitting metadata. • Promise of standardized, normalized search results. • Provides value-added links to the Ovid search results. • Using CrossRef metadata database to look up DOIs. CiteParse.dll • An ActiveX DLL which can parse various Ovid citations and turn them into OpenURLs: • Tansu N. Chang YL. Takeuchi T. Bour DP. Corzine SW. Tan MRT. Mawst LJ. Temperature analysis … quantum-well lasers. [Article] IEEE Journal of Quantum Electronics. 38(6):640-651, 2002 Jun. • http://…/resolver.asp?genre=article&aulast=Tansu&auinit1=N &atitle=Temperature+analysis+…+quantumwell+lasers&title=IEEE+Journal+of+Quantum+Electronics& volume=38&issue=6&spage=640&epage=651&pages=640651&date=2002-06 Conclusions • User reactions very positive. • The one-stop-shopping approach has been successful. • Users consider ability to link to full-text from citations in A & I Services and from references on publisher portals very helpful. • Technically, best approach appears to be a hybrid of asynchronous client interface with Web Services querying databases. Moves database middleware to Web Services and eliminates extensive custom script code for search and database query. Publishing Trends • Publishers will continue to add value to online journal articles. • Digital version will become version of record. • Virtual journals (both publisher-based and cross-publisher) will become common. • Next-generation knowledge environments will evolve. Multimedia, data exposed, live equations with in-place calculations. Publishing Trends (Continued) • Personalized services will be available -agent technology, alerting services. • Different economic and subscription models will be introduced. • Deconstruction of Journal (Bob Kelly, APS); article at a time publishing. • Journal branding or perhaps publisher branding. • Academia issues: publishing, tenure. Continuing Issues • Role of Authors, Academic Institutions, Libraries, Publishers, Abstracting & Indexing Services. • Disintermediation may affect both Libraries and Publishers. • Information as Function not Place. • Provide a ‘Digital Library’ out of digital collections. • Role of XML technology. • Service mechanisms: processing & archiving, search and discovery, presentation, linking.