Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Standards For Hybrid Libraries: Web Standards Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY [email protected] http://www.ukoln.ac.uk/ UKOLN is funded by the Library and Information Commission, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. Contents • Introduction • Background To The Web Architecture: • Addressing • Data Format • Transfer • Metadata • Conclusions 2 Standardisation Community • Library groups • Cultural Heritage • Government Proprietary 23950 PNG HTML Java? • De facto standards • Often initially Formal appealing (cf GIF, • Formal international/ PowerPoint, PDF) national standards • May emerge as processes W3C standards • ISO, CEN, NISO, • Produces W3C ECMA, ANSI, BSI… Relevant Recommendations • Can be slow-moving • Managed approach Bodies and bureaucratic • Protocols initially • Produce robust IETF developed by standards • Produces Internet W3C members Drafts on Internet protocols • Decisions made by W3C, influenced by • Bottom-up approach to developments • Protocols developed by HTTP member & PNG interested individuals URN public HTML • "Rough consensus and working whois++ review HTTP 3 code" Background to the Web The web was initially very successful due to its simplicity HTML Client Mosiac Netscape IE Give me foo.html from www.bath.ac.uk Here it is Server CERN Apache IIS The web is based on three key architectural components: 4 Data Format: HTML (HyperText Markup Language) Addressing: URLs (Uniform Resource Locators) Transport: HTTP (Hypertext Transfer Protocol) URLs HTML HTTP Problems With the Web Although the web has been successful, there are problems: • Performance - the web is too slow • Resource discovery - lack of a metadata architecture • HTML’s lack of arbitrary structure • Accessibility - difficulties of accessing information by visually impaired, people using PDAs, etc. • Functionality - difficult to deploy new applications on the web • Addressing • etc. 5 Solutions (Today) HTML 4.0 used in conjunction with CSS 2.0 (Cascading Style Sheets) and the DOM provides an architecturally pure, yet functionally rich environment HTML 4.0 - W3C-Rec • Improved forms • Hooks for stylesheets • Hooks for scripting languages • Table enhancements • Better printing Problems • Changes during CSS development • Netscape & IE incompatibilities • Continued use of browsers with known bugs 6 CSS 2.0 - W3C-Rec • Support for all HTML formatting • Positioning of HTML elements • Multiple media support DOM - W3C-Rec • Document Object Model • Hooks for scripting languages • Permits changes to HTML & CSS properties and content HTML's Limitations HTML 4.0 / CSS 2.0 have limitations: • Difficulties in introducing new elements Time-consuming standardisation process (<ABBREV>) Dictated by browser vendor (<BLINK>, <MARQUEE>) • Area may be inappropriate for standarisation: Covers specialist area (maths, music, ...) Application-specific (<STUD-NUM>) • HTML is a display (output) format • HTML's lack of arbitrary structure limits functionality: Find all memos copied to John Smith How many unique tracks on Jackson Browne CDs 7 XML XML: • • • • • • • • 8 Extensible Markup Language A lightweight SGML designed for network use Addresses HTML's lack of evolvability Arbitrary elements can be defined (<STUDENTNUMBER>, <PART-NO>, etc) Agreement achieved quickly - XML 1.0 became W3C Recommendation in Feb 1998 Forms the basis of B2B applications Support from industry (SGML vendors, Microsoft, etc.) Support in Netscape 5 and IE 5 XML Deployment Ariadne issue 15 has article on "What Is XML?" Describes how XML support can be provided: • Natively by new browsers • Back end conversion of XML - HTML • Client-side conversion of XML - HTML / CSS • Java rendering of XML Examples of intermediaries See http://www.ariadne.ac.uk/issue15/what-is/ 9 XHTML XHTML: • an XML representation of HTML Issues: • Documents must be well-formed • Tags in lowercase • Quote attributes: <img src="foo" height="10" •<li>End tags required</li> • Empty elements: <img src="foo" / > <br / > • Tidy utility – see <http://www.w3.org/People/Raggett/tidy/> • See <URL: http://www.w3.org/TR/ WD-html-in-xml/> Question: Is it time to produce XHTML documents? 10 Namespaces and Linking XML Namespaces What if an XML document contains a <TITLE> for the document and a <TITLE> for the name of a book? XML Namespaces enable such clashes to be resolved The naming conventions are defined at a URL XSL stylesheet language will provide extensibility and transformation facilities (e.g. create a table of contents or create metadata from structured data) XLink and XPointer should provide richer hyperlinking mechanisms in the future 11 Addressing (Problems) URLs (e.g. http://www.bris-poly.ac.uk/ depts/music/) have limitations: • Lack of long-term persistency – Organisation changes name – Department shut down or merged – Directory structure reorganised • Inability to support multiple versions of resources (mirroring) ISBN/ISSN also problematic: • Not tied to the work • Nor to the item at hand 12 Addressing (Solutions) PURLs (Persistent URLs): • Provide single level of redirection DOIs (Document Object Identifiers): • Proposed by publishing industry as a solution • Aimed at supporting rights ownership • Business model needed • Do two copies of a digital object get separate DOIs? 13 Transport HTTP/0.9 and HTTP/1.0: Design flaws and implementation problems HTTP/1.1: Addresses some of these problems 60% server support Performance benefits! (60% packet traffic reduction) Is acting as fire-fighter Not sufficiently flexible or extensible HTTP/NG: 14 Radical redesign using object-oriented technologies Undergoing trials Gradual transition (using proxies) Integration of application (distributed searching?) Metadata Metadata - the missing architectural component from the initial implementation of the web Addressing URL Metadata Needs: 15 • • • • • • Resource discovery Content filtering Authentication Improved navigation Multiple format support Rights management Transport Data format HTTP HTML RDF RDF Data Model RDF - the metadata framework Resource • Based on a formal data model (direct label graphs) • Syntax for interchange of data • Schema model page.html Cost Property PropName Cost 16 Value Property page.html £0.05 PropObj InstanceOf PropertyType Value ValidUntil 11-May-98 Cost £0.05 ValidUntil 11-May-98 Conclusions To conclude: • Standards are important, especially for national initiatives and other large-scale services • Proprietary solutions are often tempting because: – – – – They are available They are often well-marketed and well-supported They may become standardised Solutions based on standards may not be properly supported by applications • Metadata and structured data formats are big growth areas • Deployment of new standards is an important question 17