Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES LIS1510 Library and Archives Automation Issues XML and extensible systems Andy Dawson School of Library, Archive & Information Studies, UCL (University of Malta 2008) Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES What we will be covering today • • • • • Shortcomings of HTML Generalised markup languages How XML works XML document types Other related extensible technologies Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Limitations of (X)HTML • Fixed tag set (specifications determined by W3C) • Intended for display of documents on the Web • Doesn’t do everything everyone wants • Not easy to use for other purposes – searching in documents – analysis of documents Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Principles of Generalized Markup • Descriptive markup – encodes features within a document • Say what those features are - not what to do with them • Need to define your own tags • Creates machine-independent data • Data can then be used for different purposes Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES SGML • SGML – Standard Generalized Markup Language – International standard in 1986 – Metalanguage (syntactic framework) for defining markup tags – Parts of SGML are rather complex – Used by large projects – Not particularly easy to get started Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES XML • XML (Extensible Markup Language) – Adopted by World Wide Web Consortium in 1998 – Cut-down version of SGML – Based on same principles – Designed to implement easily on the Web Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Advantages of XML • • • • Machine-independent plain ASCII files Potential longevity Multi-purpose use Ability to analyse/manipulate content • BUT need to define tag set! • Not a replacement for HTML unless analysis/manipulation of data is required • However, XHTML has become a ‘reliable’ alternative option for simple web publishing Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Defining Your Own Tags • Need to undertake document analysis – Identify key features in document – Identify structure of document – Choose names for tags • Only then can we apply the tag scheme Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Example of a Newspaper Name of newspaper Issue Article Headline Author Paragraphs Pictures Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Basics of XML Syntax • Documents are composed of elements • Start and end tags for every element - unlike HTML, end tags must be present – also “Empty elements” • Attributes – modify an element – have a name and a value – Value must be enclosed in matching quotes (single or double) – An element may have several attributes • Documents can be “Well-formed” or “Valid” Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Well-formed Documents • Well-formed documents follow XML syntax i.e. – start and end tags – attributes in quotes – nested structure • But they have no pre-defined structure! • Therefore: – Can only check the syntax – Cannot validate the structure of well-formed documents • Prepares documents for potential use/conversion Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Valid Documents • A Valid XML document contains (or refers to) a Document Type Definition (DTD) • The DTD is a specification of the document structure identifying – which elements are allowed – where they are allowed – which attributes they may take Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Related technologies • CSS – Cascading Style Sheets – As used with HTML – Concentrate only on appearance • XHTML – Version of HTML conformant with XML syntax • XSL - eXtensible Stylesheet Language – XML language for style sheets – Controls the appearance of the elements within the document & defines templates for processing elements • XML Schemas – Another way of defining document information Andy Dawson SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES That’s all folks… • Any questions? • Optional XML exercise is available…anyone? • Otherwise – carry on with your coursework • Next Tuesday: Website management and last chance to finish off your website! …and have a nice weekend Andy Dawson