Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
eXtensible Markup Language (XML) and Java CISC370/Object Oriented Programming with Java When ideas fail, words come in very handy. CISC370 – Object Oriented Programming with Java / © 2003 J. Six Use and Distribution Notice Possession of any of these files implies understanding and agreement to this policy. The slides are provided for the use of students enrolled in Jeff Six's Object Oriented Programming with Java class (CISC 370) at the University of Delaware. They are the creation of Mr. Six and he reserves all rights as to the slides. These slides are not to be modified or redistributed in any way. All of these slides may only be used by students for the purpose of reviewing the material covered in lecture. Any other use, including but not limited to, the modification of any slides or the sale of any slides or material, in whole or in part, is expressly prohibited. Most of the material in these slides, including the examples, is derived from multiple textbooks. Credit is hereby given to the authors of these textbook for much of the content. This content is used here for the purpose of presenting this material in CISC 370, which uses, or has used, these textbooks. CISC370 – Object Oriented Programming with Java / © 2003 J. Six J2EE Territory The technology that is covered as part of this topic is part of the Java 2 Enterprise Edition (J2EE). In order to program with this technology, you must first download and install the J2EE SDK (along with the J2SE SDK you should already have). As always, just go to java.sun.com. CISC370 – Object Oriented Programming with Java / © 2003 J. Six What is XML? How does Java relate? XML is a standard for defining data representations using tag syntax. Using XML, data can be encoded, stored, and transmitted, in a way that preserves meaning and structure, and maintains it in text format. Java class libraries exist to read, parse, validate, generate, and transform XML data. Java is one of the languages of choice for a great deal of XML software development – while the two technologies are often used together, there is no direct relation between them. CISC370 – Object Oriented Programming with Java / © 2003 J. Six What is XML? XML is a standardized way of defining tagged syntax for data. In an XML document: you identify data using tags, like this: <name> you place content between tags, like this: <name>Jeffrey A. Six</name> tags can have attributes, like this: <name language="English">Jeffrey A. Six</name> XML documents are always text. This makes it relatively easy to create, examine, and debug them. CISC370 – Object Oriented Programming with Java / © 2003 J. Six Well-Formed-ness All XML documents must be wellformed. Well-formed means that each tag must have a corresponding end tag. For example, <name>Jeffrey A. Six</name> Start Tag for a Name piece of data Corresponding End tag CISC370 – Object Oriented Programming with Java / © 2003 J. Six Examples of XML Syntax Tags: <date>22 October 2002</date> <yesterday/> <date><month>10</month><day>22</day></date> Attributes: <time zone=“-2">5:57:39</time> <time gmt='yes'>7:57:39</time> Comments: <!-- XML data generated by Jeff Six --> Entity References &otherLoc; ©r; &LegalDisclaimer; Document type reference <!DOCTYPE MAIL SYSTEM “email.dtd"> CISC370 – Object Oriented Programming with Java / © 2003 J. Six XML Syntax Example <?xml version="1.0"?> <!DOCTYPE mail SYSTEM “email.dtd"> <mail> <message> <to>[email protected]</to> <from>[email protected]</from> <body type="text/plain">Work harder!</body> <unread/> </message> <message importance=“high"> <to>[email protected]</to> <from>[email protected]</from> <subject>CISC370</subject> <body>You are not doing well!</body> </message> </mail> CISC370 – Object Oriented Programming with Java / © 2003 J. Six XML Syntax Example mail message to from message body unread to from Importance=high subject body type=text/plain You are not doing well! Work harder! [email protected] CISC370 [email protected] [email protected] [email protected] CISC370 – Object Oriented Programming with Java / © 2003 J. Six XML Structure Specifications XML documents not accompanied by some kind of specification can contain any tags. This makes standardized communication VERY hard! Most XML documents obey a particular set of rules as to their structure; this defines what tags, attributes, and entities may appear in the document (more about that in a sec…). XML documents can use namespaces, which are separate scopes within which tags are defined (very similar to namespaces in C++). CISC370 – Object Oriented Programming with Java / © 2003 J. Six DTDs and Schemas The first way of specifying valid tags for a XML document is a Document Type Definition (DTD). A DTD is written in SGML (which has a very confusing syntax!). The DTD approach is not used very much anymore. The newer way of specifying the valid tags for a XML document is a schema. This is document that it itself written in XML. This makes it much easier to use to define the structure of a XML document. Schemas are the current and future way of defining and specifying XML document structure. CISC370 – Object Oriented Programming with Java / © 2003 J. Six XML Programming Generally, programs (and thus programmers) deal with XML in one of three ways… Parsing - given an XML document as input, extract the stored data and process it programmatically (XML doc as program input). Generation - given some data, generate an XML representation of it (XML doc as program output). Transformation - given an XML document, transform it into some other kind of document (XML doc as both program input and output). CISC370 – Object Oriented Programming with Java / © 2003 J. Six Java XML Programming To use Java for XML programming, you can use the Java APIs for XML Processing (JAXP). This is included with Java 1.4 and later (and is available as as downloadable component for Java 1.3.x). These APIs make programmatically working with XML under Java very easy and straightforward. Note that this package is Sun’s collection of XML APIs. There are other packages available, which all work a little different (it is just a library of classes and methods). CISC370 – Object Oriented Programming with Java / © 2003 J. Six Fundamental Models for XML Parsing Two basic programming models exist for XML parsing in Java: SAX - Simple API for XML Serial, sequential access to XML tags and content, event-oriented callback model, fast and low overhead, difficult to use for transforms DOM - Document Object Model for XML Tree-based access to entire XML document, data traversal model, keeps entire document in memory, easy to use for transforms CISC370 – Object Oriented Programming with Java / © 2003 J. Six The SAX Programming Model A SAX Parser chops a XML document into a sequence of events (an event is encountering an element of the XML document). The parser delivers XML events, and errors if any occur, to your application by calling methods on handler objects (provided by your program). Handler objects must implement particular interfaces: org.xml.sax.ContentHandler for receiving events about document contents and tags org.xml.sax.ErrorHandler for receiving errors and warnings that occur during parsing org.xml.sax.DTDHandler org.xml.sax.EntityResolver CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Parsers Most SAX parsers can read XML data (the XML document) from any valid java.io.Reader (remember, the Reader category of classes handle text input), or directly from a URL (for directly parsing XML that exists on the Internet). SAX parsers can be validating or nonvalidating. Validation refers to the checking of a document for conformance to its specified DTD or schema. Most SAX parsers (including the one in the JAXP package) support both operational modes. CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Event Passing <?xml version="1.0" ?> <mail><message> <from>[email protected]</from> </message></mail> setDocumentLocator input Reader startDocument startElement SAX Parser endElement Locator Callback methods characters The locator is an object that keeps track of the current position in the doc. endDocument Content Handler CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Basics In order to use SAX, you need to import the relevant packages (assuming you are using the JAXP packages)… import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*; To do any parsing, you need a SAX XMLReader object. Very easy to construct… SAXParserFactory sp_factory = SAXParserFactory.newInstance(); SAXParser sp = sp_factory.newSAXParser(); XMLReader theParser = sp.getXMLReader(); CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Errors When a SAX Parser is parsing, there are three kinds of errors it can encounter: warning a Not non-serious problem occurred all SAX parses treat error aeach serious but recoverable condition in the problem same occurred fatalError a problem occurred that is so grave category. Check the docs! that parsing cannot continue. Parsing Can Continue Usually, warning and error conditions result from violation of document validation (DTD or schema constraints) FatalError conditions result from I/O problems or nonwell-formed XML. All SAX parsers can throw SAXExceptions. Any SAX handler method (the methods the parser calls on your handler object) can also throw a SAXException. CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Example Let’s look at a very simple SAX example program. This example program consists of two classes: SimpleXML1 - supplies a main() method that creates a SAX parser and invokes it to parse a file specified on the command line. SAXHandler1 - acts as the SAX ContentHandler and ErrorHandler. CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import import import import java.io.*; org.xml.sax.*; org.xml.sax.helpers.*; javax.xml.parsers.*; public class SimpleXML1 { public static void main(String [] args) { try { System.out.println("Creating and setting up the SAX parser."); SAXParserFactory sp_factory = SAXParserFactory.newInstance(); XMLReader theReader = sp_factory.newSAXParser().getXMLReader(); SAXHandler1 theHandler = new SAXHandler1(); theReader.setContentHandler(theHandler); theReader.setErrorHandler(theHandler); theReader.setFeature("http://xml.org/sax/features/validation", false); System.out.println("Making InputSource for " + args[0]); FileReader file_in = new FileReader(args[0]); System.out.println("About to parse...”); theReader.parse(new InputSource(file_in)); System.out.println("...parsing done."); } catch (Exception e) { System.err.println("Error: " + e); e.printStackTrace(); } } } CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import org.xml.sax.*; public class SAXHandler1 implements ContentHandler, ErrorHandler { private Locator loc = null; public void setDocumentLocator(Locator l) { loc = l; } Calledpublic whenvoid a characters(char [] ch, int st, int String s tag = new String(ch, st, len); starting element // System.out.println("Got content string '" return; is encountered. Called when a set of characters is encountered. len) { + s + "'"); } public void startElement(String uri, String lname, String qname, Attributes attrs) { System.out.print(lname + " tag with "); System.out.print(attrs.getLength() + " attrs starts"); System.out.println(" at line " + loc.getLineNumber()); } CISC370 – Object Oriented Programming with Java / © 2003 J. Six SAX Programming – Example 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50 51 52 public void endElement(String uri, String lname, String qname) { System.out.print(lname + " tag ends "); System.out.println("at line " + loc.getLineNumber()); } public public public public public public public void void void void void void void Called when a SAX warning is } encountered. startDocument() { } endDocument() { } processingInstruction(String t, String d) { skippedEntity(String name) { } ignorableWhitespace(char[] ch, int st, int len) { } startPrefixMapping(String p, String uri) { } endPrefixMapping(String p) { } Called when an ending element tag is encountered. public void warning(SAXParseException e) { System.err.print("SAX Warning: " + e); System.err.println(" at line " + loc.getLineNumber()); } public void error(SAXParseException e) { System.err.print("SAX Error: " + e); System.err.println(" at line " + loc.getLineNumber()); } Called when the start or end of the doc is encountered. Called when a SAX fatal error is encountered. public void fatalError(SAXParseException e) { System.err.print("SAX Fatal Error: " + e); System.err.println(" at line " + loc.getLineNumber()); } } Called when a SAX error is encountered. CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Overview In contrast to SAX, a DOM parser builds an inmemory tree representation of the entire XML document. A reference to this tree is returned to your program. The DOM tree is composed of objects, most of which implement the following interfaces from the org.w3c.dom package: Node Element Document Text Attr parent interface of all DOM tree nodes represents a tag in a XML document, may have children represents an entire XML document, always has children contents of an element or an attribute represents an attribute of an element CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Overview Once the parser has created the DOM tree, your program can manipulate it as it wants to. It can traverse it, modify it, print it out, apply XSL transforms to it (XML Stylesheet Language Transforms … more about that in a bit), and many more functions. Like most SAX parsers, most DOM parsers allow the specification of either validation or nonvalidation modes. Remember, validation refers to checking for conformance against the document’s specified DTD or schema. CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Model <?xml version="1.0" ?> <mail><message> <from>[email protected]</from> </message></mail> Application Code input Reader parse return DOM tree Parser Document Builder DTD or schema (if any) CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Creating the DOM Tree Although not relevant to using the DOM model for programming, the question often arises…how does the DOM API generate the tree representing the XML document? The answer is simple…it uses a SAX parser! The DOM APIs run a SAX parser to parse the XML document in an event-driven manner, constructing the DOM tree in memory. CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Basics To create a DOM DocumentBuilder (with the JAXP packages), construction is easy… DocumentBuilderFactory dbuilder_factory = DocumentBuilderFactory.newInstance(); dbuilder_factory.setValidating(false); DocumentBuilder dbuilder = dbuilder_factory.newDocumentBuilder(); Building a document tree from XML data found in a particular file is just as easy… FileReader file_in = new FileReader(“thefile.xml”); Document doc = dbuilder.parse(new InputSource(file_in)); CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import org.w3c.dom.*; import javax.xml.parsers.*; import org.xml.sax.*; public class SimpleXML2 { public static void main(String [] args) { String uri = args[0]; try { System.out.println("Creating Document builder."); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setValidating(false); DocumentBuilder db = dbf.newDocumentBuilder(); System.out.println(“Ready to parse!”); FileReader file_in = new FileReader(“theFile.xml”); Document doc = db.parse(new InputSource(file_in)); System.out.println("Parsed document, ready to process."); myDOMTreeProcessor proc = new myDOMTreeProcessor(); proc.process(doc, System.out); } catch (Exception e) { System.err.println("XML Exception thrown: " + e); e.printStackTrace(); } } } CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 import org.w3c.dom.*; public class myDOMTreeProcessor { public void process(Document doc, java.io.PrintStream os) { printTree(doc, os, 0); } void printTree(Node n, java.io.PrintStream os, int indent) { int i; switch (n.getNodeType()) { case Node.ATTRIBUTE_NODE: for(i = 0; i < indent; i++) os.print(" "); os.println("Attr " + n.getNodeName() + "=" + n.getNodeValue()); break; case Node.TEXT_NODE: case Node.CDATA_SECTION_NODE: // os.println("Text '" + n.getNodeValue() + "'."); break; case Node.DOCUMENT_NODE: { os.println("DOCUMENT:"); NodeList kids = n.getChildNodes(); for(i = 0; i < kids.getLength(); i++) printTree(kids.item(i), os, indent + 2); break; } CISC370 – Object Oriented Programming with Java / © 2003 J. Six DOM Programming – Example 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 case Node.DOCUMENT_TYPE_NODE: for(i = 0; i < indent; i++) os.print(" "); os.println("Document type is " + n.getNodeName()); break; case Node.ELEMENT_NODE: { for(i = 0; i < indent; i++) os.print(" "); os.println("Element " + n.getNodeName()); NamedNodeMap attrs = n.getAttributes(); for(i = 0; i < attrs.getLength(); i++) printTree(attrs.item(i), os, indent + 5); NodeList kids = n.getChildNodes(); for(i = 0; i < kids.getLength(); i++) printTree(kids.item(i), os, indent + 2); break; } default: os.println("Unexpected kind of node - ignored."); break; } } } CISC370 – Object Oriented Programming with Java / © 2003 J. Six Changing the DOM Tree Once you have the DOM tree, you can: search for particular nodes extract element and text values You can alter the tree: add elements to the tree, or change them add attributes to elements, or change their values re-arrange elements and subtrees synthesize entirely new trees or subtrees If you create a new tree or subtree, you should apply the normalize() method before processing it further (this optimizes the layout). CISC370 – Object Oriented Programming with Java / © 2003 J. Six XSL and Transformations The XML Stylesheet Language (XSL) is a core XML technology for performing transformations to and on XML documents. XSLT offers extensive template matching and processing instructions for transforming XML data into other XML-like data (e.g. XML->XML, XML->HTML, XML->text) An XSL Processor is a system (usually invoked through an object interface) that can apply XSLT transforms to some XML input and generate results (normally the data in another form). The transformation rules are expressed in XSL, a programming language onto itself, geared for pattern matching and formatting…it is similar to LISP. JAXP includes the Java implementation of the Apache Xalan XSLT processor. CISC370 – Object Oriented Programming with Java / © 2003 J. Six XML Programming Transformations One of the advantages of the DOM programming model is that it can be used easily with XSL. JAXP fully supports XSLT with the package javax.xml.transform and its sub-packages. The sequence of steps for creating and applying transforms with XSLT and Xalan is: Obtain or create an initial DOM tree, create a Source from it. Create a TransformerFactory. Create a Templates object based on a particular XSLT document. Create a Transformer object from the Templates object. Create an output Results object. Apply the Transformer to the Source and the Result. CISC370 – Object Oriented Programming with Java / © 2003 J. Six XSLT Programming – Example This program applies a supplied XSL file to an XML file, and produces an output file. % java SimpleXML3 tiny.xml mail2html.xsl tiny-res.html Input XML: tiny.xml SimpleXML3 Input DTD: mail1.dtd DOM Parser Stylesheet: mail2html.xsl XSLT Processor Output HTML: tiny-res.html CISC370 – Object Oriented Programming with Java / © 2003 J. Six XSLT Programming – Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 import import import import org.w3c.dom.*; javax.xml.parsers.*; javax.xml.transform.*; org.xml.sax.*; public class SimpleXML3 { public static void main(String [] args) { String src_file = args[0]; String template_file = args[1]; String res_file = args[2]; try { System.out.println("Creating Document builder."); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setValidating(true); DocumentBuilder db = dbf.newDocumentBuilder(); FileReader file_in = new InputSource(src_file); Document doc = db.parse(file_in); System.out.println("Parsed document okay"); System.out.println("Using " + template_file + " to make " + res_file); myXSLTProcessor proc = new myXSLTProcessor(); proc.process(doc, temmplate_file, res_file); } catch (Exception e) { System.err.println("XML Exception thrown: " + e); } } } CISC370 – Object Oriented Programming with Java / © 2003 J. Six XSLT Programming – Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import import import import import org.w3c.dom.*; javax.xml.parsers.*; javax.xml.transform.*; javax.xml.transform.dom.*; javax.xml.transform.stream.*; public class myXSLTProcessor { public void process(Document d, String tmpl, String res) throws TransformerException, DOMException { System.out.println("Creating transformer factory."); TransformerFactory tf = TransformerFactory.newInstance(); System.out.println("Creating sources and results."); DOMSource ds = new DOMSource(d); StreamSource ss = new StreamSource(tmpl); StreamResult sr = new StreamResult(new java.io.File(res)); System.out.println("Creating template & transformer"); Templates tt = tf.newTemplates(ss); Transformer xformer = tt.newTransformer(); System.out.println("Performing transform to make " + res); xformer.transform(ds, sr); } } CISC370 – Object Oriented Programming with Java / © 2003 J. Six Summary XML is a technology for storing data in a standard, portable, and text-based, way. Java packages exist to help your Java programs parse XML, build XML documents, and perform XML transformations. There are two basic models for XML parsing and Java has support for both… SAX: sequential access, event-driven DOM: in-memory tree structure XSLT can be used to transform XML into other XML or other textual formats.