Download eXtensible Markup Language (XML) and Java

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
eXtensible Markup Language
(XML) and Java
CISC370/Object Oriented
Programming with Java
When ideas fail, words come in very handy.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Use and Distribution Notice
Possession of any of these files implies understanding and
agreement to this policy.
The slides are provided for the use of students enrolled in Jeff
Six's Object Oriented Programming with Java class (CISC 370) at
the University of Delaware. They are the creation of Mr. Six and
he reserves all rights as to the slides. These slides are not to be
modified or redistributed in any way. All of these slides may only
be used by students for the purpose of reviewing the material
covered in lecture. Any other use, including but not limited to, the
modification of any slides or the sale of any slides or material, in
whole or in part, is expressly prohibited.
Most of the material in these slides, including the examples, is
derived from multiple textbooks. Credit is hereby given to the
authors of these textbook for much of the content. This content
is used here for the purpose of presenting this material in CISC
370, which uses, or has used, these textbooks.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
J2EE Territory
The technology that is covered as part
of this topic is part of the Java 2
Enterprise Edition (J2EE).
In order to program with this
technology, you must first download
and install the J2EE SDK (along with the
J2SE SDK you should already have).
As always, just go to java.sun.com.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
What is XML?
How does Java relate?
XML is a standard for defining data
representations using tag syntax.
Using XML, data can be encoded, stored, and
transmitted, in a way that preserves meaning
and structure, and maintains it in text format.
Java class libraries exist to read, parse,
validate, generate, and transform XML data.
Java is one of the languages of choice for a
great deal of XML software development –
while the two technologies are often used
together, there is no direct relation between
them.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
What is XML?
XML is a standardized way of defining tagged
syntax for data.
In an XML document:


you identify data using tags, like this: <name>
you place content between tags, like this:
<name>Jeffrey A. Six</name>

tags can have attributes, like this:
<name language="English">Jeffrey A.
Six</name>
XML documents are always text. This makes
it relatively easy to create, examine, and debug
them.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Well-Formed-ness
All XML documents must be wellformed.
Well-formed means that each tag must
have a corresponding end tag. For
example,
<name>Jeffrey A. Six</name>
Start Tag for a
Name piece of data
Corresponding
End tag
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Examples of XML Syntax
Tags:
<date>22 October 2002</date>
<yesterday/>
<date><month>10</month><day>22</day></date>
Attributes:
<time zone=“-2">5:57:39</time>
<time gmt='yes'>7:57:39</time>
Comments:
<!-- XML data generated by Jeff Six -->
Entity References
&otherLoc;
&copyr;
&LegalDisclaimer;
Document type reference
<!DOCTYPE MAIL SYSTEM “email.dtd">
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XML Syntax Example
<?xml version="1.0"?>
<!DOCTYPE mail SYSTEM “email.dtd">
<mail>
<message>
<to>[email protected]</to>
<from>[email protected]</from>
<body type="text/plain">Work harder!</body>
<unread/>
</message>
<message importance=“high">
<to>[email protected]</to>
<from>[email protected]</from>
<subject>CISC370</subject>
<body>You are not doing well!</body>
</message>
</mail>
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XML Syntax Example
mail
message
to
from
message
body
unread
to
from
Importance=high
subject
body
type=text/plain
You are
not doing
well!
Work harder!
[email protected]
CISC370
[email protected]
[email protected]
[email protected]
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XML Structure Specifications
XML documents not accompanied by some
kind of specification can contain any tags.

This makes standardized communication VERY hard!
Most XML documents obey a particular set of
rules as to their structure; this defines what
tags, attributes, and entities may appear in
the document (more about that in a sec…).
XML documents can use namespaces, which
are separate scopes within which tags are
defined (very similar to namespaces in C++).
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DTDs and Schemas
The first way of specifying valid tags for a XML
document is a Document Type Definition (DTD).
A DTD is written in SGML (which has a very
confusing syntax!). The DTD approach is not
used very much anymore.
The newer way of specifying the valid tags for a
XML document is a schema. This is document
that it itself written in XML. This makes it much
easier to use to define the structure of a XML
document. Schemas are the current and future
way of defining and specifying XML document
structure.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XML Programming
Generally, programs (and thus
programmers) deal with XML in one of
three ways…



Parsing - given an XML document as input,
extract the stored data and process it
programmatically (XML doc as program input).
Generation - given some data, generate an
XML representation of it (XML doc as program
output).
Transformation - given an XML document,
transform it into some other kind of document
(XML doc as both program input and output).
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Java XML Programming
To use Java for XML programming, you can
use the Java APIs for XML Processing (JAXP).
This is included with Java 1.4 and later (and
is available as as downloadable component
for Java 1.3.x).
These APIs make programmatically working
with XML under Java very easy and
straightforward.
Note that this package is Sun’s collection of
XML APIs. There are other packages
available, which all work a little different (it is
just a library of classes and methods).
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Fundamental Models
for XML Parsing
Two basic programming models exist for
XML parsing in Java:


SAX - Simple API for XML
Serial, sequential access to XML tags and
content,
event-oriented callback model, fast and low
overhead, difficult to use for transforms
DOM - Document Object Model for XML
Tree-based access to entire XML document,
data traversal model, keeps entire document
in memory, easy to use for transforms
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
The SAX Programming Model
A SAX Parser chops a XML document into a
sequence of events (an event is encountering an
element of the XML document).


The parser delivers XML events, and errors if any
occur, to your application by calling methods on
handler objects (provided by your program).
Handler objects must implement particular interfaces:
 org.xml.sax.ContentHandler
for receiving events about document contents and tags
 org.xml.sax.ErrorHandler
for receiving errors and warnings that occur during parsing
 org.xml.sax.DTDHandler
 org.xml.sax.EntityResolver
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Parsers
Most SAX parsers can read XML data (the
XML document) from any valid
java.io.Reader (remember, the Reader
category of classes handle text input), or
directly from a URL (for directly parsing XML
that exists on the Internet).
SAX parsers can be validating or nonvalidating. Validation refers to the checking
of a document for conformance to its
specified DTD or schema. Most SAX parsers
(including the one in the JAXP package)
support both operational modes.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Event Passing
<?xml version="1.0" ?>
<mail><message>
<from>[email protected]</from>
</message></mail>
setDocumentLocator
input
Reader
startDocument
startElement
SAX
Parser
endElement
Locator
Callback
methods
characters
The locator is an
object that keeps
track of the current
position in the doc.
endDocument
Content Handler
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Basics
In order to use SAX, you need to import the
relevant packages (assuming you are using the
JAXP packages)…
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import javax.xml.parsers.*;
To do any parsing, you need a SAX XMLReader
object. Very easy to construct…
SAXParserFactory sp_factory =
SAXParserFactory.newInstance();
SAXParser sp = sp_factory.newSAXParser();
XMLReader theParser = sp.getXMLReader();
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Errors
When a SAX Parser is parsing, there are three kinds of
errors it can encounter:



warning
a Not
non-serious
problem
occurred
all SAX
parses
treat
error
aeach
serious
but recoverable
condition
in the problem
same occurred
fatalError a problem occurred that is so grave
category. Check the docs!
that parsing cannot continue.
Parsing
Can
Continue
Usually, warning and error conditions result from
violation of document validation (DTD or schema
constraints)
FatalError conditions result from I/O problems or nonwell-formed XML.
All SAX parsers can throw SAXExceptions. Any SAX
handler method (the methods the parser calls on your
handler object) can also throw a SAXException.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Example
Let’s look at a very simple SAX example
program.
This example program consists of two
classes:


SimpleXML1 - supplies a main() method
that creates a SAX parser and invokes it to
parse a file specified on the command line.
SAXHandler1 - acts as the SAX
ContentHandler and ErrorHandler.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import
import
import
import
java.io.*;
org.xml.sax.*;
org.xml.sax.helpers.*;
javax.xml.parsers.*;
public class SimpleXML1 {
public static void main(String [] args) {
try {
System.out.println("Creating and setting up the SAX parser.");
SAXParserFactory sp_factory = SAXParserFactory.newInstance();
XMLReader theReader = sp_factory.newSAXParser().getXMLReader();
SAXHandler1 theHandler = new SAXHandler1();
theReader.setContentHandler(theHandler);
theReader.setErrorHandler(theHandler);
theReader.setFeature("http://xml.org/sax/features/validation", false);
System.out.println("Making InputSource for " + args[0]);
FileReader file_in = new FileReader(args[0]);
System.out.println("About to parse...”);
theReader.parse(new InputSource(file_in));
System.out.println("...parsing done.");
}
catch (Exception e) {
System.err.println("Error: " + e); e.printStackTrace();
}
}
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import org.xml.sax.*;
public class SAXHandler1
implements ContentHandler, ErrorHandler
{
private Locator loc = null;
public void setDocumentLocator(Locator l) {
loc = l;
}
Calledpublic
whenvoid
a characters(char [] ch, int st, int
String s tag
= new String(ch, st, len);
starting element
// System.out.println("Got content string '"
return;
is encountered.
Called when a set
of characters is
encountered.
len) {
+ s + "'");
}
public void startElement(String uri, String lname,
String qname, Attributes attrs) {
System.out.print(lname + " tag with ");
System.out.print(attrs.getLength() + " attrs starts");
System.out.println(" at line " + loc.getLineNumber());
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
SAX Programming –
Example
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
44
45
46
47
48
49
50
51
52
public void endElement(String uri, String lname, String qname) {
System.out.print(lname + " tag ends ");
System.out.println("at line " + loc.getLineNumber());
}
public
public
public
public
public
public
public
void
void
void
void
void
void
void
Called when a
SAX warning is
}
encountered.
startDocument() { }
endDocument() { }
processingInstruction(String t, String d) {
skippedEntity(String name) { }
ignorableWhitespace(char[] ch, int st, int len) { }
startPrefixMapping(String p, String uri) { }
endPrefixMapping(String p) { }
Called when an
ending element tag
is encountered.
public void warning(SAXParseException e) {
System.err.print("SAX Warning: " + e);
System.err.println(" at line " + loc.getLineNumber());
}
public void error(SAXParseException e) {
System.err.print("SAX Error: " + e);
System.err.println(" at line " + loc.getLineNumber());
}
Called when the start
or end of the doc is
encountered.
Called when a
SAX fatal error is
encountered.
public void fatalError(SAXParseException e) {
System.err.print("SAX Fatal Error: " + e);
System.err.println(" at line " + loc.getLineNumber());
}
}
Called when a
SAX error is
encountered.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Overview
In contrast to SAX, a DOM parser builds an inmemory tree representation of the entire XML
document. A reference to this tree is returned to
your program.
The DOM tree is composed of objects, most of which
implement the following interfaces from the
org.w3c.dom package:

Node
Element

Document

Text
Attr


parent interface of all DOM tree nodes
represents a tag in a XML document,
may have children
represents an entire XML document,
always has children
contents of an element or an attribute
represents an attribute of an element
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Overview
Once the parser has created the DOM tree, your
program can manipulate it as it wants to. It can
traverse it, modify it, print it out, apply XSL
transforms to it (XML Stylesheet Language
Transforms … more about that in a bit), and
many more functions.
Like most SAX parsers, most DOM parsers allow
the specification of either validation or nonvalidation modes. Remember, validation refers
to checking for conformance against the
document’s specified DTD or schema.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Model
<?xml version="1.0" ?>
<mail><message>
<from>[email protected]</from>
</message></mail>
Application
Code
input
Reader
parse
return
DOM
tree
Parser
Document Builder
DTD or schema
(if any)
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Creating the DOM Tree
Although not relevant to using the DOM
model for programming, the question
often arises…how does the DOM API
generate the tree representing the XML
document?
The answer is simple…it uses a SAX
parser! The DOM APIs run a SAX
parser to parse the XML document in an
event-driven manner, constructing the
DOM tree in memory.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Basics
To create a DOM DocumentBuilder (with the
JAXP packages), construction is easy…
DocumentBuilderFactory dbuilder_factory =
DocumentBuilderFactory.newInstance();
dbuilder_factory.setValidating(false);
DocumentBuilder dbuilder =
dbuilder_factory.newDocumentBuilder();
Building a document tree from XML data
found in a particular file is just as easy…
FileReader file_in = new
FileReader(“thefile.xml”);
Document doc = dbuilder.parse(new
InputSource(file_in));
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
public class SimpleXML2 {
public static void main(String [] args) {
String uri = args[0];
try {
System.out.println("Creating Document builder.");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db = dbf.newDocumentBuilder();
System.out.println(“Ready to parse!”);
FileReader file_in = new FileReader(“theFile.xml”);
Document doc = db.parse(new InputSource(file_in));
System.out.println("Parsed document, ready to process.");
myDOMTreeProcessor proc = new myDOMTreeProcessor();
proc.process(doc, System.out);
}
catch (Exception e) {
System.err.println("XML Exception thrown: " + e);
e.printStackTrace();
}
}
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import org.w3c.dom.*;
public class myDOMTreeProcessor {
public void process(Document doc, java.io.PrintStream os) {
printTree(doc, os, 0);
}
void printTree(Node n, java.io.PrintStream os, int indent) {
int i;
switch (n.getNodeType()) {
case Node.ATTRIBUTE_NODE:
for(i = 0; i < indent; i++) os.print(" ");
os.println("Attr " + n.getNodeName() + "=" + n.getNodeValue());
break;
case Node.TEXT_NODE:
case Node.CDATA_SECTION_NODE:
// os.println("Text '" + n.getNodeValue() + "'.");
break;
case Node.DOCUMENT_NODE: {
os.println("DOCUMENT:");
NodeList kids = n.getChildNodes();
for(i = 0; i < kids.getLength(); i++)
printTree(kids.item(i), os, indent + 2);
break;
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
DOM Programming –
Example
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
case Node.DOCUMENT_TYPE_NODE:
for(i = 0; i < indent; i++) os.print(" ");
os.println("Document type is " + n.getNodeName()); break;
case Node.ELEMENT_NODE: {
for(i = 0; i < indent; i++) os.print(" ");
os.println("Element " + n.getNodeName());
NamedNodeMap attrs = n.getAttributes();
for(i = 0; i < attrs.getLength(); i++)
printTree(attrs.item(i), os, indent + 5);
NodeList kids = n.getChildNodes();
for(i = 0; i < kids.getLength(); i++)
printTree(kids.item(i), os, indent + 2);
break;
}
default:
os.println("Unexpected kind of node - ignored."); break;
}
}
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Changing the DOM Tree
Once you have the DOM tree, you can:


search for particular nodes
extract element and text values
You can alter the tree:




add elements to the tree, or change them
add attributes to elements, or change their values
re-arrange elements and subtrees
synthesize entirely new trees or subtrees
If you create a new tree or subtree, you should
apply the normalize() method before processing
it further (this optimizes the layout).
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XSL and Transformations
The XML Stylesheet Language (XSL) is a core XML
technology for performing transformations to and on
XML documents.


XSLT offers extensive template matching and processing
instructions for transforming XML data into other XML-like data
(e.g. XML->XML, XML->HTML, XML->text)
An XSL Processor is a system (usually invoked through an
object interface) that can apply XSLT transforms to some XML
input and generate results (normally the data in another form).
The transformation rules are expressed in XSL, a
programming language onto itself, geared for pattern
matching and formatting…it is similar to LISP.
JAXP includes the Java implementation of the Apache
Xalan XSLT processor.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XML Programming Transformations
One of the advantages of the DOM programming
model is that it can be used easily with XSL.
JAXP fully supports XSLT with the package
javax.xml.transform and its sub-packages.
The sequence of steps for creating and applying
transforms with XSLT and Xalan is:






Obtain or create an initial DOM tree, create a Source from it.
Create a TransformerFactory.
Create a Templates object based on a particular XSLT
document.
Create a Transformer object from the Templates object.
Create an output Results object.
Apply the Transformer to the Source and the Result.
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XSLT Programming –
Example
This program applies a supplied XSL file to an XML
file, and produces an output file.
% java SimpleXML3 tiny.xml mail2html.xsl tiny-res.html
Input XML:
tiny.xml
SimpleXML3
Input DTD:
mail1.dtd
DOM
Parser
Stylesheet:
mail2html.xsl
XSLT
Processor
Output HTML:
tiny-res.html
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XSLT Programming –
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import
import
import
import
org.w3c.dom.*;
javax.xml.parsers.*;
javax.xml.transform.*;
org.xml.sax.*;
public class SimpleXML3 {
public static void main(String [] args) {
String src_file = args[0];
String template_file = args[1];
String res_file = args[2];
try {
System.out.println("Creating Document builder.");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db = dbf.newDocumentBuilder();
FileReader file_in = new InputSource(src_file);
Document doc = db.parse(file_in);
System.out.println("Parsed document okay");
System.out.println("Using " + template_file + " to make " + res_file);
myXSLTProcessor proc = new myXSLTProcessor();
proc.process(doc, temmplate_file, res_file);
}
catch (Exception e) {
System.err.println("XML Exception thrown: " + e);
}
}
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
XSLT Programming –
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import
import
import
import
import
org.w3c.dom.*;
javax.xml.parsers.*;
javax.xml.transform.*;
javax.xml.transform.dom.*;
javax.xml.transform.stream.*;
public class myXSLTProcessor {
public void process(Document d, String tmpl, String res)
throws TransformerException, DOMException
{
System.out.println("Creating transformer factory.");
TransformerFactory tf = TransformerFactory.newInstance();
System.out.println("Creating sources and results.");
DOMSource ds = new DOMSource(d);
StreamSource ss = new StreamSource(tmpl);
StreamResult sr = new StreamResult(new java.io.File(res));
System.out.println("Creating template & transformer");
Templates tt = tf.newTemplates(ss);
Transformer xformer = tt.newTransformer();
System.out.println("Performing transform to make " + res);
xformer.transform(ds, sr);
}
}
CISC370 – Object Oriented Programming with Java / © 2003 J. Six
Summary
XML is a technology for storing data in a
standard, portable, and text-based, way.
Java packages exist to help your Java
programs parse XML, build XML documents,
and perform XML transformations.
There are two basic models for XML parsing
and Java has support for both…


SAX: sequential access, event-driven
DOM: in-memory tree structure
XSLT can be used to transform XML into
other XML or other textual formats.