Download xml-gain-without-the-pain

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
No More Pain for XML’s Gain
XJ: Facilitating XML Processing in Java
Matthew Harren
Mukund Raghavachari
Oded Shmueli
Michael Burke
Rajesh Bordawekar
Igor Pechtchanski
Vivek Sarke
Itay Maman
236826 Seminar lecture, 15 June 2005
The basic premise
• XML is getting increasingly popular
• XML manipulation is now a common programming
task
• The lead question:
– Do modern OO languages sufficiently support XML ?
2
Introduction: Schema file
(file: technioncatalog.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="course" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="points" type="xs:int"/>
<xs:element name="number" type="xs:int"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="teacher" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
3
Introduction: XML document
(file: short.xml)
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<course>
<points>3</points>
<number>234319</number>
<name>Programming Languages</name>
<teacher>Ron Pinter</teacher>
</course>
<course>
<points>3</points>
<number>234141</number>
<name>Combinatorics for CS</name>
<teacher>Ran El-Yaniv</teacher>
</course>
</catalog>
“Combinatorics for CS (234141) by Ran El-Yaniv,
Desired Output...
3 credit points”
4
Introduction: The XJ program
import java.io.*; import technioncatalog.*;
public class Demo1 {
public static void main(String[] args) throws Throwable
{
catalog cat = new catalog(new(File("short.xml"));
catalog.course c = cat [| /course[2] |];
printCourse(c);
}
private static void printCourse(catalog.course c) {
String name = c [| /name |];
String teacher = c [| /teacher |];
int points = c [| /points |];
int id = c [| /number |];
System.out.println(name + "(" + id + ") by "
+ teacher + ", " + points);
} }
5
Traditional XML processing:
(DOM, XPath apis)
The types of the XML
objects
(Node, Document) do
not reflect the schema
public static void main(String[] args) throws Throwable {
DocumentBuilderFactory dbf
= DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new java.io.File("short.xml"));
XPath xp = XPathFactory.newInstance().newXPath();
DTMNodeList nodes = (DTMNodeList)
xp.evaluate("//course", doc, XPathConstants.NODESET);
printCourse(nodes.item(1));
}
XPath is a plain string. It may be:
•Syntactically incorrect
•Incompatible with the document
6
Traditional XML processing
(DOM apis)
Assumption: 3rd child
is the course number
Assumption: 2nd child
has no child elements
private static void printCourse(Node n) {
NodeList nodes = n.getChildNodes();
System.out.println(nodes.item(5).getTextContent()
+ " (" + nodes.item(3).getTextContent()
+ ") by " + nodes.item(7).getTextContent()
+ ", " + nodes.item(1).getTextContent() + " credit points");
}
What about reading the
numeric value of an element?
Assumption: Four child
nodes must exist
• These assumptions will not hold if the schema is changed
– => run-time errors
– problems remain, even if we identify nodes by name
• Possible Schema changes:
– Allowing a new optional <students> sub-element
– Changing the order of the sub-elements
7
No easy solution
•
Similar problems occur when:
1. XML elements are created by the program
2. Other libraries are used for reading/writing XML
documents
– Such as: Xalan, SAX
3. The developer wraps several complex operations
within a single function/method/class
•
These are inherent problems of the language
8
Shaping the future
•
What XML-related facilities do we want?
–
–
–
Typed XML objects
Seamless translation of a Schema/DTD into a Java type
Two composition techniques
• XML notation
• Java’s object creation syntax
–
Two decomposition techniques
• Typed XPath
• Typed, named methods/fields
–
XPath expressions as first-class-values
9
Has the future arrived yet?
•
Significant effort in integration of XML into
modern programming language
–
–
–
–
–
•
XJ
Scala
Cω
XTatic
…
We will overview the constructs offered by XJ
– A super-set of Java
– Available at: http://www.research.ibm.com/xj
10
XJ’s Type system
11
XJ’s Type system
•
Hierarchy of classes
–
–
•
A common root class: XMLObject
Automatic import: package com.ibm.xj.*
Genericity: Sequence<T>, XMLCursor<T>
–
XMLCursor<T> is a Sequence<T> iterator
12
Integration with Schema
•
The rationale:
1. An OO program is a collection of class definitions
2. A Schema file is a collection of type definitions
•
=> let’s integrate these definitions
•
Any Schema is also an XJ types
– The XJ compiler generates a “logical class” for
each such type
– Schema file == package name
– Using a schema == import schema_file_name;
13
XML literal in XJ code
•
•
•
Invalid XML content triggers a compile-time error
Resulting elements are typed!
Curly braces allow “escaping” back into XJ
import technioncatalog.*;
public class Demo2 {
public static void main(String[] args) throws Throwable {
String x = "Algorithms 1"; int y = 234247;
catalog cat = buildCatalog(new catalog.course(
<course><points>3</points>
<number>{y}</number><name>{x}</name>
<teacher>Shlomo Moran</teacher></course>));
}
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>);
} }
14
An ill-typed program
...
Wrong <course> element
course c = new course(<course>
<teacher>Shlomo Moran</teacher></course>);
buildCatalog(c);
XMLObject x = new course.teacher (
<teacher>Shlomo Moran</teacher>);
buildCatalog(x);
An XMLObject cannot be passed as
...
a course element
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>);
}
15
Embedding XPath Queries in XJ
•
Syntax: XmlValue [| XPathQuery |]
course doSomething(catalog cat, int courseNum) {
return cat [| /course[./number = $courseNum] |];
}
•
Requires: a context-provider:
–
–
•
An XML element over which the XPath query is invoked
(see the cat variable in the sample)
Escaping: use a ‘$’ prefix
16
XPath Semantics
•
•
Problem: resulting type is sometimes not so clear
Two options
– Sequence<T>
•
If the compiler determines that all result elements are
of type T
– Sequence<XMLObject>
•
(Otherwise)
•
Automatic conversion from a singleton sequence
•
Static check of XPath queries
–
–
If result is always empty => compile-time error
(The compiler cannot catch all cases)
17
Implicit coercions
•
An atomic XML value can be seamlessly
converted into a corresponding Java value
–
–
–
–
•
xsd:double => double
xsd:boolean => boolean
xsd:string => java.lang.String
…
This reduces the verbosity of XML-related code:
import technioncatalog.*;
import technioncatalog.catalog.*;
public static String getTeacher(course c) {
return c [| /teacher |];
}
Sequence<teacher> ► teacher
► String
18
Updates: Assignment to Query Result
public static void changePoint(catalog.course c, int p) {
c [| /points |] = p;
}
•
•
•
An XPath expression returns a reference to an
existing element
– (No copying is involved)
– Consistent with Java’s semantics for objects
Thus, it can be assigned to
– An XPath expression is a legal lvalue
Bulk assignment
– Occurs when the XPath expression denotes a sequence
– Bulk assignment operator := allows multiple assignments
– Double the credit points of each course:
cat [| //points |]
*:=
2;
19
Tree structure update
•
Class XMLObject also defines methods, such as:
–
–
–
–
insertAfter()
insertBefore()
insertAsFirst()
detach()
public static void addCourse(catalog cat)
{
course c = new course(<course><points>4</points>
<number>234111</number><name>Intorduction to CS</name>
<teacher>Roy Friedman</teacher></course>);
cat.insertAsLast(c);
}
Which object is being modified?
20
Problems: Type Consistency
•
Definitions
1. An XML update operation, u, is a mapping over XML values
• u: T1 -> T2
2. An update is consistent if T1 = T2
•
Ideally, a compile-time error should be triggered for
each inconsistent update in the program
Why do we want the two types to be equal?
•
Unfortunately, this cannot be promised
Can you think of an example ?
•
The solution: Additional run-time check
21
Problems: Covariant subtyping (1/2)
•
Covariance: change of type in signature is in the
same direction as that of the inheritance
A1.m() is “spoiled”:
class X { }
Requires only X1
class A { public void m(X x) { } }
objects
Class X1 extends X { }
Class A1 extends A { public void m(X1 x) { } }
...
A a = new A1();
Which method should be
a.m(new X());
invoked: A.m() or A1.m() ?
•
Java favors type-safety: A method with covariant arguments is
considered to be an overloading rather than overriding
–
•
Same approach is taken by C++, C#
But, covariance is allowed for arrays
–
Array assignments may fail at run-time
22
Problems: Covariant subtyping (2/2)
(Now let us get back to our technioncatalog schema…)
•
A <course> value is also spoiled
–
•
But, it also has an unspoiled super-class: XMLObject
–
•
It requires unique children: <points>, <name>, etc.
All updates to XMLObject are legal at compile-time
The following code compiles successfully:
public static void trick(course c) {
XMLObject x = c;
points p = new points(<points>4</points>);
x.appendAsLast(p);
}
Run-time error is here !!
23
Shaping the future (revisited)
•
Language constructs seen so far
–
–
–
Typed XML objects
Seamless translation of a Schema/DTD into a Java type
Two composition techniques
•
•
–
Two decomposition techniques
•
•
–
XML notation
Java’s object creation syntax
Typed XPath
Typed, named methods/fields
XPath expressions as first-class-values
24
XPath expression as first-class-values
•
What is a first-class-value?
–
•
A value that can be used “naturally” in the program
• Passed as an argument
• Stored in a variable/field
• Returned from a method
• Created
In XJ, XPath expression do not met these conditions
–
The main obstacle: The XPath part of the expression cannot
be separated from its context provider
25
XPath expression as first-class-values
(cont’d)
•
•
Let’s speculate on XPath as an FCV…
(Following code IS NOT a legal XJ program)
private static Sequence<teacher> teachers;
static Sequence<teacher> find(XPath<catalog,teacher> q) {
Catalog c = new Catalog(new File("file1.xml"));
return q.evaluate(c);
}
static void main(String[] args) {
Sqeuence<teacher> all = find(<catalog>[| //teacher |]);
Sequence<teacher> few = find(
<catalog>[| //number/234319/../../teacher |] );
}
26
XPath expression as first-class-values
(cont’d)
•
Operators on XPath values
–
–
–
Composition
Conjunction
Disjunction
•
These operators will allow the developer to easily create
a rich array of safe XPath values
•
The compiler must keep track of the type of each such
value
–
–
Basically an XPath value is a function T -> R, where both
T,R are subclasses of XMLObject
When two XPath values are composed, the result type is
deduced from the types of the operands
27
Scala: Composition of XML elements
•
In Scala, types can be defined in a DTD file
–
•
A DTD can be translated into Scala classes via the
dtd2scala utility
Scala offers two options for composition of XML
elements:
–
–
Using XML notation (similar to XJ)
Using case-class construction notation:
import Data._; // import generated definitions
import scala.xml._; // for creating PCDATA nodes
object Main with Application {
val x = course(teacher(Text("Ran El-Yaniv")),
points(Text("3")), name(Text("Combinatorics for CS")),
number(Text("234141")));
Console.println(x); }
28
Typed, named methods/fields
•
Usually, values aggregated by a Java object
are accessed by fields/methods
–
–
Can we access XML sub-elements this way?
(Following code IS NOT a legal XJ program)
import technioncatalog.*;
void printTeachers(catalog cat) {
for(int i = 0; i < cat.courses.length; ++i)
{
catalog.course c = cat.courses[i];
System.out.println(c.teacher);
}
}
29
Typed, named methods/fields
(cont’d)
•
Some of the difficulties:
–
–
•
Sub-elements are not always named
Schema supports optional types: <xsd:choice>
• How can Java express an “optional” field?
Observation: Java’s typing mechanisms
cannot capture the wealth of Schema/DTD types
–
–
Missing features: virtual fields, inheritance without
polymorphism
Other features can be found in Functional languages
• E.g.: Variant types, immutability, structural conformance
• But, their popularity lags behind
30
Summary
•
XJ is a Java extension that has built in
support for XML
– Type safety: Many things are checked at
compile time
– Ease of use
•
OO languages are not powerful enough
(in terms of typing)
– Some type information is lost in the
transition Schema -> Java
31
- The End-
32
Related documents