Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CPAN 330 XML Lecture #7: SAX 2.0 Simple API for XML is an interface that provides an event driven mechanism for accessing XML documents. SAX was originally started as a Java project, but later was implemented in many different languages like C++, VB, COM, Perl, and PHP. SAX parser works differently than DOM parser. While the DOM parser loads the entire XML document into the memory then build the DOM tree of this document, the SAX parser reads the XML document and generates events based on the content it finds. When working with SAX, the developer responsibility is to provide the event handling code to perform certain task. The SAX parser will take care of mapping the appropriate event with its corresponding event handling method. SAX has some limitations. It cannot be used to change the structure of an existing XML document. Also the developer cannot control the sequence in which data are retrieved by the SAX parser. The current version of SAX is 2.0. The SAX API's include the required interfaces and classes for implementing SAX in your application. SAX with Java The SAX API core provides two packaged: org.xml.sax and org.xml.sax.helpers. These packages include the required interfaces and classes for implementing SAX in your Java application. To have support for SAX your Java application must either extends DefaultHandler class or implements ContentHandle interface. Interface ContentHandler defines the following methods: public void setDocumentLocator ( Locator locator) The SAX parser will pass an object for locating the origin of the SAX document event. public void startDocument( ) Invoked by the SAX parser at the beginning of the XML document. 1 public void endDocument() Invoked by the SAX parser at the end of the XML document. public void startPrefixMapping( String prefix, String uri) Begin the scope of a prefix- URI namespace mapping. public void endPrefixMapping( String prefix) End the scope of a prefix mapping public void startElement( String namespaceURI, String localName,String qName,Attributes atts) Invoked by the SAX parser when a start tag of an element is found. public void endElement ( String namespaceURI, String localName,String qName) Invoked by the SAX parser when an end tag of an XML element is found. public void characters ( char[] ch,int start,int length) Invoked by the SAX parser when a character data is found. public void ignorableWhitespace ( char[] ch,int start, int length) Invoked by the SAX parser when an ignorable whitespace in an element content is found. public void processingInstruction ( String target, String data ) Invoked by the SAX parser when a processing instruction is found. public void skippedEntity ( String name) Invoked by the SAX parser when a skipped entity is found. If you implement ContentHandler interface, you have to provide implementation for all the above methods. DefaultHandler class provides empty implementation for methods defined in ContentHandler interface. Appendix F of the textbook provides detailed documentation of SAX interfaces and classes. 2 SAX with .NET The NET framework doesn't support SAX parsing , instead it supports XMLDOM parsing module and a new module called the XML reader. The SAX parsing is supported by Microsoft MSXML that can be imported as a COM object from within .NET. To add support to SAX in your application, Select Add Reference from Project Menu item. Select Microsoft XML, v4 from the COM tab of the Add Reference dialog box: Note: You may have to download and install MSXML4.0 from http://microsoft.com if it is not listed under the COM tab of Add Reference dialog box. The MSXML also has support for XMLDOM. Some of the handlers provided by SAX are : 3 ContentHandler: This handler receives notification for document's content events such as startDocument, endDocument, startElement and endElement. ErrorHandler: This handler receives notification for the events: warning, error and fatalError. DTDHandler: This handler receives notification for notations and unparsed entities. DeclHandler: This handler receives notification of other events related to the DOCTPE declaration. LexicalHandler: This handler receives notification for events such as startDTD and endCDATA. To support SAX , your .NET application must first create an XML reader instance: MSXML2.SAXXMLReader reader=new MSXML2.SAXXMLReader(); In order to receive content events by the reader, the application must implement MSXML2.IVBSAXContentHandler and provide implementation to all the methods provided by this interface. Following is a list of the methods of MSXMLIVBSAXContentHandler interface: public void setDocumentLocator ( Locator locator) The SAX parser will pass an object for locating the origin of the SAX document event. public void startDocument( ) Invoked by the SAX parser at the beginning of the XML document. public void endDocument() Invoked by the SAX parser at the end of the XML document. public void startPrefixMapping( ref string strPrefix, ref string strURI) Begin the scope of a prefix- URI namespace mapping. public void endPrefixMapping(ref string strPrefix) 4 End the scope of a prefix mapping public void startElement( ref string strNamespaceURI, ref string strLocalName,ref string strQName, MSXML2.IVBSAXAttributes oAttributes) Invoked by the SAX parser when a start tag of an element is found. public void endElement (ref string strNamespaceURI, ref string strLocalName,ref string strQName) Invoked by the SAX parser when an end tag of an XML element is found. public void characters (ref string strChars) Invoked by the SAX parser when a character data is found. public void ignorableWhitespace ( ref string strChars) Invoked by the SAX parser when an ignorable whitespace in an element content is found. public void processingInstruction (ref string strTarget, ref string strData ) Invoked by the SAX parser when a processing instruction is found. public void skippedEntity (ref string strName) Invoked by the SAX parser when a skipped entity is found. To receive parsing error events, the application should implements MSXML.IVBSAXErrorHandler and provide implementation to all methods of this interface. Following is a list of the methods of MSXMLIVBSAXErrorHandler interface: public void error(MSXML.IVBSAXLocator strErrorMessage, int nErrorCode) oLocator,ref string Invoked by the SAX parser when an error is encountered. 5 public void ignorableWarning(MSXML.IVBSAXLocator oLocator,ref string strErrorMessage, int nErrorCode) Invoked by the SAX parser when a warning is encountered. public void fatalError(MSXML.IVBSAXLocator oLocator,ref string strErrorMessage, int nErrorCode) Invoked by the SAX parser when a fatal error is encountered. The easiest way to provide content and error handler is to write separate classes that implement each interface. After that, we have to associate both the content and the error with the reader using contentHandler and errorHandler properties of the XML reader. The XML reader parsing module is an alternative to the SAX parsing module. XML reader works under the control of the client application to get only the required data needed by this application. This parser enables the application to pull certain data and skip unwanted nodes. Microsoft provides 3 parsers that implements the XMLReader interface: xmlTextReader: Is a simple, fast, forward-only, and read-only parser for accessing a stream of XML data. This parser ensures that the XML document is well-formed, but it doesn't validate it against a DTD or a schema document. xmlValidationgReader: This parser uses parsers like the xmlTextReader to add extended features like validating an XML document against a DTD or a schema document. Validation handling is done with an event based architecture. To receive a notification that a validation error has occurred, a call back must be registered with the xmlValidatingReader. This is done by creating a helper class to construct the call back. An event call back is constructed from a reference to a method that take two parameters: o The object that sent the event. o a ValidationEventArgs object which has information about the validation event that triggered the call. The following code shows how to construct such a helper class: class ValidationHandler { // determine if a validation error has occurred public bool result=true; 6 // string to hold information about the validation error(s) public string message; public void validationCallBack(object sender,ValidationEventArgs args) { message="Validation Error: "+args.Message+"\n"; result=false; } } After the error handling class is been constructed, we can parse the document and test whether the validation was successful or not. Example xmlNodeReader: parses data from an XMLDOM sub trees and doesn't support validation. Java SAX Tools: In order to write Java applications that implements SAX, you need to have JSDK 1.3 or later installed on your machine. Also you need a SAX parser. There are many available for free. We will use in this lecture Xerces the XML parsers in Java and C++ (plus Perl and COM). You can download this parser from the http://xml.apache.org/xerces2-j/index.html Unzip the downloaded package to your local driver. I will assume that the path will be C:\xerces-2_4_0. You need to add the new parser to your system path and the JAR files to the classpath. Following is a batch file that I run on my system: set path=%path%;C:\Program Files\Java\jdk1.5.0_06\bin;C:\Xerces-Jbin.2.8.0 set classpath=%classpath%;C:\Xerces-J-bin.2.8.0\xercesImpl.jar; javac BooksReader.java java BooksReader pause 7 Examples: Ex1: BooksReader.java This example is a Java console application that parses an XML document called books.xml located in the same folder of the application class BooksReader.java import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.SAXParser; import org.xml.sax.XMLReader; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; public class BooksReader extends DefaultHandler { /* StringBuffer object that will be used to hold character data */ private StringBuffer sb=new StringBuffer(); public static void main(String[] args) throws Exception { BooksReader r=new BooksReader(); r.read(); } public void read() throws Exception { /* Instantiate a SAX parser */ XMLReader ro=XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXP arser"); /* Register our application to listen for SAX events */ ro.setContentHandler(this); /* instruct the SAX parser to start parsing books.xml document */ 8 ro.parse("books.xml"); } public void startDocument() { System.out.println("Start of document\n"); } public void endDocument() { System.out.println("End of document\n"); } public void startElement(String url,String localname,String name,Attributes atts) throws SAXException { sb.setLength(0); } public void endElement(String url,String localname,String name) { System.out.println("Element "+name+" contains "+sb); /* after finding the end tag of an element, reset the StringBuffer object. */ sb.setLength(0); } public void characters(char[] ch,int start,int len) { sb.append(ch,start,len); } } Following is ‘the books.xml file that will be parsed by this application: <?xml version="1.0"?> <bookstore> <book> <title>Hardware</title> <author>Dan Kingston</author> <publisher>New Book Technology</publisher> <price>100</price> <edition>2000,2nd Edition </edition> </book> <book> <title>Software </title> <author>Scott Tiger</author> <publisher>All Books Publisher</publisher> <price>84</price> <edition>2002,1st Edition </edition> 9 </book> </bookstore> Following is the batch file that I run to execute this application: set path=%path%;C:\jdk1.3.1_06\bin;c\sax2\xerces-2_4_0 set classpath=%classpath%;C:\sax2\xerces-2_4_0\xercesImpl.jar; C:\sax2\xerces-2_4_0\xmlParserAPIs.jar; javac BooksReader.java java BooksReader pause Following is a screenshot of the output: Ex2: inserttoBooksDB.java In this example will parse the books.xml document, and store its records into Microsoft Access database table book introduced in the previous lecture. import java.sql.*; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.SAXParser; import org.xml.sax.XMLReader; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory; public class inserttoBooksDB extends DefaultHandler { 10 /* StringBuffer object to hold character data */ private StringBuffer sb=new StringBuffer(); /* array of Strings of size 5 to hold an entire book record at a time. We will use this array to build a SQL insert statement */ private String [] sval=new String[5]; /* flag to check point to the last element in a book record */ private boolean isLast=false; /* Counter used to keep track of the element location in the book record */ private int i=0; private static Connection connection; private static PreparedStatement statement; public static void main(String[] args) throws Exception { String driver = "sun.jdbc.odbc.JdbcOdbcDriver"; try { /* Load the driver */ Class.forName(driver); String url ="jdbc:odbc:myData"; /* establish connection to the database */ connection = DriverManager.getConnection(url); } catch (Exception e) {} inserttoBooksDB r=new inserttoBooksDB(); r.read(); } public void read() throws Exception 11 { /* Instantiate a SAX parser */ XMLReader ro=XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXP arser"); /* Register our application to listen to SAX events */ ro.setContentHandler(this); /* Start parsing books.xml document */ ro.parse("books.xml"); } public void startElement(String url,String localname, String name,Attributes atts) throws SAXException { /* reset the StringBuffer object */ sb.setLength(0); } public void endElement(String url,String localNmae, String name) throws SAXException { /* Determine if book element is found */ if (name.equals("book")) isLast=true; else isLast=false; /* store the child elements of book element to an array of Strings. The parser will invoke this event for all the elements it finds. Therefore book and store elements should not be stored in the array 12 */ if (!name.equals("book") && (!name.equals("bookstore"))) { sval[i]=sb.toString(); /* increase the counter to keep track of the number of elements */ i++; } if (isLast) { /* reset the counter after each end of book element */ i=0; /* After all the child elements of book have been stored in an array of Strings, we will build an insert statement based on that array */ String query ="insert into book values ("; for (int j=0;j<5;j++) { if (j==4) query+=" '"+sval[j]+"' ) "; else query+=" '"+sval[j]+"' , "; } try { /* Store the data into the database */ statement=connection.prepareStatement(query); statement.execute(); } catch (SQLException se) { System.out.println(se.toString()); } 13 } } public void characters(char[] ch, int start,int len)throws SAXException { sb.append(ch,start,len); } public void endDocument() throws SAXException { System.out.println("End of processing"); try { connection.close(); } catch(SQLException se) { System.out.println(se.toString()); } System.exit(0); } } Following is the batch file used to execute this application: set path=%path%;C:\jdk1.3.1_06\bin;c\sax2\xerces-2_4_0 set classpath=%classpath%;C:\sax2\xerces-2_4_0\xercesImpl.jar; C:\sax2\xerces-2_4_0\xmlParserAPIs.jar; javac inserttoBooksDB.java java inserttoBooksDB pause Following is a screenshot of book table after the new two records have been added: 14 Ex3: DBtoXML.java This is another example that will query book table and generate an XML document based on the query output: import java.io.*; import java.sql.*; import sun.jdbc.odbc.JdbcOdbcDriver; public class DBtoXML { Connection con; public void initDB() { try { JdbcOdbcDriver driver = new JdbcOdbcDriver(); String url = "jdbc:odbc:myData"; con = driver.connect(url, new java.util.Properties()); } catch (Exception e) { } } 15 public void processDB() { int cnt = 0; try { Statement stmt = con.createStatement(); String sql = ""; sql = "select * from book"; ResultSet rs = stmt.executeQuery(sql); System.out.println("<?xml version='1.0'?>"); System.out.println("<bookstore>"); while (rs.next()) { System.out.println("<book>"); System.out.println("<title>" + rs.getString("title") + "</title>"); System.out.println("<author>" + rs.getString("author") + "</author>"); System.out.println("<publisher>" + rs.getString("publisher") + "</publisher>"); System.out.println("<price>" + rs.getString("price") + "</price>"); System.out.println("<edition>" + rs.getString("edition") + "</edition>"); System.out.println("</book>"); cnt++; } 16 } catch (Exception e) { System.out.println(e.toString()); } finally { if (cnt == 0) System.out.println("No Matching Record Found"); else System.out.println("</bookstore>"); } } public static void main(String args[]) { DBtoXML obj= new DBtoXML(); obj.initDB(); obj.processDB(); } } 17 Ex5: Using SAX in a .NET Application In this application we will open an XML document, parse it using SAX and display the result. Create a new C#.NET application called SAXApp with the following GUI: Following is a description of each control: Control Type Control Name Properties Form frmSAXApp Text="SAX Example " Button btnXML Text="Open XML File " Text Box txtXML Button btnSAX TextBox txtOutput Text="Parse" Text="" MultiLine=true Following is the source code for the application. The bolded face code need to be added only. The rest should be generated by .NET IDE : frmSAXApp.cs 18 using System; using System.Drawing; using System.Collections; using System.ComponentModel; using System.Windows.Forms; using System.Data; using System.IO; namespace SAXApp { /// <summary> /// Summary description for Form1. /// </summary> public class frmSAXApp : System.Windows.Forms.Form { private System.Windows.Forms.Button btnXMLFile; private System.Windows.Forms.TextBox txtXMLFile; private System.Windows.Forms.Button btnSAX; private System.Windows.Forms.TextBox txtOutput; /// <summary> /// Required designer variable. private string xmlFileName; /// </summary> private System.ComponentModel.Container components = null; public frmSAXApp() { // // Required for Windows Form Designer support // InitializeComponent(); // // TODO: Add any constructor code after InitializeComponent call // } /// <summary> /// Clean up any resources being used. /// </summary> protected override void Dispose( bool disposing ) { if( disposing ) { if (components != null) { 19 components.Dispose(); } } base.Dispose( disposing ); } #region Windows Form Designer generated code /// <summary> /// Required method for Designer support - do not modify /// the contents of this method with the code editor. /// </summary> private void InitializeComponent() { this.btnXMLFile = new System.Windows.Forms.Button(); this.txtXMLFile = new System.Windows.Forms.TextBox(); this.btnSAX = new System.Windows.Forms.Button(); this.txtOutput = new System.Windows.Forms.TextBox(); this.SuspendLayout(); // // btnXMLFile // this.btnXMLFile.Location = new System.Drawing.Point(24, 16); this.btnXMLFile.Name = "btnXMLFile"; this.btnXMLFile.Size = new System.Drawing.Size(112, 32); this.btnXMLFile.TabIndex = 0; this.btnXMLFile.Text = "Open XML File"; this.btnXMLFile.Click += new System.EventHandler(this.btnXMLFile_Click); // // txtXMLFile // this.txtXMLFile.Location = new System.Drawing.Point(152, 22); this.txtXMLFile.Name = "txtXMLFile"; this.txtXMLFile.Size = new System.Drawing.Size(200, 20); this.txtXMLFile.TabIndex = 1; this.txtXMLFile.Text = ""; // // btnSAX // this.btnSAX.Location = new System.Drawing.Point(24, 56); this.btnSAX.Name = "btnSAX"; this.btnSAX.Size = new System.Drawing.Size(112, 32); this.btnSAX.TabIndex = 2; this.btnSAX.Text = "Parse"; this.btnSAX.Click += new System.EventHandler(this.btnSAX_Click); // // txtOutput 20 // this.txtOutput.Location = new System.Drawing.Point(16, 104); this.txtOutput.Multiline = true; this.txtOutput.Name = "txtOutput"; this.txtOutput.Size = new System.Drawing.Size(392, 168); this.txtOutput.TabIndex = 3; this.txtOutput.Text = ""; // // frmSAXApp // this.AutoScaleBaseSize = new System.Drawing.Size(5, 13); this.ClientSize = new System.Drawing.Size(424, 273); this.Controls.AddRange(new System.Windows.Forms.Control[] { this.txtOutput, this.btnSAX, this.txtXMLFile, this.btnXMLFile}); this.Name = "frmSAXApp"; this.Text = "SAX Example"; this.Load += new System.EventHandler(this.frmSAXApp_Load); this.ResumeLayout(false); } #endregion /// <summary> /// The main entry point for the application. /// </summary> [STAThread] static void Main() { Application.Run(new frmSAXApp()); } private void frmSAXApp_Load(object sender, System.EventArgs e) { } private void btnXMLFile_Click(object sender, System.EventArgs e) { try { OpenFileDialog of=new OpenFileDialog(); of.Filter="XML Files (*.xml)|*.xml|All Files(*.*)|*.*"; DialogResult dr=of.ShowDialog(); 21 if (dr==DialogResult.Cancel) MessageBox.Show ("No file has been selected","Error",MessageBoxButtons.OK); else { xmlFileName=of.FileName; if (xmlFileName=="" || xmlFileName==null) MessageBox.Show ("Invalid file name","Error",MessageBoxButtons.OK); else { // fs=new FileStream(xmlFileName,FileMode.Open,FileAccess.Read); txtXMLFile.Text=xmlFileName; } } } catch(IOException ex) { txtOutput.Text="IOException "+ex.ToString(); } } private void btnSAX_Click(object sender, System.EventArgs e) { txtOutput.Text=""; MSXML2.SAXXMLReader reader=new MSXML2.SAXXMLReader(); /* create an instance of our user defined content handler class. It is defined separately in the same application. The main form is passed as an argument in order to display the output from the content handler class */ contHandler ch=new contHandler(this); reader.contentHandler=ch; /* creatE an instance of our user defined error handler class. It is defined separately in the same application. The main form is passed as an argument in order to display the output from the content handler class */ errHandler eh=new errHandler(this); reader.errorHandler=eh; reader.parseURL(txtXMLFile.Text); } 22 /*Providing a public attribute to be used by the content handler class and the error handler class. */ internal string txtOutputProperty { get { return txtOutput.Text; } set { txtOutput.Text=txtOutput.Text+value; } } } } Before you run the example you need to add to classes for content and error handling. For Project menu item, select Add class and create a new class called contHandler.cs . Following is the source code for this class that must implement MSXML2.IVBSAXContentHandler interface: contHandler.cs using System; namespace SAXApp { /// <summary> /// Summary description for contHandler. /// </summary> public class contHandler: MSXML2.IVBSAXContentHandler { /* reference to the main form in order to display the result in txtOutput */ private frmSAXApp form; // string to hold the character data of an element private string s; public contHandler(frmSAXApp form) { this.form=form; // 23 // TODO: Add constructor logic here // } #region Implementation of IVBSAXContentHandler public void processingInstruction(ref string strTarget, ref string strData) { form.txtOutputProperty="Processing Instruction is found. The target is :"+strTarget+" the data is: "+strData+" \r\n"; } public void endDocument() { form.txtOutputProperty="Document ended \r\n"; } public void skippedEntity(ref string strName) { } public void characters(ref string strChars) { s+=strChars; } public void endElement(ref string strNamespaceURI, ref string strLocalName, ref string strQName) { form.txtOutputProperty=strQName +" element has the value "+ s+" \r\n"; } public void startElement(ref string strNamespaceURI, ref string strLocalName, ref string strQName, MSXML2.IVBSAXAttributes oAttributes) { s=""; if(oAttributes.length >0) { form.txtOutputProperty=strQName+ " has "+oAttributes.length +" Attributes: \r\n "; for(int i=0;i<oAttributes.length;i++) { form.txtOutputProperty="Attribute Name: "+oAttributes.getQName(i)+" Attribute Value :"+oAttributes.getValue(i)+"\r\n"; } } 24 else form.txtOutputProperty=strQName+ " element is found \r\n"; } public void startDocument() { form.txtOutputProperty="Document started \r\n"; form.txtOutputProperty="_____________________\r\n"; } public void ignorableWhitespace(ref string strChars) { } public void endPrefixMapping(ref string strPrefix) { } public void startPrefixMapping(ref string strPrefix, ref string strURI) { } public MSXML2.IVBSAXLocator documentLocator { set { } } #endregion } } After that use the same procedure to create a class called errHandler that must implements MSXML2.IVBSAXErrorHandler. errHandlere.cs using System; namespace SAXApp { /// <summary> /// Summary description for errHandler. /// </summary> public class errHandler:MSXML2.IVBSAXErrorHandler 25 { /* reference to the main form in order to display the result in txtOutput */ private frmSAXApp form; public errHandler(frmSAXApp form) { this.form=form; // // TODO: Add constructor logic here // } #region Implementation of IVBSAXErrorHandler public void error(MSXML2.IVBSAXLocator oLocator, ref string strErrorMessage, int nErrorCode) { form.txtOutputProperty= "Error :"+strErrorMessage; } public void ignorableWarning(MSXML2.IVBSAXLocator oLocator, ref string strErrorMessage, int nErrorCode) { form.txtOutputProperty= "Warning : "+strErrorMessage; } public void fatalError(MSXML2.IVBSAXLocator oLocator, ref string strErrorMessage, int nErrorCode) { form.txtOutputProperty= "Fatal error "+strErrorMessage; } #endregion } 26