Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
XML with AuctionBase For Lab Use 17 Ocak 2005 Mehmet Cihan Kurt 9702413, Cmpe Dept. INDEX 1 2 Introduction _______________________________________________________ 4 1.1 MSSQL and Web Warm-up I __________________________________________ 4 1.2 MSSQL and Web Warm-up II _________________________________________ 5 1.3 AuctionBase Schema and Data _________________________________________ 5 1.4 MSSQL Features _____________________________________________________ 5 1.5 AuctionBase Web Site _________________________________________________ 5 MSSQL and Web Warm-up Part I _____________________________________ 6 2.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.2 Section A: Getting familiar with MSSQL _________________________________ 6 Logging In to Query Analyzer ________________________________________________ 6 Creating a Table___________________________________________________________ 6 Creating a Table With a Primary Key __________________________________________ 7 Inserting Tuples ___________________________________________________________ 7 Getting the Value of a Relation _______________________________________________ 8 Getting Rid of Your Tables __________________________________________________ 8 Section B : Simple Web Interface and Servlets ____________________________ 9 2.2.1 Overview ________________________________________________________________ 9 2.2.2 Setting up Java and Tomcat __________________________________________________ 9 2.2.3 Configuring & Using Apache Tomcat [2] _______________________________________ 10 2.2.3.1 Summary __________________________________________________________ 10 2.2.3.2 Steps for Installation & Setup __________________________________________ 10 2.2.3.3 Install the JDK ______________________________________________________ 12 2.2.3.4 Set the JAVA_HOME Variable _________________________________________ 13 2.2.3.5 Change the Port to 80 _________________________________________________ 13 2.2.3.6 Turn on Servlet Reloading _____________________________________________ 14 2.2.3.7 Enable the Invoker Servlet _____________________________________________ 14 2.2.3.8 Test Server _________________________________________________________ 15 2.2.3.8.1 Verify That the Server Can Start ______________________________________ 15 2.2.3.8.2 Try Some Simple HTML and JSP Pages ________________________________ 16 2.2.3.8.3 Setup Your Development Environment ________________________________ 17 2.2.3.8.4 Create a Development Directory ______________________________________ 18 2.2.3.8.5 Make Shortcuts to Start and Stop the Server _____________________________ 18 2.2.3.8.6 Set Your CLASSPATH _____________________________________________ 19 2.2.3.9 Installing JDBC Drivers for Windows XP & Windows 2000 [4] ________________ 20 2.2.4 Java and HTML Warm up [9] ________________________________________________ 23 2.2.5 Retrieving Input from the User ______________________________________________ 24 2.2.6 Forms __________________________________________________________________ 24 2.2.7 Java Server-Side Input Handling [10] __________________________________________ 27 2.2.8 Returning Output to the User ________________________________________________ 28 2.2.9 Java Code Output _________________________________________________________ 29 2.2.10 A Complete JDBC Example [5] ____________________________________________ 30 2.2.10.1 Creating a Database __________________________________________________ 30 2.2.10.2 Getting Information from a Database _____________________________________ 33 2.2.10.3 Obtaining Result MetaData Type Information ______________________________ 35 2.2.11 Handling Special Characters in HTML______________________________________ 36 3 MSSQL and Web Warm-up Part II____________________________________ 36 3.1 Preliminary Information about XML and DTD’s _________________________ 37 3.1.1 What is XML ? __________________________________________________________ 37 3.1.2 What Do XML Documents Look Like? [12] _____________________________________ 37 3.1.3 DTD (Data Type Definitions) for XML [11] _____________________________________ 38 3.1.3.1 Elements ___________________________________________________________ 40 3.1.3.2 Attributes __________________________________________________________ 42 3.1.3.3 Comments _________________________________________________________ 43 3.2 3.2.1 Section A: Examining XML Files ______________________________________ 43 Sample eBay Data ________________________________________________________ 43 3.3 Section B: Designing Relational Schema ________________________________ 49 3.4 Section C: Creating Tables in MS SQL _________________________________ 49 3.5 Section D: Writing a data transformation program [10] _____________________ 50 3.5.1 Creating the Skeleton ______________________________________________________ 50 3.5.2 Importing Classes ________________________________________________________ 51 3.5.3 Setting up for I/O _________________________________________________________ 51 3.5.4 Implementing the ContentHandler Interface ____________________________________ 52 3.5.5 Setting up the Parser ______________________________________________________ 53 3.5.6 Writing the Output ________________________________________________________ 54 3.5.7 Spacing the Output _______________________________________________________ 54 3.5.8 Handling Content Events ___________________________________________________ 55 3.5.8.1 Document Events ____________________________________________________ 55 3.5.8.2 Element Events _____________________________________________________ 55 3.5.8.3 Character Events ____________________________________________________ 56 3.5.9 Compiling and Running the Program _________________________________________ 59 3.6 4 Auctionbase Schema and Data _______________________________________ 65 4.1 Section A: Indexes ___________________________________________________ 65 4.2 Section B: Views ____________________________________________________ 65 4.2.1 4.2.2 5 Section D: Load the data into MSSQL [13] _______________________________ 63 What is a View ? [14] ______________________________________________________ 65 Views of AuctionBase [13] __________________________________________________ 66 MSSQL Features __________________________________________________ 67 5.1 Section A: Current Time _____________________________________________ 67 5.2 Section B: Constraints and Triggers [15] [14] _______________________________ 68 5.2.1 5.2.2 What is a CONSTRAINT ? _________________________________________________ 68 What is a TRIGGER? [15] ___________________________________________________ 70 5.2.2.1.1 Triggers Compared to Constraints [15] __________________________________ 70 5.2.3 CONSTRAINTs & TRIGGERs of AuctionBase DB [13] ___________________________ 72 5.2.3.1 CONSTRAINTs of AuctionBase DB_____________________________________ 72 5.2.3.2 TRIGGERs of AuctionBase DB ________________________________________ 73 6 AuctionBase Web Site ______________________________________________ 75 6.1 Functionality _______________________________________________________ 75 6.2 Web Interface 6.3 System testing ______________________________________________________ 78 [13] ___________________________________________________ 75 7 Conclusion _______________________________________________________ 79 8 References & Resources ____________________________________________ 79 1 Introduction “XML with AuctionBase for Lab Use” is an implementation of an auction web site with its full details, consisting of the database design, its working website, supported by examples implementation details and references and links so that a student having that manual in hand can carry the project himself/herself with enough information and references. This is a project “For Lab Use” so a lot of concern is given to examples ,references and World Wide Web links, so that the student can follow the this booklet by reading the supporting material, doing its exercises and get the working knowledge and then apply it to do the project. Real data of an auction site eBay is supplied to the student in XML form, so s/he can work on the data and design his/her own database on MSSQL compatible to the 4NF. After the design of the database, some special MSSQL functionalities (e.g. CONSTRAINTS) should be learnt and applied so that dabatase consistency can be preserved and functionality can be implemented. After all these design issues, a simple web interface is necessary for the user interaction, Java (including Servlets, JSP and JDBC) implementation will be covered in this lab booklet, the assumed Operating Systems are Windows XP or Windows 2000 with the latest patches and service packs applied. Since we are working with Java and HTML code they are portable, and you can setup the final project on any java compatible platform with minor changes. This booklet is partitioned into 5 sections which can be summarized as follows. Each section has a references and links part at the end so that additional information can be found on books, documents as hardcopy or internet resources as softcopy. 1.1 MSSQL and Web Warm-up I Student will become familiar to the MSSQL and Java by implementing a very simple end-to-end system, by running queries on the database and visualizing them on HTML by conducting Java Servlets. 1.2 MSSQL and Web Warm-up II Simple end-to-end system in part i will be extended so as to include additional features of MSSQL and HTML such as input boxes, menus, parameterized queries, database updates and result browsing. 1.3 AuctionBase Schema and Data A large amount of data will be supplied to the student in XML form. Student will examine the given data and design a relational schema for it. A small program or script will be written to parse XML data and load into MSSQL by converting to an importable form. 1.4 MSSQL Features Usage of indexes and its performance issues will be experimented by the student, and some jobs and view issues will be concerned. Some advanced functionalities of MSSQL will be used to implement “current time” and other features of an auction system with identified real-world constraints. CONSTRAINTS and TRIGGERS will be applied in this part. 1.5 AuctionBase Web Site AuctionBase Web Site will be designed with necessary queries and updates on the database and adapting the Part II web interface to the new requirements and functionalities. A friendly and simple web interface is enough but the details can be implemented if time left or for bonus. 2 MSSQL and Web Warm-up Part I 2.1 Section A: Getting familiar with MSSQL Introduction to MSSQL Query Analyzer interface, connecting to the database with username and password and trying some SQL commands, creating a table, making selections over the table and dropping the table. Some experimentation with the interface. 2.1.1 Logging In to Query Analyzer Query Analyzer can be reached from the Microsoft SQL Server on Program Files menu, which will open by a window where you chose the server and then login with your username and password. Since a lot of people will be connecting to the database from the same computer in the lab or their home computer, SQL Server Authentication will be used instead of Windows Authentication. Your database administrator in the department should provide you with username and passwords with necessary privileges. 2.1.2 Creating a Table In Query Analyzer we can execute any SQL command. One simple type of command creates a table (relation). The form is CREATE TABLE <tableName> ( <list of attributes and their types> ); You may enter text on one line or on several lines. If your command runs over several lines, you should type semicolon that ends any command. An example table-creation command is: CREATE TABLE test ( i int, s char(10) ); If any command you executed is successful you will get the message “The command(s) completed successfully.” if the query does not return any results. If you want to run just a single command or a line of a command then you should select the command and click the “Run” button on the Query Analyzer. This command creates a table named test with two attributes. The first, named i, is an integer, and the second, named s, is a character string of length (up to) 10. 2.1.3 Creating a Table With a Primary Key To create a table that declares attribute a to be a primary key: CREATE TABLE <tableName> (..., a <type> PRIMARY KEY, b, ...); To create a table that declares the set of attributes (a,b,c) to be a primary key: CREATE TABLE <tableName> (<attrs and their types>, PRIMARY KEY (a,b,c)); 2.1.4 Inserting Tuples Having created a table, we can insert tuples into it. The simplest way to insert is with the INSERT command: INSERT INTO <tableName> VALUES( <list of values for attributes, in order> ); For instance, we can insert the tuple (10, 'foobar') into relation test by INSERT INTO test VALUES(10, 'foobar'); 2.1.5 Getting the Value of a Relation We can see the tuples in a relation with the command: SELECT * FROM <tableName>; For instance, after the above create and insert statements, the command SELECT * FROM test; produces the result I S ---------- ---------10 foobar 2.1.6 Getting Rid of Your Tables To remove a table from your database, execute DROP TABLE <tableName>; We suggest you execute DROP TABLE test; after trying out this sequence of commands to avoid leaving a lot of garbage around that will be still there the next time you use the MSSQL system. 2.2 Section B : Simple Web Interface and Servlets A simple introduction to web interfaces, implementation with HTML and Java. JDBC ( Java Database Connectivity) will be used for database interaction. Setting up Servlet & JSP web environment and compiling Servlets. Writing HTML and Java code for implementation of a database retrieval and display on the web interface. 2.2.1 Overview Java Servlets and JSP(Java Scrip Pages) are the Java solution for providing web-based services. They provide a interface for interacting with client queries and providing server responses. As such, discussion of much of the input and output in terms of HTML will overlap. Students will interface with MSSQL using JDBC by Java Servlets. 2.2.2 Setting up Java and Tomcat Java Servlets interact with the user through HTML forms. you'll have to run a special Servlet program of your choice on a specific port on a PC, in that project Tomcat will be used which is a part of the the Apache Jakarta Project. You can find a lot of useful documents and manuals and also download the free server for Windows platform from http://jakarta.apache.org/tomcat/ , latest version is Tomcat 5.5. Tomcat will require a Java Development Environment that should already be setup before Tomcat, so you should get the latest version from http://java.sun.com and install it. Instead of deploying the whole development environment, you can choose a smaller package according to the needs of your application, which is a web application, and deploy Java WSDP (Java Web Services Developer Pack) or J2SE which targets Desktop environments, our suggestion is to install J2SE which is a general purpose package and will be useful if you code in Java in the future other than its web services functionality. 2.2.3 Configuring & Using Apache Tomcat [2] 2.2.3.1 Summary Using Tomcat as a deployment server or integrating Tomcat as a plugin within the regular Apache server or a commercial Web server is more complicated than what is described in this tutorial. Although such integration is valuable for a deployment scenario (see http://jakarta.apache.org/tomcat/tomcat-5.5-doc/), our goal here is to show how to use Tomcat as a development server on your desktop. Regardless of what deployment server you use, you'll want a standalone server on your desktop to use for development. The examples here assume you are using Windows, but they can be easily adapted for Linux, Solaris, and other versions of Unix. Steps for Installation & Setup 1. Install the JDK. Make sure JDK 5.0 is installed and your PATH is set so that both "java -version" and "javac -help" give a result. 2. Configure Tomcat. 1. Download the software. Go to http://jakarta.apache.org/site/binindex.cgi#tomcat and download and run the latest prime time release which for the time current release build of Tomcat 5.5.4. 2. Set the JAVA_HOME variable. Set it to refer to the base JDK directory, not the bin subdirectory. 3. Change the port to 80. Edit install_dir/conf/server.xml and change the port attribute of the Connector element from 8080 to 80. 4. Turn on servlet reloading. Edit install_dir/conf/context.xml and change <Context> to <Context reloadable="true">. 5. Enable the invoker servlet. Go to install_dir/conf/web.xml and uncomment the servlet and servlet-mapping elements that map the invoker servlet to /servlet/*. 6. Set the CATALINA_HOME variable. Optionally, set CATALINA_HOME to refer to the top-level Tomcat installation directory. Not necessary unless you copy the startup scripts instead of making shortcuts to them. 3. Test the server. 1. Verify that you can start the server. Double-click install_dir/bin/startup.bat and try accessing http://localhost/. 2. Check that you can access your own HTML & JSP pages. Drop some simple HTML and JSP pages into install_dir/webapps/ROOT and access them with http://localhost/filename. 4. Set up your development environment. 1. Create a development directory. Put it anywhere except within the Tomcat installation hierarchy. 2. Make shortcuts to the Tomcat startup & shutdown Scripts. Put shortcuts to install_dir/bin/startup.bat and install_dir/bin/shutdown.bat in your development directory and/or on your desktop. 3. Set your CLASSPATH. Include the current directory ("."), the servlet / JSP JAR files (install_dir/common/lib/servlet-api.jar and install_dir/common/lib/jsp-api.jar), and your main development directory from Step 1. 4. Bookmark the servlet & JSP javadocs. Add install_dir/webapps/tomcatdocs/servletapi/index.html and install_dir/webapps/tomcatdocs/jspapi/index.html to your bookmarks/favorites list. 5. Compile and test some simple servlets. 1. Test a packageless servlet. Compile a simple servlet, put the .class file in install_dir/webapps/ROOT/WEB-INF/classes, and access it with http://localhost/servlet/ServletName. 2. Test a servlet that uses packages. Compile the servlet, put the .class file in install_dir/webapps/ROOT/WEB-INF/classes/packageName, and access it with http://localhost/servlet/packageName.ServletName. 3. Test a servlet that uses packages and utility classes. Compile a servlet, put both the servlet .class file and the utility file .class file in install_dir/webapps/ROOT/WEB-INF/classes/packageName, and access the servlet with http://localhost/servlet/packageName.ServletName. This third step verifies that the CLASSPATH includes the top level of your development directory. 6. Establish a simplified deployment method. 1. Copy to a shortcut. Make a shortcut to install_dir/webapps/ROOT. Copy packageless .class files directly there. With packages, copy the entire directory there. 2. Use the -d option of javac. Use -d to tell Java where the deployment directory is. 3. Let your IDE take care of deployment. Tell your IDE where the deployment directory is and let it copy the necessary files. 4. Use ant or a similar tool. Use the Apache make-like tool to automate copying of files. 7. Get more info. Access the complete set of Tomcat docs, get free JSP and servlet tutorials, read the official servlet and JSP specifications, get JSP-savvy editors and IDEs, look for J2EE jobs, etc. 2.2.3.2 Install the JDK Your first step is to download and install Java. The servlet 2.4 (JSP 2.0) specification requires JDK 1.3 or later; J2EE 1.5.0 (which includes servlets 2.4 and JSP 2.0) requires JDK 5.0 or later. You might as well get a recent Java version, so use JDK 5.0. If you know which of those Java versions will be used on your project, get that one. See the following sites for download and installation information. JDK 5.0 for Windows, Linux, and Solaris: http://java.sun.com/j2se/1.5.0/download.jsp Be sure you download the full SDK (Software Development Kit), not just the JRE (Java Runtime Environment). The JRE is only for running already-compiled .class files, and lacks a compiler. Once you've installed Java, confirm that everything including your PATH is configured properly by opening a DOS window and typing "java -version" and "javac -help". You should see a real result both times, not an error message about an unknown command. Or, if you use an IDE, compile and run a simple program to confirm that the IDE knows where you installed Java. 2.2.3.3 Set the JAVA_HOME Variable Next, you must set the JAVA_HOME environment variable to tell Tomcat where to find Java. Failing to properly set this variable prevents Tomcat from compiling JSP pages. This variable should list the base JDK installation directory, not the bin subdirectory. For example, on almost any version of Windows, if you installed the JDK in C:\j2sdk1.5.0, you might put the following line in your C:\autoexec.bat file. set JAVA_HOME=C:\j2sdk1.5. 0 On Windows XP, you could also go to the Start menu, select Control Panel, choose System, click on the Advanced tab, press the Environment Variables button at the bottom, and enter the JAVA_HOME variable and value directly. On Windows 2000 and NT, you do Start, Settings, Control Panel, System, then Environment. However, you can use C:\autoexec.bat on those versions of Windows also (unless a system administrator has set your PC to ignore it). 2.2.3.4 Change the Port to 80 Assuming you have no other server already running on port 80, you'll find it convenient to configure Tomcat to run on the default HTTP port (80) instead of the out-of-the-box port of 8080. Making this change lets you use URLs of the form http://localhost/blah instead of http://localhost:8080/blah. Note that you need admin privileges to make this change on Unix/Linux. Also note that some versions of Windows XP automatically start IIS on port 80. So, if you use XP and want to use port 80 for Tomcat, you may need to disable IIS (see the Administrative Tools section of the Control Panel). To change the port, edit install_dir/conf/server.xml and change the port attribute of the Connector element from 8080 to 80, yielding a result similar to that below. <Connector port="80" ... maxThreads="150" minSpareThreads="25" ... You can also: 2.2.3.5 Turn on Servlet Reloading The next step is to tell Tomcat to check the modification dates of the class files of requested servlets, and reload ones that have changed since they were loaded into the server's memory. This slightly degrades performance in deployment situations, so is turned off by default. However, if you fail to turn it on for your development server, you'll have to restart the server every time you recompile a servlet that has already been loaded into the server's memory. Since this tutorial discusses the use of Tomcat for development, this change is strongly recommended. To turn on servlet reloading, edit Edit install_dir/conf/context.xml and change <Context> to <Context reloadable="true"> 2.2.3.6 Enable the Invoker Servlet The invoker servlet lets you run servlets without first making changes to your Web application's deployment descriptor (i.e., the WEB-INF/web.xml file). Instead, you just drop your servlet into WEB-INF/classes and use the URL http://host/servlet/ServletName (or http://host/webAppName/servlet/ServletName once you start using your own Web applications. The invoker servlet is extremely convenient when you are learning and even when you are doing your initial development. You almost certainly want to enable it when learning, but you should disable it again before deploying any real applications. To enable the invoker servlet, uncomment the following servlet and servlet-mapping elements in install_dir/conf/web.xml. Do not confuse this Apache Tomcat-specific web.xml file with the standard one that goes in the WEB-INF directory of each Web application. <servlet> <servlet-name>invoker</servlet-name> <servlet-class> org.apache.catalina.servlets.InvokerServlet </servlet-class> ... </servlet> ... <servlet-mapping> <servlet-name>invoker</servlet-name> <url-pattern>/servlet/*</url-pattern> </servlet-mapping> 2.2.3.7 Test Server 2.2.3.7.1 Verify That the Server Can Start Before trying your own servlets or JSP pages, you should make sure that the server is installed and configured properly. For Tomcat, click on install_dir/bin/startup.bat (or execute install_dir/bin/startup.sh on Unix/Linux). Next, enter the URL http://localhost/ in your browser and make sure you get the Tomcat welcome page, not an error message saying that the page could not be displayed or that the server could not be found. If you chose not to change the port number to 80 as described above, you will need to use a URL like http://localhost:8080/ that includes the port number. If this does not work, there are a couple of things to check: Did the Tomcat window pop up and stay open? If not, the error messages are lost and it is hard to know what you did wrong. So, open a DOS window, go to install_dir/bin and type "catalina run" to start Tomcat without popping up a new window. Now, the error messages should help you figure out the problem (e.g., JAVA_HOME not set properly or IIS already reserving port 80). Does the server appear to be running but you cannot access the home page? Maybe your browser is using a proxy and you have not set it to bypass proxies for local addresses? To fix this: o On Internet Explorer, go to Tools, Internet Options, Connections, and LAN Settings. If the "Use a proxy server" checkbox is selected, make sure the "Bypass proxy server for local addresses" box is also selected. o On Netscape 6/7, go to the Edit menu, then select Preferences, Advanced, and Proxies. Then enter "localhost" in the textfield labeled "No Proxy for:". o On Mozilla Firefox go to Tools, Internet Options, and Connections. Make sure "localhost" is in the textfield labeled "No Proxy for:". Note that this entry is the default with Firefox, so you probably do not need to change it. To halt the server, double click on install_dir/bin/shutdown.bat. I recommend that you make shortcuts to (not copies of) the startup and shutdown scripts and place those shortcuts on the desktop or in your main development directory. If you put them on the desktop, you can assign keyboard shortcuts, which is convenient. 2.2.3.7.2 Try Some Simple HTML and JSP Pages After you have verified that the server is running, you should make sure that you can install and access simple HTML and JSP pages. This test, if successful, shows two important things. First, successfully accessing an HTML page shows that you understand which directories should hold HTML and JSP files, and what URLs correspond to them. Second, successfully accessing a new JSP page shows that the Java compiler (not just the Java virtual machine) is configured properly. Eventually, you will almost certainly want to create and use your own Web applications but for initial testing many people prefer to use the default Web application. With Tomcat and the default Web application, you put HTML and JSP pages in install_dir/webapps/ROOT or install_dir/webapps/ROOT/somePath and access them with http://localhost/filename or http://localhost/somePath/filename. For your first tests, I suggest you simply take this Hello.jsp and another simple HTML file: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <!-Simple JSP file to test server setup and configuration. --> <HTML> <HEAD><TITLE>JSP Test</TITLE></HEAD> <BODY BGCOLOR="#FDF5E6"> <H1>JSP Test</H1> Time: <%= new java.util.Date() %> </BODY></HTML> and drop them into the appropriate locations. If you put the file in the top-level directory of the default Web application (i.e., in install_dir/webapps/ROOT), access it with the URL http://localhost/Hello.jsp, respectively. If you put them in a subdirectory of install_dir/webapps/ROOT, use the URL http://localhost/directoryName/Hello.jsp, respectively. If you successfully started the server as described above, but the JSP file does not work (e.g., you get File Not Found--404--errors), you likely are using the wrong directory for the files. If the HTML file works but the JSP file fails, you probably have incorrectly specified the base JDK directory (i.e., with the JAVA_HOME variable). 2.2.3.7.3 Setup Your Development Environment The server startup script startup.bat automatically sets the server's CLASSPATH to include the standard servlet and JSP classes and the WEB-INF/classes directory (containing compiled servlets) of each Web application. But you need similar settings, or you will be unable to compile servlets in the first place. Configuring your system for servlet development involves the following four steps: 1. Creating a development directory 2. Making shortcuts to the Tomcat startup and shutdown scripts 3. Setting your CLASSPATH 4. Bookmarking the servlet & JSP javadocs Details on each step are given below. 2.2.3.7.4 Create a Development Directory The first thing you should do is create a directory in which to place the servlets and JSP pages that you develop. This directory can be in your home directory (e.g., C:\Documents and Settings\Your Name\My Documents\Servlets+JSP on Windows 2000) or in a convenient general location (e.g., C:\Servlets+JSP). It should not, however, be in the Tomcat deployment directory (e.g., anywhere within install_dir/webapps). Eventually, you will organize this development directory into different Web applications. For initial testing of your environment, however, you can just put servlets either directly in the development directory (for packageless servlets) or in a subdirectory that matches the servlet package name. Many developers simply put all their code in the server's deployment directory (within install_dir/webapps). I strongly discourage this practice and instead recommend one of the approaches described in the deployment section. Although developing in the deployment directory seems simpler at the beginning since it requires no copying of files, it significantly complicates matters in the long run. Mixing locations makes it hard to separate an operational version from a version you are testing, makes it difficult to test on multiple servers, and makes organization much more complicated. Besides, your desktop is almost certainly not the final deployment server, so you'll eventually have to develop a good system for deploying anyhow. 2.2.3.7.5 Make Shortcuts to Start and Stop the Server Since I find myself frequently restarting the server, I find it convenient to use the Tomcat Configuration Monitor on the system tray and stop and start server from the relevant service links. You can also use batch files including the command line inputs; net stop tomcat5 net start tomcat5 Put these commands in a batch file and name it restart.bat then place it in your quick launch toolbar. 2.2.3.7.6 Set Your CLASSPATH Since servlets and JSP are not part of the Java 2 platform, standard edition, you have to identify the servlet classes to the compiler. The server already knows about the servlet classes, but the compiler (i.e., javac) you use for development probably doesn't. So, if you don't set your CLASSPATH, attempts to compile servlets, tag libraries, filters, Web app listeners, or other classes that use the servlet and JSP APIs will fail with error messages about unknown classes. Here are the standard Tomcat locations: install_dir/common/lib/servlet-api.jar install_dir/common/lib/jsp-api.jar You need to include both files in your CLASSPATH. Now, in addition to the servlet JAR file, you also need to put your development directory in the CLASSPATH. Although this is not necessary for simple packageless servlets, once you gain experience you will almost certainly use packages. Compiling a file that is in a package and that uses another class in a user-defined package requires the CLASSPATH to include the directory that is at the top of the package hierarchy. In this case, that's the development directory I just discussed. Forgetting this setting is perhaps the most common mistake made by beginning servlet programmers! Finally, you should include "." (the current directory) in the CLASSPATH. Otherwise, you will only be able to compile packageless classes that are in the top-level development directory. Here are two representative methods of setting the CLASSPATH. They assume that your development directory is C:\Servlets+JSP. Replace install_dir with the actual Tomcat installation path (e.g., C:\jakarta-tomcat-5.5.4). Also, be sure to use the appropriate case for the filenames, and enclose your pathnames in double quotes if they contain spaces. Note that these examples represent only one approach for setting the CLASSPATH. Many Java integrated development environments have global or project-specific settings that accomplish the same result. But these settings are totally IDEspecific and won't be discussed here. Another alternative is to make a .bat file or ant build script whereby -classpath ... is automatically appended onto calls to javac. Windows NT/2000/XP. On WinXP, go to the Start menu and select Control Panel, then System, then the Advanced tab, then the Environment Variables button. On Win2K/WinNT, go to the Start menu and select Settings, then Control Panel, then System, then Environment. Either way, enter the CLASSPATH value from the previous bullet. 2.2.3.8 Installing JDBC Drivers for Windows XP & Windows 2000 [4] JDBC is a mechanism that allows Java to communicate with databases using a standard Application Programming Interface (API) to access databases regardless of the driver and the database product. We have to install the necessary drivers so that we are able to speak and communicate with the SQL Server through our code. Beside the JDK, as always required for Java programs, we need the Microsoft implementation of the JDBC specification. The so-called Microsoft SQL Server 2000 JDBC Driver Download Site: http://www.microsoft.com/downloads/details.aspx?FamilyID=9f1874b6f8e1-4bd6-947c-0fc5bf05bf71&displaylang=en OR search Google for keywords; “jdbc sql server windows” and follow the first link. The Microsoft® SQL Server 2000 Driver for JDBC is a Type 4 JDBC driver that provides highly scalable and reliable connectivity for the enterprise Java environment. This driver provides JDBC access to SQL Server 2000 through any Java-enabled applet, application, or application server. We can find there drivers for Windows and Unix. If we download the Windows and Solaris drivers we get the following two files: setup.exe (Windows) mssqlserver.tar (Solaris) Trying to install the setup.exe on Windows we immediately recognised that we don't know what the installation program exactly does (registry?) and additionally we get about 30 DLLs. That's not what we want! JDBC does not need any installation or environmental settings. The other thing is that we don't want to have DLLs, instead a pure Java driver, that's the thing we are looking for! Inside the Solaris mssqlserver.tar file it looks much better. Why not taking the required jar files out of this tar? If it is pure Java it will run on Windows too! ... and it does. Installation on all above platforms - Create any Directory on your system - Untar the file: mssqlserver.tar and you get: install.ksh msjdbc.tar read.me - Untar the file: msjdbc.tar and you get: lib/msbase.jar lib/mssqlserver.jar lib/msutil.jar A small Test Program: Test.java Here's a sample program, which shows how to establish a connection to Microsoft SQL Server. Don't forget to import the java.sql packages to get access to DriverManager and many other related classes and methods. Take a careful look ath the connection string where <Host> means the IP or the name of the SQL Server defined in the domain you are working on, 1433 is the default SQL Server connection port, which of course can be different in your development environment, ask your DB Administrator for relevant information. <UID> and <PWD> are of course your userid and password which youı should already have under your hand, if not contact your DB Administrator. import java.sql.*; /** * Microsoft SQL Server JDBC test program */ public class Test { public Test() throws Exception { // Get connection DriverManager.registerDriver(new com.microsoft.jdbc.sqlserver.SQLServerDriver()); Connection connection = DriverManager.getConnection( "jdbc:microsoft:sqlserver://<Host>:1433",<"UID>","<PWD>"); if (connection != null) { System.out.println(); System.out.println("Successfully connected"); System.out.println(); // Meta data DatabaseMetaData meta = connection.getMetaData(); System.out.println("\nDriver Information"); System.out.println("Driver Name: " + meta.getDriverName()); System.out.println("Driver Version: " + meta.getDriverVersion()); System.out.println("\nDatabase Information "); System.out.println("Database Name: " + meta.getDatabaseProductName()); System.out.println("Database Version: "+ meta.getDatabaseProductVersion()); } } // Test public static void main (String args[]) throws Exception { Test test = new Test(); } } Compile it Compile the Java Source: Test.java (all in one line): $ javac -classpath ".;./lib/mssqlserver.jar; ./lib/msbase.jar;./lib/msutil.jar" Test.java Be aware that you need access to a javac program on your computer or media. If not, simply specify the full path ahead of javac. The above command is good for Java 2. If your are using for instance Java 1.1.8, add your JDK's classes.zip to the classpath. On Unix systems replace the the semicolons " ; " by colons " : " The forward slashes " / " are fine for both platforms, it's not a must to use backslashes " \ " on Windows. Run it Similar to the compilation you may run it like this (again all in one line): $ java -classpath ".;./lib/mssqlserver.jar; ./lib/msbase.jar;./lib/msutil.jar" Test The output looks something like this: Successfully connected Driver Information Driver Name: SQLServer Driver Version: 2.2.0022 Database Information Database Name: Microsoft SQL Server Database Version: Microsoft SQL Server 2000 8.00.194 (Intel X86) Aug 6 2000 00:57:48 Copyright (c) 1988-2000 Microsoft Corporation Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 2) 2.2.4 Java and HTML Warm up [9] Now you have installed and fine-tuned your development environment and JDBC driver for SQL is installed, it’s time to play with it a bit, since our aim is to build a web application that utilizes a DBMS. 2.2.5 Retrieving Input from the User Input to Servlet programs is passed to the program using web forms. Forms include text fields, radio buttons, check boxes, popup boxes, scroll tables, and the like. Thus retrieving input is a two-step process: you must create an HTML document that provides forms to allow users to pass information to the server, and your Servlet program must have a means for parsing the input data and determining the action to take. This mechanism is provided for you in Java Servlets. 2.2.6 Forms Forms are designated within an HTML document by the fill-out form tag: <FORM> ... Contents of the form ... </FORM> Within the form you may have anything except another form. The tags used to create user interface objects are INPUT, SELECT, and TEXTAREA. The INPUT tag specifies a simple input interface: <INPUT TYPE="text" NAME="thisinput" VALUE="default" SIZE=10 MAXLENGTH=20> <INPUT TYPE="checkbox" NAME="thisbox" VALUE="on" CHECKED> <INPUT TYPE="radio" NAME="radio1" VALUE="1"> <INPUT TYPE="submit" VALUE="done"> <INPUT TYPE="radio" NAME="radio1" VALUE="2" CHECKED> <INPUT TYPE="hidden" NAME="notvisible" VALUE="5"> Which would produce the following form: Figure 1 The different attributes are mostly self-explanatory. The TYPE is the variety of input object that you are presenting. Valid types include "text", "password", "checkbox", "radio", "submit", "reset", and "hidden". Every input but "submit" and "reset" has a NAME which will be associated with the value returned in the input to the program. This will not be visible to the user (unless they read the HTML source). The other fields will be explained with the types: "text" - refers to a simple text entry field. The VALUE refers to the default text within the text field, the SIZE represents the visual length of the field, and the MAXLENGTH indicates the maximum number of characters the textfield will allow. There are defaults to all of these (nothing, 20, unlimited). "password" - the same as a normal text entry field, but characters entered are obscured. "checkbox" - refers to a toggle button that is independently either on or off. The VALUE refers to the string sent to the server when the button is checked (unchecked boxes are disregarded). The default value is "on". "radio" - refers to a toggle button that may be grouped with other toggle buttons such that only one in the group can be on. It's essentially the same as the checkbox, but any radio button with the same NAME attribute will be grouped with this one. "submit" and "reset" - these are the pushbuttons on the bottom of most forms you'll see that submit the form or clear it. These are not required to have a NAME, and the VALUE refers to the label on the button. The default names are "Submit Query" and "Reset" respectively. "hidden" - this input is invisible as far as the user interface is concerned (though don't be fooled into thinking this is some kind of security feature -- it's easy to find "hidden" fields by perusing a document source or examining the URL for a GET method). It simply creates an attribute/value binding without need for user action that gets passed transparently along when the form is submitted. The second type of interface is the SELECT interface, which includes popup menus and scrolling tables. Here are examples of both: <SELECT NAME="menu"> <OPTION>option 1 <OPTION>option 2 <OPTION>option 3 <OPTION SELECTED>option 4 <OPTION>option 5 <OPTION>option 6 <OPTION>option 7 </SELECT> Figure 2 The SIZE attribute determines whether it is a menu or a scrolled list. If it is 1 or it is absent, the default is a popup menu. If it is greater than 1, then you will see a scrolled list with SIZE elements. The MULTIPLE option, which forces the select to be a scrolled list, signifies that a more than one value may be selected (by default only one value can be selected in a scrolled list). OPTION is more or less self-explanatory -- it gives the names and values of each field in the menu or scrolled table, and you can specify which are SELECTED by default. 2.2.7 Java Server-Side Input Handling [10] The parsing of the input is done for you by Java, so you are separated from the actual format of the input data completely. Your program will be an object subclassed off of HttpServlet, the generalized Java Servlet class for handling web services. Servlet programs must override the doGet() or doPost() messages, which are methods that are executed in response to the client. There are two arguments to these methods, HttpServletRequest request and HttpServletResponse response. Let's take a look at a very simple servlet program, the traditional HelloWorld (this time with a doGet method): import java.io.*; import java.text.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*; public class Hello extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType("text/html"); PrintWriter out = response.getWriter(); out.println("<html>"); out.println("<head>"); String title = "Hello World"; out.println("<title>" + title + "</title>"); out.println("</head>"); out.println("<body bgcolor=white>"); out.println("<h1>" + title + "</h1>"); String param = request.getParameter("param"); if (param != null) out.println("Thanks for the lovely param='" + param + "' binding."); out.println(""); out.println(""); } } We'll discuss points in this code again in the section on Java Output, but for now, we will focus on the input side. The argument HttpServletRequest request represents the client request, and the values of the parameters passed from the HTML FORM can be retrieved by calling the HttpServletRequest getParameter method. This method takes as its argument the name of the parameter (the name of the HTML INPUT object), and returns as a Java String the value assigned to the parameter. In cases where the parameter may have multiple bindings, the method getParameterValues can be used to retrieve the values in an array of Java Strings -- note that getParameter will return the first value of this array. It is through these mechanisms that you can retrieve any of the values entered or implicit in the form. As might be inferred from the example above, Java returns null if the parameter for whose name you request does not have a value. Recall that unchecked buttons' bindings are not passed in a POST message -- you can check for null to determine when buttons are off. 2.2.8 Returning Output to the User In your project, you are going to be concerned with returning HTML documents to the user. The documents will be dynamically created based on the output of the query. You can format it however you like, using ordinary HTML formatting routines. Before we get into gettint an output with Java, let’s look to a very simple Hello World servlet: /******************************************************************** * * Hello.java * * A simple sevlet that returns a single page. * It looks for a binding called "param" and if present incorporates * it into its response. * ********************************************************************/ import import import import import java.io.*; java.text.*; java.util.*; javax.servlet.*; javax.servlet.http.*; public class Hello extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType("text/html"); PrintWriter out = response.getWriter(); out.println("<html>"); out.println("<head>"); String title = "Hello World"; out.println("<title>" + title + "</title>"); out.println("</head>"); out.println("<body bgcolor=white>"); out.println("<h1>" + title + "</h1>"); String param = request.getParameter("param"); if (param != null) { out.println("Thanks for the lovely param='" + param + "' binding."); } out.println("</body>"); out.println("</html>"); } } 2.2.9 Java Code Output When we look at our Java code example. Output is all handled by the HttpServletResponse object, which allows you to set the content type through the setContentType method. Instead of printing the HTTP header yourself, you tell the HttpServletResponse object that you want the content type to be "text/html" explicitly. All HTML is returned to the user through a PrintWriter object, that is retrieved from the response object using the getWriter method. HTML code is then returned line by line using the println method. Assuming that you all have a basic background in Java, so we won't provide a detailed treatment of exceptions here, but do note that IOException and ServletException both must either be handled or thrown. 2.2.10 A Complete JDBC Example [5] Running through a simple, but complete, example will help you grasp the overall concepts of JDBC. The fundamental issues encountered when writing any database application are: Creating a database. You can either create the database outside of Java, via tools supplied by the database vendor, or via SQL statements fed to the database from a Java program. Connecting to a data source. Java you can use either the JDBC to ODBC bridge, or JDBC and a vendor-specific bridge to connect to the datasource, in this project we are going to use MS SQL Driver for JDBC which is a vendor-specific bridge. Inserting information into a database. Again, you can either enter data outside of Java, using database-specific tools, or with SQL statements sent by a Java program. Selectively retrieving information. You use SQL commands from Java to get results and then use Java to display or manipulate that data. 2.2.10.1 Creating a Database For this example, consider the scenario of tracking coffee usage at the Boğaziçi University Computer Enginnering Department. A weekly report must be generated for University management that includes total coffee sales and the maximum coffee consumed by a programmer in one day. Here is the data: Coffee Consumption at CMPE Dept, Boğaziçi University "Caffeinating the World, one programmer at a time" Programmer Day # Cups Gilbert Mon 1 Wally Mon 2 Edgar Tue 8 Wally Tue 2 Eugene Tue 3 Josephine Wed 2 Eugene Thu 3 Gilbert Thu 1 Clarence Fri 9 Edgar Fri 3 Josephine Fri 4 To create this database, you can feed SQL statements to MS SQL via the JDBC-MS SQL bridge. To enter the data into the CafeJolt database, create a Java application that follows these steps: 1. Load the JDBC Driver for MS SQL. You must load a driver that tells the JDBC classes how to talk to a data source. Class.forName("com.microsoft.jdbc.sqlserver.SQLServerDriver"); 2. Connect to a data source. A URL is used to connect to a particular JDBC data source. Using the DriverManager class, you request a connection to a URL and the DriverManager selects the appropriate driver; here, only the driver MS SQL is loaded. Connection con = DriverManager.getConnection( URL, username, password); 3. Send SQL statements to create the table. Ask the connection object for a Statement object: Statement stmt = con.createStatement(); Then, execute the following SQL statement to create a table called JoltData. create table JoltData ( programmer varchar (32), day char (3), cups integer, variety varchar (20)); The Java code to do this is: stmt.execute( "create table JoltData ("+ "programmer varchar (32),"+ "day char (3),"+ "cups integer);" ); After you have created the table, you can the insert the appropriate values such as: insert into JoltData values ('Gilbert', 'Mon', 1); insert into JoltData values ('Wally', 'Mon', 2); insert into JoltData values ('Edgar', 'Tue', 8); ... Review what you have done so far. After creating a data source visible to ODBC, you connected to that source via the JDBC-MS SQL Driver and sent a series of SQL statements to create a table called JoltData filled with rows of data. 2.2.10.2 Getting Information from a Database To retrieve information from a database, use SQL select statements via the Java Statement.executeQuery method, which returns results as rows of data in a ResultSet object. The results are examined row-by-row using the ResultSet.next and ResultSet.getXXX methods. Consider how you would obtain the maximum number of cups of coffee consumed by a programmer in one day. In terms of SQL, one way to get the maximum value is to sort the table by the cups column in descending order. The programmer column is selected, so the name attached to the most coffee consumption can also be printed. Use the SQL statement: SELECT programmer, cups FROM JoltData ORDER BY cups DESC; From Java, execute the statement with: ResultSet result = stmt.executeQuery( "SELECT programmer, cups FROM JoltData ORDER BY cups DESC;"); The cups column of the first row of the result set will contain the largest number of cups: Clarence 9 Edgar 8 Josephine 4 Eugene 3 Eugene 3 Edgar 3 Wally 2 Wally 2 Josephine 2 Examine the ResultSet by: Gilbert 1 Gilbert 1 1. "Moving" to the first row of data. Perform: 2. 3. result.next(); 4. Extracting data from the columns of that row. Perform: 5. 6. 7. String name = result.getString("programmer"); int cups = result.getInt("cups"); The information can be printed easily via: System.out.println("Programmer "+name+ " consumed the most coffee: "+cups+" cups."); resulting in the following output: Programmer Clarence consumed the most coffee: 9 cups. Computing the total sales for the week is a matter of adding up the cups column. Use an SQL select statement to retrieve the cups column: result = stmt.executeQuery( "SELECT cups FROM JoltData;"); Peruse the results by calling method next until it returns false, indicating that there are no more rows of data: // for each row of data cups = 0; while(result.next()) { cups += result.getInt("cups"); } Print the total number of cups sold: System.out.println("Total sales of "+cups+" cups of coffee."); The output should be: Total sales of 38 cups of coffee. 2.2.10.3 Obtaining Result MetaData Type Information You will occasionally need to obtain type information about the result of a query. For example, the SQL statement: SELECT * from JoltData will return a ResultSet with the same number of columns (and rows) as the table, JoltData. If you do not know how many columns there are beforehand, you must use metadata via the ResultSetMetaData class to find out. Continuing the Cafe Jolt scenario, determine the number and type of columns returned by the same SQL query SELECT programmer, cups FROM JoltData ORDER BY cups DESC; First, perform the usual execute method call: ResultSet result = stmt.executeQuery( "SELECT programmer, cups FROM JoltData ORDER BY cups DESC;"); Then obtain the column and type metadata from the ResultSet: ResultSetMetaData meta = result.getMetaData(); You can query the ResultSetMetaData easily to determine how many columns there are: int columns = meta.getColumnCount(); and then walk the list of columns printing out their name and type: int numbers = 0; for (int i=1;i<=columns;i++) { System.out.println (meta.getColumnLabel(i) + "\t" + meta.getColumnTypeName(i)); if (meta.isSigned(i)) { // is it a signed number? numbers++; } } System.out.println ("Columns: " + columns + " Numeric: " + numbers); 2.2.11 Handling Special Characters in HTML The special characters &, <, and >, need to be escaped as &, <, and >, respectively in HTML text (see NCSA Beginner's Guide to HTML). Moreover, special characters appearing in URL's need to be escaped, differently than when they appear in HTML text. For example, if you link on text with special characters and want to embed them into extended URLs as parameter values, you need to escape them: convert space to + or %20, convert & to %26, convert = to %3D, convert % to %25, etc. (In general, any special character can be escaped by a percent sign followed by the character's hexadecimal ASCII value.) Important: Do NOT escape the & that actually separates parameters! For example, if you want two parameters p1 and p2 to have the values 3 and M&M, you should write something like: http://cgi-courses.stanford.edu/~username/cgi-bin/cgiprog?p1=3&p2=M%26M Be careful not to confuse the escape strings for HTML text with those for URL's. 3 MSSQL and Web Warm-up Part II After the first warm-up part and getting acquianted with the web interface and servlet coding, it’s time to get in touch with the real data, which is in XML form. We’ll analyze and work on that real data from eBay and desing our AuctionBase database accordingly, and bulk load the data into the database. 3.1 Preliminary Information about XML and DTD’s 3.1.1 What is XML ? XML is a markup language for documents containing structured information. Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure. A markup language is a mechanism to identify structures in a document. The XML specification defines a standard way to add markup to documents. 3.1.2 What Do XML Documents Look Like? [12] If you are conversant with HTML, XML documents will look familiar. A simple XML document is presented in Example 1. Example: A Simple XML Document <?xml version="1.0"?> <oldjoke> <burns>Say <quote>goodnight</quote>, Gracie.</burns> <allen><quote>Goodnight, Gracie.</quote></allen> <applause/> </oldjoke> A few things may stand out to you: The document begins with a processing instruction: <?xml ...?>. This is the XML declaration. While it is not required, its presence explicitly identifies the document as an XML document and indicates the version of XML to which it was authored. There's no document type declaration. Unlike SGML, XML does not require a document type declaration. However, a document type declaration can be supplied, and some documents will require one in order to be understood unambiguously. Empty elements (<applause/> in this example) have a modified syntax. While most elements in a document are wrappers around some content, empty elements are simply markers where something occurs (a horizontal rule for HTML's <hr> tag, for example). The trailing /> in the modified syntax indicates to a program processing the XML document that the element is empty and no matching end-tag should be sought. Since XML documents do not require a document type declaration, without this clue it could be impossible for an XML parser to determine which tags were intentionally empty and which had been left empty by mistake. XML has softened the distinction between elements which are declared as EMPTY and elements which merely have no content. In XML, it is legal to use the empty-element tag syntax in either case. It's also legal to use a start-tag/endtag pair for empty elements: <applause></applause>. If interoperability is of any concern, it's best to reserve empty-element tag syntax for elements which are declared as EMPTY and to only use the empty-element tag form for those elements. XML documents are composed of markup and content. There are six kinds of markup that can occur in an XML document: elements, entity references, comments, processing instructions, marked sections, and document type declarations. The following sections introduce each of these markup concepts. 3.1.3 DTD (Data Type Definitions) for XML [11] An XML document primarily consists of a strictly nested hierarchy of elements with a single root. Elements can contain character data, child elements, or a mixture of both. In addition, they can have attributes. Child character data and child elements are strictly ordered; attributes are not. For example: <?xml version="1.0" ?> <Book Author="Anonymous"> <Title>Sample Book</Title> <Chapter id="1"> This is chapter 1. It is not very long or interesting. </Chapter> <Chapter id="2"> This is chapter 2. Although it is longer than chapter 1, it is not any more interesting. </Chapter> </Book> The names of the elements and attributes and their order in the hierarchy (among other things) form the XML markup language used by the document. This language can be defined by the document author or it can be inferred from the document's structure. In the example shown above, the language contains three elements: Book, Title, and Chapter. The Book element contains a single Title element and one or more Chapter elements. The Book element has an Author attribute and the Chapter element has an id attribute. The main reason to explicitly define the language is so that documents can be checked to conform to it. For example, if we defined a grammar for the Book language, authors using this grammar could use a validating parser to ensure that their documents conformed to the language. An XML markup language is defined in a Document Type Definition (DTD). The DTD is either contained in a <!DOCTYPE> tag, contained in an external file and referenced from a <!DOCTYPE> tag, or both. For example, the document shown above could contain the following <!DOCTYPE> tag: <!DOCTYPE Book [ <!ELEMENT Book (Title, Chapter+)> <!ATTLIST Book Author CDATA #REQUIRED> <!ELEMENT Title (#PCDATA)> <!ELEMENT Chapter (#PCDATA)> <!ATTLIST Chapter id ID #REQUIRED> ]> 3.1.3.1 Elements 1) An element is defined as a group of one or more subelements/subgroups, character data, EMPTY, or ANY. For example: Group: <!ELEMENT A (B, C)> Character data: <!ELEMENT A (#PCDATA)> EMPTY: <!ELEMENT A EMPTY> ANY: <!ELEMENT A ANY> 2) Elements defined as groups of subelements/subgroups constitute non-terminals in the language. Elements defined as character data, EMPTY, or ANY constitute terminals. For example: <!-- Element A is a non-terminal. --> <!ELEMENT A (B)> <!-- Element B is a terminal. --> <!ELEMENT B (#PCDATA)> Although it is legal to define a language containing non-terminals that never resolve to terminals, such as one with purely circular definitions, it is generally impossible and/or useless to create any valid documents for such languages. 3) Groups can be either a sequence or choice of subelements and/or subgroups. For example: Sequence: <!-- Element A consists of a single element B. --> <!ELEMENT A (B)> <!-- Element A consists of element B followed by element C. --> <!ELEMENT A (B, C)> <!-- Element A consists of a sequence, including a choice subgroup. --> <!ELEMENT A (B, (C | D), E> Choice: <!-- Element A consists of either element B or element C. -> <!ELEMENT A (B | C)> <!-- Element A consists of a choice, including a sequence subgroup. --> <!ELEMENT A (B | C | (D, E))> 4) Optional (?), one-or-more (+), and zero-or-more (*) operators can be applied to groups, subgroups, and subelements. For example: Optional: <!-- Subelement B is optional. --> <!ELEMENT A (B?, C)> One or more: <!-- Subgroup (C | D) occurs one or more times. --> <!ELEMENT A (B, (C | D)+, E)> Zero or more: <!-- Group (B, C) occurs zero or more times, i.e. A can be empty. --> <!ELEMENT A (B, C)*> 5) Elements containing character data can be declared as containing only character data: <!ELEMENT A (#PCDATA)> or as containing a mixture of character data and elements in any order: <!ELEMENT A (#PCDATA | B | C)*> In the latter case, the declaration must place #PCDATA first in the group, the group must be a choice, and the group must appear zero or more times. Such groups are generally referred to as "mixed content" (as opposed to element-only groups or "element content"). Technically, mixed content refers to any element containing character data. However, in common usage it refers only to the latter case. Note: "PCDATA" in the declarations is short for "Parsed Character DATA". The term is inherited from SGML and comes from the fact that the text in the XML document following the element tag is parsed looking for more markup tags. Although it is possible to include unparsed character data through the use of CDATA sections, these can occur only where PCDATA occurs. While this is of interest to parser writers, it does not affect the syntax of DTDs, nor does it affect the resulting elements -- they still contain character data. 6) EMPTY means that the element has no child elements or character data. Empty elements often have attributes -- see below. 7) ANY means that the element can contain zero or more child elements of any declared type, as well as character data. It is therefore a shorthand for mixed content containing all declared elements. 3.1.3.2 Attributes 1) Elements can have zero or more attributes. For example: <!ELEMENT A (#PCDATA)> <!-- Declare an attribute a for element A --> <!ATTLIST A a CDATA #IMPLIED> 2) A single ATTLIST statement can declare multiple attributes for the same element. Multiple ATTLIST statements can declare attributes for the same element. That is, the following are equivalent: Single ATTLIST statement declaring multiple attributes for an element: <!-- Element A has attributes a and b --> <!ATTLIST A a CDATA #IMPLIED b CDATA #IMPLIED> Multiple ATTLIST statements declaring attributes for the same element: <!-- Element A has attributes a and b --> <!ATTLIST A a CDATA #IMPLIED> <!ATTLIST A b CDATA #IMPLIED> 3) Attributes can be optional, required, or have a fixed value. Optional attributes can have a default; fixed attributes must have a default. For example: Optional without a default: <!-- Element A has an attribute a. #IMPLIED = "optional, no default" --> <!ATTLIST A a CDATA #IMPLIED> Optional with a default: <!-- If attribute a is not provided, a default of "aaa" will be used. --> <!ATTLIST A a CDATA "aaa"> Required: <!ATTLIST A a CDATA #REQUIRED> Fixed: <!-- The value of attribute a is always "aaa" --> <!ATTLIST A a CDATA #FIXED "aaa"> 4) Each attribute has a type: Character data: <!ATTLIST A a CDATA #IMPLIED> A user-defined enumerated type: <!-- Attribute a uses a simple enumeration. --> <!ATTLIST A a (yes | no) #IMPLIED> <!-- Attribute a uses an enumeration of notation types. See the XML specification for complete details. --> <!ATTLIST A a NOTATION (ps | pdf) #IMPLIED> ID, IDREF: These attributes point from one element to another. The value of the IDREF attribute on the pointing element is the same as the value of the ID attribute on the pointed-to element. <!-- Attribute id gives the ID of element A --> <!ATTLIST A id ID #REQUIRED> <!-- Attribute ref points to the ID of another element --> <!ATTLIST A ref IDREF #IMPLIED> ENTITY, ENTITIES. These attributes point to external data in the form of unparsed entities. For complete details, see the XML specification. <!-- Attribute a points to a single unparsed entity --> <!ATTLIST A a ENTITY #IMPLIED> <!-- Attribute b points to multiple unparsed entities --> <!ATTLIST A b ENTITIES #IMPLIED> NMTOKEN, NMTOKENS. These attributes have single/multiple tokens as values. <!ATTLIST A a NMTOKEN #IMPLIED> <!ATTLIST A b NMTOKENS #IMPLIED> 3.1.3.3 Comments 1) DTDs can contain comments. Comments are delimited by <!-- and -->. For example: <!-- This is a comment in an XML file. --> 3.2 Section A: Examining XML Files Examining the XML and DTD files to make sure that data we are working is fully understood, then this data will be transformed into a relational schema and loaded into the AuctionBase system. One of the most important aspect of that data is it represents a single point in time, starting with 1st January 2005, one second after midnight. It will contain items auctioned in the past and are open to bid right now. 3.2.1 Sample eBay Data This auction data is gathered from real auctions on ebay.com, crawled by Yuan Wang at University of Wisconsin. This is a somewhat-random, somewhat-edited sample of the data. The Buy_Price and Bids elements were synthetically generated, but all other data is real. Web address for the data is http://www-2.cs.cmu.edu/~olston/streamseminar/project.html , follow the links down of the page for the dtd and other item infos contained in a 10MB zip file. The data is organized into files items-*.xml, where each items-*.xml file contains information about 500 auctions. Each auction corresponds to one Item element, which gives complete information about the auction. The files conform to the DTD given in the file data-xml/items.dtd, reproduced here: <!ELEMENT Items <!ELEMENT Item (Item*)> (Name, Category+, Currently, Buy_Price?, First_Bid, Quantity?, Number_of_Bids, Bids, Location, Country, Started, Ends, Seller, Description)> <!ATTLIST Item ItemID CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> <!ELEMENT Category (#PCDATA)> <!ELEMENT Currently (#PCDATA)> <!ELEMENT Buy_Price (#PCDATA)> <!ELEMENT First_Bid (#PCDATA)> <!ELEMENT Quantity (#PCDATA)> <!ELEMENT Number_of_Bids (#PCDATA)> <!ELEMENT Bids (Bid*)> <!ELEMENT Bid (Bidder, Time, Amount, Quantity?)> <!ATTLIST Bidder UserID CDATA #REQUIRED Rating CDATA #REQUIRED> <!ELEMENT Bidder (Location?, Country?)> <!ELEMENT Time (#PCDATA)> <!ELEMENT Amount (#PCDATA)> <!ELEMENT Location (#PCDATA)> <!ELEMENT Country (#PCDATA)> <!ELEMENT Started (#PCDATA)> <!ELEMENT Ends (#PCDATA)> <!ELEMENT Seller EMPTY> <!ATTLIST Seller UserID CDATA #REQUIRED Rating CDATA #REQUIRED> <!ELEMENT Description (#PCDATA)> The meaning of each element and attribute is explained below: ItemID (attribute): An identifier unique across all items. Name: A short item description used as the auction's title. Category: A category to which the item belongs. An item may belong to multiple categories. Currently: The current highest bid. This amount is always equal to the amount of the highest bid, or First_Bid if there are no bids. Buy_Price: This element was synthetically generated. First_Bid: The minimum qualifying first-bid amount, as determined by the seller before the auction starts. It does not mean there is a bid at all. Quantity: The number of copies of the item up for sale. Usually this number is 1, although some auctions are for multiple copies. In such auctions, each bidder may bid on more than 1 copy, and there may be multiple winners. (Note that auction winners are not encoded in our data. It is up to you to determine winners.) Assumed 1 if missing. Number_of_Bids: Number of Bids/Bid elements, each corresponding to a bid. Bids: This element was synthetically generated. Bids/Bid/Bidder: Attribute UserID uniquely identifies a user. Attribute Rating is the user's rating. Note that a user may be a bidder in one auction and a seller in another. However, his Rating, Location, and Country information are the same wherever he appears in our data (which reflects a snapshot in time). Note this implies that UserID's with missing location or country information cannot be sellers in another auction. Bids/Bid/Time: The time the bid was placed. Note that bids must be placed after the auction starts and before it ends. A user may bid on an item multiple times, but not at the same time. Bids/Bid/Amount: Bid amount. If bid quantity is greater than 1, this is the price per copy. Bids/Bid/Quantity: The number of copies bid on. Must be less than or equal to the number of copies up for auction. Assumed 1 if missing. Location: The seller's location information (e.g., city, state). See comment under Bids/Bid/Bidder. Country: Seller's country. See comment under Bids/Bid/Bidder. Started: Auction start time. Ends: Auction end time. If this is in the past with respect to the current system time, the auction is closed. If in the future, the auction is still open. Seller: Attributes give the seller's UserID and rating. Description: The item's full description. All money values are in the form $x,xxx.xx and are in US dollars. All times are in 24hour format. See the actual data for the exact time format. The auctions in the data set range in time from November to December of 2001. Both open and closed auctions are included, and it is up to you to determine which auctions are still open based on the current system time, taken to be Dec. 20, 2001 00:00:01. Times in the data are consistent with the current system time, so all bid times and auction start times are earlier. Example XML file we will be working on is from yahoo,an instance of a single item is as follows: eBay1.xml - <Items> - <Item ItemID="1043374545"> <Name>christopher radko | fritz n_ frosty sledding</Name> <Category>Collectibles</Category> <Category>Decorative & Holiday</Category> <Category>Decorative by Brand</Category> <Category>Christopher Radko</Category> <Currently>$30.00</Currently> <First_Bid>$30.00</First_Bid> <Number_of_Bids>0</Number_of_Bids> <Bids /> <Location>its a dry heat</Location> <Country>USA</Country> <Started>Dec-03-01 18:10:40</Started> <Ends>Dec-13-01 18:10:40</Ends> <Seller UserID="rulabula" Rating="1035" /> <Description>brand new beautiful handmade european blown glass ornament from christopher radko. this particular ornament features a snowman paired with a little girl bundled up in here pale blue coat sledding along on a silver and blue sled filled with packages. the ornament is approximately 5_ tall and 4_ wide. brand new and never displayed, it is in its clear plastic packaging and comes in the signature black radko gift box. PLEASE READ CAREFULLY!!!! payment by cashier's check, money order, or personal check. personal checks must clear before shipping. the hold period will be a minimum of 14 days. I ship with UPS and the buyer is responsible for shipping charges. the shipping rate is dependent on both the weight of the package and the distance that package will travel. the minimum shipping/handling charge is $6 and will increase with distance and weight. shipment will occur within 2 to 5 days after the deposit of funds. a $2 surcharge will apply for all USPS shipments if you cannot have or do not want ups service. If you are in need of rush shipping, please let me know and I_will furnish quotes on availability. the BUY-IT-NOW price includes free domestic shipping (international winners and residents of alaska and hawaii receive a credit of like value applied towards their total) and, as an added convenience, you can pay with paypal if you utilize the feature. paypal is not accepted if you win the auction during the course of the regular bidding-I only accept paypal if the buy it now feature is utilized. thank you for your understanding and good luck! Free Honesty Counters powered by Andale! Payment Details See item description and Payment Instructions, or contact seller for more information. Payment Instructions See item description or contact seller for more information.</Description> </Item> </Items> DTD (Data Type Definition) of the sample XML file is as follows eBay.dtd <!ELEMENT Items <!ELEMENT Item (Item*)> (Name, Category+, Currently, Buy_Price?, First_Bid, Quantity?, Number_of_Bids, Bids, Location, Country, Started, Ends, Seller, Description)> <!ATTLIST Item ItemID CDATA #REQUIRED> <!ELEMENT Name (#PCDATA)> <!ELEMENT Category (#PCDATA)> <!ELEMENT Currently (#PCDATA)> <!ELEMENT Buy_Price (#PCDATA)> <!ELEMENT First_Bid (#PCDATA)> <!ELEMENT Quantity (#PCDATA)> <!ELEMENT Number_of_Bids (#PCDATA)> <!ELEMENT Bids (Bid*)> <!ELEMENT Bid (Bidder, Time, Amount, Quantity?)> <!ATTLIST Bidder UserID CDATA #REQUIRED Rating CDATA #REQUIRED> <!ELEMENT Bidder (Location?, Country?)> <!ELEMENT Time (#PCDATA)> <!ELEMENT Amount (#PCDATA)> <!ELEMENT Location (#PCDATA)> <!ELEMENT Country (#PCDATA)> <!ELEMENT Started (#PCDATA)> <!ELEMENT Ends (#PCDATA)> <!ELEMENT Seller EMPTY> <!ATTLIST Seller UserID CDATA #REQUIRED Rating CDATA #REQUIRED> <!ELEMENT Description (#PCDATA)> 3.3 Section B: Designing Relational Schema 1. Designing a relational schema, with all the keys auctions(item_id,name, seller, current_price, buy_price, min_price, starts, ends,description) bids(item_id,bidder,time,amount money) itemInCategory(item_id,cat_id) categories(cat_id,name) users(user_id,location,country,rating,last_rated) 2. Listing all nontrivial functional dependencies that hold on each relation, excluding those that effectively specify keys. In auctions item_id name, seller, current_price, buy_price, min_price, starts, ends,description In bids item_id,bidder,time amount In categories cat_id name In users user_id location,country,rating,last_rated 3.4 Section C: Creating Tables in MS SQL After completing the Relational Schema it’s time to create our tables on MS SQL Query Analyzer by issuing the following commands. CREATE TABLE auctions( item_id int PRIMARY KEY, name varchar(255) NOT NULL, seller varchar(50) NOT NULL, current_price money NOT NULL, buy_price money, min_price money NOT NULL, starts datetime NOT NULL, ends datetime NOT NULL, description varchar(4000) ); CREATE TABLE; CREATE TABLE itemInCategory( item_id int, cat_id int, PRIMARY KEY(item_id, cat_id) ); CREATE TABLE categories( cat_id int PRIMARY KEY, name varchar(50) ); CREATE TABLE users( user_id varchar(50) PRIMARY KEY, location varchar(120), country varchar(50), rating int, last_rated datetime ); 3.5 Section D: Writing a data transformation program [10] We will use the internal SAX XML Parser that comes with the Java Package and going to write a parser in Java that will take XML data and put it into MSSQL loader format. Coding should eliminate duplicate entries or rely on MSSQL giving errors during the loading but continuing the process. In this exercise, you'll echo SAX parser events to System.out. Consider it the "Hello World" version of an XML-processing program. It shows you how to use the SAX parser to get at the data, and then echoes it to show you what you've got. 3.5.1 Creating the Skeleton Start by creating a file named Echo.java and enter the skeleton for the application: public class Echo { public static void main(String argv[]) { } } Since we're going to run it standalone, we need a main method. And we need commandline arguments so we can tell the application which file to echo. 3.5.2 Importing Classes Next, add the import statements for the classes the application will use: import import import import import import public { ... java.io.*; org.xml.sax.*; org.xml.sax.helpers.DefaultHandler; javax.xml.parsers.SAXParserFactory; javax.xml.parsers.ParserConfigurationException; javax.xml.parsers.SAXParser; class Echo The classes in java.io, of course, are needed to do output. The org.xml.sax package defines all the interfaces we use for the SAX parser. The SAXParserFactory class creates the instance we use. It throws a ParserConfigurationException if it is unable to produce a parser that matches the specified configuration of options. (You'll see more about the configuration options later.) The SAXParser is what the factory returns for parsing, and the DefaultHandler defines the class that will handle the SAX events that the parser generates. 3.5.3 Setting up for I/O The first order of business is to process the command line argument, get the name of the file to echo, and set up the output stream. Add the text highlighted below to take care of those tasks and do a bit of additional housekeeping: public static void main(String argv[]) { if (argv.length != 1) { System.err.println("Usage: cmd filename"); System.exit(1); } try { // Set up output stream out = new OutputStreamWriter(System.out, "UTF8"); } catch (Throwable t) { t.printStackTrace(); } System.exit(0); } static private Writer out; When we create the output stream writer, we are selecting the UTF-8 character encoding. We could also have chosen US-ASCII, or UTF-16, which the Java platform also supports. 3.5.4 Implementing the ContentHandler Interface The most important interface for our current purposes is the ContentHandler interface. That interface requires a number of methods that the SAX parser invokes in response to different parsing events. The major event handling methods are: startDocument, endDocument, startElement, endElement, and characters. The easiest way to implement that interface is to extend the DefaultHandler class, defined in the org.xml.sax.helpers package. That class provides do-nothing methods for all of the ContentHandler events. Enter the code highlighted below to extend that class: public class Echo extends DefaultHandler { ... } Note: DefaultHandler also defines do-nothing methods for the other major events, defined in the DTDHandler, EntityResolver, and ErrorHandler interfaces. You'll learn more about those methods as we go along. Each of these methods is required by the interface to throw a SAXException. An exception thrown here is sent back to the parser, which sends it on to the code that invoked the parser. In the current program, that means it winds up back at the Throwable exception handler at the bottom of the main method. When a start tag or end tag is encountered, the name of the tag is passed as a String to the startElement or endElement method, as appropriate. When a start tag is encountered, any attributes it defines are also passed in an Attributes list. Characters found within the element are passed as an array of characters, along with the number of characters (length) and an offset into the array that points to the first character. 3.5.5 Setting up the Parser Now (at last) you're ready to set up the parser. Add the text highlighted below to set it up and get it started: public static void main(String argv[]) { if (argv.length != 1) { System.err.println("Usage: cmd filename"); System.exit(1); } // Use an instance of ourselves as the SAX event handler DefaultHandler handler = new Echo(); // Use the default (non-validating) parser SAXParserFactory factory = SAXParserFactory.newInstance(); try { // Set up output stream out = new OutputStreamWriter(System.out, "UTF8"); // Parse the input SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv[0]), handler ); } catch (Throwable t) { t.printStackTrace(); } System.exit(0); } With these lines of code, you created a SAXParserFactory instance, as determined by the setting of the javax.xml.parsers.SAXParserFactory system property. You then got a parser from the factory and gave the parser an instance of this class to handle the parsing events, telling it which input file to process. Note: The javax.xml.parsers.SAXParser class is a wrapper that defines a number of convenience methods. It wraps the (somewhat-less friendly) org.xml.sax.Parser object. If needed, you can obtain that parser using the SAXParser's getParser() method. 3.5.6 Writing the Output The ContentHandler methods throw SAXExceptions but not IOExceptions, which can occur while writing. The SAXException can wrap another exception, though, so it makes sense to do the output in a method that takes care of the exception-handling details. Add the code highlighted below to define an emit method that does that: static private Writer out; private void emit(String s) throws SAXException { try { out.write(s); out.flush(); } catch (IOException e) { throw new SAXException("I/O error", e); } } ... When emit is called, any I/O error is wrapped in SAXException along with a message that identifies it. That exception is then thrown back to the SAX parser. You'll learn more about SAX exceptions later on. For now, keep in mind that emit is a small method that handles the string output. (You'll see it called a lot in the code ahead.) 3.5.7 Spacing the Output Here is another bit of infrastructure we need before doing some real processing. Add the code highlighted below to define a nl() method that writes the kind of line-ending character used by the current system: private void emit(String s) ... } private void nl() throws SAXException { String lineEnd = System.getProperty("line.separator"); try { out.write(lineEnd); } catch (IOException e) { throw new SAXException("I/O error", e); } } Note: Although it seems like a bit of a nuisance, you will be invoking nl() many times in the code ahead. Defining it now will simplify the code later on. It also provides a place to indent the output when we get to that section of the tutorial. 3.5.8 Handling Content Events Finally, let's write some code that actually processes the ContentHandler events. 3.5.8.1 Document Events Add the code highlighted below to handle the start-document and end-document events: static private Writer out; public void startDocument() throws SAXException { emit("<?xml version='1.0' encoding='UTF-8'?>"); nl(); } public void endDocument() throws SAXException { try { nl(); out.flush(); } catch (IOException e) { throw new SAXException("I/O error", e); } } private void echoText() ... Here, you are echoing an XML declaration when the parser encounters the start of the document. Since you set up the OutputStreamWriter using the UTF-8 encoding, you include that specification as part of the declaration. 3.5.8.2 Element Events Now for the interesting stuff. Add the code highlighted below to process the start-element and end-element events: public void startElement(String namespaceURI, String sName, // simple name String qName, // qualified name Attributes attrs) throws SAXException { String eName = sName; // element name if ("".equals(eName)) eName = qName; // not namespaceAware emit("<"+eName); if (attrs != null) { for (int i = 0; i < attrs.getLength(); i++) { String aName = attrs.getLocalName(i); // Attr name if ("".equals(aName)) aName = attrs.getQName(i); emit(" "); emit(aName+"=\""+attrs.getValue(i)+"\""); } } emit(">"); } public void endElement(String namespaceURI, String sName, // simple name String qName // qualified name ) throws SAXException { String eName = sName; // element name if ("".equals(eName)) eName = qName; // not namespaceAware emit("<"+eName+">"); } private void emit(String s) ... With this code, you echoed the element tags, including any attributes defined in the start tag. Note that when the startElement() method is invoked, the simple name ("local name") for elements and attributes could turn out to be the empty string, if namespace processing was not enabled. The code handles that case by using the qualified name whenever the simple name is the empty string. 3.5.8.3 Character Events To finish handling the content events, you need to handle the characters that the parser delivers to your application. Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand, and still be standard-conforming implementation. So, if your application needs to process the characters it sees, it is wise to accumulate the characters in a buffer, and operate on them only when you are sure they have all been found. Add the line highlighted below to define the text buffer: public class Echo01 extends DefaultHandler { StringBuffer textBuffer; public static void main(String argv[]) { ... Then add the code highlighted below to accumulate the characters the parser delivers in the buffer: public void endElement(...) throws SAXException { ... } public void characters(char buf[], int offset, int len) throws SAXException { String s = new String(buf, offset, len); if (textBuffer == null) { textBuffer = new StringBuffer(s); } else { textBuffer.append(s); } } private void emit(String s) ... Next, add this method highlighted below to send the contents of the buffer to the output stream. public void characters(char buf[], int offset, int len) throws SAXException { ... } private void echoText() throws SAXException { if (textBuffer == null) return; String s = ""+textBuffer emit(s); textBuffer = null; } private void emit(String s) ... When this method is called twice in a row (which will happens at times, as we'll see next), the buffer will be null. So in that case, the method simply returns. When the buffer is non-null, however, it's contents are sent to the output stream. Finally, add the code highlighted below to echo the contents of the buffer whenever an element starts or ends: public void startElement(...) throws SAXException { echoText(); String eName = sName; // element name ... } public void endElement(...) throws SAXException { echoText(); String eName = sName; // element name ... } You're done accumulating text when an element ends, of course. So you echo it at that point, which clears the buffer before the next element starts. But you also want to echo the accumulated text when an element starts! That's necessary for document-style data, which can contain XML elements that are intermixed with text. For example, in this document fragment: <para>This paragraph contains <bold>important</bold> ideas.</para> The initial text, "This paragraph contains" is terminated by the start of the <bold> element. The text, "important" is terminated by the end tag, </bold>, and the final text, "ideas.", is terminated by the end tag, </para>. Note: Most of the time, though, the accumulated text will be echoed when an endElement() event occurs. When a startElement() event occurs after that, the buffer will be empty. The first line in the echoText() method checks for that case, and simply returns. 3.5.9 Compiling and Running the Program In the Java SDK, the JAXP libraries are distributed in the directory <JAVA_HOME>/common/lib. To compile the program you created, you'll first need to install the JAXP JAR files in the appropriate location. (The names of the JAR files depend on which version of JAXP you are using, and their location depends of which version of the Java platform you are using. See the Java XML release notes at <JAVA_HOME>/docs/jaxp/ReleaseNotes.html for the latest details.) Note: Since JAXP 1.1 is built into version 1.4 of the Java 2 platform, you can also execute the majority of the JAXP tutorial (SAX, DOM, and XSLT) sections, without doing any special installation of the JAR files. However, to make use of the added features in JAXP -- XML Schema and the XSLTC compiling translator -- you will need to install JAXP 1.2, as described in the release notes. For versions 1.2 and 1.3 of the Java 2 platform, you can execute the following commands to compile and run the program: javac -classpath jaxp-jar-files Echo.java java -cp jaxp-jar-files Echo slideSample.xml Alternatively, you could place the JAR files in the platform extensions directory and use the simpler commands: javac Echo.java java Echo slideSample.xml For version 1.4 of the Java 2 platform, you must identify the JAR files as newer versions of the "endorsed standards" that are built into the Java 2 platform. To do that, put the JAR files in the endorsed standards directory, jre/lib/endorsed. (You copy all of the JAR files, except for jaxp-api.jar. You ignore that one because the JAXP APIs are already built into the 1.4 platform.) You can then compile and run the program with these commands: javac Echo.java java Echo slideSample.xml Note: You could also elect to set the java.endorsed.dirs system property on the command line so that it points to a directory containing the necessary JAR files, using an commandline option like this: -D"java.endorsed.dirs=somePath". slideSample01.xml <?xml version='1.0' encoding='utf-8'?> <!-- A SAMPLE set of slides --> <slideshow title="Sample Slide Show" date="Date of publication" author="Yours Truly" > <!-- TITLE SLIDE --> <slide type="all"> <title>Wake up to WonderWidgets!</title> </slide> <!-- OVERVIEW --> <slide type="all"> <title>Overview</title> <item>Why <em>WonderWidgets</em> are great</item> <item/> <item>Who <em>buys</em> WonderWidgets</item> </slide> </slideshow> Echo01.java /* * @(#)Echo01.java 1.5 99/02/09 * * Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. */ import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class Echo01 extends DefaultHandler { StringBuffer textBuffer; public static void main(String argv[]) { if (argv.length != 1) { System.err.println("Usage: cmd filename"); System.exit(1); } // Use an instance of ourselves as the SAX event handler DefaultHandler handler = new Echo01(); // Use the default (non-validating) parser SAXParserFactory factory = SAXParserFactory.newInstance(); try { // Set up output stream out = new OutputStreamWriter(System.out, "UTF8"); // Parse the input SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv[0]), handler); } catch (Throwable t) { t.printStackTrace(); } System.exit(0); } static private Writer out; //=========================================================== // SAX DocumentHandler methods //=========================================================== public void startDocument() throws SAXException { emit("<?xml version='1.0' encoding='UTF-8'?>"); nl(); } public void endDocument() throws SAXException { try { nl(); out.flush(); } catch (IOException e) { throw new SAXException("I/O error", e); } } public void startElement(String namespaceURI, String sName, // simple name String qName, // qualified name Attributes attrs) throws SAXException { echoText(); String eName = sName; // element name if ("".equals(eName)) eName = qName; // not namespaceAware emit("<"+eName); if (attrs != null) { for (int i = 0; i < attrs.getLength(); i++) { String aName = attrs.getLocalName(i); // Attr name if ("".equals(aName)) aName = attrs.getQName(i); emit(" "); emit(aName+"=\""+attrs.getValue(i)+"\""); } } emit(">"); } public void endElement(String namespaceURI, String sName, // simple name String qName // qualified name ) throws SAXException { echoText(); String eName = sName; // element name if ("".equals(eName)) eName = qName; // not namespaceAware emit("</"+eName+">"); } public void characters(char buf[], int offset, int len) throws SAXException { String s = new String(buf, offset, len); if (textBuffer == null) { textBuffer = new StringBuffer(s); } else { textBuffer.append(s); } } //=========================================================== // Utility Methods ... //=========================================================== // Display text accumulated in the character buffer private void echoText() throws SAXException { if (textBuffer == null) return; String s = ""+textBuffer; emit(s); textBuffer = null; } // Wrap I/O exceptions in SAX exceptions, to // suit handler signature requirements private void emit(String s) throws SAXException { try { out.write(s); out.flush(); } catch (IOException e) { throw new SAXException("I/O error", e); } } // Start a new line private void nl() throws SAXException { String lineEnd = System.getProperty("line.separator"); try { out.write(lineEnd); } catch (IOException e) { throw new SAXException("I/O error", e); } } } 3.6 Section D: Load the data into MSSQL [13] Now we are ready to load the data prepared in the previous section into the tables created according to the schema. All tables should have at least a primary key for efficiency. There are two specific data types, for date and time values MSSQL use datetime variable and money for attributes representing money. Parse we have written in the previous sections will produce a | vertical bar delimited text file which we can import directly into the MS SQL’s Import capability. Our parser will produce .dat files that look like the below example: Auctions.dat 1045769659|*|SPRINGERLE COOKIE BOARD ** NO RESERVE**|*|dosouth|*|14.50|*||*|14.50|*|Dec-08-01 16:23:53|*|Dec-15-01 16:23:53|*|Wood Springerle cookie borad depicting a FISH, flowers & birds. It will imprint 8 designs in all. It |*| is the delimiter to separate colomns, and carriege return is the row delimiter so that MS SQL can map text file colomns to actual colomns in the table in DB. To import auction.dat into the database, open the Enterprise Manager for MSSQL and then select the auction table, right click on it, click Tasks>Import. At that stage it should ask for source database, and chose Text File from the bottom of the drop down menu. Then browse and select the auction.dat file, then you should see the below window. Figure 3.6.1 We didn’t parse text with quoation marks because it is unsafe, so no text qualifier is used, row delimiter is Carriage return and Line Feed which is standard in Windows systems but it is only CR in Unix like systems. Click Next to continue and it should parse the colomns. In the Next screen select |*| as the delimiter and you will successfully import the parse datafiles into relational database. 4 Auctionbase Schema and Data 4.1 Section A: Indexes An important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the DBMS to quickly find all tuples in R matching a given value or range of values for attribute A (useful when evaluating selection or join conditions involving attribute A). An index can be created on any attribute of any relation, or on several attributes combined. Create at least one useful index on each table in your large AuctionBase schema. Run several queries over your large AuctionBase database with the indexes and without the indexes. Try to write queries that are realistic, that are complex enough to take a while to execute, and that can exploit the indexes you chose so you can best experiment with the performance differences. Turn in a transcript showing your commands to create indexes, and showing the relative times of query execution with and without indexes 4.2 Section B: Views 4.2.1 What is a View ? [14] A view can be thought of as either a virtual table or a stored query. The data accessible through a view is not stored in the database as a distinct object. What is stored in the database is a SELECT statement. The result set of the SELECT statement forms the virtual table returned by the view. A user can use this virtual table by referencing the view name in Transact-SQL statements the same way a table is referenced. A view is used to do any or all of these functions: Restrict a user to specific rows in a table. For example, allow an employee to see only the rows recording his or her work in a labor-tracking table. Restrict a user to specific columns. For example, allow employees who do not work in payroll to see the name, office, work phone, and department columns in an employee table, but do not allow them to see any columns with salary information or personal information. Join columns from multiple tables so that they look like a single table. Aggregate information instead of supplying details. For example, present the sum of a column, or the maximum or minimum value from a column. The syntax for a VIEW is: CREATE VIEW view_name AS SELECT columns FROM table WHERE predicates; Whether it is possible to perform INSERT, DELETE, and/or UPDATE statements a view is an interesting question. views meeting the criteria that can be updated, and views not meeting the criteria that cannot be updated. Views in all versions of SQL Server are updatable (can be the target of UPDATE, DELETE, or INSERT statements), as long as the modification affects only one of the base tables referenced by the view, for example: 4.2.2 Views of AuctionBase [13] Here are two simple views we can use on our database; create view openItemSummary AS SELECT item_id, name, seller, current_price, starts, ((ends - curTime) * 24) AS hours_left FROM auctions, time WHERE ends > curTime AND current_price < buy_price; create view num_bids AS SELECT b.item_id, COUNT(b.item_id) AS num_bids FROM auctions a, bids b WHERE a.item_id = b.item_id GROUP BY b.item_id; create view auction_winner AS SELECT * FROM bids b1 WHERE amount >= ALL (SELECT b2.amount FROM bids b2 WHERE b1.item_id = b2.item_id); 5 MSSQL Features 5.1 Section A: Current Time The original auction data that we provided for you in XML, which you translated into relations and loaded into your AuctionBase database represents a single point in time, specifically one second after midnight on January 17th, 2005 ("Jan-17-05 00:00:01"). In the final part of the project, outlined below - we will develop full auction functionality: users will be able to browse items, enter and retrieve bids, create new auctions, run statistics, etc. To fully test our functionality, and to simulate the true operation of an online auction system in which auctions close as time passes, we suggest that you maintain a fictitious "current time" in your database. Add a new one-attribute table to your AuctionBase schema. This table should at all times contain a single row (i.e., a single value) representing the "current time," which can be updated to represent time passing. (It's up to you whether you also want to permit backward time-travel.) Initialize the table by inserting the current time for the initial state of your database: Jan-17-05 00:00:01. 5.2 Section B: Constraints and Triggers [15] [14] 5.2.1 What is a CONSTRAINT ? A constraint is a property assigned to a column or the set of columns in a table that prevents certain types of inconsistent data values from being placed in the column(s). Constraints are used to enforce the data integrity. This ensures the accuracy and reliability of the data in the database. The following categories of the data integrity exist: Entity Integrity Domain Integrity Referential integrity User-Defined Integrity Entity Integrity ensures that there are no duplicate rows in a table. Domain Integrity enforces valid entries for a given column by restricting the type, the format, or the range of possible values. Referential integrity ensures that rows cannot be deleted, which are used by other records (for example, corresponding data values between tables will be vital). User-Defined Integrity enforces some specific business rules that do not fall into entity, domain, or referential integrity categories. Each of these categories of the data integrity can be enforced by the appropriate constraints. Microsoft SQL Server supports the following constraints: PRIMARY KEY UNIQUE FOREIGN KEY CHECK NOT NULL A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row and only one primary key constraint can be created for each table. The primary key constraints are used to enforce entity integrity. A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no duplicate values are entered. The unique key constraints are used to enforce entity integrity as the primary key constraints. A FOREIGN KEY constraint prevents any actions that would destroy link between tables with the corresponding data values. A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value. The foreign key constraints are used to enforce referential integrity. A CHECK constraint is used to limit the values that can be placed in a column. The check constraints are used to enforce domain integrity. A NOT NULL constraint enforces that the column will not accept null values. The not null constraints are used to enforce domain integrity, as the check constraints. You can create constraints when the table is created, as part of the table definition by using the CREATE TABLE statement. Example: CREATE TABLE cust_sample ( cust_id int PRIMARY KEY, cust_name char(50), cust_address char(50), cust_credit_limit money, CONSTRAINT chk_id CHECK (cust_id BETWEEN 0 and 10000 ) 5.2.2 What is a TRIGGER? [15] A trigger is a special type of stored procedure that automatically takes effect when the data in a specified table is modified. A trigger is invoked in response to an INSERT, UPDATE, or DELETE statement. A trigger can query other tables and can include complex Transact-SQL statements. The trigger and the statement that fires it are treated as a single transaction, which can be rolled back from within the trigger. If a severe error is detected (for example, insufficient disk space), the entire transaction automatically rolls back. Triggers are useful in these ways: Triggers can cascade changes through related tables in the database; however, these changes can be executed more efficiently using cascading referential integrity constraints. Triggers can enforce restrictions that are more complex than those defined with CHECK constraints. Unlike CHECK constraints, triggers can reference columns in other tables. For example, a trigger can use a SELECT from another table to compare to the inserted or updated data and to perform additional actions, such as modify the data or display a user-defined error message. Triggers can also evaluate the state of a table before and after a data modification and take action(s) based on that difference. Multiple triggers of the same type (INSERT, UPDATE, or DELETE) on a table allow multiple, different actions to take place in response to the same modification statement. 5.2.2.1.1 Triggers Compared to Constraints [15] Constraints and triggers each have benefits that make them useful in special situations. The primary benefit of triggers is that they can contain complex processing logic that uses Transact-SQL code. Therefore, triggers can support all of the functionality of constraints; however, triggers are not always the best method for a given feature. Entity integrity should always be enforced at the lowest level by indexes that are part of PRIMARY KEY and UNIQUE constraints or are created independently of constraints. Domain integrity should be enforced through CHECK constraints, and referential integrity (RI) should be enforced through FOREIGN KEY constraints, assuming their features meet the functional needs of the application. Triggers are most useful when the features supported by constraints cannot meet the functional needs of the application. For example: FOREIGN KEY constraints can validate a column value only with an exact match to a value in another column, unless the REFERENCES clause defines a cascading referential action. A CHECK constraint can validate a column value only against a logical expression or another column in the same table. If your application requires that a column value be validated against a column in another table, you must use a trigger. Constraints can communicate about errors only through standardized system error messages. If your application requires (or can benefit from) customized messages and more complex error handling, you must use a trigger. Triggers can cascade changes through related tables in the database; however, these changes can be executed more efficiently through cascading referential integrity constraints. Triggers can disallow or roll back changes that violate referential integrity, thereby canceling the attempted data modification. Such a trigger might go into effect when you change a foreign key and the new value does not match its primary key. For example, you can create an insert trigger on titleauthor.title_id that rolls back an insert if the new value does not match some value in titles.title_id. However, FOREIGN KEY constraints are usually used for this purpose. If constraints exist on the trigger table, they are checked after the INSTEAD OF trigger execution but prior to the AFTER trigger execution. If the constraints are violated, the INSTEAD OF trigger actions are rolled back and the AFTER trigger is not executed. 5.2.3 CONSTRAINTs & TRIGGERs of AuctionBase DB [13] If the data in your AuctionBase system at a given point in time represents a correct state of the real world, a number of constraints are expected to hold. Here are a few possible examples, some of which depend on a particular schema: In every auction the number-of-bids field (if included) corresponds to the actual number of bids. In every auction and every bid the quantity (if present) must be greater than 0. The item-id in every bid corresponds to an actual item. No auction may have a bid before its start time or after its end time. There are no bids after the current time. The quantity in a bid must not exceed the quantity available. A user may not bid on an item he or she is offering. (This one is a judgment call.) All sellers and bidders must exist as users. (Whether this one makes sense depends on your relational schema.) 5.2.3.1 CONSTRAINTs of AuctionBase DB The following constraints enforce referential integrity of foreign keys; that is the existence of foreign keys on other tables. alter table bids ADD CONSTRAINT itemIDRef FOREIGN KEY (item_id) REFERENCES auctions(item_id); alter table bids ADD CONSTRAINT bidderIDRef FOREIGN KEY (bidder) REFERENCES users(user_id); alter table auctions ADD CONSTRAINT sellerIDRef FOREIGN KEY (seller) REFERENCES users(user_id); alter table itemInCategory ADD CONSTRAINT itemIDRef FOREIGN KEY (item_id) REFERENCES auctions(item_id); alter table itemInCategory ADD CONSTRAINT catIDRef FOREIGN KEY (cat_id) REFERENCES categories(cat_id); 5.2.3.2 TRIGGERs of AuctionBase DB bidBetweenStartAndEnd Trigger enforces that bid time is always between start and end of an auction, it checks for that info on every update and insert in the relevant tables. CREATE TRIGGER bidsBetweenStartAndEnd AFTER INSERT OR UPDATE OF time ON bids REFERENCING NEW AS newRow FOR EACH ROW DECLARE startTime auctions.starts%TYPE; endTime auctions.ends%TYPE; BEGIN select starts, ends INTO startTime, endTime FROM auctions WHERE item_id = newRow.item_id; IF(newRow.time < startTime OR newRow.time > endTime) THEN RAISEERROR('Bid time must be between the starting and ending time of the item being bid upon'); END IF; END; GO; cannotBidOnOwnItem Trigger prevents the auction owner to bid on his/her own item, on every insert and update on table on bids, the check must hold otherwise the trigger raises an error. CREATE TRIGGER cannotBidOnOwnItem AFTER INSERT OR UPDATE OF bidder ON bids REFERENCING NEW AS newRow FOR EACH ROW DECLARE seller auctions.seller%TYPE; BEGIN select a.seller INTO seller FROM auctions a WHERE item_id = newRow.item_id; IF(:newRow.bidder = seller) THEN RAISEERROR('A user may not bid upon his or her own items'); END IF; END; GO; setCurrentPriceonNewBid trigger sets the current price value of an auction with the new bidding price, after every insert into the bid, meaning that a new bid is made. CREATE TRIGGER setCurrentPriceOnNewBid AFTER INSERT ON bids REFERENCING NEW AS newRow FOR EACH ROW BEGIN UPDATE auctions SET current_price = newRow.amount WHERE item_id = newRow.item_id; END; 6 AuctionBase Web Site 6.1 Functionality The functionality of our AuctionBase system is quite flexible and open-ended. However, we would want to implement some basic capabilities: Ability to manually change the "current time." Automatic auction closing. An auction is "open" after its start time and "closed" when its end time is past or its buy price is reached for its entire quantity. Your design may be such that an auction closes implicitly with high enough bids or a time update, or you may have chosen to represent open/closed status with an explicit data field. Ability for new auction users to provide their information to be entered into the database (name, initial rating if not assigned automatically, optional location and country), if relevant in your schema. Ability to browse auctions of interest based on a variety of input choices. Possible parameters include open/closed status, category, date, price, substring match in description, etc. Use your imagination. Ability to see the winner(s) of a closed auction. Ability for auction users to enter bids on open auctions. Ability for auction users to add new items up for auction. Ability to retrieve auction or bidding history for a given user, including current auctions or bids. Ability to run various statistics over the auctions. Possibilities include average number of bids per user, highest selling price over initial bid, average time to reach buy-price, etc. Use your imagination. 6.2 Web Interface [13] Implementation of the web interface is totally left to the student but a clean and user friendly interface is expected. Below are some template pages; Figure 6.2.1 Figure 6.2.2 Figure 6.2.3 6.3 System testing We should debug our queries directly on MSSQL before hooking them into your Web interface. JDBC is not particularly friendly when it comes to "runaway" queries, so you will benefit yourself and the rest of the class by using Query Analyzer first to ensure that your queries are working properly and are finishing in a reasonable amount of time. Once you are certain your queries are working correctly, incorporate them into your Web interface. Even with prior debugging, it is prudent to set a timeout mechanism in JDBC for all of your queries. Use setQueryTimeout([time in seconds]) on each of your statement objects, for example: Statement stmt = conn.createStatement( ); stmt.setQueryTimeout(180); ... 7 Conclusion In this project, we have captured some practical issues of building a simple web application with its design, technical background, setting up, installation and fine-tuning of its development environment and finally implementing it. There is infinite detail in even a small project like this because there is usually no limit in optimizing the performance, writing a better code, building a good and user-friendly interface and a secure system, a small scaled software engineering challenge. This manual is written as a help document for a student willing to do a database project as part of his introduction to database course. In the references and resources section, a decent amount of book refrences and world wide web links are provided so that the student can reach to more detail in the area that needs clarification, deepening of the knowledge. I hope that tutorial will of use to many database students. 8 References & Resources XML References & Resources o Books Inside XML, Steven Holzner, 2001, New Riders Publishing XML Schema, Eric van der Vlist, The W3C's Object-Oriented Descriptions for XML, 2002, O’Reilly XML in a Nutshell, Elliotte Rusty Harold, W. Scott Means , 3rd Edition, 2004, O’Reilly o World Wide Web W3C Application Domain: http://www.w3.org/XML/ A Technical Introduction: http://www.xml.com/pub/a/98/10/guide0.html [12] XML from the Inside Out: http://www.xml.com Well-formedness Checker: http://www.cogsci.ed.ac.uk/~richard/xml-check.html W3C Markup Validator: http://validator.w3.org/ Microsoft’s XML Perspective: http://msdn.microsoft.com/xml/ Oracle XML Technology Center: http://www.oracle.com/technology/tech/xml/index.html [1] SAX (Simple Api for XML): http://www.saxproject.org/ [11] Declaring Elements and Attributes in an XML DTD : http://www.rpbourret.com/xml/xmldtd.htm Java References & Resources o Books Java Servlet & JSP Cookbook, Bruce W. Perry 1st Edition January 2004, O’Reilly Java Servlet Programming, 2nd Edition, Jason Hunter, O’Reilly Tomcat: The Definitive Guide, Jason Brittain, Ian F. Darwin, O’Reilly Java: How To Program 5/E, 2004, Deitel & Deitel o World Wide Web [10] Sun Microsystems: http://java.sun.com Everything about Tomcat, Apache Jakarta Project: http://jakarta.apache.org/ [2] Tomcat Configuration for Windows XP & 2000: http://www.coreservlets.com/Apache-Tomcat-Tutorial/ [3] A Servlet and JSP Tutorial: http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/ServletTutorial-First-Servlets.html J2SE v1.4.2 API Specification: http://java.sun.com/j2se/1.4.2/docs/api/index.html Apache Tomcat 5.0 Documentation: http://jakarta.apache.org/tomcat/tomcat-5.0-doc/ JDBC Microsoft SQL Driver for Windows XP/2000: http://www.microsoft.com/downloads/details.aspx?FamilyID=9f18 74b6-f8e1-4bd6-947c-0fc5bf05bf71&displaylang=en [4] JDBC Driver for Microsoft SQL Server Installation How-to: http://www.akadia.com/services/sqlsrv_jdbc.html [5] A Complete JDBC Example: http://www.eas.asu.edu/~cse494db/IonJDBC/JDBCExample.html Sun’s JDBC Course on Web: http://java.sun.com/developer/onlineTraining/Database/JDBCShort Course/jdbc/jdbc.html [6] Java Developers Almanac: JSP & Servlet Examples: http://javaalmanac.com/egs/javax.servlet.jsp/pkg.html JSP & Servlet Tutorials: http://www.coreservlets.com Database Design o Books An Introduction to Database Systems, C.J.Date 7/E, AddisonWesley, 2000 o World Wide Web Database Modelin Using UML: http://www.sparxsystems.com.au/uml_topics/uml_datamodel/uml_ datamodel.htm Data Warehousing – SQL for Nerds: http://philip.greenspun.com/sql/data-warehousing.html SQL & Microsoft SQL Server References & Resources o Books SQL Bible, Alex Kriegel, Boris M. Trukhnov, 2002, Wiley SQL Queries for Mere Mortals, A Hands-On Guide to Data Manipulation in SQL, Michael J. Hernandez, John L. Viescas, 2000, Addison-Wesley SQL - The Complete Reference, Paul N. Weinberg, James R. Groff, 2002,McGraw-Hill Inside Microsoft SQL Server 2000, Kalen Delaney, 2000, Microsoft Press o World Wide Web SQL for Web Nerds: http://philip.greenspun.com/sql/introduction.html SQL for Web Nerds – Queries: http://philip.greenspun.com/sql/queries.html SQL for Web Nerds – Complex Queries: http://philip.greenspun.com/sql/complex-queries.html [7] Microsoft Sql Server Programming Guide: http://www.informit.com/guides/content.asp?g=sqlserver&seqNum =46 [15] MS SQL Constraints: http://www.mssqlcity.com/Articles/General/using_constraints.htm [8] SQL – Web Nerds Triggers & Constraints: http://philip.greenspun.com/sql/triggers.html [14] Microsoft SQL Home: www.microsoft.com/sql/ Sample Parser, DB Code and Help from Stanford Database Group o [13] Josh Sandberg [email protected] o [9] Stanford CS 145 Page: http://www.stanford.edu/class/cs145/