Download Implementation of One Stop Search by XSLT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Implementation of One Stop
Search by XSLT
By Dave Low
University of Hong Kong
9-Dec-2003
Agenda
• Flow of One Stop Search
• Reason to use Extensible Stylesheet Language
Transformation (XSLT)
• Difficulties on implementation of One Stop
Search by XSLT
• Our solution
• Our implementation
• Summary
Flow of One Stop Search
1. Capture the search keyword
2. Issue the search to different search engines
3. Collect the result and click on next button until
we got all the records
4. Compile the search results from different
search engines
5. Present the result to the user
Flow of One Stop Search
One Stop
Search
ProQuest
Science
Direct
Capture Keyword
Search and next
Search and next
Search and next
Compile Result
Present Result
Kluwer
Online
Reason to use XSL
• Simple
– XSL is plain text
• Multiplatform
– Can run on any machine with XSLT Engine
• Easy to maintain
– When the output layout of target search engine
change
• Just change the content of XSL file
• No recompilation is needed
Two main problems when using XSL
1. XSLT engine requires well formatted XML
files as input
–
–
Web based search engine output in HTML only
HTML is not well formatted XML
•
•
HTML allows open tag only for some tags
E.g. <br>
Solution
1. Use HTML tidy (http://tidy.sourceforge.net/) to
convert HTML to well-format XML
–
–
–
“A HTML syntax checker and pretty printer. It can be used
as a tool for cleaning up malformed and faulty HTML. In
addition, it provides a DOM interface to the document that
is being processed, which effectively makes you able to use
it as a DOM parser for real-world HTML”
It is open source
It has many implementations such as Java, Perl and Python
Solution
• Sample code in Java
StringReader strReader = new StringReader(html);
Tidy tidy = new Tidy();
return tidy.parseDOM(strReader, null);
• HTML => XML
Two main problems when using XSL
2. There is no browse function in XSL
–
–
In one-stop search, we need to click the next
button several times to collect all the result
We need to tell the program to find the next button
and then issue a browse request based on the URL
of the next button
Solution
2. Add browse function to XSL by XSL
extension
–
–
–
XSLT allows two kinds of extension, extension
elements and extension functions
Type of extension depends on XSLT
implementations
Detail can be found
http://www.w3.org/TR/xslt#extension
Solution
• Our implementation
–
–
–
–
Select a java based XSLT Engine
Use java to write the function
Compile it into classes and then jar
Include the jar file into the classpath of the XSLT
Engine
– Run it
Sample code on XSL extension
Define Class
<?xml version="1.0" encoding="UTF-8"?>
to be used
<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:HKUL="http://www.lib.hku.hk/java/hkul.apps.web.Browser"
exclude-result-prefixes="HKUL">
<xsl:template match=“/">
Create
<xsl:variable name="url">http://www.lib.hku.hk/</xsl:variable>
it
<xsl:variable name="browser" select="HKUL:new($url)" />
<xsl:variable name="content" select="HKUL:browse($browser,$url)" />
<xsl:apply-templates select="$content/html/*" />
Call the browse
</xsl:template>
function
Our Implementation
Browse
Next
Tidy
Parse
Result
Our Implementation
• Both client and server programs are written by
Java
• Client and server program communicated by
HTTP
• Making use of wireless network
Our Implementation (Client side)
• Palm OS
– Sun’s Java 2 Platform, Micro Edition (J2ME)
http://java.sun.com/j2me/
– Mobile Information Device Profile (MIDP)
http://java.sun.com/products/midp
Our Implementation (Server side)
• Application Server (Running on Sun Solaris with
JDK1.4)
– Jakarta Tomcat (http://jakarta.apache.org/tomcat)
– Jakarta Struts Framework (http://jakarta.apache.org/struts)
• Xerces XSLT Engine (http://xml.apache.org/#xerces)
• MySQL database (http://www.mysql.com)
Summary
• Implement the one stop search by XSLT
– Simple
– Multiplatform
– Easy to maintain
• Two problems
– HTML is not well formatted XML
– No browse function in XSL
Summary
• Solutions
– HTML Tidy
– XSL Extension
• Implementation
–
–
–
–
J2ME
Jakarta Tomcat + Struts
Xerces
MySQL
Questions?
• Thank you