Download Test On Line: reusing SAS code in WEB applications

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Test On Line: reusing SAS code in WEB applications
Author: Carlo Ramella – TXT e-solutions
Chapter 1:
Abstract
The Proway System is a powerful complete system for Process and Testing Data Analysis in IC industry
environment.
The system, based on a classical client-server architecture, is designed to access several databases at the
same time, placed in different sites all around the world, moving and computing very large sets of data.
A complete set of statistical analysis (tools) have been written using SAS procedures (histogram, box
plots, scatter plots, bar charts, statistical reporting, and so on). The increasing number of users (more
than 3000, on four continents), with different levels of skill in different application areas, led to design a
new generation of tools dedicated to fast, light and very specific analysis.
The new architecture includes the classic client-server tools for "heavy" analysis (SAS datasets of 1
Gigabyte are not so unusual in these cases) and a new set of WEB tools for quick reporting.
The new WEB architecture is heavily based on components built with SAS modules designed for the
classic Proway System, inserted in a standard pure-JEEE architecture. This guaranteed following
benefits:
1) one code - many applications: support and maintenance are easier, new features are
automatically shared by all applications.
2) portability (only SAS and Java code are used)
3) flexibility
Chapter 2:
The Web Data Analysis: introduction
One of the possible ways to classify the group of existing Data Analysis Tools is in terms of response
time.
In a world where data are stored in huge relational databases or in very large file systems, the response
time is more or less directly related to the size of the data sample.
This kind of splitting automatically identifies two analysis areas:
“look and run” applications, where the aim is to be able to quickly select and display an output on a
small amount of data.
“number crunching” applications, producing summary statistics on a large amount of data. Typically
this kind of application involves a very detailed Data Selection and a massive Data Extraction and
elaboration.
The first class of applications fits very well the Web paradigm, with short response times, easy and fast
GUI and “browser suitable” HTML outputs. These applications will be collected in what we’ll call
Data Analysis Components (DAC).
The Proway System, using the classical SAS System 2-tier architecture, currently covers the second
group of applications:
DBMS, Legacy &
other Resource
Managers
GUI &
Application
SQL,
File I/O
Tier 1
Tier 2
This approach has been valid for years, but these days the Web revolution is providing a set of new
generation technologies that are deeply impacting on this architecture. This kind of technologies could
be applied to the classical Proway System, maintaining the kernel architecture and building a
completely new environment based on the Web.
In both cases, the general schema is based on customisations of the classical 3-tier architecture:
DBMS, Legacy
& other
Resource
Managers
Application
Services
Browser
RPC,
ORB,
MOM,
HTTP
Tier 1
SQL,
File I/O
Tier 2
Tier 3
This architecture meets the requirements of large-scale Intranet client/server applications. These
systems are easier to manage and deploy on the network because most of the code runs on the servers.
Also, 3-tier applications minimise network interchanges by creating abstract levels of service. Instead of
interacting with the database directly, the client calls business logic on the server. The business logic
then accesses the database on the client’s behalf.
3-tier substitutes a few server calls for many SQL queries, so it performs much better than 2-tier. It also
provides better security by not exposing the database schema to the client and by enabling more finegrained authorisation on the server.
The middle tier (“middleware”) provides a platform for running server-side components, balancing their
loads, managing the integrity of transactions, maintaining high-availability, and securing the
environment. It must also provide pipes that allow server components to communicate using a variety of
protocols.
The technology that we propose to implement this layer is based on servlets and Java Server Pages.
2.1.1
Servlets and Java Server Pages
A servlet is a small piece of Java code that a Web server loads to handle client requests. Unlike a CGI
application, the Servlet code stays resident in memory when the request terminates. In addition, a
Servlet can connect to a database when it is initialised and then retain its connection across requests. A
Servlet can also pass a client request to another Servlet. This is called Servlet chaining. All these
features make Servlets an excellent workaround for many of CGI’s limitations.
The Servlet runs inside a Java Virtual Machine (JVM) on the server, processing data (which according
to the HTML terminology are called parameters) sent by a client web browser via HTTP protocol. The
Servlet returns to the user, via HTTP protocol too, a bytes stream defining an HTML page. This page is
said to be dynamic as it contains data defined at run time, based on the parameters sent by the client.
In order to build a Servlet it is necessary to implement either a JAVA class or a Java Server Page, a file
that allows defining, in addition to the dynamic content defined by JAVA code, the static content and
the page layout by mean of the HTML language. Resorting to the JSP technologies even unskilled
developers can easily develop dynamic pages.
The servlet engine is in charge of executing the Servlets and providing the framework of the HTTP
connection to the clients. The JAVA class implementing a servlet must provide specific methods that
are invoked by the Servlet engine to accomplish the client requests. On behalf of a JSP request by the
client, the Servlet engine creates an actual JAVA class that provides the content defined within the JSP
file itself. This class compilation on demand is referred to as JSP translation.
2.1.2
Servlet/JSP engine
The Servlet engine, on behalf of a client request, generates one dedicated thread that executes the
JAVA class implementing the Servlet. This class, after being loaded into memory, is kept there until the
available memory is not enough to load other Servlets (or other processes); if necessary, the less
recently requested Servlets are unloaded to free space for the new ones (this mechanism is similar to
that of LRU caches), provided there are no pending requests for the Servlets to be replaced.
The Servlets are unloaded very seldom, so approximately we can say that a Servlet is loaded only once
to sere all the subsequent client requests. The overhead due to the code loading is minimised.
Furthermore it is only when the Servlet code is loaded that the Servlet initialisation method is executed;
it means that most of the data structures used by the Servlets are initialised at load time, eliminating
initialisation overhead during the service time (i.e.: when the clients requests are served).
2.1.3
Multithreaded execution
The multithreaded Servlet engine generates a dedicated thread for each client request. It means that
concurrent requests involve concurrent threads execution. Some data structures used by the Servlet,
such as database connections, streams or parameter lists acquired by initialisation files, are initialised
only once, on behalf of the first request. The incoming requests will then handle already initialised data
structures with significant reduction in service time. All the servlets threads share this data structures, so
particular attention in the development process must be focused on managing concurrent access to this
data structures.
Chapter 3:
3.1
Data Analysis Components (DAC)
Overview
Web Browser
XML config
HTML
Document
XML config
Database
Servlet Engine
Data
Manager
Application
Manager
Raw Output
Data Image
Application
Engine
This architecture is designed to satisfy following main constraints:
¾
¾
¾
Very short response time
Simple and quick Data Selection
Low flexibility, in terms of output customisation
The aim is to build a collection of components in order to improve the code reuse and portability.
All the configurations, including metadata, will be implemented via XML documents and manipulated
with the Document Object Model (DOM).
From the architectural point of view, the DAC is composed of two main modules:
1. the Servlet Engine, containing two servlets managing Data Access and Applications;
2. the Application Engine, containing the business rules to produce the output. This Engine is
completely written using existing Proway SAS/SCL code.
3.2
Data Manager
XML config
- field names
- field types
- field relationships
- field desc (O/M)
Selection Data
Base
JDBC
Driver
DB
Connection
Dispatcher
XML config
- # of connections
- DB parameters
Data Selection
GUI
Application
Run
Data Sources
STDF
Files
DB
XML Selection
Criterion
XML config
- STDF reader settings
- table / field names
- join description
- path description
Extraction
Engine
Data Image
This Servlet manages the Data Selection and the Data Extraction tasks, including database connection
management.
3.2.1
Data Selection
The HTML form to navigate the Selection Database and to prepare the Selection Criterion is contained
into a Java Server Page (JSP), a dynamic content page which presents to the user Lists Of Values
(LOV) acquired at run time from a remote database (the user can also directly edit the entities
identification codes, avoiding querying the database).
More in detail, the servlet (servlet # 1) connects to the database by means of a JDBC driver, avoiding
the need of installing database access components on the client.
Main features are:
¾ the connections to the database are opened during the initialisation process only, and these are
employed to satisfy all the clients requests. The number of connections will be set into a
configuration XML file and a dedicated DB Connection Dispatcher will manage the incoming
requests.
¾ The JDBC driver connects directly to the database instance listener, supporting the pre-fetching of
the record-set returned by the DBMS and consequently minimising the round trips between client
and server.
For the above stated considerations the system through put, at least for what concerns database access,
is rather high.
The Data Selection Frame is fully XML driven. The name of the database tables and fields involved,
their type, the relationships between them and other information are dynamically read by the servlet in
the XML configuration file.
The output of the Data Selection will be an XML Selection Criterion, a data structure containing all
information needed to extract data from Data Sources (databases, files).
The extraction engine will use the selection criterion in input to access the Data Source and to build a
data structure ready to be elaborated. An SQL script containing the query will be automatically created
Once again, necessary information about the database structure or the file reader configuration is stored
into an XML configuration file.
3.2.2
Application Manager
Data Image
Dispatcher
XML config
- Listener configuration
- Directory structure
Process
Monitor
Application
Engine 1
Application
Engine 2
Application
Engine n
ACK
Raw
Output
Raw
Output
Output
Formatter
Output
Formatter
Output
Formatter
HTML
Document
HTML
Document
HTML
Document
Raw
Output
This servlet will manage the output generation, including the HTML page preparation.
The Application Engine is written in SAS (reusing existing Proway code), the architecture is based on a
pool of “listeners”, a set of daemons waiting for requests, driven by XML configuration files.
The communication between the servlet and the SAS process is implemented by means of TCP/IP
listeners. The servlet specifically sends the activation string to the listener associated to the SAS
process by means of a Dispatcher that balances the load on the different queues. The selected engine
read and processes the Data Image, produces the chart and puts it into a JPEG file, to finally return an
acknowledgement to the servlet. This acknowledgement consists of a string, referred to as
acknowledgement string, which is sent to the listener opened by the servlet. The stated string contains a
progressive number that univocally identifies the JPEG file mentioned before.
The Servlet then returns to the web browser, via HTTP protocol, an HTML file referencing this JPEG
file (this reference lies within an IMG tag).
Chapter 4:
An example: Bin Distribution Map Gallery
This application is a good example of the new architecture potentialities.
The aim is to build a wafer map showing the distribution of testing failures classes starting from a
binary file containing testing results.
Using the Data Selection the user is able to browse a database indexing all source files. The search can
be made at single file level (that means single wafer) or at an higher aggregation level (for example the
lot, that is normally composed of twenty-five wafers).
Once the files have been selected, the file list is passed to the SAS engine via the socket. The Extraction
Engine reads binary files, producing datasets that will be used by the Application Engine to produce the
output (in jpeg format).
The servlet will take these outputs, generating an HTML page that will be sent back to the browser.
The final result is the following: