Download Test On Line: reusing SAS code in WEB applications

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella – TXT e-solutions Chapter 1: Abstract The Proway System is a powerful complete system for Process and Testing Data Analysis in IC industry environment. The system, based on a classical client-server architecture, is designed to access several databases at the same time, placed in different sites all around the world, moving and computing very large sets of data. A complete set of statistical analysis (tools) have been written using SAS procedures (histogram, box plots, scatter plots, bar charts, statistical reporting, and so on). The increasing number of users (more than 3000, on four continents), with different levels of skill in different application areas, led to design a new generation of tools dedicated to fast, light and very specific analysis. The new architecture includes the classic client-server tools for "heavy" analysis (SAS datasets of 1 Gigabyte are not so unusual in these cases) and a new set of WEB tools for quick reporting. The new WEB architecture is heavily based on components built with SAS modules designed for the classic Proway System, inserted in a standard pure-JEEE architecture. This guaranteed following benefits: 1) one code - many applications: support and maintenance are easier, new features are automatically shared by all applications. 2) portability (only SAS and Java code are used) 3) flexibility Chapter 2: The Web Data Analysis: introduction One of the possible ways to classify the group of existing Data Analysis Tools is in terms of response time. In a world where data are stored in huge relational databases or in very large file systems, the response time is more or less directly related to the size of the data sample. This kind of splitting automatically identifies two analysis areas: “look and run” applications, where the aim is to be able to quickly select and display an output on a small amount of data. “number crunching” applications, producing summary statistics on a large amount of data. Typically this kind of application involves a very detailed Data Selection and a massive Data Extraction and elaboration. The first class of applications fits very well the Web paradigm, with short response times, easy and fast GUI and “browser suitable” HTML outputs. These applications will be collected in what we’ll call Data Analysis Components (DAC). The Proway System, using the classical SAS System 2-tier architecture, currently covers the second group of applications: DBMS, Legacy & other Resource Managers GUI & Application SQL, File I/O Tier 1 Tier 2 This approach has been valid for years, but these days the Web revolution is providing a set of new generation technologies that are deeply impacting on this architecture. This kind of technologies could be applied to the classical Proway System, maintaining the kernel architecture and building a completely new environment based on the Web. In both cases, the general schema is based on customisations of the classical 3-tier architecture: DBMS, Legacy & other Resource Managers Application Services Browser RPC, ORB, MOM, HTTP Tier 1 SQL, File I/O Tier 2 Tier 3 This architecture meets the requirements of large-scale Intranet client/server applications. These systems are easier to manage and deploy on the network because most of the code runs on the servers. Also, 3-tier applications minimise network interchanges by creating abstract levels of service. Instead of interacting with the database directly, the client calls business logic on the server. The business logic then accesses the database on the client’s behalf. 3-tier substitutes a few server calls for many SQL queries, so it performs much better than 2-tier. It also provides better security by not exposing the database schema to the client and by enabling more finegrained authorisation on the server. The middle tier (“middleware”) provides a platform for running server-side components, balancing their loads, managing the integrity of transactions, maintaining high-availability, and securing the environment. It must also provide pipes that allow server components to communicate using a variety of protocols. The technology that we propose to implement this layer is based on servlets and Java Server Pages. 2.1.1 Servlets and Java Server Pages A servlet is a small piece of Java code that a Web server loads to handle client requests. Unlike a CGI application, the Servlet code stays resident in memory when the request terminates. In addition, a Servlet can connect to a database when it is initialised and then retain its connection across requests. A Servlet can also pass a client request to another Servlet. This is called Servlet chaining. All these features make Servlets an excellent workaround for many of CGI’s limitations. The Servlet runs inside a Java Virtual Machine (JVM) on the server, processing data (which according to the HTML terminology are called parameters) sent by a client web browser via HTTP protocol. The Servlet returns to the user, via HTTP protocol too, a bytes stream defining an HTML page. This page is said to be dynamic as it contains data defined at run time, based on the parameters sent by the client. In order to build a Servlet it is necessary to implement either a JAVA class or a Java Server Page, a file that allows defining, in addition to the dynamic content defined by JAVA code, the static content and the page layout by mean of the HTML language. Resorting to the JSP technologies even unskilled developers can easily develop dynamic pages. The servlet engine is in charge of executing the Servlets and providing the framework of the HTTP connection to the clients. The JAVA class implementing a servlet must provide specific methods that are invoked by the Servlet engine to accomplish the client requests. On behalf of a JSP request by the client, the Servlet engine creates an actual JAVA class that provides the content defined within the JSP file itself. This class compilation on demand is referred to as JSP translation. 2.1.2 Servlet/JSP engine The Servlet engine, on behalf of a client request, generates one dedicated thread that executes the JAVA class implementing the Servlet. This class, after being loaded into memory, is kept there until the available memory is not enough to load other Servlets (or other processes); if necessary, the less recently requested Servlets are unloaded to free space for the new ones (this mechanism is similar to that of LRU caches), provided there are no pending requests for the Servlets to be replaced. The Servlets are unloaded very seldom, so approximately we can say that a Servlet is loaded only once to sere all the subsequent client requests. The overhead due to the code loading is minimised. Furthermore it is only when the Servlet code is loaded that the Servlet initialisation method is executed; it means that most of the data structures used by the Servlets are initialised at load time, eliminating initialisation overhead during the service time (i.e.: when the clients requests are served). 2.1.3 Multithreaded execution The multithreaded Servlet engine generates a dedicated thread for each client request. It means that concurrent requests involve concurrent threads execution. Some data structures used by the Servlet, such as database connections, streams or parameter lists acquired by initialisation files, are initialised only once, on behalf of the first request. The incoming requests will then handle already initialised data structures with significant reduction in service time. All the servlets threads share this data structures, so particular attention in the development process must be focused on managing concurrent access to this data structures. Chapter 3: 3.1 Data Analysis Components (DAC) Overview Web Browser XML config HTML Document XML config Database Servlet Engine Data Manager Application Manager Raw Output Data Image Application Engine This architecture is designed to satisfy following main constraints: ¾ ¾ ¾ Very short response time Simple and quick Data Selection Low flexibility, in terms of output customisation The aim is to build a collection of components in order to improve the code reuse and portability. All the configurations, including metadata, will be implemented via XML documents and manipulated with the Document Object Model (DOM). From the architectural point of view, the DAC is composed of two main modules: 1. the Servlet Engine, containing two servlets managing Data Access and Applications; 2. the Application Engine, containing the business rules to produce the output. This Engine is completely written using existing Proway SAS/SCL code. 3.2 Data Manager XML config - field names - field types - field relationships - field desc (O/M) Selection Data Base JDBC Driver DB Connection Dispatcher XML config - # of connections - DB parameters Data Selection GUI Application Run Data Sources STDF Files DB XML Selection Criterion XML config - STDF reader settings - table / field names - join description - path description Extraction Engine Data Image This Servlet manages the Data Selection and the Data Extraction tasks, including database connection management. 3.2.1 Data Selection The HTML form to navigate the Selection Database and to prepare the Selection Criterion is contained into a Java Server Page (JSP), a dynamic content page which presents to the user Lists Of Values (LOV) acquired at run time from a remote database (the user can also directly edit the entities identification codes, avoiding querying the database). More in detail, the servlet (servlet # 1) connects to the database by means of a JDBC driver, avoiding the need of installing database access components on the client. Main features are: ¾ the connections to the database are opened during the initialisation process only, and these are employed to satisfy all the clients requests. The number of connections will be set into a configuration XML file and a dedicated DB Connection Dispatcher will manage the incoming requests. ¾ The JDBC driver connects directly to the database instance listener, supporting the pre-fetching of the record-set returned by the DBMS and consequently minimising the round trips between client and server. For the above stated considerations the system through put, at least for what concerns database access, is rather high. The Data Selection Frame is fully XML driven. The name of the database tables and fields involved, their type, the relationships between them and other information are dynamically read by the servlet in the XML configuration file. The output of the Data Selection will be an XML Selection Criterion, a data structure containing all information needed to extract data from Data Sources (databases, files). The extraction engine will use the selection criterion in input to access the Data Source and to build a data structure ready to be elaborated. An SQL script containing the query will be automatically created Once again, necessary information about the database structure or the file reader configuration is stored into an XML configuration file. 3.2.2 Application Manager Data Image Dispatcher XML config - Listener configuration - Directory structure Process Monitor Application Engine 1 Application Engine 2 Application Engine n ACK Raw Output Raw Output Output Formatter Output Formatter Output Formatter HTML Document HTML Document HTML Document Raw Output This servlet will manage the output generation, including the HTML page preparation. The Application Engine is written in SAS (reusing existing Proway code), the architecture is based on a pool of “listeners”, a set of daemons waiting for requests, driven by XML configuration files. The communication between the servlet and the SAS process is implemented by means of TCP/IP listeners. The servlet specifically sends the activation string to the listener associated to the SAS process by means of a Dispatcher that balances the load on the different queues. The selected engine read and processes the Data Image, produces the chart and puts it into a JPEG file, to finally return an acknowledgement to the servlet. This acknowledgement consists of a string, referred to as acknowledgement string, which is sent to the listener opened by the servlet. The stated string contains a progressive number that univocally identifies the JPEG file mentioned before. The Servlet then returns to the web browser, via HTTP protocol, an HTML file referencing this JPEG file (this reference lies within an IMG tag). Chapter 4: An example: Bin Distribution Map Gallery This application is a good example of the new architecture potentialities. The aim is to build a wafer map showing the distribution of testing failures classes starting from a binary file containing testing results. Using the Data Selection the user is able to browse a database indexing all source files. The search can be made at single file level (that means single wafer) or at an higher aggregation level (for example the lot, that is normally composed of twenty-five wafers). Once the files have been selected, the file list is passed to the SAS engine via the socket. The Extraction Engine reads binary files, producing datasets that will be used by the Application Engine to produce the output (in jpeg format). The servlet will take these outputs, generating an HTML page that will be sent back to the browser. The final result is the following:

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Test On Line: reusing SAS code in WEB applications