* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A Trauma Registry System Using the SAS System and dBASE III Plus
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Clusterpoint wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
m GREENLEAF & KALYANDRUG I Tumo r Regi stry Syste TEM AND DBASE III PLUS A TRAUMA REG ISTR Y SYSTEM USING THE SAS SYS Andrew S. Greenleaf, ARC Sivat·am Kalyandrug, ARC Abstract ARC Professional Services Group (fonnerly ORI, Inc.) has developed a clinical Trauma Registry System to capture injury related data, generate reports, and For data entry and perfonn statistical analysis. management of the registry, dBASE m PLUS/Clipper was utilized because of its efficiency and speed in processing a large number of data elements distributed across different files. PC/SAS Version 6.0 provided the extensive statistical and analytical support needed for report generation and analysis. integration of a DBMS and the SAS system was a This paper addresses some of the logical choice. reasons for selecting our configuration as well as problems encountered during the implementation. Specifically, the following areas of concern will be addressed: • Database Management System: why it was selected, how it is used; SAS System: why it was selected, how it is used; Introduction • User Interfaces, including the use of pre· defined procedures and functions; • Problems and some solutions; • Desired SAS enhancements; Conclusions. The SAS System provides for very extensive statistical analysis of data; however, it lacks many of the capabilities database management systems (DBMS) While SAS and DBMS 's have overlapping offer. strengths in areas such as data manipulation and report generation, each bas features that surpass the For example, corresponding features of the other. ve statistical, extensi the e provid most DBMS 's do not the SAS of analytical, and graphical capabilities System. Conversely, the SAS System does not easily support relational links that DBMS 's provide for data entry, update, and management. This is especially true when large database applications require data to span many tables but record by record processing is still Background The Maryland Institute for Emergency Medical Services Systems (MIEMSS), University of Maryland, Baltimore, Maryland, is the leading organization in the country for the treatment and care of shock trauma In patients (critically injured accident victims). fulfillment of a legislative requirement and more as a research tool in the study of etiology and treatment of trauma, MIEMSS collects enonnous amounts of data on trauma victims. The data includes infonnation on the physiological and anatomical aspects of injury to various body parts, the nature, cause, and extent of injury, the type and course of treatment, and other desired. After we analyzed the functional specifications of our application, the data organization, the user interfaces required, and the system resources available, patient and hospital indicators. an 243 GREENLEAF & KAL YANDRUG I Tumor Registry Syste m The comprehensive database was designed to help the physicians and traumatologists in their reseateh. It was part of a VAX system dedicated to on-line clinical care, but could not sufficiently meet the data processing demands for application reseateh. To relieve dependence on the over-burdened VAX system and to better address the reseateh needs, the new Trauma Registry System was designed and developed for use on microcomputers. Why a Database Management System was Selected While the primary objective of our system is to provide statistical and analytical support to traumatologists, the issues of how the data was to be entered into the computer had to be addressed. The relative merits of SAS and DBMS's as vehicles for data entry were weighed and prototypes of the two methodologies were compared Following is a discussion of the evaluation process which led us to choose dBASE as a data entry and update tool. In designing the specifications for our Trauma Registry System, one factor stood out as most crucial to the design of the system: the number of discrete attributes recorded per patient. Because the physicians who would be using the system anticipated a need to use any and all recorded attributes, elimination of nonessential data entities was not possible. Even though the approximately 3000 data entities could be organized into many smaller clusters, there was still the problem of displaying, editing, and updating this volume of data on a patient-by-patient basis. The volume of data collected yearly within the hospital made it quickly evident that multiple files would be required. Entity-Relationship modeling was used to design the database. The approximately 3000 data entities were grouped into 29 tables, each containing a primary key and specifically related data. The resultant design was then analyzed in terms of SAS usage and DBMS usage. User requirements indicated a strong desire to enter all of a single patient's information at a given time. Since this data 244 would span across numerous database tables, some relational links were needed. The relational links between tables is much simpler to implement in a DBMS because much of the control is readily available to users. The SAS System concentrates on a single data set at a time; basically on a file-by-file basis. For support personnel to enter a patient's entire record of information which spans multiple files, a connection among those files must be established. One method of accomplishing this is to simply merge files together. Memory constraints, apparently dne to liroits on the numbers of variables which can be merged into one file, made it virtually impossible to merge more than a couple of our files. While the SAS System documentation does not indicate an upper limit on the number of variables within a data file, our experience has shown that attempts to merge files with more than 100 variables each results in inconsistent, but ever present, memory constraint problems. These merge steps at best caused run-time aborts and at worst locked the system up (requiring cold starts), created numerous lost clusters of memory, and at times corrupted the DOS file allocation table. Since a patient's record spans many data sets, prototypes indicated that SAS macros might be used to loop through each data set, opening and closing each for the single record being processed. Despite the overhead and the inefficiency of the approach, SAS macros on the microcomputer were, at the time, too limited to even permit such a design. This approach would also have required extensive processing to handle cursor control from one file's information to another, and basically would have required coding many of the features already present in a DBMS. Even the SAS FSP procedures would have required complicated processing to handle cursor control from one file to another and would also have increased response time when processing multiple patients at a time. This left us with the choice of designing the system to process each SAS data set for all observations before proceeding to another data set thus abandoning the desire to enter an entire patient's record at one time, or of selecting some other software GREENLE AF & KAL VAN DRUG I Tumor Registry System independent nodes, there are no physical connections among the nodes of the implied network. The users are relied upon to act as the media for transferring data between the workstations and the main machine, thus, maintaining data integrity is difficult when the users do not make timely transfers of the data. To address integrity and consistency in the database, transaction management principles were examined. Additions of new data and modifications to existing data are performed via transaction records, which are stored in transaction tables created to contain the new information. The system creates a transaction table for every master table to be updated. package for the system. It is for these reasons that we explored the possibilities of using a DBMS for the data management portion of our Trauma Registry System. After analyzing various database packages, we chose dBASE m PLUS for several reasons: 1. It is relational, and prototypes met our basic requirements quite satisfactorily. 2. It is simple to use and has interactive program development facilities. 3. Use of a code compiler provided us with executable code which could be distributed to users without their need to purchase additional software. An update system, residing on the main machine, applies the transaction records to the master tables. Support personnel transport their transaction tables to thi.5 machine and the update system applies these changes to the appropriate master database files. Copies of the new master files are then distributed to each subordinate machine to maintain data integrity. Because multiple workstations are capable of processing the same data simultaneously, updates to the master database tables are processed sequentially. Consequently, multiple transaction batches must be applied sequentially with the most recently executed batch becoming the master. Data integrity can be compromised if separate woikstations process updates to a given record but do not contain identical master A fully interconnected distributed database tables. system would apply updates to a single master database table which would then be immediately available to all users. Since our configuratinn is not fully interconnected, it was necessary to stress the fact that simultaneous work from separate workstations on a given record could result in data inconsistency and integrity problems unless each workstation maintained identical master files. 4. It was readily available for our immediate use. Since dBASE is relational and permits easy linking of multiple database tables, tbe original user requirements were met and speed and performance were greatly increased Of course, if our database were small enough to reside within one or two SAS data sets, SAS could have easily been used for data entry functions. How the Database Management System is Used Using the relational database capabilities of dBASE ill PLUS, a database was constructed and systems for data entry and update were created. The Nantucket Clipper compiler was chosen to compile our dBASE code into object modules which were linked to create discrete execute modules for these systems. The entry system is screen-driven and offers easy access for support personnel to enter and modify patient information. The entry system also pennits patient-by-patient browsing of the data and individualrecord edit and analysis capabilities. A number of problems exist because of this configuration. The first occurs when the users of the subordinate machines fail to update their copies of the master files. If this happens, new transaction records could contain data which has since changed. Applying the transactions to the master files would then reset some attributes to previous values, thus compromising While our Trauma Registry System is a distributed database system operational at a number of 245 GRE ENL EAF & KAL VAN DRU G I Tum or Registry Sys tem data integrity and creating inconsistency. Becau se the potential for this problem could not be eliminated , logs are maintained to record all changes to the database. In this manner, inconsistencies can be tracke d and corrected if they arise. Our system design bas minimized the problems of data integrity and inconsistency significantly. Because the workstations composing the Trauma Registry System network are used primarily for data-entry pu1p0ses, the entire master database is not stored on these machines. Patient information is enter ed at any of the workstations and these transaction batch es are applied to the master database files on the main machine. Updates to existing data occur infreq uently and are made directly on the main machine. In this manner, data integrity is not easily comp romised should separate workstations be accessing simila r data. This approach also permits the introduction of some security on the sensitive patient information processed throughout the Trauma Registry System. By limiting each workstation's access to the master datab ase to small portions at a time, only the main mach ine has capabilities for reporting and analyzing the entire spectrum of data in the database. This has the effect of limiting the number of sites requiring physical security features (i.e. lock and key). Because data are processed in dBASE/Cli pperbased systems as well as SAS-based system s, either the data must be in a format acceptable to both software packages or two versions of the database tables are needed. Presently, dBASE canno t access data in SAS files and while SAS can acces s data in dBASE files, this data must be converted into SAS data sets for use. In light of this fact, we decid ed to maintain two databases, a dBASE m PLUS formatted database and a SAS database. The dBASE database was selected as the maste r for several basic reaso ns: I. Each dBASE system designed for the Traum a Registry System can access the master data directly; 2. dBASE stores data in a more compressed format, and the volume of data anticipated for 246 this project is very large. 3. SAS can convert dBASE files into SAS files while no corresponding procedure exists in dBASE to convert SAS files. The SAS procedure PROC DBF provides a mean s to convert dBASE table information into SAS accessible form, but the process requires some care. Variable names are copied from the dBAS E table header; however, since SAS restricts variable names to 8 characters, truncation of longer names occur s. Since SAS issues no messages that this truncation has occurred, users can potentially create programs which attempt to access seemingly nonexistent variab les. The assignment of data types can also lead to problems. In SAS, character variables are stored exactly as in the dBASE table; however, numeric data are invariably stored in 16 bytes. Consequently, numeric variables which have been defined with lengths of l byte, 2 bytes, etc. in dBASB are defined as 16-byte numerics in SAS. While no data values are lost this way, the resultant SAS files can increase in size over their dBASE equivalents by tremendous amounts. Since microcomputers do not have the virtually endless supply of disk space that mainframes appear to have, this can create problems. (Of course, if your microcomputer bas an unlimited supply of hard disk space, this will be of no concern to you.) In our applications, data sets bad on the order of 20003000 records each with 150-200 numeric variables. While each of these variables could easily be captu red using the minimum SAS numeric length of 3 bytes , each is set to the default 16 bytes. 1bis extra length of 13 bytes per variable per record increased our working versions of the data sets by 3.9-7.8 megabytes per data set. Limitations on storage space made it diffic ult to work with data sets of this size and greatly contri buted to our decision to maintain separate versions of our database. To correct the reassigned numeric variables, a separate DATA step can be run after the PROC DBF to define each variable data type explicitly; however, the problem of memory management remains for the duration of the SAS procedure until the converted data set is relea sed GREEN LEAF & KAL VAN DRUG I Tumor Registry System It is au analysis; While this procedure itself is not overly slow, it is still not a desirable process to follow whenever a statistical analysis is desired on some query. If we choose to maintain a single database in dBASE III PLUS format, any time a statistical procedure was desired a conversion would be required. Because the end users of the Trauma Registry System are physicians for whom analysis via SAS is of primary importance, it was not reasonable to convert the data and reset the data types each time a SAS statistical function is desired. This is a time-consuming process which appears to prioritize analysis as a secondary concern, which it is not. It is mainly for this reason that a SAS database is created and maintained in addition to a dBASE III PLUS database. industry standard for statistical • SAS provides routines to access dBASE files; • Our clients were familiar with the output from SAS procedures; • PC/SAS was available to our clients members of the site license; as • We could depend on the vast SAS expertise within our company. The SAS System was selected not only because it is one of the most extensive statistical and analytical packages available, but because it is easy to use and constantly being improved. Our experience with SAS in other projects has lead us to understand that the SAS System is not without its limitations but it more than adequately meets our immediate project needs as well as our anticipated future needs. While the data are maintained in two separate databases, any and all changes to information retained in the Trauma Registry System are applied to the dBASE III PLUS database tables. The update system applies the transaction tables to the dBASE database and also creates the SAS programs necessary to To include these records in the SAS database. no database, the in y maintain a level of consistenc The manual changes are permitted to the SAS files. SAS database is simply a static copy of the master database and is used whenever any of the multitude of SAS functions is desired. It goes without saying that the problems of database maintenance, storage, and backup are doubled because of the need for separate databases. Unfortunately, there was no foreseeable method for avoiding the problem of multiple databases if the best features of DBMS's and SAS were to be How the SAS System is Used While the SAS System is fairly simple to use, users have been known to have trouble either because they were not familiar with computers or they were not comfortable with the fourth generation design. To avoid any problems with its use, we developed an interface to permit users to create queries and request statistics without knowing the syntax and/or operating procedures for the SAS System. The SAS System is utilized by the trauma surgeons to obtain a variety of exploited. results. • Demographic statistics of all trauma patients; by race, sex, age, etc. Why the SAS System was Selected Because one of the primary objectives in developing our Trauma Registry System was to provide a means for extensive analytical capabilities for all data collected, SAS seemed a perfect solution. SAS was selected for the following simple reasons: • Statistics based on type of injury: head, neck, spine, thorax, etc. • Cross-tabulations such as type of injury versus injury severity score or trauma score, and type of injury versus discharge status. Graphs and charts summarizing distributions of 247 GREENLEAF & KAL VAN DRUG I Tumor Registry System trauma by type, time, location, etc. • one of these systems. When one of these systems terminates, the user is returned to the main menu. Selection of the data management system yields further menus provided to assist users with various management functions: Reports summarizing yearly data to identify trends in tmuma incidence over time, relationships between mJury severity and survival time, expected course of stay based on initial injuries, etc. • Backup of existing data and system directories to diskette and restoration of such data and systems to hard disks. • Application of advanced multivariate procedures to isolate important factors and determine probabilities of survival. • Backup of user-defined data and progmms to diskette and restomtion of these to hard disks. As user knowledge of the database and the SAS System progressed, the scope of their applications grew. The micro-to-mainframe Unk has been employed to tap into data stored on other mainframes and to utilize SAS procedures not available on the microcomputer. This link removes the limitation of having a strictly mainframe or microcomputer system. SAS Graph has also been used for line plots, pie charts, bar charts, etc. • Installation of systems including the data entry, data update, data management, reporting, and analysis systems. • Removal of obsolete data from the hard disks. Selection of the SAS-based reporting and analysis system results not only in the initialization of the SAS environment but in the invocation of a menu system. The AUTOEXE C.SAS file is used to define the user environment, input/output libraries, menu system directories, and various function key settings. Subordinate menus provide for specific report generation and analysis capabilities. Since the system was designed for use by physicians and support personnel, we strove to provide as many prepared procedures as possible so a working knowledge of SAS would not be needed. Generic SAS procedures were prepared to create data subsets based on userspecified criteria, to merge data from several data sets based on user-specified criteria, and to generate a variety of reports containing extensive analysis of data on a patient-by-patient basis as well as on more general groupings. By making the system menu-based, a user needed only to know how to select a menu option and the desired end result. The menu system generates the SAS code needed to perform the desired function. Then, the user needs only to execute the code. User Interface An extensive menu-driven interface was developed to permit users easy access to the data and to provide a number of prepared procedures for their use. Since each machine in our distributed database network is dedicated to the use of the Trauma Registry System, the AUTOEXE C.BAT file was modified to invoke the system at boot time. The users then conduct their work from the menus and can invoke any of the subsystems of the Trauma Registry System including: • the dBASE III PLUS/Clipper-based data entry and data update systems, • the DOS-based data management system, • the SAS-based reporting and analysis system, • operating system commands. As user proficiency in SAS increased through usage, there was less dependency upon the menu system for progmm generation. Experienced SAS users are able to bypass the menu system after SAS is Selection of either the data entry system or the data update system results in the direct execution of 248 GREENLEAF & KAL VAN DRUG I Tumor Registry System with the details of the system. Because PROC DBF reformats numeric data types to a default length, it would be desirable to be able to explicitly define the data types of the variables within the procedure itself. This would eliminate the need to run a separate DATA step to redefine data types. It could also greatly reduce the size of the data sets created because numerics would not be assigned a maximum length if not specifically needed. Problems and Some Solutions Conclusions During prototype development, we encountered a problem with combining large SAS files. Merges of SAS files where each contained more than 100 variables always resulted in run-time memory errors. As an attempt to rectify the problem, KEEP= options were placed on each data set in the MERGE statement to restrict the numbers of variables in each. Some run-time errors were eliminated, however, aborts nevertheless occurred. To solve this problem, each data set used in the MERGE was first created as a working file containing only the variables necessary to satisfy the current query. These working data sets were then merged. While this approach is more timeconsuming, it enables merges of large data sets to reach a normal completion. A strong note of caution Lost should be given concerning these merges. merges. failed from result clusters and data fragments While this problem of lost clusters is probably evident to all PC SAS users, what we were not aware of was the potentially catastrophic results of running numerous aborted processes without freeing up these clusters for reuse. Several times during the course of developing our system and testing the upper limits of SAS in terms of volumes of data SAS could handle, our file allocation tables were corrupted. Fortunately, we had retained backups of our computer configuration so reformatting of the hard disk resulted only in loss of The Trauma Registry System which we developed incmporates many of the best features of both the SAS System Version 6.0 and the dBASE ill PLUS database management system. Our system design and software integration indicates that a system of both dBASE and SAS while not without its problems, can provide users with dependable systems to collect, maintain, and analyze data on the microcomputer level without having to rely upon third party support. invoked to create their own data steps and procedures. Inexperienced SAS users were quickly able to become proficient via our menu system because of the number of examples invoked to conduct usable analytical results. The result of this interface is an environment whereby the user can employ features from both dBASE III PLUS and SAS without being concerned We have implemented user-friendly menus which activate writing of basic SAS code. This has resulted in introducing SAS to beginners without intimidating them with complex file definitions, library declarations, syntax, etc. Ideally, we, as software developers, would much prefer to design and implement a system in a single prograntming language, thus minimizing problems with maintainability. and portability, compatibility, Unfortunately, at the time we developed our Trauma Registry System it would have been difficult and timePerhaps future consuming to achieve that desire. better data provide will System versions of the SAS accessing capabilities to allow for direct linking to databases in non-SAS formats. Until that time, our system will be actively used by the trauma surgeons for whome it was developed and will be maintained to provide for their continuing needs and desires. valuable time and not resources. Desired SAS Enhancements 249 GREENLEAF & KAL VAN DRUG I Tumor Registry System Acknowledgements We would like to thank Dr. C. Michael Dunham, Dr. David Genns, and Mr. Dick Switalski for their support of our efforts. Mr. Chamrong Chutt was instrumental in implementing Injury Severity Score modules and various other SAS modules of the Trauma Registry System. 250