Download A Trauma Registry System Using the SAS System and dBASE III Plus

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Database wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Clusterpoint wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
m
GREENLEAF & KALYANDRUG I Tumo r Regi stry Syste
TEM AND DBASE III PLUS
A TRAUMA REG ISTR Y SYSTEM USING THE SAS SYS
Andrew S. Greenleaf, ARC
Sivat·am Kalyandrug, ARC
Abstract
ARC Professional Services Group (fonnerly ORI,
Inc.) has developed a clinical Trauma Registry System
to capture injury related data, generate reports, and
For data entry and
perfonn statistical analysis.
management of the registry, dBASE m PLUS/Clipper
was utilized because of its efficiency and speed in
processing a large number of data elements distributed
across different files. PC/SAS Version 6.0 provided
the extensive statistical and analytical support needed
for report generation and analysis.
integration of a DBMS and the SAS system was a
This paper addresses some of the
logical choice.
reasons for selecting our configuration as well as
problems encountered during the implementation.
Specifically, the following areas of concern will be
addressed:
•
Database Management System: why it was
selected, how it is used;
SAS System: why it was selected, how it is
used;
Introduction
•
User Interfaces, including the use of pre·
defined procedures and functions;
•
Problems and some solutions;
•
Desired SAS enhancements;
Conclusions.
The SAS System provides for very extensive
statistical analysis of data; however, it lacks many of
the capabilities database management systems (DBMS)
While SAS and DBMS 's have overlapping
offer.
strengths in areas such as data manipulation and report
generation, each bas features that surpass the
For example,
corresponding features of the other.
ve statistical,
extensi
the
e
provid
most DBMS 's do not
the SAS
of
analytical, and graphical capabilities
System. Conversely, the SAS System does not easily
support relational links that DBMS 's provide for data
entry, update, and management. This is especially true
when large database applications require data to span
many tables but record by record processing is still
Background
The Maryland Institute for Emergency Medical
Services Systems (MIEMSS), University of Maryland,
Baltimore, Maryland, is the leading organization in the
country for the treatment and care of shock trauma
In
patients (critically injured accident victims).
fulfillment of a legislative requirement and more as a
research tool in the study of etiology and treatment of
trauma, MIEMSS collects enonnous amounts of data
on trauma victims. The data includes infonnation on
the physiological and anatomical aspects of injury to
various body parts, the nature, cause, and extent of
injury, the type and course of treatment, and other
desired.
After we analyzed the functional specifications of
our application, the data organization, the user
interfaces required, and the system resources available,
patient and hospital indicators.
an
243
GREENLEAF & KAL YANDRUG I Tumor Registry Syste
m
The comprehensive database was designed to help
the physicians and traumatologists in their reseateh. It
was part of a VAX system dedicated to on-line
clinical care, but could not sufficiently meet the data
processing demands for application reseateh.
To
relieve dependence on the over-burdened VAX system
and to better address the reseateh needs, the new
Trauma Registry System was designed and developed
for use on microcomputers.
Why a Database Management System was Selected
While the primary objective of our system is to
provide statistical and analytical support to
traumatologists, the issues of how the data was to be
entered into the computer had to be addressed. The
relative merits of SAS and DBMS's as vehicles for
data entry were weighed and prototypes of the two
methodologies were compared
Following is a
discussion of the evaluation process which led us to
choose dBASE as a data entry and update tool.
In designing the specifications for our Trauma
Registry System, one factor stood out as most crucial
to the design of the system: the number of discrete
attributes recorded per patient. Because the physicians
who would be using the system anticipated a need to
use any and all recorded attributes, elimination of nonessential data entities was not possible. Even though
the approximately 3000 data entities could be
organized into many smaller clusters, there was still
the problem of displaying, editing, and updating this
volume of data on a patient-by-patient basis.
The volume of data collected yearly within the
hospital made it quickly evident that multiple files
would be required. Entity-Relationship modeling was
used to design the database. The approximately 3000
data entities were grouped into 29 tables, each
containing a primary key and specifically related data.
The resultant design was then analyzed in terms of
SAS usage and DBMS usage.
User requirements
indicated a strong desire to enter all of a single
patient's information at a given time. Since this data
244
would span across numerous database tables, some
relational links were needed.
The relational links
between tables is much simpler to implement in a
DBMS because much of the control is readily
available to users.
The SAS System concentrates on a single data set
at a time; basically on a file-by-file basis. For support
personnel to enter a patient's entire record of
information which spans multiple files, a connection
among those files must be established. One method of
accomplishing this is to simply merge files together.
Memory constraints, apparently dne to liroits on the
numbers of variables which can be merged into one
file, made it virtually impossible to merge more than a
couple of our files.
While the SAS System
documentation does not indicate an upper limit on the
number of variables within a data file, our experience
has shown that attempts to merge files with more than
100 variables each results in inconsistent, but ever
present, memory constraint problems. These merge
steps at best caused run-time aborts and at worst
locked the system up (requiring cold starts), created
numerous lost clusters of memory, and at times
corrupted the DOS file allocation table.
Since a patient's record spans many data sets,
prototypes indicated that SAS macros might be used to
loop through each data set, opening and closing each
for the single record being processed. Despite the
overhead and the inefficiency of the approach, SAS
macros on the microcomputer were, at the time, too
limited to even permit such a design. This approach
would also have required extensive processing to
handle cursor control from one file's information to
another, and basically would have required coding
many of the features already present in a DBMS.
Even the SAS FSP procedures would have required
complicated processing to handle cursor control from
one file to another and would also have increased
response time when processing multiple patients at a
time. This left us with the choice of designing the
system to process each SAS data set for all
observations before proceeding to another data set thus
abandoning the desire to enter an entire patient's
record at one time, or of selecting some other software
GREENLE AF & KAL VAN DRUG I Tumor Registry System
independent nodes, there are no physical connections
among the nodes of the implied network. The users
are relied upon to act as the media for transferring
data between the workstations and the main machine,
thus, maintaining data integrity is difficult when the
users do not make timely transfers of the data. To
address integrity and consistency in the database,
transaction management principles were examined.
Additions of new data and modifications to existing
data are performed via transaction records, which are
stored in transaction tables created to contain the new
information. The system creates a transaction table for
every master table to be updated.
package for the system.
It is for these reasons that we explored the
possibilities of using a DBMS for the data
management portion of our Trauma Registry System.
After analyzing various database packages, we chose
dBASE m PLUS for several reasons:
1. It is relational, and prototypes met our basic
requirements quite satisfactorily.
2. It is simple to use and has interactive program
development facilities.
3. Use of a code compiler provided us with
executable code which could be distributed to
users without their need to purchase additional
software.
An update system, residing on the main machine,
applies the transaction records to the master tables.
Support personnel transport their transaction tables to
thi.5 machine and the update system applies these
changes to the appropriate master database files.
Copies of the new master files are then distributed to
each subordinate machine to maintain data integrity.
Because multiple workstations are capable of
processing the same data simultaneously, updates to
the master database tables are processed sequentially.
Consequently, multiple transaction batches must be
applied sequentially with the most recently executed
batch becoming the master. Data integrity can be
compromised if separate woikstations process updates
to a given record but do not contain identical master
A fully interconnected distributed database
tables.
system would apply updates to a single master
database table which would then be immediately
available to all users. Since our configuratinn is not
fully interconnected, it was necessary to stress the fact
that simultaneous work from separate workstations on
a given record could result in data inconsistency and
integrity problems unless each workstation maintained
identical master files.
4. It was readily available for our immediate use.
Since dBASE is relational and permits easy
linking of multiple database tables, tbe original user
requirements were met and speed and performance
were greatly increased Of course, if our database
were small enough to reside within one or two SAS
data sets, SAS could have easily been used for data
entry functions.
How the Database Management System is Used
Using the relational database capabilities of
dBASE ill PLUS, a database was constructed and
systems for data entry and update were created. The
Nantucket Clipper compiler was chosen to compile our
dBASE code into object modules which were linked to
create discrete execute modules for these systems.
The entry system is screen-driven and offers easy
access for support personnel to enter and modify
patient information. The entry system also pennits
patient-by-patient browsing of the data and individualrecord edit and analysis capabilities.
A number of problems exist because of this
configuration. The first occurs when the users of the
subordinate machines fail to update their copies of the
master files. If this happens, new transaction records
could contain data which has since changed. Applying
the transactions to the master files would then reset
some attributes to previous values, thus compromising
While our Trauma Registry System is a distributed
database system operational at a number of
245
GRE ENL EAF & KAL VAN DRU G I Tum or
Registry Sys tem
data integrity and creating inconsistency. Becau
se the
potential for this problem could not be eliminated
, logs
are maintained to record all changes to the
database.
In this manner, inconsistencies can be tracke
d and
corrected if they arise.
Our system design bas minimized the problems
of
data integrity and inconsistency significantly.
Because
the workstations composing the Trauma
Registry
System network are used primarily for
data-entry
pu1p0ses, the entire master database is not
stored on
these machines. Patient information is enter
ed at any
of the workstations and these transaction batch
es are
applied to the master database files on
the main
machine. Updates to existing data occur infreq
uently
and are made directly on the main machine.
In this
manner, data integrity is not easily comp
romised
should separate workstations be accessing simila
r data.
This approach also permits the introduction
of some
security on the sensitive patient information
processed
throughout the Trauma Registry System. By
limiting
each workstation's access to the master datab
ase to
small portions at a time, only the main mach
ine has
capabilities for reporting and analyzing the
entire
spectrum of data in the database. This has
the effect
of limiting the number of sites requiring
physical
security features (i.e. lock and key).
Because data are processed in dBASE/Cli
pperbased systems as well as SAS-based system
s, either
the data must be in a format acceptable
to both
software packages or two versions of the
database
tables are needed. Presently, dBASE canno
t access
data in SAS files and while SAS can acces
s data in
dBASE files, this data must be converted
into SAS
data sets for use. In light of this fact, we decid
ed to
maintain two databases, a dBASE m PLUS
formatted
database and a SAS database. The dBASE
database
was selected as the maste r for several basic reaso
ns:
I. Each dBASE system designed for the Traum
a
Registry System can access the master data
directly;
2. dBASE stores data in a more compressed
format, and the volume of data anticipated for
246
this project is very large.
3. SAS can convert dBASE files into SAS
files
while no corresponding procedure exists
in
dBASE to convert SAS files.
The SAS procedure PROC DBF provides a mean
s
to convert dBASE table information into
SAS
accessible form, but the process requires some
care.
Variable names are copied from the dBAS
E table
header; however, since SAS restricts variable
names to
8 characters, truncation of longer names occur
s. Since
SAS issues no messages that this truncation
has
occurred, users can potentially create programs
which
attempt to access seemingly nonexistent variab
les. The
assignment of data types can also lead to
problems.
In SAS, character variables are stored exactly
as in the
dBASE table; however, numeric data are
invariably
stored in 16 bytes. Consequently, numeric
variables
which have been defined with lengths of l
byte, 2
bytes, etc. in dBASB are defined as 16-byte
numerics
in SAS. While no data values are lost this
way, the
resultant SAS files can increase in size over
their
dBASE equivalents by tremendous amounts.
Since
microcomputers do not have the virtually
endless
supply of disk space that mainframes appear
to have,
this can create problems.
(Of course, if your
microcomputer bas an unlimited supply of
hard disk
space, this will be of no concern to you.)
In our
applications, data sets bad on the order of 20003000
records each with 150-200 numeric variables.
While
each of these variables could easily be captu
red using
the minimum SAS numeric length of 3 bytes
, each is
set to the default 16 bytes. 1bis extra length
of 13
bytes per variable per record increased our
working
versions of the data sets by 3.9-7.8 megabytes
per data
set. Limitations on storage space made it diffic
ult to
work with data sets of this size and greatly contri
buted
to our decision to maintain separate versions
of our
database. To correct the reassigned numeric
variables,
a separate DATA step can be run after the
PROC
DBF to define each variable data type
explicitly;
however, the problem of memory management
remains
for the duration of the SAS procedure
until the
converted data set is relea sed
GREEN LEAF & KAL VAN DRUG I Tumor Registry System
It is au
analysis;
While this procedure itself is not overly slow, it is
still not a desirable process to follow whenever a
statistical analysis is desired on some query. If we
choose to maintain a single database in dBASE III
PLUS format, any time a statistical procedure was
desired a conversion would be required. Because the
end users of the Trauma Registry System are
physicians for whom analysis via SAS is of primary
importance, it was not reasonable to convert the data
and reset the data types each time a SAS statistical
function is desired. This is a time-consuming process
which appears to prioritize analysis as a secondary
concern, which it is not. It is mainly for this reason
that a SAS database is created and maintained in
addition to a dBASE III PLUS database.
industry
standard
for statistical
•
SAS provides routines to access dBASE files;
•
Our clients were familiar with the output from
SAS procedures;
• PC/SAS was available to our clients
members of the site license;
as
• We could depend on the vast SAS expertise
within our company.
The SAS System was selected not only because it
is one of the most extensive statistical and analytical
packages available, but because it is easy to use and
constantly being improved. Our experience with SAS
in other projects has lead us to understand that the
SAS System is not without its limitations but it more
than adequately meets our immediate project needs as
well as our anticipated future needs.
While the data are maintained in two separate
databases, any and all changes to information retained
in the Trauma Registry System are applied to the
dBASE III PLUS database tables. The update system
applies the transaction tables to the dBASE database
and also creates the SAS programs necessary to
To
include these records in the SAS database.
no
database,
the
in
y
maintain a level of consistenc
The
manual changes are permitted to the SAS files.
SAS database is simply a static copy of the master
database and is used whenever any of the multitude of
SAS functions is desired. It goes without saying that
the problems of database maintenance, storage, and
backup are doubled because of the need for separate
databases. Unfortunately, there was no foreseeable
method for avoiding the problem of multiple databases
if the best features of DBMS's and SAS were to be
How the SAS System is Used
While the SAS System is fairly simple to use,
users have been known to have trouble either because
they were not familiar with computers or they were
not comfortable with the fourth generation design. To
avoid any problems with its use, we developed an
interface to permit users to create queries and request
statistics without knowing the syntax and/or operating
procedures for the SAS System. The SAS System is
utilized by the trauma surgeons to obtain a variety of
exploited.
results.
• Demographic statistics of all trauma patients;
by race, sex, age, etc.
Why the SAS System was Selected
Because one of the primary objectives in
developing our Trauma Registry System was to
provide a means for extensive analytical capabilities
for all data collected, SAS seemed a perfect solution.
SAS was selected for the following simple reasons:
•
Statistics based on type of injury: head, neck,
spine, thorax, etc.
•
Cross-tabulations such as type of injury versus
injury severity score or trauma score, and type
of injury versus discharge status.
Graphs and charts summarizing distributions of
247
GREENLEAF & KAL VAN DRUG I Tumor Registry System
trauma by type, time, location, etc.
•
one of these systems. When one of these systems
terminates, the user is returned to the main menu.
Selection of the data management system yields further
menus provided to assist users with various
management functions:
Reports summarizing yearly data to identify
trends in tmuma incidence over time,
relationships between mJury severity and
survival time, expected course of stay based on
initial injuries, etc.
• Backup of existing data and system directories
to diskette and restoration of such data and
systems to hard disks.
• Application
of
advanced
multivariate
procedures to isolate important factors and
determine probabilities of survival.
• Backup of user-defined data and progmms to
diskette and restomtion of these to hard disks.
As user knowledge of the database and the SAS
System progressed, the scope of their applications
grew.
The micro-to-mainframe Unk has been
employed to tap into data stored on other mainframes
and to utilize SAS procedures not available on the
microcomputer. This link removes the limitation of
having a strictly mainframe or microcomputer system.
SAS Graph has also been used for line plots, pie
charts, bar charts, etc.
• Installation of systems including the data entry,
data update, data management, reporting, and
analysis systems.
• Removal of obsolete data from the hard disks.
Selection of the SAS-based reporting and analysis
system results not only in the initialization of the SAS
environment but in the invocation of a menu system.
The AUTOEXE C.SAS file is used to define the user
environment, input/output libraries, menu system
directories, and various function key settings.
Subordinate menus provide for specific report
generation and analysis capabilities. Since the system
was designed for use by physicians and support
personnel, we strove to provide as many prepared
procedures as possible so a working knowledge of
SAS would not be needed. Generic SAS procedures
were prepared to create data subsets based on userspecified criteria, to merge data from several data sets
based on user-specified criteria, and to generate a
variety of reports containing extensive analysis of data
on a patient-by-patient basis as well as on more
general groupings. By making the system menu-based,
a user needed only to know how to select a menu
option and the desired end result. The menu system
generates the SAS code needed to perform the desired
function. Then, the user needs only to execute the
code.
User Interface
An extensive menu-driven interface was developed
to permit users easy access to the data and to provide
a number of prepared procedures for their use. Since
each machine in our distributed database network is
dedicated to the use of the Trauma Registry System,
the AUTOEXE C.BAT file was modified to invoke the
system at boot time. The users then conduct their
work from the menus and can invoke any of the
subsystems of the Trauma Registry System including:
•
the dBASE III PLUS/Clipper-based data entry
and data update systems,
• the DOS-based data management system,
•
the SAS-based reporting and analysis system,
•
operating system commands.
As user proficiency in SAS increased through
usage, there was less dependency upon the menu
system for progmm generation.
Experienced SAS
users are able to bypass the menu system after SAS is
Selection of either the data entry system or the
data update system results in the direct execution of
248
GREENLEAF & KAL VAN DRUG I Tumor Registry System
with the details of the system.
Because PROC DBF reformats numeric data types
to a default length, it would be desirable to be able to
explicitly define the data types of the variables within
the procedure itself. This would eliminate the need to
run a separate DATA step to redefine data types. It
could also greatly reduce the size of the data sets
created because numerics would not be assigned a
maximum length if not specifically needed.
Problems and Some Solutions
Conclusions
During prototype development, we encountered a
problem with combining large SAS files. Merges of
SAS files where each contained more than 100
variables always resulted in run-time memory errors.
As an attempt to rectify the problem, KEEP= options
were placed on each data set in the MERGE statement
to restrict the numbers of variables in each. Some
run-time errors were eliminated, however, aborts
nevertheless occurred. To solve this problem, each
data set used in the MERGE was first created as a
working file containing only the variables necessary to
satisfy the current query. These working data sets
were then merged. While this approach is more timeconsuming, it enables merges of large data sets to
reach a normal completion. A strong note of caution
Lost
should be given concerning these merges.
merges.
failed
from
result
clusters and data fragments
While this problem of lost clusters is probably evident
to all PC SAS users, what we were not aware of was
the potentially catastrophic results of running numerous
aborted processes without freeing up these clusters for
reuse. Several times during the course of developing
our system and testing the upper limits of SAS in
terms of volumes of data SAS could handle, our file
allocation tables were corrupted. Fortunately, we had
retained backups of our computer configuration so
reformatting of the hard disk resulted only in loss of
The Trauma Registry System which we developed
incmporates many of the best features of both the SAS
System Version 6.0 and the dBASE ill PLUS database
management system. Our system design and software
integration indicates that a system of both dBASE and
SAS while not without its problems, can provide users
with dependable systems to collect, maintain, and
analyze data on the microcomputer level without
having to rely upon third party support.
invoked to create their own data steps and procedures.
Inexperienced SAS users were quickly able to become
proficient via our menu system because of the number
of examples invoked to conduct usable analytical
results. The result of this interface is an environment
whereby the user can employ features from both
dBASE III PLUS and SAS without being concerned
We have implemented user-friendly menus which
activate writing of basic SAS code. This has resulted
in introducing SAS to beginners without intimidating
them with complex file definitions, library declarations,
syntax, etc.
Ideally, we, as software developers, would much
prefer to design and implement a system in a single
prograntming language, thus minimizing problems with
maintainability.
and
portability,
compatibility,
Unfortunately, at the time we developed our Trauma
Registry System it would have been difficult and timePerhaps future
consuming to achieve that desire.
better data
provide
will
System
versions of the SAS
accessing capabilities to allow for direct linking to
databases in non-SAS formats. Until that time, our
system will be actively used by the trauma surgeons
for whome it was developed and will be maintained to
provide for their continuing needs and desires.
valuable time and not resources.
Desired SAS Enhancements
249
GREENLEAF & KAL VAN DRUG I Tumor Registry System
Acknowledgements
We would like to thank Dr. C. Michael Dunham,
Dr. David Genns, and Mr. Dick Switalski for their
support of our efforts. Mr. Chamrong Chutt was
instrumental in implementing Injury Severity Score
modules and various other SAS modules of the
Trauma Registry System.
250