Download Linking Medical and Research Data Bases with TMR and SAS Software

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Linking Medical and Research Data Bases with TMR* and SAS* Software
Lawrence H. Muhlbaier
Jean A. Dozier
Duke University Medical Center
selection, and a specification file that
describes the records to the SAS data base. A
SAS program reads the specification file to
generate the SAS code to read the TMR data
records and create or update the SAS data
base. Each of these components will be
described taken from the TMR to SAS interface.
ABSTRACT
,
"
Patient care data bases and clinical
research data bases often share a large
portion of data. An automatic transfer of
data eliminates the duplicate entry process
and increases consistency between the data
bases. This paper provides a description of
the different components necessary for the
transfer between patient care records of TMR
and the relational data base records of SAS:
structure, selection, and interpretation. A
TMR program extracts record components for
selected patients and places them in a file
for SAS to access. TMR creates a
specification file that describes the contents
of the data file. A SAS program reads the TMR
specification file and generates the SAS
program to then read the TMR data and create
or update the SAS data base. We describe the
SAS programming aspects of the transfer. See
Dozier (1985) for details on the TMR aspects.
The system is in use at Duke University
Medical Center for Cardiology and Cardiac
Surgery data bases. The techniques used and
problems encountered are appl icable to any
transfer between a network-type database and
The transfer system was designed to
minimize the number of changes that would need
to be made to the SAS code to maintain the
system. All changes in the SAS data base
specifications come from the TMR data
dictionary. The TMR data dictionary already
contained information that could be
transformed into SAS names, labels, and value
labels, as well as some of the SAS formatting
information.
RECORD STRUCTURE
Patient care data bases are used to store
the complete patient record including
financial and administrative information.
Records include demographic information,
medical history, laboratory results,
medications prescribed and dispensed,
treatments, clinic or hospital visits,
appointments, and financial information. The
clinical data base must allow the entry of one
or more of several thousand findings, each
with a variety of possible responses. Data
must be displayed by problem, encounter, or
over time, but not usually across patients.
The system must be flexible enough to satisfy
physicians that their findings have been
accurately described; TMR must be able to
satisfy any clinical need. The TMR data base
is a multi-user system, with records
accessable for read and write from multiple
locations at one time.
SAS.
I NTRODUCTI ON
The increasingly detailed information
from clinical data bases for patient care and
reimbursement has made these data bases an
important source of information for addressing
medical research problems (Pryor, 1985).
Rather than add the statistical capabilities
to a medical data base, an a.utomatic transfe,~
system-from a medical data base to a
statistical data base was designed. The
Cardiology_ and Cardiac Surgery divisions at
Duke Univer,sity Medical Center use TMR
(Hammond, 1979), "The Medical Record"*, to
capture data on patients undergoing invasive
and non-invasive studies and therapies for
coronary artery disease. The information in
TMR is used for the management of patient
care, including clinical test and procedure
notes and follow-up flow sheets. SAS (SAS
Institute, Inc., 1995) products are used to
determine which clinical variables are
prognostically important and to describe the
differences found between groups of patients
undergOing different treatments.
Because of the'ir different purposes, the
record format of a patient care data base is
quite different from that of an analytical
data base. TMR treates a single record for
each pat,ient (Figure 1). The different types
of information (lab results, medications) are
stored in different sections of the record.
FUrther, repeat entries of an item are stored
together in rever'se time order (most recent
first). Analytical data bases used for
clinical research require their records to be
rigidly defined" typically using a relational
model (Figure 2). A separate record exists
for each component of a medical record
(demographic, history, each lab test, etc.).
If a serum creatinine level, say, is taken on
three different occasions, the results would
be located in three different records in the
SAS data file for creatinine labs ordered.
Multiple labs that are performed together
would go into one SAS record for each
ordering.
This paper describes the components
necessary to link the TMR data structures
(essentially a network or hierarchical file
structure allowing many repeats of data items
in one record) to the SAS data structures
(relational tables). Creating such a link
involves working with three separate TMR
components: record structure, record
377
The single record per patient in TMR must
be converted to multiple records per patient
per type of finding in the SAS data base.
TMR record to see if this assumption is valid.
If it is not a valid assumption, then the log
will need to be expanded.
SELECTING FOR RECORD TRANSFER
RECORD INTERPRETATION
In order to develop a system that is
flexible enough for different applications,
TMR provides a user-defined data dictionary.
This dictionary contains a list of data
elements, characteristics of each element (for
example, response format, value limits,
element labels, and value labels), and
relationships among data elements (such as the
lab items that make up a lab panel).
Dictionary entries are used to specify the
record components for transfer to the
analytical data base.
There are six options for transfer, given
the different structures of the data bases:
Replace:
1)
All records, all components.
2)
Selected records, all components.
3)
Selected records,
selected components.
Update:
4)
All updated records, all components.
5)
Selected updated records,
selected _components.
Updated components.
6)
SPECIFICATION FILE
The specification file is necessary to
define for SAS the identity of the data
elements that are to be transferred. A SAS
program reads the speCification file and
generates the SAS code necessary to read the
data record components that TMR has created.
The specification file contains the name of
the SAS data base for each record component
and the variable names, labels, response
types, and lengths. The specification file
also contains the value labels, where
appropriate, for SAS's PRQe FORMAT.
Options 3) and 5) were chosen for
implementation for several reasons. Option 3)
is a generalization of Options 1) and 2) and
is needed to start up a SAS data base from an
existing TMR data base. Option 5) is what is
used for the main task of maintaining the TMR
and SAS data bases in parallel. While option
6) would minimize the amount of· data to
process and transfer, the increase in the
amount of information stored to identify the
updated components is substantial, putting on
the order of 30-50 times the volume of entries
into the TMR log file. Logging $elected
records instead of selected components for
transfer reduces the number of update
identification entries created. The savings
in time on the TMR system is significant, in
that the initial processing of the Cardiology
TMR data base for transfer to SAS takes
several days in real time to conclude. The
savings on the SAS system are smaller since
the main SAS data file must still be passed by
the SAS UPDATE program to create each new SAS
data file. The choice to send only selected
record components is based on the differing
needs of clinical medicine and research. The
variables of interest to examine for research
are only a portion of those needed for patient
care. In the cardiac surgery data base, for
example, only 32% of the record components in
the TMR data dictionary are selected for
transfer to SAS. This may well be an upper
bound on the amount of data transferred for a
more general clinical practice.
Figure 3 shows a block diagram of the
levels of data base communication between TMR
and SAS.
SAS CONVERSION PROGRAMS
Using techniques of automatic
programming, a SAS program reads the TMR
Specification File and generates SAS code to
read the created data-records and update the
SAS data, base. This program can access a SAS
file of exceptions to handle variable name and
label changes and to resolve duplicate name
conflicts. Name conflicts arise because TMR
names may be 12 characters long containing any
characters, while SAS is limited to 8, and TMR
names are actually short labels. Thus the
same TMR name may legitimately be used in two
different places. Having a file to change the
names also means that we need to verify that
there are no duplicate names that slip through
to SASe After checking for all duplicates,
the program halts if any are found.
Certain assumptions were made in the
implementation of the TMR update logging
facility that may need to be revised in the
future. In particular, TMR logs activities in
LABS and SAPS, but does not log any activities
in other areas of the patient's record such as
demographic, problems, or medications. This
is based on the assumption that changes in
these non-logged ares have a very high
probability of being associated with changes
in other areas that are logged. During the
next year we will closely monitor the activity
in the logged versus non-logged areas of the
The design difference between TMR and SAS
that causes the most problems is that TMR
actually labels data with code numbers based
on the data dictionary; TMR "names" are
just short labels. Thus changes can be made to
the TMR data dictionary that are totally
transparent to the TMR system but completely
compromise the data in the SAS data base. For
example, it is perfectly legitimate in TMR to
exchange the names on two variables. An update
in SAS would proceed smoothly, but the data
that is in SAS data base is now garbage. To
378
prevent this the TMR transfer specification
file contains the TMR item number for each
entity. The SAS transfer program compares the
current specification file to the previous
update's specification file for upward
compatibility.
and Data File can be read by any of the
popular analytical programs.
The TMR section of the programs is in
place and operational. The SAS programs are
functional for replacement of a data base, but
not for incremental update. That is expected
by the time this paper is published.
In processing the specification file, the
SAS code generator creates INPUT statements
and KEEP lists for each SAS dataset being
created. Length statements are created based
on the variable's data type, the formats are
attached permanently to the associated
variables, and the format library is updated
and compressed. A possible 'addition-to the
system would be the capability to specify a
format to be used in the name change file.
The SAS code generator also accesses a general
code section that allows the SAS data base
manager to insert SAS 'data base specific
changes into the generated code each time the
update is performed.
For further information, please contact
the authOrs at
Box 3865 (Muhlbaier) or Box 2914 (Dozier)
Duke University Medical Center
Durham, North Carolina 27710
REFERENCES:
Dozier JA, Hammond WE, and Stead WW (1985).
Creating a link between medical and
analytical databases. Proc~edings of the
Ninth Annual Symposium on Computer
Applications in Medical Care, MJ
Ackerman, Editor. IEEE Computer Society
'+ 78-482, 1985.
While holding down the very real computer
time charges', the choice to transfer records
from TMR to SAS in a component-wise
incremental update mode necessitates a rather
more complicated SAS program. This is
primarily due to two factors: the design
differences that keep names in TMR from being
unique identifiers and the need to change ID's
and to delete records. Although medical
record numbers are supposedly constant for a
patient, data entry errors are made and
medical record numbers may legitimately
change. To change and delete ID's in update
mode, the SAS code generator creates a batch
PROC EDITOR stream to access the SAS data
bases and update all IO"s before the remainder
of the update is performed. PROC EDITOR is
not a very good tool because we do not know
the maximum number of entries for a particular
ID that will occur in anyone SAS data file,
thus we had to hard code a maximum number of
occurrences that the Proc EDITOR command
stream would check for. As a final step, the
SAS data base"s contents are listed and check
totals are printed to cross check with the TMR
data base"s check totals.
Hammond WE, Stead WW, Straube MJ, and Jelovsek
FR (1979). A clinical data base
management system. Proceedings of the
First International Symposium on Policy
Analysis & Information Systems, 454-461,
1979.
Pryor
D~, Califf RM, Harrell FE, Hlatky MA,
Lee KL, Mark DB, and Rosati RA (1985).
Clinical databases: accomplishments and
unrealized potential. Journal of Medical
Care, 23:623-647, 1985.
SAS Institute, Inc. (1985). SAS User's
Guide: Basics, Version 5 Edition. Cary,
NC, SAS Institute, Inc., 1985.
This approach provides sufficient
flexibility to the total system to have a TMR
data base completely define the SAS code or to
allow the SAS data base manager to extensively
modify the data base outside of the TMR
system.
SUMMARY
We have identified the requirements
placed on a clinical data base and on a
research data base with different format and
structure for the two systems to effectively
communicate. Clinical and analytic data bases
can maintain a symbionic relationship if there
is mechanism for converting the information
from the format of the clinical to that of the
research data base. Though we have
implemented this conversion in the SAS
framework for analysis, the Specification File
379
*
TMR and "The Medical Record" are registered
trademarks of Database, Inc., P.O. Box
3054, Durham, NC 27705, USA.
*
SAS is a registered trademark of SAS
Institute, Inc., Cary, NC, USA.
I
/--
PATIENT CARE RECORD
I--
/--
Demographic
I--
Laboratory
I--
Medications
I--
/-/-SASDSL.DEMOGRAP (Demographic Section)
I--
"---"
Subjective &
I--
Physical
r--
Problems
-
Accounting
r--
Insurance
-
Encounters
I--
~
/-/-SASDSL.CABGI
Lab)
Figure 2. Research data base relational structure.
Each patient in the rMR record generates multiple
records in multiple SAS data sets in the SAS data base.
Figure 1. Medical data base record
structure for a patient in the TMR data
base. Note that each patient has one
record that contains all of his or her
data.
TMR
DATA BASE
(a
r
I
TMR
Pat ient
r\
Dictionary
I
Records
I
-----Transfer
Record
Section
Extraction.
Record
Formatting
I
Format
Library
Ir
~
L,
Archived
Specification
File
Figure 3.
~
SAS Data Base
Query
L Fi I e (
t
1
Speci fication
Fi Ie
lI
I
TMR
TRANSFER
SYSTEM
I
H
U
I
Name
Changes
I
SAS
TRANSFER
SYSTEM
Block Diagram the levels of data base communication between TMR and
SAS.
380