Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linking Medical and Research Data Bases with TMR* and SAS* Software Lawrence H. Muhlbaier Jean A. Dozier Duke University Medical Center selection, and a specification file that describes the records to the SAS data base. A SAS program reads the specification file to generate the SAS code to read the TMR data records and create or update the SAS data base. Each of these components will be described taken from the TMR to SAS interface. ABSTRACT , " Patient care data bases and clinical research data bases often share a large portion of data. An automatic transfer of data eliminates the duplicate entry process and increases consistency between the data bases. This paper provides a description of the different components necessary for the transfer between patient care records of TMR and the relational data base records of SAS: structure, selection, and interpretation. A TMR program extracts record components for selected patients and places them in a file for SAS to access. TMR creates a specification file that describes the contents of the data file. A SAS program reads the TMR specification file and generates the SAS program to then read the TMR data and create or update the SAS data base. We describe the SAS programming aspects of the transfer. See Dozier (1985) for details on the TMR aspects. The system is in use at Duke University Medical Center for Cardiology and Cardiac Surgery data bases. The techniques used and problems encountered are appl icable to any transfer between a network-type database and The transfer system was designed to minimize the number of changes that would need to be made to the SAS code to maintain the system. All changes in the SAS data base specifications come from the TMR data dictionary. The TMR data dictionary already contained information that could be transformed into SAS names, labels, and value labels, as well as some of the SAS formatting information. RECORD STRUCTURE Patient care data bases are used to store the complete patient record including financial and administrative information. Records include demographic information, medical history, laboratory results, medications prescribed and dispensed, treatments, clinic or hospital visits, appointments, and financial information. The clinical data base must allow the entry of one or more of several thousand findings, each with a variety of possible responses. Data must be displayed by problem, encounter, or over time, but not usually across patients. The system must be flexible enough to satisfy physicians that their findings have been accurately described; TMR must be able to satisfy any clinical need. The TMR data base is a multi-user system, with records accessable for read and write from multiple locations at one time. SAS. I NTRODUCTI ON The increasingly detailed information from clinical data bases for patient care and reimbursement has made these data bases an important source of information for addressing medical research problems (Pryor, 1985). Rather than add the statistical capabilities to a medical data base, an a.utomatic transfe,~ system-from a medical data base to a statistical data base was designed. The Cardiology_ and Cardiac Surgery divisions at Duke Univer,sity Medical Center use TMR (Hammond, 1979), "The Medical Record"*, to capture data on patients undergoing invasive and non-invasive studies and therapies for coronary artery disease. The information in TMR is used for the management of patient care, including clinical test and procedure notes and follow-up flow sheets. SAS (SAS Institute, Inc., 1995) products are used to determine which clinical variables are prognostically important and to describe the differences found between groups of patients undergOing different treatments. Because of the'ir different purposes, the record format of a patient care data base is quite different from that of an analytical data base. TMR treates a single record for each pat,ient (Figure 1). The different types of information (lab results, medications) are stored in different sections of the record. FUrther, repeat entries of an item are stored together in rever'se time order (most recent first). Analytical data bases used for clinical research require their records to be rigidly defined" typically using a relational model (Figure 2). A separate record exists for each component of a medical record (demographic, history, each lab test, etc.). If a serum creatinine level, say, is taken on three different occasions, the results would be located in three different records in the SAS data file for creatinine labs ordered. Multiple labs that are performed together would go into one SAS record for each ordering. This paper describes the components necessary to link the TMR data structures (essentially a network or hierarchical file structure allowing many repeats of data items in one record) to the SAS data structures (relational tables). Creating such a link involves working with three separate TMR components: record structure, record 377 The single record per patient in TMR must be converted to multiple records per patient per type of finding in the SAS data base. TMR record to see if this assumption is valid. If it is not a valid assumption, then the log will need to be expanded. SELECTING FOR RECORD TRANSFER RECORD INTERPRETATION In order to develop a system that is flexible enough for different applications, TMR provides a user-defined data dictionary. This dictionary contains a list of data elements, characteristics of each element (for example, response format, value limits, element labels, and value labels), and relationships among data elements (such as the lab items that make up a lab panel). Dictionary entries are used to specify the record components for transfer to the analytical data base. There are six options for transfer, given the different structures of the data bases: Replace: 1) All records, all components. 2) Selected records, all components. 3) Selected records, selected components. Update: 4) All updated records, all components. 5) Selected updated records, selected _components. Updated components. 6) SPECIFICATION FILE The specification file is necessary to define for SAS the identity of the data elements that are to be transferred. A SAS program reads the speCification file and generates the SAS code necessary to read the data record components that TMR has created. The specification file contains the name of the SAS data base for each record component and the variable names, labels, response types, and lengths. The specification file also contains the value labels, where appropriate, for SAS's PRQe FORMAT. Options 3) and 5) were chosen for implementation for several reasons. Option 3) is a generalization of Options 1) and 2) and is needed to start up a SAS data base from an existing TMR data base. Option 5) is what is used for the main task of maintaining the TMR and SAS data bases in parallel. While option 6) would minimize the amount of· data to process and transfer, the increase in the amount of information stored to identify the updated components is substantial, putting on the order of 30-50 times the volume of entries into the TMR log file. Logging $elected records instead of selected components for transfer reduces the number of update identification entries created. The savings in time on the TMR system is significant, in that the initial processing of the Cardiology TMR data base for transfer to SAS takes several days in real time to conclude. The savings on the SAS system are smaller since the main SAS data file must still be passed by the SAS UPDATE program to create each new SAS data file. The choice to send only selected record components is based on the differing needs of clinical medicine and research. The variables of interest to examine for research are only a portion of those needed for patient care. In the cardiac surgery data base, for example, only 32% of the record components in the TMR data dictionary are selected for transfer to SAS. This may well be an upper bound on the amount of data transferred for a more general clinical practice. Figure 3 shows a block diagram of the levels of data base communication between TMR and SAS. SAS CONVERSION PROGRAMS Using techniques of automatic programming, a SAS program reads the TMR Specification File and generates SAS code to read the created data-records and update the SAS data, base. This program can access a SAS file of exceptions to handle variable name and label changes and to resolve duplicate name conflicts. Name conflicts arise because TMR names may be 12 characters long containing any characters, while SAS is limited to 8, and TMR names are actually short labels. Thus the same TMR name may legitimately be used in two different places. Having a file to change the names also means that we need to verify that there are no duplicate names that slip through to SASe After checking for all duplicates, the program halts if any are found. Certain assumptions were made in the implementation of the TMR update logging facility that may need to be revised in the future. In particular, TMR logs activities in LABS and SAPS, but does not log any activities in other areas of the patient's record such as demographic, problems, or medications. This is based on the assumption that changes in these non-logged ares have a very high probability of being associated with changes in other areas that are logged. During the next year we will closely monitor the activity in the logged versus non-logged areas of the The design difference between TMR and SAS that causes the most problems is that TMR actually labels data with code numbers based on the data dictionary; TMR "names" are just short labels. Thus changes can be made to the TMR data dictionary that are totally transparent to the TMR system but completely compromise the data in the SAS data base. For example, it is perfectly legitimate in TMR to exchange the names on two variables. An update in SAS would proceed smoothly, but the data that is in SAS data base is now garbage. To 378 prevent this the TMR transfer specification file contains the TMR item number for each entity. The SAS transfer program compares the current specification file to the previous update's specification file for upward compatibility. and Data File can be read by any of the popular analytical programs. The TMR section of the programs is in place and operational. The SAS programs are functional for replacement of a data base, but not for incremental update. That is expected by the time this paper is published. In processing the specification file, the SAS code generator creates INPUT statements and KEEP lists for each SAS dataset being created. Length statements are created based on the variable's data type, the formats are attached permanently to the associated variables, and the format library is updated and compressed. A possible 'addition-to the system would be the capability to specify a format to be used in the name change file. The SAS code generator also accesses a general code section that allows the SAS data base manager to insert SAS 'data base specific changes into the generated code each time the update is performed. For further information, please contact the authOrs at Box 3865 (Muhlbaier) or Box 2914 (Dozier) Duke University Medical Center Durham, North Carolina 27710 REFERENCES: Dozier JA, Hammond WE, and Stead WW (1985). Creating a link between medical and analytical databases. Proc~edings of the Ninth Annual Symposium on Computer Applications in Medical Care, MJ Ackerman, Editor. IEEE Computer Society '+ 78-482, 1985. While holding down the very real computer time charges', the choice to transfer records from TMR to SAS in a component-wise incremental update mode necessitates a rather more complicated SAS program. This is primarily due to two factors: the design differences that keep names in TMR from being unique identifiers and the need to change ID's and to delete records. Although medical record numbers are supposedly constant for a patient, data entry errors are made and medical record numbers may legitimately change. To change and delete ID's in update mode, the SAS code generator creates a batch PROC EDITOR stream to access the SAS data bases and update all IO"s before the remainder of the update is performed. PROC EDITOR is not a very good tool because we do not know the maximum number of entries for a particular ID that will occur in anyone SAS data file, thus we had to hard code a maximum number of occurrences that the Proc EDITOR command stream would check for. As a final step, the SAS data base"s contents are listed and check totals are printed to cross check with the TMR data base"s check totals. Hammond WE, Stead WW, Straube MJ, and Jelovsek FR (1979). A clinical data base management system. Proceedings of the First International Symposium on Policy Analysis & Information Systems, 454-461, 1979. Pryor D~, Califf RM, Harrell FE, Hlatky MA, Lee KL, Mark DB, and Rosati RA (1985). Clinical databases: accomplishments and unrealized potential. Journal of Medical Care, 23:623-647, 1985. SAS Institute, Inc. (1985). SAS User's Guide: Basics, Version 5 Edition. Cary, NC, SAS Institute, Inc., 1985. This approach provides sufficient flexibility to the total system to have a TMR data base completely define the SAS code or to allow the SAS data base manager to extensively modify the data base outside of the TMR system. SUMMARY We have identified the requirements placed on a clinical data base and on a research data base with different format and structure for the two systems to effectively communicate. Clinical and analytic data bases can maintain a symbionic relationship if there is mechanism for converting the information from the format of the clinical to that of the research data base. Though we have implemented this conversion in the SAS framework for analysis, the Specification File 379 * TMR and "The Medical Record" are registered trademarks of Database, Inc., P.O. Box 3054, Durham, NC 27705, USA. * SAS is a registered trademark of SAS Institute, Inc., Cary, NC, USA. I /-- PATIENT CARE RECORD I-- /-- Demographic I-- Laboratory I-- Medications I-- /-/-SASDSL.DEMOGRAP (Demographic Section) I-- "---" Subjective & I-- Physical r-- Problems - Accounting r-- Insurance - Encounters I-- ~ /-/-SASDSL.CABGI Lab) Figure 2. Research data base relational structure. Each patient in the rMR record generates multiple records in multiple SAS data sets in the SAS data base. Figure 1. Medical data base record structure for a patient in the TMR data base. Note that each patient has one record that contains all of his or her data. TMR DATA BASE (a r I TMR Pat ient r\ Dictionary I Records I -----Transfer Record Section Extraction. Record Formatting I Format Library Ir ~ L, Archived Specification File Figure 3. ~ SAS Data Base Query L Fi I e ( t 1 Speci fication Fi Ie lI I TMR TRANSFER SYSTEM I H U I Name Changes I SAS TRANSFER SYSTEM Block Diagram the levels of data base communication between TMR and SAS. 380