* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Leveraging Base SAS for Constructing a Low-Cost, Data Clarification Request Generation and Tracking System Independent of a Clinical Data Management System
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
Paper PO13 Leveraging Base SAS for Constructing a Low-Cost, Data Clarification Request Generation and Tracking System Independent of a Clinical Data Management System David M. Escobar Barbara Mathewson George Strang Cystic Fibrosis Therapeutics Development Network Coordinating Center Seattle Children’s Hospital Research Institute, Seattle, WA ABSTRACT Data clarification requests (DCR) are used to send questions to clinical study participants regarding illogical or incorrect data and are used to ensure the quality of data. We created a low-cost in-house DCR generation and tracking system that utilizes SAS® programs and Microsoft Access® to detect, produce and track DCRs. Such a system could give sponsors with limited study budgets a tool to clean discrepant data. It is easy to use and allows the clinical site staff to efficiently complete and return DCRs to the Data Management Unit (DMU). SAS data steps identify errant or ambiguous data based on pre-determined logic checks, and an aggregate dataset of errors is created via concatenation; unique error identification numbers correlate to specific errant data points. PROC SQL is used in synergy with the data steps to compare aggregate error datasets with a set operator. Running the edit checks regularly allows the detection of new errors and the identification of data points with outstanding DCRs. Creating and tracking DCRs helps to ensure quality data is being recorded for future study analysis. The DMU produces and tracks DCRs for multiple research sites across the country using this system. Multiple users have successfully utilized the system created by the DMU, increasing study management efficiency and intra-study team communication. INTRODUCTION DCRs are important clinical research data management tools and are particularly useful for multi-site or multi-center studies. Identifying and correcting illogical or erroneous data requires significant time and effort, necessitating an organized and efficient system to support this process. DCRs are typically generated from pre-determined edit checks within an electronic data management system or as the result of data managers and coordinators reviewing the data. Tracking DCRs as they are transmitted from data management to research sites is important, to account for each DCR. Generally, clinical data management systems (CDMS) will contain the necessary capabilities to effect edit checks and subsequent DCR generation and tracking. However, an in-house DCR generation and tracking system developed with commercial-off-the-shelf systems can be appealing to sponsors with limited study budgets; often the existing information technology infrastructure of an organization can be used to develop such a system. This negates the necessity of outsourcing the process to a third-party. CASE STUDY The DCR generation and tracking system described here is developed for a not-for-profit coordinating center overseeing multiple nationwide, multi-center clinical studies. As a service-oriented unit, the DMU was asked to assume responsibility of the DCR process for two longitudinal studies, with the added tasks of streamlining and improving upon past practices. The DMU uses existing software platforms and information technology infrastructure to create the current DCR system. The two chief components of this system are Base SAS® and Microsoft Access 2000®. 1 DCR PROCESS: EDIT CHECK DATASET The DMU runs SAS programs to execute pre-determined logical edit checks against the study database to find illogical, erroneous or missing data. SAS data steps are used for individual edit checks, with each pre-determined logic check assigned a numeric value. The end of the edit check program concatenates the established datasets into one large dataset; certain variables which are needed within the tracking database are also added to this concatenated dataset (please see Figure 1 for an illustration). A small data dictionary with the definitions of the tracking database variables is included in Appendix A. Example SAS code for creating the edit checks and final concatenated dataset: Proc Format; Value Prob 1 = “Adverse event end date is missing” 2 = “Weight value does not correspond to weight units” 3 = “Birth date is missing”; Run; Data AEDT_Prob; Set Study_data; Prob = 1; If aestdt = . then output; Run; Data WT_prob; Set study_data; Prob = 2; If wtu = 1 and xx > wt > xx then output; Run; Data Brth_prob; Set study_data; Prob = 3; If brthdt = . output; Run; Data Prob; Set AEDT_Prob WT_Prob Brth_prob; Resolution = ' ' ; /*these variables are place holders for the tracking database*/ DCRSent = ' '; DateSent = .; DateDCRReturn = .; Confirmed = ' ' ; ConfirmINT = ' '; DBUpdate = ' '; PortCFUpdate = ' '; PrdDt = .; INt = ' '; Keep pt site prob resolution dcrsent datesent datedcrreturn confirmed confirmint dbupdate portcfupdate prddt int; Run; Figure 1. Subject ID xxx-xxx xxx-xxx xxx-xxx Study Center xxxxxx xxxxxx xxxxxx Problem Number 1 2 3 DCR PROCESS: ESTABLISH ONLY NEW DCRS Assuming the study database is not updated with the resolutions of the DCRs on a regular basis, the same edit check triggers would occur on a recurring basis until the database is updated with the resolutions. Hence, if the edit checks are intended to be run with regular frequency, for example on a weekly basis, a means for identifying only “new” edit check triggers is required. This is where PROC SQL can play a powerful role in synergy with data steps. Two datasets, containing the weekly 2 triggers from two back-to-back edit check executions can be compared using PROC SQL, using the set operator “except all.” In this procedure, a new dataset is created containing only “new” edit check triggers not previously identified (please see Figure 2 for a visualization of this concept). This new dataset is then imported into MSAccess. As each round of edit checks occur (e.g., on a daily or weekly basis), each new dataset is appended to the data table within the database. An example of the PROC SQL code used is: PROC SQL; Create table work.SQL_TEMP as Select site, pt, prob From NEW as A /*this is the latest edit check dataset, created as in Figure 1*/ Except all Select site, pt, prob From OLD as B; /*this is the older edit check dataset*/ Create table SQL_MMDDYYYY as Select * From work.SQL_TEMP as a, new as b Where a.pt = b.pt AND a.site=b.site AND a.prob=b.prob; /*this last step rejoins the subject identifiers with the other data from NEW*/ Quit; Figure 2. Current Week’s Edit Check Data Set ABC Proc SQL Set Operator only selects what is unique in the new (top) dataset “New” Edit Check Trigger(s) A BC Import/append into relational database of choice Previous Week’s Edit Check Dataset 3 DCR PROCESS: THE RELATIONAL DATABASE The relational database component of our DCR system is based on a Microsoft SQL Server® / Access platform; the data reside on a secure sever with SQL Server (back-end) as the operating system, and an ODBC pass-through to a Microsoft Access (front-end) is used. Our internal customers utilize user-friendly switchboard menus to edit data and produce reports (e.g., update the status of a particular DCR, list outstanding DCRs that have not returned from the research sites, etc.). Please see Figure 3 for an illustration. Figure 3. SQL Server Database (Back-end) Microsoft Access Individual User Interfaces (Front-end) The database can also produce DCR forms to be sent to the research sites via the study clinical research associates (CRA). In our case study, the research coordinators at each site find the DCR forms easy to use and complete. Once the DCR form is returned to the coordinating center, the CRA inputs the DCR resolution text into the tracking database, along with other pertinent information (e.g., date DCR was returned, initials of research coordinator stipulating the resolution, etc.). In addition to accepting electronic edit check datasets created by SAS programs, the database will (by design) accept manual DCRs when errant data is discovered that is not covered by the electronic edit checks. This design feature is particularly useful for the study biostatisticians and statistical programmers, who may find errant or ambiguous data during analytical exercises. DCR PROCESS: DATABASE RESOLUTIONS One of the last steps in the DCR generation and tracking cycle is to verify appropriate changes have been made to the study database. Our DCR system has the additional capability of electronically verifying changes have occurred in the study database as a result of DCR resolutions. A SAS program accesses the database table holding the DCR tracking data, and compares it against a comprehensive edit check dataset (i.e., a dataset which is created with an edit check program, but which has not been run through the PROC SQL program to establish only “new” DCRs). Logically, if an edit check no longer fires, then that particular errant data point has been corrected. For example, if the data points in Figure 4 caused an edit check to fire one week, but several weeks after a DCR resolution was created and the database updated, the data points would appear as in Figure 5 and the edit check would no longer fire. The SAS program comparing the datasets recognizes this and populates a variable in the dataset which indicates the study database change has been verified. PT xxx-xxx Figure 4 Site Birth Date xxxxxx NULL PT xxx-xxx 4 Figure 5 Site Birth Date xxxxxx 01/27/2001 Example SAS code: Data EditChk; Set Prob; Run; /*this should be the latest weekly SAS Edit Check Dataset*/ Data Observations; Set DCR_Database_Table Run; /*this is the data table from the tracking database*/ Proc SQL; Create Table EditChkComp as Select ccdesc, cffidno, problem From work.editchk; /*latest edit check*/ Create Table ObsComp as Select ccdesc, cffidno, problem From work.observations;/*tracking database table*/ Create Table Compare as Select * From obscomp as a except Select * From editchkcomp as b; Quit; Data Resolved; Set work.Compare; Format problem prob.; Run; DCR PROCESS: CHALLENGES AND ASSUMPTIONS The biggest assumption the current system makes is that the errant data points can only occur once during the course of the study. Our original design is intended to run edit checks and generate and track DCRs primarily for study enrollment data. If such a system were to be derived for use with multiple visit data (e.g. serum chemistry, hematology, pulmonary function testing, etc.), the parameterization of variables would need to be more specific. In the current system, the parameterization consist of “patient – research site – problem;” a system used for multiple visit data would need to include additional variables. In the example of serum chemistry, variable parameterization might look like this: “patient – research site – visit – lab test – problem.” CONCLUSION Creating and tracking DCRs helps to ensure quality data is being recorded for future study analysis. The DMU produces and tracks DCRs for multiple research sites across the country using this system. End-users of this system, including CRAs, statistical programmers, biostatisticians and project managers are enthusiastic about the new system, and the DMU provides intensive and prompt assistance when there are questions or system malfunctions. Since its inception, we have modified the system to also retain the ability to accept DCRs and edit checks by other means than SAS edit checks. We occasionally utilize Contract Research Organizations (CRO) to store a study database and run electronic edit checks; third-party edit checks can be uploaded into our system (after negotiating the structure of the edit check data) in parallel with our own manual DCRs and electronic edit checks. The system has been in service for over eight months now, and multiple users have successfully used the system, increasing study management efficiency and intra-study team communication. DMU staff members are currently exploring ways of using this system as a prototype for other clinical studies our organization may be coordinating, in an attempt to provide a standardized method of DCR generation and tracking across all studies. 5 Acknowledgements: • This work was supported by a grant from Cystic Fibrosis Foundation Therapeutics, Inc. and from National Center for Research Resources (NIH) Grant # M01-RR-000037. Contact Information: David M. Escobar, MPH Cystic Fibrosis Therapeutics Development Network Coordinating Center Seattle Children’s Hospital Research Institute 1100 Olive Way, Suite 500 Seattle, WA 98101 T: (206) 987-7515 F: (206) 987-7504 E: [email protected] W: www.seattlechildrens.org SAS and all other SAS Institute Inc. products or services names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 6 APPENDIX A DCR TRACKING DATABASE DATA DICTIONARY Variable Name Closed CMLaccnum Comments CRApproval DateDCRRtr Dbupdate Dbupdatedt Dbupdatesnt Dbverif Dbverifdt DCRsnt Edtchknum Flwup Inits PRDDT PRDORG Prob PT001 QID Resolution Siteno Srcform Srcnum Variable Varvalue Vst Vstdt Description Name of individual closing the query (DMU Only) Core Micro Lab Account Number Free text comments CRA Approval Initials Date DCR Received Back from Site Date Database Updated Date Database Update Request Sent Sent to whom for database update Database Updated Verified By Whom Date Database Update was Verified Date Query Sent to Site Edit Check Category Followup if query response not approved Subject Initials Date Problem Identified Organization that identified the problem Problem Description PT No. Query ID Number (Auto-generated by the Database) Site Response Clinical Site No. CRF Module (e.g., AE, ConMed) CRF Page Number The field name the query corresponds to Field name value change Visit Description Visit Date 7