Download Leveraging Base SAS for Constructing a Low-Cost, Data Clarification Request Generation and Tracking System Independent of a Clinical Data Management System

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Paper PO13
Leveraging Base SAS for Constructing a Low-Cost, Data Clarification
Request Generation and Tracking System Independent of a Clinical Data
Management System
David M. Escobar
Barbara Mathewson
George Strang
Cystic Fibrosis Therapeutics Development Network Coordinating Center
Seattle Children’s Hospital Research Institute, Seattle, WA
ABSTRACT
Data clarification requests (DCR) are used to send questions to clinical study participants regarding illogical or incorrect data
and are used to ensure the quality of data. We created a low-cost in-house DCR generation and tracking system that utilizes
SAS® programs and Microsoft Access® to detect, produce and track DCRs. Such a system could give sponsors with limited
study budgets a tool to clean discrepant data. It is easy to use and allows the clinical site staff to efficiently complete and
return DCRs to the Data Management Unit (DMU).
SAS data steps identify errant or ambiguous data based on pre-determined logic checks, and an aggregate dataset of errors is
created via concatenation; unique error identification numbers correlate to specific errant data points. PROC SQL is used in
synergy with the data steps to compare aggregate error datasets with a set operator. Running the edit checks regularly allows
the detection of new errors and the identification of data points with outstanding DCRs.
Creating and tracking DCRs helps to ensure quality data is being recorded for future study analysis. The DMU produces and
tracks DCRs for multiple research sites across the country using this system. Multiple users have successfully utilized the
system created by the DMU, increasing study management efficiency and intra-study team communication.
INTRODUCTION
DCRs are important clinical research data management tools and are particularly useful for multi-site or multi-center studies.
Identifying and correcting illogical or erroneous data requires significant time and effort, necessitating an organized and
efficient system to support this process. DCRs are typically generated from pre-determined edit checks within an electronic
data management system or as the result of data managers and coordinators reviewing the data. Tracking DCRs as they are
transmitted from data management to research sites is important, to account for each DCR.
Generally, clinical data management systems (CDMS) will contain the necessary capabilities to effect edit checks and
subsequent DCR generation and tracking. However, an in-house DCR generation and tracking system developed with
commercial-off-the-shelf systems can be appealing to sponsors with limited study budgets; often the existing information
technology infrastructure of an organization can be used to develop such a system. This negates the necessity of outsourcing the process to a third-party.
CASE STUDY
The DCR generation and tracking system described here is developed for a not-for-profit coordinating center overseeing
multiple nationwide, multi-center clinical studies. As a service-oriented unit, the DMU was asked to assume responsibility of
the DCR process for two longitudinal studies, with the added tasks of streamlining and improving upon past practices. The
DMU uses existing software platforms and information technology infrastructure to create the current DCR system. The two
chief components of this system are Base SAS® and Microsoft Access 2000®.
1
DCR PROCESS: EDIT CHECK DATASET
The DMU runs SAS programs to execute pre-determined logical edit checks against the study database to find illogical,
erroneous or missing data. SAS data steps are used for individual edit checks, with each pre-determined logic check assigned
a numeric value. The end of the edit check program concatenates the established datasets into one large dataset; certain
variables which are needed within the tracking database are also added to this concatenated dataset (please see Figure 1 for
an illustration). A small data dictionary with the definitions of the tracking database variables is included in Appendix A.
Example SAS code for creating the edit checks and final concatenated dataset:
Proc Format;
Value Prob
1 = “Adverse event end date is missing”
2 = “Weight value does not correspond to weight units”
3 = “Birth date is missing”;
Run;
Data AEDT_Prob;
Set Study_data;
Prob = 1;
If aestdt = . then output;
Run;
Data WT_prob;
Set study_data;
Prob = 2;
If wtu = 1 and xx > wt > xx then output;
Run;
Data Brth_prob;
Set study_data;
Prob = 3;
If brthdt = . output;
Run;
Data Prob;
Set AEDT_Prob WT_Prob Brth_prob;
Resolution = ' ' ; /*these variables are place holders for the tracking database*/
DCRSent = ' ';
DateSent = .;
DateDCRReturn = .;
Confirmed = ' ' ;
ConfirmINT = ' ';
DBUpdate = ' ';
PortCFUpdate = ' ';
PrdDt = .;
INt = ' ';
Keep pt site prob resolution dcrsent datesent datedcrreturn confirmed confirmint
dbupdate portcfupdate prddt int;
Run;
Figure 1.
Subject ID
xxx-xxx
xxx-xxx
xxx-xxx
Study Center
xxxxxx
xxxxxx
xxxxxx
Problem Number
1
2
3
DCR PROCESS: ESTABLISH ONLY NEW DCRS
Assuming the study database is not updated with the resolutions of the DCRs on a regular basis, the same edit check triggers
would occur on a recurring basis until the database is updated with the resolutions. Hence, if the edit checks are intended to
be run with regular frequency, for example on a weekly basis, a means for identifying only “new” edit check triggers is
required. This is where PROC SQL can play a powerful role in synergy with data steps. Two datasets, containing the weekly
2
triggers from two back-to-back edit check executions can be compared using PROC SQL, using the set operator “except all.”
In this procedure, a new dataset is created containing only “new” edit check triggers not previously identified (please see
Figure 2 for a visualization of this concept). This new dataset is then imported into MSAccess. As each round of edit checks
occur (e.g., on a daily or weekly basis), each new dataset is appended to the data table within the database. An example of
the PROC SQL code used is:
PROC SQL;
Create table work.SQL_TEMP as
Select site, pt, prob
From NEW as A /*this is the latest edit check dataset, created as in Figure 1*/
Except all
Select site, pt, prob
From OLD as B; /*this is the older edit check dataset*/
Create table SQL_MMDDYYYY as
Select *
From work.SQL_TEMP as a, new as b
Where a.pt = b.pt AND a.site=b.site AND a.prob=b.prob;
/*this last step rejoins the subject identifiers with the other
data from NEW*/
Quit;
Figure 2.
Current Week’s Edit
Check Data Set
ABC
Proc SQL Set Operator
only selects what is
unique in the new (top)
dataset
“New” Edit
Check
Trigger(s)
A
BC
Import/append
into relational
database of
choice
Previous Week’s Edit
Check Dataset
3
DCR PROCESS: THE RELATIONAL DATABASE
The relational database component of our DCR system is based on a Microsoft SQL Server® / Access platform; the data
reside on a secure sever with SQL Server (back-end) as the operating system, and an ODBC pass-through to a Microsoft
Access (front-end) is used. Our internal customers utilize user-friendly switchboard menus to edit data and produce reports
(e.g., update the status of a particular DCR, list outstanding DCRs that have not returned from the research sites, etc.).
Please see Figure 3 for an illustration.
Figure 3.
SQL Server
Database
(Back-end)
Microsoft Access
Individual User
Interfaces (Front-end)
The database can also produce DCR forms to be sent to the research sites via the study clinical research associates (CRA).
In our case study, the research coordinators at each site find the DCR forms easy to use and complete. Once the DCR form is
returned to the coordinating center, the CRA inputs the DCR resolution text into the tracking database, along with other
pertinent information (e.g., date DCR was returned, initials of research coordinator stipulating the resolution, etc.). In addition
to accepting electronic edit check datasets created by SAS programs, the database will (by design) accept manual DCRs
when errant data is discovered that is not covered by the electronic edit checks. This design feature is particularly useful for
the study biostatisticians and statistical programmers, who may find errant or ambiguous data during analytical exercises.
DCR PROCESS: DATABASE RESOLUTIONS
One of the last steps in the DCR generation and tracking cycle is to verify appropriate changes have been made to the study
database. Our DCR system has the additional capability of electronically verifying changes have occurred in the study
database as a result of DCR resolutions. A SAS program accesses the database table holding the DCR tracking data, and
compares it against a comprehensive edit check dataset (i.e., a dataset which is created with an edit check program, but which
has not been run through the PROC SQL program to establish only “new” DCRs). Logically, if an edit check no longer fires,
then that particular errant data point has been corrected. For example, if the data points in Figure 4 caused an edit check to
fire one week, but several weeks after a DCR resolution was created and the database updated, the data points would appear
as in Figure 5 and the edit check would no longer fire. The SAS program comparing the datasets recognizes this and
populates a variable in the dataset which indicates the study database change has been verified.
PT
xxx-xxx
Figure 4
Site
Birth Date
xxxxxx
NULL
PT
xxx-xxx
4
Figure 5
Site
Birth Date
xxxxxx 01/27/2001
Example SAS code:
Data EditChk;
Set Prob;
Run; /*this should be the latest weekly SAS Edit Check Dataset*/
Data Observations;
Set DCR_Database_Table
Run; /*this is the data table from the tracking database*/
Proc SQL;
Create Table EditChkComp as
Select ccdesc, cffidno, problem
From work.editchk; /*latest edit check*/
Create Table ObsComp as
Select ccdesc, cffidno, problem
From work.observations;/*tracking database table*/
Create Table Compare as
Select *
From obscomp as a
except
Select *
From editchkcomp as b;
Quit;
Data Resolved;
Set work.Compare;
Format problem prob.;
Run;
DCR PROCESS: CHALLENGES AND ASSUMPTIONS
The biggest assumption the current system makes is that the errant data points can only occur once during the course of the
study. Our original design is intended to run edit checks and generate and track DCRs primarily for study enrollment data. If
such a system were to be derived for use with multiple visit data (e.g. serum chemistry, hematology, pulmonary function
testing, etc.), the parameterization of variables would need to be more specific. In the current system, the parameterization
consist of “patient – research site – problem;” a system used for multiple visit data would need to include additional variables.
In the example of serum chemistry, variable parameterization might look like this: “patient – research site – visit – lab test –
problem.”
CONCLUSION
Creating and tracking DCRs helps to ensure quality data is being recorded for future study analysis. The DMU produces and
tracks DCRs for multiple research sites across the country using this system. End-users of this system, including CRAs,
statistical programmers, biostatisticians and project managers are enthusiastic about the new system, and the DMU provides
intensive and prompt assistance when there are questions or system malfunctions.
Since its inception, we have modified the system to also retain the ability to accept DCRs and edit checks by other means than
SAS edit checks. We occasionally utilize Contract Research Organizations (CRO) to store a study database and run
electronic edit checks; third-party edit checks can be uploaded into our system (after negotiating the structure of the edit check
data) in parallel with our own manual DCRs and electronic edit checks.
The system has been in service for over eight months now, and multiple users have successfully used the system, increasing
study management efficiency and intra-study team communication. DMU staff members are currently exploring ways of using
this system as a prototype for other clinical studies our organization may be coordinating, in an attempt to provide a
standardized method of DCR generation and tracking across all studies.
5
Acknowledgements:
•
This work was supported by a grant from Cystic Fibrosis Foundation Therapeutics, Inc. and from National
Center for Research Resources (NIH) Grant # M01-RR-000037.
Contact Information:
David M. Escobar, MPH
Cystic Fibrosis Therapeutics Development
Network Coordinating Center
Seattle Children’s Hospital Research Institute
1100 Olive Way, Suite 500
Seattle, WA 98101
T: (206) 987-7515
F: (206) 987-7504
E: [email protected]
W: www.seattlechildrens.org
SAS and all other SAS Institute Inc. products or services names are registered trademarks or trademarks of SAS Institute Inc.
in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
6
APPENDIX A
DCR TRACKING DATABASE DATA DICTIONARY
Variable Name
Closed
CMLaccnum
Comments
CRApproval
DateDCRRtr
Dbupdate
Dbupdatedt
Dbupdatesnt
Dbverif
Dbverifdt
DCRsnt
Edtchknum
Flwup
Inits
PRDDT
PRDORG
Prob
PT001
QID
Resolution
Siteno
Srcform
Srcnum
Variable
Varvalue
Vst
Vstdt
Description
Name of individual closing the query (DMU Only)
Core Micro Lab Account Number
Free text comments
CRA Approval Initials
Date DCR Received Back from Site
Date Database Updated
Date Database Update Request Sent
Sent to whom for database update
Database Updated Verified By Whom
Date Database Update was Verified
Date Query Sent to Site
Edit Check Category
Followup if query response not approved
Subject Initials
Date Problem Identified
Organization that identified the problem
Problem Description
PT No.
Query ID Number (Auto-generated by the
Database)
Site Response
Clinical Site No.
CRF Module (e.g., AE, ConMed)
CRF Page Number
The field name the query corresponds to
Field name value change
Visit Description
Visit Date
7