Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GLOBAL INTEGRATED DATABASE (GIDB): Not Just a SET Statement Marla A. Childers Quintiles, Inc., Kansas City, MO ABSTRACT Last minute changes to the definition of algorithms and reports are inevitable. Creating and using a Global Integrated Database (GIDB) for reporting can reduce programming time and the number of last minute changes across any given project that contains more than one protocol. This paper will reveal the advantages and disadvantages associated with the creation of a GIDB and provide insight to defining and building a GIDB. output and a lower number of discrepancies to resolve. When last minute cosmetic changes are made to a report, such as adding a data point, it occurs in a minimum numbers of places that serves for both study level and project level. Each team must decide in what way a GIDB can be most efficacious for the project. The reality of the availability of personnel and resources along with the time estimated to design and create a GIDB is also a factor. Past experiences dictate that to pay now is much more rewarding than paying later in the final hours. EXAMPLE OF MAPPING EXISTING VARIABLE INTRODUCTION Developing a GIDB is an integral part to reporting Integrated Safety Summaries (ISS), Integrated Efficacy Summaries (ISE), and answering FDA questions after a NDA (New Drug Application) submission. The amount of time and effort spent in creating a GIDB can be time and effort rewarded to the reporting aspect of a NDA submission. Strategies for the development and the content of the GIDB should be done at the onset of a project whenever possible. The later it is created during the project, the more likely there will be lack of resources to produce a full and robust GIDB. If a full and robust GIDB is created in the early stage of a project, the effort is rewarded in creating the individual Clinical Study Reports (CSRs), ISS, and ISE. For a drug submission, the idea of using one set of code whenever possible to create reports at both study and project level is the drive behind creating a GIDB. DESCRIPTION OF A GIDB A global integrated database can be as sparse or as robust as one would define. It can contain a partial portion of the database or a full scale database. The GIDB is a concerted effort between the programmer and the statistician. In order to create a GIDB, there should be more than one protocol involved or one protocol split out into two sets of reports based upon time (e.g. reports based upon one year of data collection and two years of data collection). This paper will explain different concepts of a GIDB and the use of mapping variables to one variable per data point to be used for reporting. Integration is to allow programs to access variables without being study specific. The person using the global integrated database does not need to know the study specific values and formats. The need for this knowledge is eliminated due to having one set of variables associated with one set of formats for reporting from the GIDB. The GIDB contains one set of formats for all studies included. This also should eliminate discrepancies when validating between study level reports and project level reports, if both are reported from the GIDB. The numbers should match because one set of code accessing one set of data is used to create both sets of reports. However, a project team can decide to create only a partial GIDB database for the use of the ISS and ISE reporting only, thus allowing room for potential discrepancies between these reports and the study level reports. Should there be a discrepancy, there is a time cost factor for resolving the issue. “Pay me now or pay me later” is a common saying when discussing the creation of a GIDB. Full GIDB’s do come with a higher up-front cost, but should reward the reporting aspect with speedier access to One example of what is meant by integrating is the data point of race. The following provides a simplistic example of mapping a data point that is not consistent in data values and formats across studies. STUDY 001 Proc Format; Value Race 1=’Caucasian’ 2=’Black’ 3=’Other’; Run; Proc Print Data=Study1.Demog; Format Race Race.; Run; Study 0001 0001 0001 Subject 1 2 3 Age 43 43 55 Sex F M F Race Black Other Caucasian STUDY 002 Proc Format; Value Race 1=’Caucasian’ 2=’Black’ 3=’Asian’ 4=’Other’; Run; Proc Print Data=Study2.Demog; Format Race Race.; Run; Study 0002 0002 0002 0002 Subject 1 2 3 4 Age 59 43 43 55 Sex M F M F Data All; Set Study1.Demog Study2.Demog; Run; Proc Print Data=All; Format Race Race.; Run; Race Asian Black Other Caucasian Study 0001 0001 0001 0002 0002 0002 0002 Subject 1 2 3 1 2 3 4 Age 43 43 55 59 43 43 55 Sex F M F M F M F Race Black Asian Caucasian Asian Black Other Caucasian Just to look at this final proc print, the error in the data is not apparent. With a closer look comparing the original data with the newly created data, one will find that subject 2 in study 001 has an incorrect value for race. The original format value is Other. Since a second format of race was created with the same name prior to the final set statement, the formatting defaulted to the last format prior to the datastep. This is just a simplified example of the types of mapping errors that can occur through the use of formats. To form a robust integrated data point would be to map the data point as follows: Proc Format; Value G_race 1=’Caucasian’ 2=’Black’ 3=’Asian’ 4=’Other’; Data All; Set Study1.Demog(in=a) Study2.Demog(in=b); If a and race=3 then race=4; Run; Proc Print Data=All; Format Race G_race.; Run; Subject 1 2 3 1 2 3 4 Age 43 43 55 59 43 43 55 EXAMPLE OF MAPPING A DERIVATION CASE REPORT FORM Study 001 Subject _ _ _ FINAL DOSING STATUS RECORD Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1) [COMPLETE] If No: Number of Completed Doses _ _ [DOSECOMP] Number of Missed Doses _ _ [DOSEMISS] CASE REPORT FORM Study 002 Subject _ _ _ Using a SET statement to combine the two datasets in this example will not provide an accurate data point. When the data point results in a number three, it would not be known whether it stood for Other or Asian. Thus, by mapping this data point and creating a new format for race or by using an existing format that encompasses all data points, the correct associations for race would occur for this new derivation. Study 0001 0001 0001 0002 0002 0002 0002 level and then one format library for each study. With this structure, the project level should contain all formats that are consistent across all of the project studies. The study level should contain formats specific to that study. Sex F M F M F M F Race Black Other Caucasian Asian Black Other Caucasian The reason to be robust with the mapping is to allow values of any given data point specific to a study to be accessible. Mapping Other and Asian to Other, one would lose the ability to report these two races at the study level report without having to return to the study specific database. The concept of building the GIDB would then be lost. Maintaining the unique values across studies allows the capability to be study specific or if desired, mapping Other and Asian together at the program level or specify a second format for race where Other and Asian would be combined. Caution should be used when creating new formats and using an existing format name. Confusion as to the appropriate format to use can exist. To avoid confusion, one could adopt the philosophy to have unique format names within a given project. Depending upon the project set up, there may be more than one library of formats that is used. For example, there may be a format library at the project FINAL DOSING STATUS RECORD Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1) [COMPLETE] If No: Number of Completed Doses _ _ . _ _ [DOSECOMP] Number of Missed Doses _ _ . _ _ [DOSEMISS] Both Case Report Form (CRF) pages look exactly alike except that study 002 allows for partial doses. The derivation to be calculated is compliance. For this example both studies dispensed tablets. Study 001 subjects received 1 tablet once a day for 1 week. Study 002 subjects received 2 tablets twice a day for 1 week. For Study 001, one tablet equals one dose. For Study 002, four tablets equals one dose and should one or more tablets on a given day not be taken for any reason, the study will capture the partial dose. In deriving the compliance, the assumption for both studies is that “Number of Completed Doses” + “Number of Missed Doses” equals seven for Study 001 and fourteen for Study 002. The second assumption is that for Study 002, partial doses can be added towards a whole dose. If this second assumption is not valid for the derivation of compliance, then the programmer will need to use the individual dosing records to re-summarize the number of completed doses and the number of missed doses. Like mapping variables such as race, attention to details on the CRF must be given in order for the derivations to reflect the intended meaning of the derivation. EXAMPLE OF ALIKE CASE REPORT FORM CASE REPORT FORM Study 001 Subject _ _ _ _ FINAL DOSING STATUS RECORD Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1) [COMPLETE] If No: Number of Completed Doses _ _ [DOSECOMP] Number of Missed Doses _ _ [DOSEMISS] CASE REPORT FORM Study 002 Subject _ _ _ _ FINAL DOSING STATUS RECORD Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1) [COMPLETE] If No: Number of Completed Doses _ _ [DOSECOMP] Number of Missed Doses _ _ [DOSEMISS] Both Case Report Form (CRF) pages look exactly alike. The information captured is the same. The assumption could be that a “SET” statement would satisfy the integration of this module for these two studies. The dose, dosing unit and regimen are not captured here and could be crucial to reflecting accuracy of total amount of dosing and percentage of doses missed or completed. If the dose, dosing unit and regimen are not exactly alike for both studies, then it would be beneficial to build a GIDB module that would contain the above record plus the dosing unit and regimen or the dosing unit and a total number of doses that were to be taken. Of course if one or both were a cross over study, then the coding would be a little more complex. For study 001 each subject received 40mg twice a week for 4 weeks. For study 002 each subject received 30mg three times a week for 4 weeks. With this information the following could be coded into a GIDB module: Data All; Set Study1.Findose(in=a) Study2.Findose(in=b); If a then do; Tdose=320; Tnumdose=8; End; Else do; Tdose=360; Tnumdose=12; End; Run; EXAMPLE OF ATTRIBUTE DIFFERENCES ACROSS STUDIES Using the above Final Dosing Status Record CRF’s, here is an example of what might be revealed through a proc contents. Study 001 Variable Type Len Format COMPLETE DOSECOMP DOSEMISS SUBJECT SUNO Char Num Num Num Num 1 8 8 4 3 Label $NOYES. Complete Dosing Medication Number of Completed Doses Number of Missed Doses Subject Number Study Number Study 002 Variable Type Len Format Label COMPLETE DOSECOMP DOSEMISS SUBJECT SUNO Num Num Num Num Num Complete Dosing Medication Completed Doses Number of Missed Doses Subject Number Study Number 1 8 8 4 3 $NY. There are three attribute differences that will need to be resolved through the mapping process. The variable COMPLETE has discrepancies for both the type and format. The variable DOSECOMP has a label discrepancy. Each of these will need to be addressed in programming the module. FULL GIDB What is a full GIDB? A full GIDB starts with the concept that all data points will be contained within the new database structure for the use of all reporting. A full GIDB requires immediate attention to formatting issues and content structure. The content structure can be modular with like data or left as the original database structure or a combination of both. Project teams will most likely choose a combination of both for ease of reporting from the GIDB. Establishing rules of who and how to add or modify the formats is important. Once the base set of formats is created, there can be no value changes made to the formats after programming has commenced, except to add to an existing format or add of a new format not previously defined. Thus, there should be one person established to be the keeper of the created GIDB formats. Using the previous example of the G_race fomat: Proc Format; Value G_race 1=’Caucasian’ 2=’Black’ 3=’Asian’ 4=’Other’; Once programming commences and another race value is introduced, one would then simply add “5=’New Race Value’”. One would not do: “4=’New Race Value’ 5=’Other’”. Printing in specific order can be established by other methods of programming. PARTIAL GIDB What is a partial GIDB? A partial GIDB starts with the concept that not all data points will be contained in the new database structure and will be used for selective reporting. Another term that could be used is “analysis data sets”. In this case, the selective reporting usually entails only the ISS and ISE and descriptive statistic reports. The project team will then need to provide specifications at an earlier stage in creating the GIDB. If a full GIDB is to be created, the programmer can start immediately writing code to integrate the data, knowing that there will be the merging of some modules. Merging modules with like data, such as demographic data or vitals data is of a like nature. DEFINING A GIDB QUESTIONS TO ASK To define the layout of a GIDB, the statistician is the most likely person to provide this design. The statistician is most likely the hub of information of the database design, collection of data points, derivations and aspects of how and what will be reported. Other direct or indirect participants in creating a GIDB would be: a database programmer(s) from both Phase I and Phase III who is familiar with the database setup and collection of data; SAS programmer(s) with knowledge of reporting needs; Clinician(s) with knowledge of how the data was reported on the Case Report Form (CRF). These four key participants enhance the making of intelligent decisions on grouping data, including derivations, and integration of Phase I data. Data can be grouped by data types or by modules. Derivations can be included with the appropriate data modules or in a separate module for derivations. Combinations of these methods can also be used. This is where logic for retrieving data from the database and not efficiency of the database should be priority. Unless there is a space efficiency requirement, the ease of retrieving and reporting should be the focal point. Prior to making decisions on what kind of GIDB to build, there are questions that should be asked and answered. Answering these questions will help in the decision making of what type of GIDB should be created for the project. When working with relational databases, there will be a need to merge and manipulate data for reporting. For those merges and manipulations that would need to be coded and executed again and again for tables and listings, creating a GIDB module of the result from the merge or manipulated data would be efficient. One common possibility would be Adverse Event and related information. There are those instances where a statistician with less SAS experience may need access to modules. This would also be an example of where the module could be tailored to meet the need of the statistician’s SAS ability. Not all statistician’s become involved with writing their own SAS code for analysis and verification purposes. Thus, defining the GIDB should be tailored to its users. These are some questions to resolve and keep in mind with the aspect of building a GIDB. Once the above questions and other questions arising from the above questions are answered, the process of deciding how to build the GIDB can commence. BUILDING A GIDB When deciding when and how to build a GIDB, approach the task with the focal point on retrieving and reporting the data. Be aware that the concept for building the GIDB will determine its efficiency for that particular project. GIDB’s can be a proactive tool for reporting. Last minute changes can be reduced down from many to one. The number of programs written and maintained for any one report can be reduced from the number of individual study levels plus ISS or ISE to one. For example, if there are ten studies and a column change is requested for an adverse event table, the number of changes would reduce from eleven program changes to one program change. The cost savings would be realized in both reduction of time to change and elimination of errors in the final result for all studies plus the ISE. A GIDB must have the commitment of project team members as early as possible. The timing of the commitment will directly affect the amount of resources needed. The statistical analysis plan must be in good order for the derivations needed for reporting. Each project team has to assess their resources and time constraints in order to make a decision as to what a GIDB will do for them. GIDB’s are not always the answer, but in many cases they are an efficient use of cost and time at the end of a project where timelines are fast approaching and major or minor changes are being encountered. Most of all, remember that “GIDB’s are not just a set statement”. The complexity of coding a GIDB will be determined in part by the database structure. Are the databases standardized across studies for a given project or not? Databases with less standardization will most likely require more coding to be done in order to create the GIDB. Is there any standardization of databases across studies? What modules are expected to have differences and what types of differences are anticipated? What modules are known to have differences? Are these differences a mapping issue or are they data points collected in one study and not the other? What derivations are expected? How do visit numbers relate across studies if being used as markers for screening, baseline, on drug, and post drug? For a given module, were directions to fill out the form the same? What is the availability of staffing resources? Is there time to complete specifications and creation of the GIDB prior to the first study database closure? Does the project team agree with and understand the potential benefits of creating a GIDB for all reporting aspects? CONCLUSION Each project team is challenged to decide the profit margin of creating a GIDB. The decision is based upon many aspects from that project and its design. A GIDB encompasses both advantages and disadvantages as explored in this paper. The rewards can be substantial. The amount of standardization in both the database and the reporting aspect affect the different aspects of creating a GIDB. If only we could just “SET” the data together for a GIDB. Since this is not possible, design the GIDB to reward the NDA submission team. Let the GIDB work for the project. A GIDB reduces programming time and amount of rework for the inevitable last minute changes to the definition of algorithms and reports. TRADEMARK INFORMATION SAS® is a registered trademark of the SAS Institute Inc., Cary, NC, USA. ABOUT THE AUTHOR The author welcomes your comments & suggestions Marla A. Childers Quintiles, Inc. Post Office Box 9708 Kansas City, MO 64134-0708 (816) 767-6464 [email protected] ACKNOWLEDGEMENTS The author would like to thank Elizabeth Dennis of Quintiles Inc., Kansas City, MO for her invaluable assistance in the preparation of this paper.