Download GLOBAL INTEGRATED DATABASE (GIDB): Not Just a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
GLOBAL INTEGRATED DATABASE (GIDB): Not Just a SET Statement
Marla A. Childers
Quintiles, Inc., Kansas City, MO
ABSTRACT
Last minute changes to the definition of algorithms and reports are
inevitable. Creating and using a Global Integrated Database (GIDB)
for reporting can reduce programming time and the number of last
minute changes across any given project that contains more than
one protocol. This paper will reveal the advantages and
disadvantages associated with the creation of a GIDB and provide
insight to defining and building a GIDB.
output and a lower number of discrepancies to resolve. When last
minute cosmetic changes are made to a report, such as adding a
data point, it occurs in a minimum numbers of places that serves for
both study level and project level. Each team must decide in what
way a GIDB can be most efficacious for the project. The reality of
the availability of personnel and resources along with the time
estimated to design and create a GIDB is also a factor. Past
experiences dictate that to pay now is much more rewarding than
paying later in the final hours.
EXAMPLE OF MAPPING EXISTING VARIABLE
INTRODUCTION
Developing a GIDB is an integral part to reporting Integrated Safety
Summaries (ISS), Integrated Efficacy Summaries (ISE), and
answering FDA questions after a NDA (New Drug Application)
submission. The amount of time and effort spent in creating a GIDB
can be time and effort rewarded to the reporting aspect of a NDA
submission. Strategies for the development and the content of the
GIDB should be done at the onset of a project whenever possible.
The later it is created during the project, the more likely there will be
lack of resources to produce a full and robust GIDB. If a full and
robust GIDB is created in the early stage of a project, the effort is
rewarded in creating the individual Clinical Study Reports (CSRs),
ISS, and ISE. For a drug submission, the idea of using one set of
code whenever possible to create reports at both study and project
level is the drive behind creating a GIDB.
DESCRIPTION OF A GIDB
A global integrated database can be as sparse or as robust as one
would define. It can contain a partial portion of the database or a full
scale database. The GIDB is a concerted effort between the
programmer and the statistician. In order to create a GIDB, there
should be more than one protocol involved or one protocol split out
into two sets of reports based upon time (e.g. reports based upon
one year of data collection and two years of data collection). This
paper will explain different concepts of a GIDB and the use of
mapping variables to one variable per data point to be used for
reporting.
Integration is to allow programs to access variables without being
study specific. The person using the global integrated database
does not need to know the study specific values and formats. The
need for this knowledge is eliminated due to having one set of
variables associated with one set of formats for reporting from the
GIDB. The GIDB contains one set of formats for all studies
included. This also should eliminate discrepancies when validating
between study level reports and project level reports, if both are
reported from the GIDB. The numbers should match because one
set of code accessing one set of data is used to create both sets of
reports. However, a project team can decide to create only a partial
GIDB database for the use of the ISS and ISE reporting only, thus
allowing room for potential discrepancies between these reports and
the study level reports. Should there be a discrepancy, there is a
time cost factor for resolving the issue.
“Pay me now or pay me later” is a common saying when discussing
the creation of a GIDB. Full GIDB’s do come with a higher up-front
cost, but should reward the reporting aspect with speedier access to
One example of what is meant by integrating is the data point of
race. The following provides a simplistic example of mapping a data
point that is not consistent in data values and formats across
studies.
STUDY 001
Proc Format;
Value Race 1=’Caucasian’
2=’Black’
3=’Other’;
Run;
Proc Print Data=Study1.Demog;
Format Race Race.;
Run;
Study
0001
0001
0001
Subject
1
2
3
Age
43
43
55
Sex
F
M
F
Race
Black
Other
Caucasian
STUDY 002
Proc Format;
Value Race 1=’Caucasian’
2=’Black’
3=’Asian’
4=’Other’;
Run;
Proc Print Data=Study2.Demog;
Format Race Race.;
Run;
Study
0002
0002
0002
0002
Subject
1
2
3
4
Age
59
43
43
55
Sex
M
F
M
F
Data All;
Set Study1.Demog Study2.Demog;
Run;
Proc Print Data=All;
Format Race Race.;
Run;
Race
Asian
Black
Other
Caucasian
Study
0001
0001
0001
0002
0002
0002
0002
Subject
1
2
3
1
2
3
4
Age
43
43
55
59
43
43
55
Sex
F
M
F
M
F
M
F
Race
Black
Asian
Caucasian
Asian
Black
Other
Caucasian
Just to look at this final proc print, the error in the data is not
apparent. With a closer look comparing the original data with the
newly created data, one will find that subject 2 in study 001 has an
incorrect value for race. The original format value is Other. Since a
second format of race was created with the same name prior to the
final set statement, the formatting defaulted to the last format prior to
the datastep. This is just a simplified example of the types of
mapping errors that can occur through the use of formats.
To form a robust integrated data point would be to map the data
point as follows:
Proc Format;
Value G_race 1=’Caucasian’
2=’Black’
3=’Asian’
4=’Other’;
Data All;
Set Study1.Demog(in=a) Study2.Demog(in=b);
If a and race=3 then race=4;
Run;
Proc Print Data=All;
Format Race G_race.;
Run;
Subject
1
2
3
1
2
3
4
Age
43
43
55
59
43
43
55
EXAMPLE OF MAPPING A DERIVATION
CASE REPORT FORM
Study 001 Subject _ _ _
FINAL DOSING STATUS RECORD
Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1)
[COMPLETE]
If No:
Number of Completed Doses _ _
[DOSECOMP]
Number of Missed Doses _ _
[DOSEMISS]
CASE REPORT FORM
Study 002 Subject _ _ _
Using a SET statement to combine the two datasets in this example
will not provide an accurate data point. When the data point results
in a number three, it would not be known whether it stood for Other
or Asian. Thus, by mapping this data point and creating a new
format for race or by using an existing format that encompasses all
data points, the correct associations for race would occur for this
new derivation.
Study
0001
0001
0001
0002
0002
0002
0002
level and then one format library for each study. With this structure,
the project level should contain all formats that are consistent across
all of the project studies. The study level should contain formats
specific to that study.
Sex
F
M
F
M
F
M
F
Race
Black
Other
Caucasian
Asian
Black
Other
Caucasian
The reason to be robust with the mapping is to allow values of any
given data point specific to a study to be accessible. Mapping Other
and Asian to Other, one would lose the ability to report these two
races at the study level report without having to return to the study
specific database. The concept of building the GIDB would then be
lost. Maintaining the unique values across studies allows the
capability to be study specific or if desired, mapping Other and
Asian together at the program level or specify a second format for
race where Other and Asian would be combined.
Caution should be used when creating new formats and using an
existing format name. Confusion as to the appropriate format to use
can exist. To avoid confusion, one could adopt the philosophy to
have unique format names within a given project. Depending upon
the project set up, there may be more than one library of formats that
is used. For example, there may be a format library at the project
FINAL DOSING STATUS RECORD
Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1)
[COMPLETE]
If No:
Number of Completed Doses _ _ . _ _
[DOSECOMP]
Number of Missed Doses _ _ . _ _
[DOSEMISS]
Both Case Report Form (CRF) pages look exactly alike except that study
002 allows for partial doses. The derivation to be calculated is compliance.
For this example both studies dispensed tablets. Study 001 subjects
received 1 tablet once a day for 1 week. Study 002 subjects received 2
tablets twice a day for 1 week. For Study 001, one tablet equals one dose.
For Study 002, four tablets equals one dose and should one or more tablets
on a given day not be taken for any reason, the study will capture the partial
dose. In deriving the compliance, the assumption for both studies is that
“Number of Completed Doses” + “Number of Missed Doses” equals seven
for Study 001 and fourteen for Study 002. The second assumption is that
for Study 002, partial doses can be added towards a whole dose. If this
second assumption is not valid for the derivation of compliance, then the
programmer will need to use the individual dosing records to re-summarize
the number of completed doses and the number of missed doses. Like
mapping variables such as race, attention to details on the CRF must be
given in order for the derivations to reflect the intended meaning of the
derivation.
EXAMPLE OF ALIKE CASE REPORT FORM
CASE REPORT FORM
Study 001 Subject _ _ _ _
FINAL DOSING STATUS RECORD
Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1)
[COMPLETE]
If No:
Number of Completed Doses _ _
[DOSECOMP]
Number of Missed Doses _ _
[DOSEMISS]
CASE REPORT FORM
Study 002 Subject _ _ _ _
FINAL DOSING STATUS RECORD
Did Subject Complete Dosing of Medication? _ No (0) _ Yes (1)
[COMPLETE]
If No:
Number of Completed Doses _ _
[DOSECOMP]
Number of Missed Doses _ _
[DOSEMISS]
Both Case Report Form (CRF) pages look exactly alike. The
information captured is the same. The assumption could be that a
“SET” statement would satisfy the integration of this module for
these two studies. The dose, dosing unit and regimen are not
captured here and could be crucial to reflecting accuracy of total
amount of dosing and percentage of doses missed or completed. If
the dose, dosing unit and regimen are not exactly alike for both
studies, then it would be beneficial to build a GIDB module that
would contain the above record plus the dosing unit and regimen or
the dosing unit and a total number of doses that were to be taken.
Of course if one or both were a cross over study, then the coding
would be a little more complex.
For study 001 each subject received 40mg twice a week for 4
weeks. For study 002 each subject received 30mg three times a
week for 4 weeks. With this information the following could be
coded into a GIDB module:
Data All;
Set Study1.Findose(in=a) Study2.Findose(in=b);
If a then do;
Tdose=320;
Tnumdose=8;
End;
Else do;
Tdose=360;
Tnumdose=12;
End;
Run;
EXAMPLE OF ATTRIBUTE DIFFERENCES ACROSS STUDIES
Using the above Final Dosing Status Record CRF’s, here is an
example of what might be revealed through a proc contents.
Study 001
Variable
Type Len Format
COMPLETE
DOSECOMP
DOSEMISS
SUBJECT
SUNO
Char
Num
Num
Num
Num
1
8
8
4
3
Label
$NOYES. Complete Dosing Medication
Number of Completed Doses
Number of Missed Doses
Subject Number
Study Number
Study 002
Variable
Type Len Format
Label
COMPLETE
DOSECOMP
DOSEMISS
SUBJECT
SUNO
Num
Num
Num
Num
Num
Complete Dosing Medication
Completed Doses
Number of Missed Doses
Subject Number
Study Number
1
8
8
4
3
$NY.
There are three attribute differences that will need to be resolved
through the mapping process. The variable COMPLETE has
discrepancies for both the type and format. The variable
DOSECOMP has a label discrepancy. Each of these will need to be
addressed in programming the module.
FULL GIDB
What is a full GIDB? A full GIDB starts with the concept that all
data points will be contained within the new database structure for
the use of all reporting. A full GIDB requires immediate attention to
formatting issues and content structure. The content structure can
be modular with like data or left as the original database structure or
a combination of both. Project teams will most likely choose a
combination of both for ease of reporting from the GIDB.
Establishing rules of who and how to add or modify the formats is
important. Once the base set of formats is created, there can be no
value changes made to the formats after programming has
commenced, except to add to an existing format or add of a new
format not previously defined. Thus, there should be one person
established to be the keeper of the created GIDB formats.
Using the previous example of the G_race fomat:
Proc Format;
Value G_race 1=’Caucasian’
2=’Black’
3=’Asian’
4=’Other’;
Once programming commences and another race value is
introduced, one would then simply add “5=’New Race Value’”. One
would not do: “4=’New Race Value’ 5=’Other’”. Printing in specific
order can be established by other methods of programming.
PARTIAL GIDB
What is a partial GIDB? A partial GIDB starts with the concept that
not all data points will be contained in the new database structure
and will be used for selective reporting. Another term that could be
used is “analysis data sets”. In this case, the selective reporting
usually entails only the ISS and ISE and descriptive statistic reports.
The project team will then need to provide specifications at an earlier
stage in creating the GIDB. If a full GIDB is to be created, the
programmer can start immediately writing code to integrate the data,
knowing that there will be the merging of some modules. Merging
modules with like data, such as demographic data or vitals data is of
a like nature.
DEFINING A GIDB
QUESTIONS TO ASK
To define the layout of a GIDB, the statistician is the most likely
person to provide this design. The statistician is most likely the hub
of information of the database design, collection of data points,
derivations and aspects of how and what will be reported. Other
direct or indirect participants in creating a GIDB would be: a
database programmer(s) from both Phase I and Phase III who is
familiar with the database setup and collection of data; SAS
programmer(s) with knowledge of reporting needs; Clinician(s) with
knowledge of how the data was reported on the Case Report Form
(CRF). These four key participants enhance the making of
intelligent decisions on grouping data, including derivations, and
integration of Phase I data. Data can be grouped by data types or by
modules. Derivations can be included with the appropriate data
modules or in a separate module for derivations. Combinations of
these methods can also be used. This is where logic for retrieving
data from the database and not efficiency of the database should be
priority. Unless there is a space efficiency requirement, the ease of
retrieving and reporting should be the focal point.
Prior to making decisions on what kind of GIDB to build, there are
questions that should be asked and answered. Answering these
questions will help in the decision making of what type of GIDB
should be created for the project.
When working with relational databases, there will be a need to
merge and manipulate data for reporting. For those merges and
manipulations that would need to be coded and executed again and
again for tables and listings, creating a GIDB module of the result
from the merge or manipulated data would be efficient. One
common possibility would be Adverse Event and related information.
There are those instances where a statistician with less SAS
experience may need access to modules. This would also be an
example of where the module could be tailored to meet the need of
the statistician’s SAS ability. Not all statistician’s become involved
with writing their own SAS code for analysis and verification
purposes. Thus, defining the GIDB should be tailored to its users.
These are some questions to resolve and keep in mind with the
aspect of building a GIDB. Once the above questions and other
questions arising from the above questions are answered, the
process of deciding how to build the GIDB can commence.
BUILDING A GIDB
When deciding when and how to build a GIDB, approach the task
with the focal point on retrieving and reporting the data. Be aware
that the concept for building the GIDB will determine its efficiency for
that particular project. GIDB’s can be a proactive tool for reporting.
Last minute changes can be reduced down from many to one. The
number of programs written and maintained for any one report can
be reduced from the number of individual study levels plus ISS or
ISE to one. For example, if there are ten studies and a column
change is requested for an adverse event table, the number of
changes would reduce from eleven program changes to one
program change. The cost savings would be realized in both
reduction of time to change and elimination of errors in the final
result for all studies plus the ISE.
A GIDB must have the commitment of project team members as
early as possible. The timing of the commitment will directly affect
the amount of resources needed. The statistical analysis plan must
be in good order for the derivations needed for reporting. Each
project team has to assess their resources and time constraints in
order to make a decision as to what a GIDB will do for them. GIDB’s
are not always the answer, but in many cases they are an efficient
use of cost and time at the end of a project where timelines are fast
approaching and major or minor changes are being encountered.
Most of all, remember that “GIDB’s are not just a set statement”.
The complexity of coding a GIDB will be determined in part by the
database structure. Are the databases standardized across studies
for a given project or not? Databases with less standardization will
most likely require more coding to be done in order to create the
GIDB.
Is there any standardization of databases across studies? What
modules are expected to have differences and what types of
differences are anticipated? What modules are known to have
differences? Are these differences a mapping issue or are they data
points collected in one study and not the other? What derivations
are expected? How do visit numbers relate across studies if being
used as markers for screening, baseline, on drug, and post drug?
For a given module, were directions to fill out the form the same?
What is the availability of staffing resources? Is there time to
complete specifications and creation of the GIDB prior to the first
study database closure? Does the project team agree with and
understand the potential benefits of creating a GIDB for all reporting
aspects?
CONCLUSION
Each project team is challenged to decide the profit margin of
creating a GIDB. The decision is based upon many aspects from
that project and its design. A GIDB encompasses both advantages
and disadvantages as explored in this paper. The rewards can be
substantial. The amount of standardization in both the database and
the reporting aspect affect the different aspects of creating a GIDB.
If only we could just “SET” the data together for a GIDB. Since this
is not possible, design the GIDB to reward the NDA submission
team. Let the GIDB work for the project. A GIDB reduces
programming time and amount of rework for the inevitable last
minute changes to the definition of algorithms and reports.
TRADEMARK INFORMATION
SAS® is a registered trademark of the SAS Institute Inc., Cary, NC,
USA.
ABOUT THE AUTHOR
The author welcomes your comments & suggestions
Marla A. Childers
Quintiles, Inc.
Post Office Box 9708
Kansas City, MO 64134-0708
(816) 767-6464
[email protected]
ACKNOWLEDGEMENTS
The author would like to thank Elizabeth Dennis of Quintiles Inc.,
Kansas City, MO for her invaluable assistance in the preparation of
this paper.