Download Data Warehousing and CANDA Concepts for Biometry

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Database model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Data Warehousing and CANDA Concepts for Biometry
Mr Colin Brown, Dr Hans Ulrich Burger
F. Hoffmann-La Roche, Basle
1.
Introduction
According to Greek mythology, Cassandra -- daughter of King Priam and Queen Hecuba of Troy
-- was given the gift of prophecy by Apollo the sun god. Later, in revenge for her refusal to sleep
with him, Apollo put a curse on her which made it impossible for her to convince anyone of the
truth of her prophecies. Thus, she predicted the fall of Troy and warned of the wooden horse.
She foresaw her own death and that of Agamemnon whose slave she had became after the fall of
Troy. But all was in vain. Neither she nor Troy were able to benefit from her extraordinary gift.
Today, the pharmaceutical industry finds itself in a highly competitive and rapidly changing
environment. The stakes involved in drug development and registration are increasing high. The
ability to predict the truth about new drugs -- just as Cassandra was able to prophesy the future -would massively reduce the risks of drug development. Also, like Cassandra, real benefit could
only be derived from the predictions if the decision makers and authorities could be convinced.
Obviously, any pharmaceutical company basing its development strategy on the wills of a Greek
god would have serious difficulties finding the investors to back-up the risks. Nevertheless, the
need to accelerate development and review times remains fundamental to success. Unlike
Cassandra, we have a reliable basis for our predictions in the data which have been collected -provided that these can be processed and presented in a manner which can be validated and be
convincing. For some years, CANDAs (Computer Assisted New Drug Applications) -- often
based on the latest technology -- have been used to try to speed up regulatory authority review
times and they have now become a requirement for the FDA. However, pharmaceutical
companies are still not deriving the full benefit during drug development from their CANDAs.
This is because CANDAs are too often seen as an end product of the development process rather
than as a tool of the development process itself. It is increasingly clear that the new CANDA
technologies will be of practical benefit not only for registration but for the whole drug
development process. They should become implemented and used as an integral part of the drug
development process as a whole.
Major problems in the in-house implementation of CANDAs are data flow and data handling
processes and the setting-up of the necessary standards. This paper introduces the concept of
"Data Warehousing" in pharmaceutical development and addresses some key biometric issues
which have been highlighted by the impact of CANDA development. The primary goal of the
Data Warehouse is to enable the right decisions to be taken at the right time. This is achieved by
providing the drug development teams with the same high quality tools for reviewing the data as
are available to the regulatory authorities.
I
11
1
1142
2.
Data Warehousing
SAS Institute defines Data Warehousing as: "a strategy and method for physically and logically
separating operational data from decision support data" (Randy Betancourt, SAS Program
Manager). The traditional approach of having one fully normalized database no longer meets
today's requirements for quick, flexible data analysis and visualization. Data Warehousing is an
alternative concept on which to build a distributed qata environment better adapted to today's
more exacting requirements.
Data Warehousing can be split into three logical sub-divisions:
1. Data Warehouse Management: Accessing data from operational and legacy systems
2. Data Warehouse Organization: Re-organising data (i.e. moving or viewing the data into
a form better suited to extracting interpretable information)
3. Data Warehouse Exploitation: Exploiting the data with data visualization/analysis tools
3.
Current Pharmaceutical Industry Requirements
Within Clinical Research, Biometry is the primary department dealing with clinical trials data,
from planning to data entry through to reporting. Within a rapidly changing environment,
Biometry has to deal with many different tasks and requirements with respect to the underlying
data structure. These include :.
• data entry and cleaning as well as data storage
• handling data from different sources (various in-house databases and contractor
databases with differing structures)
• reporting
• CANDAs
• etc.
There is a considerable experience in the pharmaceutical industry with different companies using
different approaches to building databases. However, the environment is undergoing considerable
changes due to CANDAs, validation requirements and other issues. The days when reporting
programs were used exclusively by biometricians now appear to be past. Moreover, in addition to
producing tables for reports, Biometry departments are increasingly becoming responsible for the
data and the reporting programs themselves. This new responsibility can impact on the underlying
data structure and data processing requirements.
3.1
Data Handling
There are specific requirements for the underlying data structure to facilitate data entry and data
cleaning as well as data storage. These include:
• To facilitate data checking, the underlying data structure must support the Case Report
Form (CRF) format.
• The structure must have back-tracking facilities to enable changes made to the data
during the study to be logged.
• The structure must provide the clinical development team with tools to manage the data
flow on the data entry level.
1143
In addition to these requirements, the situation is complicated by the fact that data come from
different sites and from different contractors in different formats. These formats cannot be
standardized. To avoid wasting time and resources in repeated data entry, these additional data
sources must be integrated in the Biometry data process. If these different sources are not
properly taken into account in the database, they can easily lead to major problems in the data
capture and processing.
3.2
Reporting
Data listings and summary data must be made available to the project team very rapidly after tile
closure to enable project decisions to be taken as early as possible. This requires that planning of
the programming has to start at the beginning of the study. It is also necessary for output to be
validated which requires that additional time and resources must be included in the project plans.
This can be achieved through standardization of reports and the use of generic reporting tools.
However, this requires standardized data structures which are customized for reporting.
3.3
CANDAs
CANDAs are increasing in complexity, incorporating even more sophisticated data review
capabilities, based on clinical, pharmacokinetic or toxicology data, as well as image handling,
document storage and navigation capabilities. In this paper, without loss of generality, we focus
on clinical data.
CANDAs impose further requirements on the Biometry data processing. Typical CANDA
functionality such as data browsing or drilldown may not be easily implemented on data structures
which are suitable for data entry and reporting. Such problems have a major impact on Biometry
data structures requiring a new concept for the whole data process. The special requirements of
CANDAs are considered below.
4.
Special Requirements for CANDAs
4.1
Business Requirements
Business requirements for CANDAs include:
• The FDA require that from 1995 onward all New Drug Applications include a CANDA
in addition to the paper submission.
• CANDAs are expected to accelerate the regulatory authority review process. This could
result in tremendous cost savings as well as faster time to market.
• CANDAs which provide users with high quality, validated data review and analysis tools
improve the quality and utility of both the clinical team and regulatory authority reviews.
• The use of sophisticated CANDA tools during the drug development process improve
the quality of the NDA. (This is also a basic request of the FDA: CANDAs found to be
useful in-house are more likely to be useful in the review process.).
1144
This last requirement impacts most directly on Biometry. The most important CANDA facilities
for in-house use are those for clinical data review and analysis. For these tools to be used and
integrated into the drug development process, CANDAs have to be implemented in-house with
the Biometry department working closely with the drug development team to:
• ensure that listings and analyses are consistent between final study reports and CANDAs
• resources are used efficiently.
4.2
Special Requirements
The implementation of in-house CANDAs impacts on the Data Warehouse concept within
Biometry. Additional requirements include:
• Data listing and reporting programs are portable across operating systems.
• Data are structured enabling fast reporting within the CANDA.
• Programs which generate output for the final study report are reused in CANDAs to
ensure consistency.
• A consistent standardized underlying data structure is used to enable the use of generic
tools.
• Generic reporting and browsing tools are available.
5.
Biometry Data Warehousing Concept
Data handling in Clinical Biometry typically begins with data entry, data cleaning and data
storage. This is done in a central clinical data management database meeting the needs to control
data changes and data quality. Although this database is the main source of data, it is also
frequently necessary for data to be incorporated from other systems. These may include external
data management systems or older in-house clinical data management systems still in use. These
systems may store data in SAS datasets, ORACLE® tables or in a variety of other sources. The
tools used for building the Data Warehouse must, therefore, be capable of accessing all these
different sources.
The second layer of data handling is a generic data model. This is a standard, non-normalized
data model capable of handling data from all drug projects using the same data structure. The
clinical data contained within the generic data model are in a raw, unprocessed form. This stable
standard model acts as the basis for the gateway to the next parts of the Data Warehouse. It also
acts as the data input to an application framework.
The final stage of data handling in Clinical Biometry is the data reporting for final study reports.
Nowadays, this also includes providing the data review tools for the CANDA for submission to
regulatory authorities. These tools are based on a pre-processed data structure in order to
achieve the best possible performance for data visualisation and reporting. The diagram displays
the Data Warehouse concept fulfilling the needs of Clinical Biometry.
-1·'
.'
1145
!
j
~
::S,,'~~<O'.", ...-.,...-.--.----
-
Data Warehousing Concept.
€
Management
data
~ I Other Sources I
~
Organisation
C
A
S
A
N
Exploitation
t
t
t
d0Ei)
o
r
A
Having the generic data model as a standard structure enables standard reporting, analysis and
data query tools to be built which are independent of the original clinical data management data
structure. This allows the data structure to be designed in a way which is best suited to the tasks
of data entry and data cleaning, without adversely affecting the subsequent data analysis
processes. It also means that data from non-standard sources can easily be brought into the Data
Warehouse by transforming the data into the generic data model. From standard clinical
databases, meta data is used to automatically build logical views which form the generic data
model. This eases the resource overhead of transforming to the generic data model.
The pre-processed data structure is required by fast data visualization and analysis applications.
Additionally, CANDA systems must be able to browse data easily and to have drill down
functionality on summary reports. However, before clinical data can be analysed, they must be
prepared appropriately, assigning time windows, preferred terms, normal ranges, etc., to observed
values. The FDA have made it quite clear that they wish to change the assumptions when making
such preparations. However, a CANDA system that tries to take raw data, pre-process it,
summarize it and then allow drilldown on the summarized results back to the raw data would be
prohibitively complex. It would push even the most powerful computer to its capacity limits and
system maintenance would be difficult.
Pre-processed data, therefore, are data that have been processed in preparation for the demands
of fast summary and analysis. The structure of these pre-processed data could be made generic
for a large part of the data structure (as required for standard safety reporting). However, project
specific requirements such as efficacy analysis would not necessarily be met by a generic preprocessed efficacy data structure and need project specific specifications. To implement this, a
meta base is needed to describe this project specific environment.
1146
On this basis, standard data analysis tools can be used. These would include tools for data
queries, data visualization and for standard reporting. They could be used across all projects and
the environment would enable further development of the tools. For proj~~t specific requirements
such as efficacy analysis, project specific tools complete the system. The usefulness and necessity
of these tools differ for different groups of users. They need to be customized to different user
capabilities and requirements. Highly specialized tools such as those for inferential statistics
should be available only to specific groups of userS.
At the beginning of a drug development a whole range of generic analysis t061s will already be
available and project specific tools can be developed ~d added to the system as required.
Together, the generic and project specific tools enable the "generation of all output for final study
reports.. Following production of the last final study report, the whole system -- including generic
data model, pre-processed data and all analysis tools -- are ready for submission to the regulatory
authorities as the data analysis part of a CANDA. In Roche we differentiate between a computer
system built mainly for use by regulatory authorities (CANDAs) and this system, which is an
integral part of the whole drug development process. We call our system "CASANDrA"
(Computer Assisted Summary and Analysis for a New Drug Application). This system is used
for both drug development and regulatory authority review. It provides both the sponsors and the
regulatory authorities with the same data and the same analysis tools for drawing conclusions.
6.
What is the Role of the SAS system?
The reasons why the pharmaceutical industry has used the SAS system in the past have been its
very powerful data manipulation capabilities -- needed for analyses for final study reports -- and
its data access functionality. Now in addition, in this Data Warehousing environment, the SAS
system meets two other important criteria. Firstly, having a single tool for the whole process
eases the development of the environment. Secondly, the portability of the SAS system enables
the whole CASANDrA system to be delivered to the authorities on their own preferred operating
system.
7.
Costs and Benefits
The proposed Data Warehouse concept has pros and cons. Advantages include:
• Productivity: Standardization enables the creation of one set of generic tools for both
drug development and CANDAs.
• Quality:
~ Using a broader set of generic review and analysis tools distributed to the whole drug
development team increases the quality of drug development as well as the quality of
review by regulatory authorities.
~ Better quality CANDA system (i.e. less bugs) for submission to regulatory authorities
because it has already been extensively used in-house.
~ The CANDAs reflect the drug development process. Conclusions drawn during drug
development and conolusions drawn during the review by regulatory authorities are
based on the same data and review tools and hence on the same information.
1147
• Efficiency: User empowerment: putting information into the hands of the whole drug
development team from the beginning of the project onwards.
• Validation: Reduction of the amount of validation needed because the CANDA system
is a reuse of production tools already validated rather than an additional computer
system. This also avoids inconsistencies between CANDAs and final study reports.
Beside these benefits, however, there are also costs and potential pitfalls which need to be taken
into account:
• Such a Data Warehouse concept needs a larger amount of sophisticated programming
which in turn requires more resources.
• The concept requires a greater standardization of the drug development process from the
creation of Case Report Forms onwards. Full management support is vital to persuade
project personnel to seek and implement standard solutions before considering project
specific approaches.
• There might be additional costs in giving inferential power -- the capacity to easily
perform inferential analyses -- to non-statisticians.
8.
Conclusions
Despite their astonishing potential, data review tools for CANDAs are currently often kept
completely outside the drug development process. However, it is becoming increasingly clear that
these tools should also be used in-house as early as possible. The focus of CANDAs is currently
changing from the delivery to regulatory authorities to in-house implementation. In order to be
consistent with the whole data analysis process within drug development, these tools have to be
implemented within the Biometry department since this is the main department responsible for
clinical data. Without this condition, the risk of failure would seem to be high. This, however,
impacts on the traditional way of working and changes some of the key requirements for reporting
and data handling.
The Data Warehouse concept addresses these changes and provides a suitable environment for the
implementation of the data analysis functionality of CANDAs within drug development. In this
way, Data Warehousing is a concept which can be of major importance to pharmaceutical
companies. Our experience so far is that such a concept is realistic and can be achieved with the
SAS system. It provides major advantages not only for CANDAs for regulatory authorities, but
also to the pharmaceutical companies from the beginning of a drug development onwards.
We hope that then Cassandra may be heard.
Acknowledgments
We would like to thank our colleagues Mr Michael Clarke, Mr Thorkild Nielsen, Dr Olga
Rutman, Mr Daniel Gronell and Ms Cheryl Altieri as well as our manager Dr Samuel Givens for
many fruitful discussions which have provided significant input into this paper.
t 148