Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehousing and CANDA Concepts for Biometry Mr Colin Brown, Dr Hans Ulrich Burger F. Hoffmann-La Roche, Basle 1. Introduction According to Greek mythology, Cassandra -- daughter of King Priam and Queen Hecuba of Troy -- was given the gift of prophecy by Apollo the sun god. Later, in revenge for her refusal to sleep with him, Apollo put a curse on her which made it impossible for her to convince anyone of the truth of her prophecies. Thus, she predicted the fall of Troy and warned of the wooden horse. She foresaw her own death and that of Agamemnon whose slave she had became after the fall of Troy. But all was in vain. Neither she nor Troy were able to benefit from her extraordinary gift. Today, the pharmaceutical industry finds itself in a highly competitive and rapidly changing environment. The stakes involved in drug development and registration are increasing high. The ability to predict the truth about new drugs -- just as Cassandra was able to prophesy the future -would massively reduce the risks of drug development. Also, like Cassandra, real benefit could only be derived from the predictions if the decision makers and authorities could be convinced. Obviously, any pharmaceutical company basing its development strategy on the wills of a Greek god would have serious difficulties finding the investors to back-up the risks. Nevertheless, the need to accelerate development and review times remains fundamental to success. Unlike Cassandra, we have a reliable basis for our predictions in the data which have been collected -provided that these can be processed and presented in a manner which can be validated and be convincing. For some years, CANDAs (Computer Assisted New Drug Applications) -- often based on the latest technology -- have been used to try to speed up regulatory authority review times and they have now become a requirement for the FDA. However, pharmaceutical companies are still not deriving the full benefit during drug development from their CANDAs. This is because CANDAs are too often seen as an end product of the development process rather than as a tool of the development process itself. It is increasingly clear that the new CANDA technologies will be of practical benefit not only for registration but for the whole drug development process. They should become implemented and used as an integral part of the drug development process as a whole. Major problems in the in-house implementation of CANDAs are data flow and data handling processes and the setting-up of the necessary standards. This paper introduces the concept of "Data Warehousing" in pharmaceutical development and addresses some key biometric issues which have been highlighted by the impact of CANDA development. The primary goal of the Data Warehouse is to enable the right decisions to be taken at the right time. This is achieved by providing the drug development teams with the same high quality tools for reviewing the data as are available to the regulatory authorities. I 11 1 1142 2. Data Warehousing SAS Institute defines Data Warehousing as: "a strategy and method for physically and logically separating operational data from decision support data" (Randy Betancourt, SAS Program Manager). The traditional approach of having one fully normalized database no longer meets today's requirements for quick, flexible data analysis and visualization. Data Warehousing is an alternative concept on which to build a distributed qata environment better adapted to today's more exacting requirements. Data Warehousing can be split into three logical sub-divisions: 1. Data Warehouse Management: Accessing data from operational and legacy systems 2. Data Warehouse Organization: Re-organising data (i.e. moving or viewing the data into a form better suited to extracting interpretable information) 3. Data Warehouse Exploitation: Exploiting the data with data visualization/analysis tools 3. Current Pharmaceutical Industry Requirements Within Clinical Research, Biometry is the primary department dealing with clinical trials data, from planning to data entry through to reporting. Within a rapidly changing environment, Biometry has to deal with many different tasks and requirements with respect to the underlying data structure. These include :. • data entry and cleaning as well as data storage • handling data from different sources (various in-house databases and contractor databases with differing structures) • reporting • CANDAs • etc. There is a considerable experience in the pharmaceutical industry with different companies using different approaches to building databases. However, the environment is undergoing considerable changes due to CANDAs, validation requirements and other issues. The days when reporting programs were used exclusively by biometricians now appear to be past. Moreover, in addition to producing tables for reports, Biometry departments are increasingly becoming responsible for the data and the reporting programs themselves. This new responsibility can impact on the underlying data structure and data processing requirements. 3.1 Data Handling There are specific requirements for the underlying data structure to facilitate data entry and data cleaning as well as data storage. These include: • To facilitate data checking, the underlying data structure must support the Case Report Form (CRF) format. • The structure must have back-tracking facilities to enable changes made to the data during the study to be logged. • The structure must provide the clinical development team with tools to manage the data flow on the data entry level. 1143 In addition to these requirements, the situation is complicated by the fact that data come from different sites and from different contractors in different formats. These formats cannot be standardized. To avoid wasting time and resources in repeated data entry, these additional data sources must be integrated in the Biometry data process. If these different sources are not properly taken into account in the database, they can easily lead to major problems in the data capture and processing. 3.2 Reporting Data listings and summary data must be made available to the project team very rapidly after tile closure to enable project decisions to be taken as early as possible. This requires that planning of the programming has to start at the beginning of the study. It is also necessary for output to be validated which requires that additional time and resources must be included in the project plans. This can be achieved through standardization of reports and the use of generic reporting tools. However, this requires standardized data structures which are customized for reporting. 3.3 CANDAs CANDAs are increasing in complexity, incorporating even more sophisticated data review capabilities, based on clinical, pharmacokinetic or toxicology data, as well as image handling, document storage and navigation capabilities. In this paper, without loss of generality, we focus on clinical data. CANDAs impose further requirements on the Biometry data processing. Typical CANDA functionality such as data browsing or drilldown may not be easily implemented on data structures which are suitable for data entry and reporting. Such problems have a major impact on Biometry data structures requiring a new concept for the whole data process. The special requirements of CANDAs are considered below. 4. Special Requirements for CANDAs 4.1 Business Requirements Business requirements for CANDAs include: • The FDA require that from 1995 onward all New Drug Applications include a CANDA in addition to the paper submission. • CANDAs are expected to accelerate the regulatory authority review process. This could result in tremendous cost savings as well as faster time to market. • CANDAs which provide users with high quality, validated data review and analysis tools improve the quality and utility of both the clinical team and regulatory authority reviews. • The use of sophisticated CANDA tools during the drug development process improve the quality of the NDA. (This is also a basic request of the FDA: CANDAs found to be useful in-house are more likely to be useful in the review process.). 1144 This last requirement impacts most directly on Biometry. The most important CANDA facilities for in-house use are those for clinical data review and analysis. For these tools to be used and integrated into the drug development process, CANDAs have to be implemented in-house with the Biometry department working closely with the drug development team to: • ensure that listings and analyses are consistent between final study reports and CANDAs • resources are used efficiently. 4.2 Special Requirements The implementation of in-house CANDAs impacts on the Data Warehouse concept within Biometry. Additional requirements include: • Data listing and reporting programs are portable across operating systems. • Data are structured enabling fast reporting within the CANDA. • Programs which generate output for the final study report are reused in CANDAs to ensure consistency. • A consistent standardized underlying data structure is used to enable the use of generic tools. • Generic reporting and browsing tools are available. 5. Biometry Data Warehousing Concept Data handling in Clinical Biometry typically begins with data entry, data cleaning and data storage. This is done in a central clinical data management database meeting the needs to control data changes and data quality. Although this database is the main source of data, it is also frequently necessary for data to be incorporated from other systems. These may include external data management systems or older in-house clinical data management systems still in use. These systems may store data in SAS datasets, ORACLE® tables or in a variety of other sources. The tools used for building the Data Warehouse must, therefore, be capable of accessing all these different sources. The second layer of data handling is a generic data model. This is a standard, non-normalized data model capable of handling data from all drug projects using the same data structure. The clinical data contained within the generic data model are in a raw, unprocessed form. This stable standard model acts as the basis for the gateway to the next parts of the Data Warehouse. It also acts as the data input to an application framework. The final stage of data handling in Clinical Biometry is the data reporting for final study reports. Nowadays, this also includes providing the data review tools for the CANDA for submission to regulatory authorities. These tools are based on a pre-processed data structure in order to achieve the best possible performance for data visualisation and reporting. The diagram displays the Data Warehouse concept fulfilling the needs of Clinical Biometry. -1·' .' 1145 ! j ~ ::S,,'~~<O'.", ...-.,...-.--.---- - Data Warehousing Concept. € Management data ~ I Other Sources I ~ Organisation C A S A N Exploitation t t t d0Ei) o r A Having the generic data model as a standard structure enables standard reporting, analysis and data query tools to be built which are independent of the original clinical data management data structure. This allows the data structure to be designed in a way which is best suited to the tasks of data entry and data cleaning, without adversely affecting the subsequent data analysis processes. It also means that data from non-standard sources can easily be brought into the Data Warehouse by transforming the data into the generic data model. From standard clinical databases, meta data is used to automatically build logical views which form the generic data model. This eases the resource overhead of transforming to the generic data model. The pre-processed data structure is required by fast data visualization and analysis applications. Additionally, CANDA systems must be able to browse data easily and to have drill down functionality on summary reports. However, before clinical data can be analysed, they must be prepared appropriately, assigning time windows, preferred terms, normal ranges, etc., to observed values. The FDA have made it quite clear that they wish to change the assumptions when making such preparations. However, a CANDA system that tries to take raw data, pre-process it, summarize it and then allow drilldown on the summarized results back to the raw data would be prohibitively complex. It would push even the most powerful computer to its capacity limits and system maintenance would be difficult. Pre-processed data, therefore, are data that have been processed in preparation for the demands of fast summary and analysis. The structure of these pre-processed data could be made generic for a large part of the data structure (as required for standard safety reporting). However, project specific requirements such as efficacy analysis would not necessarily be met by a generic preprocessed efficacy data structure and need project specific specifications. To implement this, a meta base is needed to describe this project specific environment. 1146 On this basis, standard data analysis tools can be used. These would include tools for data queries, data visualization and for standard reporting. They could be used across all projects and the environment would enable further development of the tools. For proj~~t specific requirements such as efficacy analysis, project specific tools complete the system. The usefulness and necessity of these tools differ for different groups of users. They need to be customized to different user capabilities and requirements. Highly specialized tools such as those for inferential statistics should be available only to specific groups of userS. At the beginning of a drug development a whole range of generic analysis t061s will already be available and project specific tools can be developed ~d added to the system as required. Together, the generic and project specific tools enable the "generation of all output for final study reports.. Following production of the last final study report, the whole system -- including generic data model, pre-processed data and all analysis tools -- are ready for submission to the regulatory authorities as the data analysis part of a CANDA. In Roche we differentiate between a computer system built mainly for use by regulatory authorities (CANDAs) and this system, which is an integral part of the whole drug development process. We call our system "CASANDrA" (Computer Assisted Summary and Analysis for a New Drug Application). This system is used for both drug development and regulatory authority review. It provides both the sponsors and the regulatory authorities with the same data and the same analysis tools for drawing conclusions. 6. What is the Role of the SAS system? The reasons why the pharmaceutical industry has used the SAS system in the past have been its very powerful data manipulation capabilities -- needed for analyses for final study reports -- and its data access functionality. Now in addition, in this Data Warehousing environment, the SAS system meets two other important criteria. Firstly, having a single tool for the whole process eases the development of the environment. Secondly, the portability of the SAS system enables the whole CASANDrA system to be delivered to the authorities on their own preferred operating system. 7. Costs and Benefits The proposed Data Warehouse concept has pros and cons. Advantages include: • Productivity: Standardization enables the creation of one set of generic tools for both drug development and CANDAs. • Quality: ~ Using a broader set of generic review and analysis tools distributed to the whole drug development team increases the quality of drug development as well as the quality of review by regulatory authorities. ~ Better quality CANDA system (i.e. less bugs) for submission to regulatory authorities because it has already been extensively used in-house. ~ The CANDAs reflect the drug development process. Conclusions drawn during drug development and conolusions drawn during the review by regulatory authorities are based on the same data and review tools and hence on the same information. 1147 • Efficiency: User empowerment: putting information into the hands of the whole drug development team from the beginning of the project onwards. • Validation: Reduction of the amount of validation needed because the CANDA system is a reuse of production tools already validated rather than an additional computer system. This also avoids inconsistencies between CANDAs and final study reports. Beside these benefits, however, there are also costs and potential pitfalls which need to be taken into account: • Such a Data Warehouse concept needs a larger amount of sophisticated programming which in turn requires more resources. • The concept requires a greater standardization of the drug development process from the creation of Case Report Forms onwards. Full management support is vital to persuade project personnel to seek and implement standard solutions before considering project specific approaches. • There might be additional costs in giving inferential power -- the capacity to easily perform inferential analyses -- to non-statisticians. 8. Conclusions Despite their astonishing potential, data review tools for CANDAs are currently often kept completely outside the drug development process. However, it is becoming increasingly clear that these tools should also be used in-house as early as possible. The focus of CANDAs is currently changing from the delivery to regulatory authorities to in-house implementation. In order to be consistent with the whole data analysis process within drug development, these tools have to be implemented within the Biometry department since this is the main department responsible for clinical data. Without this condition, the risk of failure would seem to be high. This, however, impacts on the traditional way of working and changes some of the key requirements for reporting and data handling. The Data Warehouse concept addresses these changes and provides a suitable environment for the implementation of the data analysis functionality of CANDAs within drug development. In this way, Data Warehousing is a concept which can be of major importance to pharmaceutical companies. Our experience so far is that such a concept is realistic and can be achieved with the SAS system. It provides major advantages not only for CANDAs for regulatory authorities, but also to the pharmaceutical companies from the beginning of a drug development onwards. We hope that then Cassandra may be heard. Acknowledgments We would like to thank our colleagues Mr Michael Clarke, Mr Thorkild Nielsen, Dr Olga Rutman, Mr Daniel Gronell and Ms Cheryl Altieri as well as our manager Dr Samuel Givens for many fruitful discussions which have provided significant input into this paper. t 148