Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Version 1.0 Draft 3 1. Analyses and Displays Associated with Outliers or Shifts from Normal to Abnormal – Focus on Vital Sign, Electrocardiogram, and Laboratory Analyte Measurements in Phase 2-4 Clinical Trials and Integrated Summary Documents Version 1.0 Created xx XXXX 201x A White Paper by the PhUSE Computational Science Development of Standard Scripts for Analysis and Programming Working Group Disclaimer: The opinions expressed in this document are those of the authors and do not necessarily represent the opinions of PhUSE, members' respective companies or organizations, or regulatory authorities. The content in this document should not be interpreted as a data standard and/or information required by regulatory authorities. Note to reviewers: This is the 3rd draft sent for broad review and likely the last round. Please review all sections. Thanks! 1 Version 1.0 Draft 3 2. Table of Contents Section Page 1. Analyses and Displays Associated with Outliers or Shifts from Normal to Abnormal – Focus on Vital Sign, Electrocardiogram, and Laboratory Analyte Measurements in Phase 2-4 Clinical Trials and Integrated Summary Documents ...............................................................................1 2. Table of Contents ....................................................................................................................2 3. Revision History ......................................................................................................................4 4. Purpose ....................................................................................................................................5 5. Introduction .............................................................................................................................6 6. General Considerations ...........................................................................................................7 6.1. All Measurement Types .....................................................................................................7 6.1.1. P-values and Confidence Intervals .............................................................................7 6.1.2. Importance of Visual Displays ...................................................................................7 6.1.3. Conservativeness ........................................................................................................7 6.1.4. Measurements After Stopping Study Medication ......................................................8 6.1.5. Measurements at a Discontinuation Visit ..................................................................9 6.1.6. Measurements Collected in Reflex Manner ...............................................................9 6.1.7. Screening Measurements versus Special Topics .......................................................9 6.1.8. Number of Therapy Groups .......................................................................................9 6.1.9. Multi-phase Clinical Trials ......................................................................................10 6.1.10. Integrated Analyses ..................................................................................................10 6.2. Laboratory Analyte Measurements ..................................................................................10 6.2.1. Planned versus Unplanned Measurements ...............................................................10 6.2.2. Analytes Collected Qualitatively .............................................................................10 6.2.3. Central Versus Local Laboratories ..........................................................................11 6.2.4. Reference Limits ......................................................................................................11 6.2.5. Above and Below Quantifiable Limits ....................................................................12 6.3. ECG Quantitative Measurements .....................................................................................12 6.3.1. QT Correction Factors .............................................................................................12 6.3.2. Reference Limits ......................................................................................................12 6.3.3. JT Interval ................................................................................................................13 6.4. Vital Sign Measurements .................................................................................................13 6.4.1. Reference Limits ......................................................................................................13 7. Tables and Figures for Individual Studies .............................................................................14 7.1. Recommended Displays ...................................................................................................14 7.2. Discussion.........................................................................................................................19 8. Tables and Figures for Integrated Summaries .......................................................................21 2 Version 1.0 Draft 3 8.1. 8.2. Recommended Displays ...................................................................................................21 Discussion.........................................................................................................................24 9. Example SAP Language ........................................................................................................26 9.1. Individual Study ...............................................................................................................26 9.2. Integrated Summary .........................................................................................................28 10. [To be further developed]References ....................................................................................30 11. Acknowledgements ...............................................................................................................32 12. Appendix ...............................................................................................................................33 3 Version 1.0 Draft 3 3. Revision History Version 1.0 was finalized xx XXXX 201x. 4 Version 1.0 Draft 3 4. Purpose The purpose of this white paper is to provide advice on displaying, summarizing, and/or analyzing measures of outliers or shifts, with a focus on vital signs, electrocardiogram (ECG) quantitative findings, and laboratory analyte measurements in Phase 2-4 clinical trials and integrated submission documents. This white paper also provides advice on collection if a particular recommended display requires data to be collected in a certain manner that may differ from current practice. The intent is to begin the process of developing industry standards with respect to analysis and reporting for measurements that are common across clinical trials and across therapeutic areas. In particular, this white paper provides recommended tables, figures, and listings for measures of outliers or shifts for a common set of safety measurements. Separate white papers address other types of data or analytical approaches (e.g., central tendency). This advice can be used when developing the analysis plan for individual clinical trials, integrated summary documents, or other documents in which measures of outliers or shifts are of interest. Although the focus of this white paper pertains to specific safety measurements (vital signs, ECG quantitative findings, and laboratory analyte measurements), some of the content may apply to other measurements (e.g., different safety measurements and efficacy assessments). Similarly, although the focus of this white paper pertains to Phase 24, some of the content may apply to Phase 1 or other types of medical research (e.g., observational studies). Development of standard Tables, Figures, and Listings (TFLs) and associated analyses will lead to improved standardization from collection through data storage. (You need to know how you want to analyze and report results before finalizing how to collect and store data.) The development of standard TFLs will also lead to improved product lifecycle management by ensuring reviewers receive the desired analyses for the consistent and efficient evaluation of patient safety and drug effectiveness. Although having standard TFLs is an ultimate goal, this white paper reflects recommendations only and should not be interpreted as “required” by any regulatory agency. Detailed specifications for TFL or dataset development are considered out-of-scope for this white paper. However, the hope is that specifications and code (utilizing SDTM and ADaM data structures) will be developed consistent with the concepts outlined in this white paper, and placed in the publicly available PhUSE Standard Scripts Repository. 5 Version 1.0 Draft 3 5. Introduction Industry standards have evolved over time for data collection (CDASH), observed data (SDTM), and analysis datasets (ADaM). There is now recognition that the next step would be to develop standard TFLs for common measurements across clinical trials and across therapeutic areas. Some could argue that perhaps the industry should have started with creating standard TFLs prior to creating standards for collection and data storage (consistent with end-in-mind philosophy), however, having industry standards for data collection and analysis datasets provides a good basis for creating standard TFLs. The beginning of the effort leading to this white paper came from the PhUSE Computational Science Collaboration, an initiative between PhUSE, FDA, and Industry where key priorities were identified to tackle various challenges using collaboration, crowd sourcing, and innovation (Rosario, et. al. 2012). Several Computational Science (CS) working groups were created to address a number of these challenges. The working group titled “Development of Standard Scripts for Analysis and Programming” has led the development of this white paper, along with the development of a platform for storing shared code. Most contributors and reviewers of this white paper are industry statisticians, with input from non-industry statisticians (e.g., FDA and academia) and industry and non-industry clinicians. Hopefully additional input (e.g., other regulatory agencies) will be received for future versions of this white paper. There are several existing documents that contain suggested TFLs for common measurements. However, many of the documents are now relatively outdated, and generally lack sufficient detail to be used as support for the entire standardization effort. Nevertheless, these documents were used as a starting point in the development of this white paper. The documents include: ICH E3: Structure and Content of Clinical Study Reports Guideline for Industry: Structure and Content of Clinical Study Reports Guidance for Industry: Premarketing Risk Assessment Reviewer Guidance. Conducting a Clinical Safety Review of a New Product Application and. Preparing a Report on the Review ICH M4E: Common Technical Document for the Registration of Pharmaceuticals for Human Use Efficacy ICH E14: The Clinical Evaluation of QT/QTc Interval Prolongation and Proarrhythmic Potential For Non-Antiarrhythmic Drugs Guidance for Industry: ICH E14 Clinical Evaluation of QT/QTc. Interval Prolongation and Proarrhythmic Potential for Non-Antiarrhythmic Drugs The Reviewer Guidance is considered a key document. As discussed in the guidance, there is generally an expectation that analyses of outliers or shifts are conducted for vital signs, ECG quantitative findings, and laboratory analyte measurements. The guidance recognizes value to both analyses of central tendency and analyses of outliers or shifts from within reference limits to outside reference limits (below lower reference limit or above upper reference limit). We assume both will be conducted for safety signal detection. This white paper covers the outliers or shifts portion, with the expectation that an additional TFL or TFLs will also be created with a focus on central tendency (see the CSS white paper pertaining to central tendency). 6 Version 1.0 Draft 3 6. General Considerations This section contains some general considerations for the plan of analyses and displays associated with outliers or shifts from normal to abnormal for laboratory analyte measurements, vital signs and ECG quantitative measurements. Section 6.1 discusses general considerations for all the three safety domains. Section 6.2 discusses considerations specific to laboratory analyte measurements. Section 6.3 discusses considerations specific to ECGs quantitative measurements. Section 6.4 discusses considerations specific to the vital signs. 6.1. All Measurement Types 6.1.1. P-values and Confidence Intervals There has been ongoing debate on the value or lack of value of the inclusion of p-values and/or confidence intervals in safety assessments (Crowe, et. al. 2009). This white paper does not attempt to resolve this debate. As noted in the Reviewer Guidance, p-values or confidence intervals can provide some evidence of the strength of the finding, but unless the trials are designed for hypothesis testing, these should be thought of as descriptive. Throughout this white paper, p-values and measures of spread are included in several places. Where these are included, they should not be considered as hypothesis testing. If a company or compound team decides that these are not helpful as a tool for reviewing the data, they can be excluded from the display. Some teams may find p-values and/or confidence intervals useful to facilitate focus, but have concerns that lack of “statistical significance” provides unwarranted dismissal of a potential signal. Conversely, there are concerns that due to multiplicity issues, there could be over-interpretation of p-values adding potential concern for too many outcomes. Similarly, there are concerns that the lower- or upper-bound of confidence intervals will be over-interpreted. (A percentage can be as high as x causing undue alarm.) It is important for the users of these TFLs to be educated on these issues. 6.1.2. Importance of Visual Displays Communicating information effectively and efficiently is crucial in detecting safety signals and enabling decision-making. Current practice, which focuses on tables and listings, has not always enabled us to communicate information effectively since tables and listings may be very long and repetitive. Graphics, on the other hand, can provide more effective presentation of complex data, increasing the likelihood of detecting key safety signals and improving the ability to make clinical decisions. They can also facilitate identification of unexpected values. Standardized presentation of visual information is encouraged. The FDA/Industry/Academia Safety Graphics Working Group was initiated in 2008. The working group was formed to develop a wiki and to improve safety graphics best practice. It has recommendations on the effective use of graphics for three key safety areas: adverse events, ECGs and laboratory analytes. The working group focused on static graphs, and their recommendations were considered while developing this white paper. In addition, there has also been advancement in interactive visual capabilities. The interactive capabilities are beneficial, but are considered out-of-scope for this version of the white paper. 6.1.3. Conservativeness The focus of this white paper pertains to clinical trials in which there is comparator data. As such, the concept of “being conservative” is different than when assessing a safety signal within an individual subject or a single 7 Version 1.0 Draft 3 arm. A seemingly conservative approach may end up not being conservative in the end. For example, for studies that collect safety data during an off-drug follow-up period, one might consider it conservative to include the adverse events reported in the follow-up period. However, this approach may result in smaller odds ratios than including only the exposed period in the analysis. Another example occurs when choosing cut-offs for shift/outlier analyses. A conservative approach for defining outcomes, from a single arm perspective, is one that would lead to a higher number of patients reaching a threshold. However, a conservative approach for defining outcomes may actually make it more difficult to identify safety signals with respect to comparing treatment with a comparator (see Section 7.1.7.3.2 in the Reviewer Guidance). Thus, some of the approaches recommended in this white paper may appear less conservative than alternatives, but the intent is to propose methodology that can identify meaningful safety signals for a treatment relative to a comparator group. 6.1.4. Measurements After Stopping Study Medication Measurements collected after stopping medications under study (e.g., treatment under study and comparators) are common for various reasons. In some cases, “follow-up” phases are included to monitor patients for a period of time after study medication is stopped. Additionally, study designs where keeping patients in a study (for the entire planned length of time) after deciding to stop medication early are becoming more popular. In these cases, patients can be off study medication for an extended period of time. Measurements post study medication can also arise not by design. For example, a subject can decide to stop study medication at any time, and then later attend the planned visit where the planned measurements are obtained. There is currently no standard approach on how to handle safety assessments post study medication. Some guidances contain advice on how long to collect safety measurements post study medication (e.g, 30 days post or, x half-lives). Any advice or decisions related to the collection of safety measurements post study medication should not be confused with how to include such data in displays and/or analyses. It is extremely important to document within the database for analysis the best estimate of the last date study treatment was taken as well as dates on which all numerical safety data were collected so that an accurate determination can be made of time of data collection relative to last dose of medication. We recommend that the TFLs in this white paper generally exclude measurements taken during a “follow-up” phase. Separate TFLs can be created for the follow-up phase and/or the treatment and follow-up phases combined. We also recommend that the TFLs in this white paper exclude measurements taken after the visit which is considered the “study medication discontinuation” visit. In the study designs which keep patients in a study for the entire planned length of time even after stopping medication, separate TFLs can be created for the “off-medication” time and/or the treatment and “off-medication” times combined. This enables the researcher to distinguish between drug-related safety signals versus safety signals that could be more related to discontinuing a drug (e.g., return of disease symptoms, introduction of a concomitant medication, and/or discontinuation- or withdrawal-effects of the drug) or due to subsequent therapy. We assume it is important to distinguish among these. Generally, at least some TFLs that include data from follow-up phases and/or “offmedication” time will be required, but not usually as many as done for during treatment and not necessarily in the same format as provided in this white paper. For some compounds (e.g., compounds with a long half-life compared to the duration of the study, compounds used for a very short time like antibiotics), a more complete set of TFLs including such data may be required. The ease of interpretation from such TFLs will vary depending on the compound, disease, and/or design aspects, such as, the half-life of the compound, likelihood of taking alternative therapy, allowed concomitant medications during the observation period, etc. 8 Version 1.0 Draft 3 For the case where a subject decides to stop study medication at any time and then later attends the planned visit to obtain the planned measurements, we recommend measures taken at the study medication discontinuation visit be included. Although some patients may be off medication, the time is generally short in these situations. For this example, the inclusion of such measurements may more accurately reflect the safety profile of a compound versus their exclusion. In study designs with a long period of time between visits, an alternative approach may be warranted. 6.1.5. Measurements at a Discontinuation Visit When creating displays or conducting analyses over time, how to handle data collected at discontinuation visits should be specified. Since a subject’s discontinuation visit isn’t always aligned with planned timing, it’s not obvious whether to include these measurements in displays or analyses over time. Such measurements are “planned” per protocol, but not consistent with the planned timing. We generally recommend including measures taken at the discontinuation visit toward the next timepoint. For example, if a patient discontinues medication and the study between Visits 6 and 7, goes to the office for their discontinuation visit, we recommend that the measurements taken at the discontinuation visit are grouped with “Visit 7”. The inclusion of such measurements may more accurately reflect trends over time for the compound than their exclusion. In study designs with a long period of time between visits, an alternative approach may be warranted. 6.1.6. Measurements Collected in Reflex Manner In study designs, it is possible to have some measurements collected only when another measurement meets a certain criteria (i.e., collected in a reflex manner). For example, sometimes a peripheral smear is only performed when certain Complete Blood Count (CBC) analytes meet a specified threshold. How to handle such measurements should be specified in analysis planning, which requires an understanding of collection practices. Generally, measurements collected in a reflex manner would be used for individual patient management and possibly for individual patient listings or individual case descriptions (e.g., as included in patient narratives). Summaries of such measurements within or between treatment groups tend to be uninterpretable as you can not generally assume normality among those who did not have the measurement, and a summary among those meeting the critieria for receiving the measurement (sometimes a very small denominator) tends not to be very helpful for signal detection purposes. 6.1.7. Screening Measurements versus Special Topics The focus of this white paper pertains to measurements as part of normal safety screening. For many compounds, some measurements are relevant to addressing a-priori special topics of interest. In these cases, it is possible that additional TFLs and/or different TFLs are warranted. TFLs designed for special topics are outof-scope for this white paper. In addition, it is possible that additional TFLs are warranted when a safety signal is identified using the TFLs recommended in this white paper and/or the TFLs that focus on central tendency (separate white paper). Additional TFLs that would be considered “post-hoc” for further investigation are considered out-of-scope. 6.1.8. Number of Therapy Groups The example TFLs show one treatment arm versus comparator in this version of the white paper. Most TFLs can be easily adapted to include multiple treatment arms or a single arm. 9 Version 1.0 Draft 3 6.1.9. Multi-phase Clinical Trials The example TFLs for individual studies show two treatment arms and a comparator arm within a controlled phase of a study. The example TFLs for integrated summaries show one treatment arm (assumes all the treated arms pooled) and a comparator arm within the controlled phase of the studies. Discussion around additional phases (e.g., open-label extensions) is considered out-of-scope in this version of the white paper. Many of the TFLs recommended in this white paper can be adapted to display data from additional phases and/or additional treatment arms. 6.1.10. Integrated Analyses For submission documents, TFLs are generally created from using data from multiple clinical trials. Determining which clinical trials to combine for a particular set of TFLs can be complex. Section 7.4.1 of the Reviewer Guidance contains a discussion of points to consider. Generally, when p-values are computed, adjusting for study is important. Creating visual displays or tables in which timepoints or treatment comparisons are confounded with study is discouraged. Understanding whether the overall representation accurately reflects the review across individual clinical trial results is important. 6.2. Laboratory Analyte Measurements The following topics generally pertain to laboratory analyte measurements, though they may apply to other measurement types, as well. In these cases, the discussion below may or may not apply. 6.2.1. Planned versus Unplanned Measurements One topic that tends to be unique to safety (laboratory analyte measurements in particular) is the collection of unplanned measurements. Unplanned safety measurements can arise for various reasons. During a study, the clinical investigator sometimes orders a repeat test or “retest” of a laboratory test especially if he/she has received an unexpected value. The investigator may also request the patient return for a “follow-up visit” due to clinical concerns. In general, retests are repeat tests performed because an initial test result had an unexpected value. The repeat result may either confirm the initial test results, or (less commonly) suggest that a laboratory error occurred in the case of the initial result. Retests are often performed to verify that the action taken by the investigator (e.g., changing the dose of study drug as allowed by the protocol) has the desired effect (e.g., test results have returned to within reference limits). If such retests are conducted until desired measurement results have been reached, analyses from baseline to last observation, for example, would be biased toward “normality”. Thus, we recommend including only planned measurements when creating displays or conducting analyses over time and when assessing change from baseline to endpoint. However, we recommend including planned and unplanned measurements for analyses that focus on outliers or shifts across an entire period, as these are intended to focus on the most extreme changes. 6.2.2. Analytes Collected Qualitatively Some laboratory analyte measurements are collected in a qualitative manner that is usually binary (e.g. Elliptcytes: normal/abnormal) or ordinal (e.g. Spherocytes: 0 [imply by lack of reporting], +, ++, +++, ++++). Some analytes have a numeric value when present, but is better treated as qualitative data (e.g., atypical lymphocytes, a type of abnormal white blood cell seen with some viral infections, should be treated as present, not present). How to handle such analytes should be included in analysis planning. In general, a listing of abnormal findings is sufficient. 10 Version 1.0 Draft 3 A summary of those shifting from normal during the pre-treatment period to abnormal during the treatment period can also be considered. Converting qualitative measurements to abnormal versus normal categories when it is not collected as abnormal and normal, is usually defined by laboratories and included in routine data transfers, but should be confirmed and well understood by study teams. 6.2.3. Central Versus Local Laboratories In recent years, most large studies have utilized a central laboratory to ensure consistency in laboratory assessments across institutions. However, there are times when this is not feasible. For example, some studies may need to utilize local laboratories due to the nature of the study. There are also cases where the scheduled labs are done using a central laboratory, but ad-hoc local laboratory results are done as needed for patient care. Generally, results from different laboratories should not be combined, unless careful review of laboratory assay methods and laboratory limit determination methods have been deemed consistent. When feasible, samples can be split such that the local laboratory results can be provided for urgent patient care, but results from the central laboratory would also be available. If you adopt such practice, including data from the central laboratory only is sufficient. 6.2.4. Reference Limits Laboratories generally maintain reference limits that can be used to screen for potential pathology. Methods to develop such limits vary, but many are developed with individual subject safety monitoring in mind. Thus, the limits from many laboratories tend to be “sensitive” (reduced false negatives). Several statistical authors (Copeland et al, 1977; O’Neil, xxxx; Quade et al, 1980) have presented arguments suggesting that conventional reference limits with limits set at the 2.5th and 97.5th percentiles (95 percentile reference interval; commonly used method for reference limit determination) after removal of outliers (Clinical and Laboratory Standards Institute, 2008) might not be optimal for outlier / shift categorical analysis of laboratory analytes aimed at detecting differences between groups. Quade et al (1980) in particular discussed the impact of misclassification on estimation of incidence and on the power of an inferential test to detect a difference between groups when it exists. Translating this into a problem of choosing reference limits determined by the reference interval, it will be more important to choose a limit that is extreme enough that specificity remains high, but one that is not so high as to decrease sensitivity to a very low value. The choice of optimal reference limit will be data dependent and is likely to be variable across analytes, but using the principles that specificity has a greater effect than sensitivity, we can make reasonable choices that could be superior to reference limits provided by the laboratory. Currently, such alternatives are not widely available. When such alternatives are available, their use is generally recommended. Another aspect of choosing an optimal reference limit pertains to the population in which reference limits are developed. “Reference individuals (patients)” are the individuals from whom biological samples are collected for measuring an analyte in order to establish the reference limits for the analyte. For clinical use, the reference sample group is generally determined to be healthy by some means. This is appropriate for screening individual patients for presence or lack of health. Authors have suggested that it might be important to tailor the reference population based on the purpose for which the derived reference limits will be used (Solberg, xxx). This could include using a reference sample of clinical trial patients, or clinical trial patients with the disease under study. As with limits developed using higher percentile reference intervals, limits developed using alternative populations are not widely available. When such alternatives are available, their use is generally recommended. 11 Version 1.0 Draft 3 For some laboratory analytes, clinical thresholds (e.g., Fasting Glucose ≥126 mg/dL) have been published and can be considered for use in outlier/shift summaries and analyses. Use of clinically-derived limits is recommended (likely in addition to use of statistically-derived limits) when the analyte is of special interest. For purposes of this white paper, it is assumed a reference limit is chosen that would identify values as low, normal, or high for quantitative measurements. For qualitative measurements, it is assumed observations would be identified as normal or abnormal. Providing a specific recommendation for the reference limits is out-ofscope for this version of the white paper. The specific choice of limit should be documented (protocol, Statistical Analysis Plan, study report methods section, etc.). Reference limits for a laboratory analyte may vary across demographics. For example, reference limits for a laboratory analyte may be different for < 45 years old and ≥ 45 years old. We recommend using the reference limit according to the patient’s real age at the time the laboratory measurement was taken instead of using the patient’s age entering the study. 6.2.5. Above and Below Quantifiable Limits Values above or below quantitative range (eg, <0.0001) include critical information and should not be discarded. Such values can generally be categorized as low or high and their inclusion in outlier/shift summaries and analyses is recommended. 6.3. ECG Quantitative Measurements Special considerations for “thorough QT/QTc studies” are considered out-of-scope for this white paper. 6.3.1. QT Correction Factors As noted in the ICH QT/QTc guidance (Section IA; Background), because of its inverse relationship to heart rate, the measured QT interval is routinely corrected by means of various formulae to a less heart-ratedependent value known as the QTc interval. Section IIIA of the same guidance provides a discussion of some of the various correction formulas and notes the controversy around appropriate corrections. Generally, we recommend that the TFLs include the corrected QT interval using Fridericia’s method (QTcF = QT/RR0.33). We believe the regulatory and medical environments are ready to accept the exclusion of Bazett’s method from standard TFLs. We believe a second method would likely be warranted for a more complete evaluation. The second method could be one that is derived from a linear regression technique (Dmitrienko, et. al. 2005). 6.3.2. Reference Limits As with laboratory reference limits, the choice for ECG reference limits can be controversial. Unlike laboratory analytes, use of clinically-derived limits are commonly used for ECG outlier/shift summaries and analyses. In addition, it’s common to include clinically-derived limitsfor both raw measures and change values for identifying patients of potential concern. Unfortunately, the specific clinically-derived thresholds that are used vary widely, hampering efforts to standardize analysis data across the industry. For purposes of this white paper, it is assumed a reference limit is chosen that would identify raw values as low, normal, or high. Providing a specific recommendation for the reference limits for either raw measures or changes is out-of-scope for this version of the white paper. The specific choice of limits should be documented (protocol, Statistical Analysis Plan, study report methods section, etc.). 12 Version 1.0 Draft 3 6.3.3. JT Interval QTc is a biomarker with a long established history of being used to assess the duration of ventricular repolarization. However, QTc encompasses both ventricular depolarization and ventricular repolarization. The length of the QRS complex represents ventricular depolarization and the length of the JT interval, measured from the end of the QRS complex to the end of the T-wave, specifically represents ventricular repolarization. JT can be corrected for heart rate as with QT. Thus, when the QRS is prolonged (e.g., a complete bundle branch block), QTc should not be used to assess ventricular repolarization. The decision as to which basis for assessing potential changes in ventricular repolarization will be used should be based on the expected proportion of patients with widened QRS complexes for any reason in that study. It is worth noting that this proportion increases with the age of the patient population and the extent to which the population is expected to suffer cardiac disease. 6.4. Vital Sign Measurements 6.4.1. Reference Limits As with laboratory and ECG reference limits, the choice for vital sign reference limits can be controversial. Similar to ECG limits, use of clinically-derived limits are commonly used for vital sign outlier/shift summaries and analyses, but vary widely. For purposes of this white paper, it is assumed a reference limit is chosen that would identify raw values as low, normal, or high. Providing a specific recommendation for the reference limits for either raw measures or changes is out-of-scope for this version of the white paper. The specific choice of limits should be documented (protocol, Statistical Analysis Plan, study report methods section, etc.). 13 Version 1.0 Draft 3 7. Tables and Figures for Individual Studies 7.1. Recommended Displays For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs in which low and high limits are based on raw values without a change or percent change criterion, a 3-panel display that includes a scatterplot, shift table, and a shift to low/high table is recommended. See Figures 7.1 and 7.2. In the scatterplot portion, lines indicating the reference limits are included to ease the review of the plots. In cases where limits vary across demographic characteristics and/or laboratories, lines indicating the most common limit can be displayed, which is especially a good option if the population under study contains a relatively large percentage of a particular demographic. Alternatively, lines for the lowest of the high limits and the highest of the low limits can be displayed. Displaying lines for all limits can be considered but will likely be too confusing to the users of the display. Figure 7.1 is an example for assessing low values, and Figure 7.2 is an example for assessing high values. Two sets of visuals, one for low and one for high for each laboratory analyte, vital sign and ECG are generally desired. The summary of shifts from normal/high to low includes patients whose minimum baseline value is normal or high. The summary of shifts from normal/low to high includes patients whose maximum baseline value is low or normal. For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs in which low and high limits are based on a specified change or percent change value or a combination of a specified value and a change or percent change, a 2-panel display that includes a scatterplot and a shift to low/high is recommended. See Figures 7.3 and 7.4. For laboratory analyte measurements collected qualitatively, a listing of abnormal findings is recommended (Table 7.1). For the shift from normal/high to low and shift from normal/low to high summaries, a test to compare treatments using Fisher’s exact test can be included as reflected in Figures 7.1 through 7.4. 14 Version 1.0 Draft 3 Figure 7.1 Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing Low Value – Individual Study 15 Version 1.0 Draft 3 Figure 7.2 Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing High Value – Individual Study 16 Version 1.0 Draft 3 Figure 7.3 Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing Low Value with Change Criteria – Individual Study 17 Version 1.0 Draft 3 Figure 7.4 Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing High Value with Change Criteria – Individual Study 18 Version 1.0 Draft 3 Table 7.1 Treatment-Emergent Abnormal Summary for Qualitative Safety Measures – Individual Study Laboratory Test (unit) Lab Test 1 Lab Test 2 … Lab Test n Treatment T1 N xxx n (%) xx(xx.x) P value* .xxx T2 xxx xx(xx.x) .xxx PL xxx xx(xx.x) T1 xxx xx(xx.x) .xxx T2 xxx xx(xx.x) .xxx PL xxx xx(xx.x) … … T1 xxx xx(xx.x) .xxx T2 xxx xx(xx.x) .xxx PL xxx xx(xx.x) … Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n = number of patients with abnormal post-baseline result. * – P values are from Fisher’s Exact test compare with PL. 7.2. Discussion There are certainly multiple ways to display outlier/shift summaries. For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, we considered only diplaying the scatterplots, only displaying the shift tables (See Table 12.1), and only displaying a shift to low/high table (See Table 12.2). We also considered a display that combined the boxplot (from the central tendency white paper) with a treatment-emergent table (See Figure 12.1). We quickly discarded only displaying the scatterplots. With just a scatterplot, users of the plots will likely be attempting to count and create percentages manually. We also quickly discarded only displaying the shift table (Table 12.1) for similar reasons. Users of the shift tables tend to count and create grouped percentages manually for those shifting to high from low/normal (or low from normal/high). Of note, shift tables become complex to create and difficult to interpret if the definition of an outlier/shift includes a specified change or percent change value. Thus, creating the shift table is not recommended in these cases. We strongly considered only displaying shifts to low/high with all analytes on the table (Table 12.2). This table has the advantage of being succinct and still reflecting the data that tends to be the most useful for signal detection. However, feedback from the medical community has indicated a desire to sort information by analyte as opposed to by analytical method. Thus, there’s a preference to have the ability to see the central tendency summary followed by the outlier/shift summary by analyte. Table 12.2 is not suited for this type of presentation. Thus, we strongly considered the display that has the boxplot and shift to low/high summary on the same page (Figure 12.1). This has the advantage of being succinct (one page for each analyte) and is by analyte consistent with medical preferences. However, the 3-panel display is recommended since we believe the additional information provided in the scatterplot is generally worth the extra page per analyte. Researchers can visually see the extent of the shift (how high or how low relative to baseline measurements) and can visually see if there’s a clustering by treatment. The shift table portion is perhaps of less value, but it is quite popular across current practices. 19 Version 1.0 Draft 3 Thus, it will likely help researchers who are used to shift tables have what they are used to seeing while also seeing other useful displays (when limits are based on specified values without change criterion). The information can still be sorted by analyte in the clinical study report (likely in the appendix) – the boxplot followed by the 3-panel outlier/shift diplay sorted by analyte. This also makes it easy to bring the analytes that end up being interesting into the body of the clinical study report. If a table such as Table 12.2 is created, a manually created summary or a new table would be required to discuss the analyte of interest in the body of the clinical study report. For laboratory analyte measurements collected qualitatively, a shift from normal to abnormal table was considered (Table 12.3). In most cases, we believe the listing of abnormal findings is sufficient. For any analyte part of a topic of special interest, then a shift from normal to abnormal table will likely be of interest. As noted in Section 6.2.2, it would be important to understand data collection to properly create the table. We also considered another display for summarizing information in a succinct manner within the body of a clinical study report. See Figure 12.2. This display has the advantage of quickly browsing through all the analytes that were analyzed sorting by decreasing odds ratios. However, given the medical feedback to present information sorted by analyte as opposed to analytical method, we believe the approach to bring forward the boxplot and 3-panel outlier/shift figure into the body of the clinical study report for those analytes of interest (with all the displays in the appendix), discussed by analyte, is a preferred approach. 20 Version 1.0 Draft 3 8. Tables and Figures for Integrated Summaries 8.1. Recommended Displays For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, a display that includes scatterplots by study, and a shift to low/high table is recommended for summaries across studies. See Figure 8.1 (for low values) and Figure 8.2 (for high values). For cut-off criteria including change value, additional reference line will be added to the scatter plots in Figure 8.1 and Figure 8.2 (see Figure 7.3 and 7.4). For laboratory analyte measurements collected qualitatively, the same listing of abnormal findings (Listing 7.1) for individual studies is recommended for integrated summaries, or a shift from normal to abnormal table can be considered (Table 8.1). 21 Version 1.0 Draft 3 Figure 8.1 Scatter Plot and Shift Summary for Quantitative Safety Measures for Low Value – Integrated Database 22 Version 1.0 Draft 3 Figure 8.2 Scatter Plot and Shift Summary for Quantitative Safety Measures for High Value – Integrated Database 23 Version 1.0 Draft 3 Table 8.1 Treatment-Emergent Abnormal Summary for Qualitative Safety Measures – Integrated Database Laboratory Test (unit) Treatment N n (%) OR*a Heterogeneity P value*b P value*c Lab Test 1 A xxx xx(xx.x) xx.xx .xxx .xxx B xxx xx(xx.x) A xxx xx(xx.x) xx.xx .xxx .xxx B xxx xx(xx.x) … … A xxx xx(xx.x) xx.xx .xxx .xxx B xxx xx(xx.x) Lab Test 2 … Lab Test n … Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n = number of patients with abnormal post-baseline result. OR = Mantel-Haenszel odds ratio; add more as needed (alphabetically). *a – Mantel-Haenszel Odds Ratio stratified by study. Treatment B is numerator, treatment A is denominator. *b – Heterogeneity of odds ratios across studies was assessed using the Breslow Day test. *c – P values are from Cochran-Mantel-Haenszel (CMH) test of general association stratified by study. 8.2. Discussion For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, utilizing the same display recommended for individual studies was considered (3-panel display with a single scatterplot, shift table, and shift to low/high table with meta-analytical methods added. However, due to concerns with potential paradoxes (can we reference our book chapter?) when combining data from multiple studies, a single scatterplot (with studies combined) and a single shift table (with studies combined) was discarded. Instead, a scatterplot by study is recommended (unless the number of studies prohibits the use of such a display). A shift table by study is not recommended due to space limitations, but would be available in the study reports of the individual studies. The percentages provided in the shift to low/high table are subject to similar potential paradoxes, however can be reviewed in context with the Mantel-Haenszel odds ratio (which does account for study). Users of the figure would need to be educated to look for situations when the odds ratio appears inconsistent with the presented percentages. In such cases, the odds ratio would reflect the data more appropriately and understanding by-study results would be important. Utilizing methods to provide “adjusted cumulative proportions” suggested by Chuang-Stein, et al (2011) might also be useful, but considered out-of-scope for this version of the white paper. Another display that was considered was a shift to low/high table with a corresponding forest plot that shows incidence differences by study (Figure 12.4.). This display has the advantage of being practical even when many studies are included in a summary. However, when the number of studies is small enough (e.g., 6 or less) the scatterplot is recommended as it provides insight to patient level information by individual study that is often very valuable for users of the figure. When the number of studies is large (e.g., >6), Figure 12.4 can be considered. 24 Version 1.0 Draft 3 A simple shift to low/high table was also considered (Table 12.2). Again, we would strongly recommend the display shown in Figure 8.1 over this display for integrate analysis when the number of studies is small enough for the same reason stated above. When the number of studies is large (e.g., >6), Table 12.2 can be considered. 25 Version 1.0 Draft 3 9. Example SAP Language 9.1. Individual Study For quantitative laboratory analyte measurements, 3-panel displays that include a scatterplot, shift table, and a shift to high/low table will be created. Specifically, for each measurement, both a 3-panel display assessing low values and a 3-panel diplay assessing high values will be created. In the 3-panel display to assess low values, the scatterplot will plot the minimum value during the baseline period versus the minimum value during the treatment period. Lines indicating the reference limits are included. In cases where limits vary across demographic characteristics, lines indicating the most common limit will be displayed. The shift table will include the number and percentage of patients within each baseline category (minimum value is low, normal, high, or missing) versus each treatment category (minimum value is low, normal, or high) by treatment. Patients with at least one result in the treatment period will be included in the shift table. The shift from normal or high to low table will include the number and percentage of patients by treatment whose minimum baseline result is normal or high and whose minimum treatment result is low. Patients whose minimum baseline result is normal or high and have at least one result during the treatment period are included. The Fisher’s exact test will be used to compare percentages of patients who shift from normal or high to low between treatments. The 3-panel display to assess high values will be created similarly. The scatterplot will plot the maximum value during the baseline period versus the maximum value during the treatment period. The shift table will include the number and percentage of patients within each baseline category (maximum value is low, normal, high, or missing) versus each treatment category (maximum value is low, normal, or high) by treatment. The shift from normal or low to high table will include the number and percentage of patients by treatment whose maximum baseline result is normal or low and whose maximum treatment result is high. Patients whose maximum baseline result is normal or low and have at least one result during the treatment period are included. For laboratory analyte measurements collected qualitatively, a listing of abnormal findings will be created. The listing will include patient ID, treatment group, laboratory collection date, analyte name, analyte finding. For quantitative ECG measurements and vital signs with limits defined using a specified value without a change criterion, 3-panel displays will be created as described above. For quantitative ECG measurements and vital signs with limits defined using a specified value and a change criterion, 2-panel displays will be created. The 2-panel display will include the scatterplot and the shift to low/high table. To assess increases, change from the maximum value during the baseline period to the maximum value during the treatment period will be used. To assess decreases, change from the minimum value during the baseline period to the minimum value during the treatment period will be used. Laboratory tests include all planned analytes as defined in the protocol, excluding those collected in a reflex manner (only collected under certain circumstances). Alanine aminotransferase (ALT), aspartate aminotransferase (AST), and total bilirubin will not be included in this analysis as they will be analyzed as described in the hepatotoxicity section. Vital signs include systolic blood pressure, diastolic blood pressure, pulse, and temperature. 26 Version 1.0 Draft 3 Physical characteristics include weight and BMI. ECG parameters include heart rate, PR, QRS, QT, corrected QT using Fredericia’s correction factor (QTcF=QT/RR0.333), and corrected QT using a large clinical trial population based correction factor (QTcLCTPB=QT/RR0.413; Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005) When the QRS is prolonged (for example, a complete bundle branch block), QT and QTc should not be used to assess ventricular repolarization. Thus, for a particular ECG, the following will be set to missing (for analysis purposes) when QRS is ≥120: QT, QTcF and QTcLCTPB. Large clinical trial population based reference limits will be used to define the low and high limits for laboratory analyte measurements (Reference x or Attachment x – not shown in this example). Reference limits for ECGs and vital signs are defined in Table 9.1 and 9.2, respectively.. Table 9.1 Selected Categorical Limits for ECG Data Parameter Heart Rate (bpm) PR Interval (msec) QRS Interval (msec) QTcF (msec) QTcLCTPB (msec)1 Males Age (yrs): limit ≥18: <50 and decrease ≥15 Low Females Age (yrs): limit ≥18: <50 and decrease ≥15 Males Age (yrs): limit ≥18: >100 and increase ≥15 High Females Age (yrs): limit ≥18: >100 and increase ≥15 All ages: <120 All ages: <120 All ages: ≥220 All ages: ≥220 All ages: <60 All ages: <330 All ages: <330 All ages: <60 All ages: <340 All ages: <340 All ages: ≥120 ≥16: >450 <18: >444 18-25: >449 26-35: >438 36-45: >446 46-55: >452 56-65: >448 >65: >460 All ages: ≥120 ≥16: >470 <18: >445 18-25: >455 26-35: >455 36-45: >459 46-55: >464 56-65: >469 >65: >465 NA=Not applicable 1. Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005 Table 9.2 Categorical Criteria for Abnormal Treatment-Emergent Blood Pressure and Pulse Measurement, and Categorical Criteria for Weight and Temperature Changes for Adults Parameter Low mmHg High mmHg Systolic BP (mm Hg) (Supine or sitting – forearm at heart level) Diastolic BP (mm Hg) (Supine or sitting – forearm at heart level) Pulse (bpm) (Supine or sitting) Temperature ≤ 90 and decrease from baseline ≥ 20 ≥ 140 and increase from baseline ≥ 20 ≤ 50 and decrease from baseline ≥ 10 ≥ 90 and increase from baseline ≥ 10 < 50 and decrease from baseline ≥ 15 < 96 degrees F and decrease ≥ 2 degrees F > 100 and increase from baseline ≥ 15 ≥ 101 degrees F and increase ≥ 2 degrees F 27 Version 1.0 Draft 3 9.2. Integrated Summary For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, a display that includes scatterplots by study, and a shift to low/high table will be created. Specifically, for each measurement, both a 2-panel display assessing low values and a 2-panel display assessing high values will be created. In the 2-panel display to assess low values, the scatterplots will plot the minimum value during the baseline period versus the minimum value during the treatment period for each study. Lines indicating the reference limits are included. For cut-off criteria including a change value, an additional reference line will be added to the scatterplots. In cases where limits vary across demographic characteristics, lines indicating the most common limit will be displayed. The shift from normal or high to low table will include the number and percentage of patients by treatment whose minimum baseline result is normal or high and whose minimum treatment result is low. Patients whose minimum baseline result is normal or high and have at least one result during the treatment period are included. The Cochran-Mantel-Haenszel test stratified by study will be used to compare percentages of patients who shift from normal or high to low between treatments. The Mantel-Haenszel odds ratio and Breslow-Day test for heterogeneity will also be provided. The 2-panel display to assess high values will be created similarly. The scatterplots will plot the maximum value during the baseline period versus the maximum value during the treatment period for each study. The shift from normal or low to high table will include the number and percentage of patients by treatment whose maximum baseline result is normal or low and whose maximum treatment result is high. Patients whose maximum baseline result is normal or low and have at least one result during the treatment period are included. For quantitative ECG measurements and vital signs with limits defined using a specified value and a change criterion, change from the maximum value during the baseline period to the maximum value during the treatment period will be used to assess increases. Change from the minimum value during the baseline period to the minimum value during the treatment period will be used to assess decreases. For laboratory analyte measurements collected qualitatively, a listing of abnormal findings will be created. The listing will include patient ID, treatment group, laboratory collection date, analyte name, analyte finding. Laboratory tests include all planned analytes as defined in the protocol, excluding those collected in a reflex manner (only collected under certain circumstances). Alanine aminotransferase (ALT), aspartate aminotransferase (AST), and total bilirubin will not be included in this analysis as they will be analyzed as described in the hepatotoxicity section. Vital signs include systolic blood pressure, diastolic blood pressure, pulse, and temperature. Physical characteristics include weight and BMI. ECG parameters include heart rate, PR, QRS, QT, corrected QT using Fredericia’s correction factor (QTcF=QT/RR0.333), and corrected QT using a large clinical trial population based correction factor (QTcLCTPB=QT/RR0.413; Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005) When the QRS is prolonged (for example, a complete bundle branch block), QT and QTc should not be used to assess ventricular 28 Version 1.0 Draft 3 repolarization. Thus, for a particular ECG, the following will be set to missing (for analysis purposes) when QRS is ≥120: QT, QTcF and QTcLCTPB. Large clinical trial population based reference limits will be used to define the low and high limits for laboratory analyte measurements (Reference x or Attachment x – not shown in this example). Reference limits for ECGs and vital signs are defined in Table 9.1 and 9.2, respectively.. Table 9.1 Selected Categorical Limits for ECG Data Parameter Heart Rate (bpm) PR Interval (msec) QRS Interval (msec) QTcF (msec) QTcLCTPB (msec)1 Males Age (yrs): limit ≥18: <50 and decrease ≥15 Low Females Age (yrs): limit ≥18: <50 and decrease ≥15 Males Age (yrs): limit ≥18: >100 and increase ≥15 High Females Age (yrs): limit ≥18: >100 and increase ≥15 All ages: <120 All ages: <120 All ages: ≥220 All ages: ≥220 All ages: <60 All ages: <330 All ages: <330 All ages: <60 All ages: <340 All ages: <340 All ages: ≥120 ≥16: >450 <18: >444 18-25: >449 26-35: >438 36-45: >446 46-55: >452 56-65: >448 >65: >460 All ages: ≥120 ≥16: >470 <18: >445 18-25: >455 26-35: >455 36-45: >459 46-55: >464 56-65: >469 >65: >465 NA=Not applicable 1. Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005 Table 9.2 Categorical Criteria for Abnormal Treatment-Emergent Blood Pressure and Pulse Measurement, and Categorical Criteria for Weight and Temperature Changes for Adults Parameter Low mmHg High mmHg Systolic BP (mm Hg) (Supine or sitting – forearm at heart level) Diastolic BP (mm Hg) (Supine or sitting – forearm at heart level) Pulse (bpm) (Supine or sitting) Temperature ≤ 90 and decrease from baseline ≥ 20 ≥ 140 and increase from baseline ≥ 20 ≤ 50 and decrease from baseline ≥ 10 ≥ 90 and increase from baseline ≥ 10 < 50 and decrease from baseline ≥ 15 < 96 degrees F and decrease ≥ 2 degrees F > 100 and increase from baseline ≥ 15 ≥ 101 degrees F and increase ≥ 2 degrees F 29 Version 1.0 Draft 3 10. [To be further developed]References Amit O, Heiberger RM, and Lane PW. Graphical approaches to the analysis of safety data from clinical trials. Pharmaceut. Statist. 2008; 7: 20–35. doi: 10.1002/pst.254. Crowe BJ, Xia A, Berlin JA, Watson DJ, Shi H, Lin SL, et. al. Recommendations for safety planning, data collection, evaluation and reporting during drug, biologic and vaccine development: a report of the safety planning, evaluation, and reporting team. Clinical Trials 2009; 6: 430-440. Biological Variation: From Principles to Practice. Callum G. Fraser. Washington, DC: AACC Press, 2001, 151 pp. Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical trial population. Drug Inf J 2005; 39:395-405. McGill R, Tukey JW, and Larsen WA. Variations of Box Plots. The American Statistician 1978; 32(1): 12-16. doi:10.2307/2683468.JSTOR 2683468. Rosario LA, Kropp TJ, Wilson SE, Cooper CK. Join FDA/PhUSE Working Groups to help harness the power of computational science. Drug Information Journal 2012; 46: 523-524. 1) Solberg HE and Grasbeck R. Reference values. Adv Clin Chem 27:1-79; 1989. Keep if used (related to reference limits): 1) CLSI. DEFINING, ESTABLISHING, AND VERIFYING REFERENCE INTERVALS IN THE CLINICAL LABORATORY; APPROVED GUIDELINE – THIRD EDITION. CLSI document C28A3c. Wayne, PA: Clinical and Laboratory Standards Institute; 2008. 2) Copeland KT, Checkoway H, McMichael AJ, Holbrook RH. Bias due to misclassification in the estimation of relative risk. AM J EPIDEMIOL 105:488-495; 1977. 3) Dixon WJ. Processing Data for Outliers. BIOMETRICS 9:74-89; 1953. 4) Horn PS, Pesce AJ. REFERENCE INTERVALS: A USER’S GUIDE. Washington, DC: AACC Press; 2005. 5) O’Neil RT. Assessment of safety. Chapter 13 in Peace KE (Ed.) BIOPHARMACEUTICAL STATISTICS FOR DRUG DEVELOPMENT. New York: Marcel Dekker; 1988. 6) Quade D, Lachenbruch PA, Whaley FS, McClish DK, Haley RW. Effects of misclassifications on statistical inferences in epidemiology. AM J EPIDEMIOL 111:503-515; 1980. 30 Version 1.0 Draft 3 7) Reed AH, Henry RJ, Mason WB. Influence of the statistical method used on the resulting estimate of normal range. CLIN CHEM 17:275-284; 1971. 8) Thompson WL, Brunelle RL, Enas GG, Simpson PJ. Routine laboratory tests in clinical trials: interpretation of results. J CLIN RESEARCH AND DRUG DEV 1:95-119; 1987. 9) Tukey J. Exploratory Data Analysis. Reding, MA: Addison-Wesley; 1977. Chuang-Stein C, Beltangady M. Reporting cumulative proportion of subjects with an adverse event based on data from multiple studies. Pharmaceut Statist, 10(1), 3-7 (2011). The FDA/Industry/Academia Safety Graphics Working Group [reference to be added] 31 Version 1.0 Draft 3 11. Acknowledgements The key contributors include: xxxx. Additional contributors and members of the white paper project within the PhUSE Development of Standard Scripts for Analysis and Programming Working Group include: Acknowledgement to others who provided text for various sections, review comments, and/or participated in discussions related to methodology: . 32 Version 1.0 Draft 3 12. Appendix 33 Version 1.0 Draft 3 Figure 12.1 Summary for Quantitative Safety Measures – Individual Study 34 Version 1.0 Draft 3 Figure 12.2 Summary of Common Treatment Emergent Abnormal for Quantitative Safety Measures – Individual Study 35 Version 1.0 Draft 3 Table 12.1 Shift from Normal/high to Low and from Normal/low to High for Laboratory Measures Laboratory Tests Shift from Normal/high to Low and from Normal/low to High Abnormality Laboratory Test Direction Treatment N Lab Test 1 Low T1 xxx T2 xxx PL xxx Lab Test 2 … Lab Test n n (%) xx(xx.x) xx(xx.x) xx(xx.x) P value* .xxx .xxx High T1 T2 PL xxx xxx xxx xx(xx.x) xx(xx.x) xx(xx.x) .xxx .xxx Low T1 T2 PL xxx xxx xxx xx(xx.x) xx(xx.x) xx(xx.x) .xxx .xxx High T1 T2 PL xxx xxx xxx xx(xx.x) xx(xx.x) xx(xx.x) .xxx .xxx … Low … T1 T2 PL … xxx xxx xxx … xx(xx.x) xx(xx.x) xx(xx.x) .xxx .xxx High T1 xxx xx(xx.x) .xxx T2 xxx xx(xx.x) .xxx PL xxx xx(xx.x) Abbreviations: N = number of patients with a normal (i.e.,. not low if calculating ‘low’ and not high if calculating ‘high’) baseline and at least one post-baseline measure, n = number of patients with an abnormal post-baseline result in the specified category. *P values are from Fisher’s Exact test, compared with PL. 36 Version 1.0 Draft 3 Table 12.2 Shift from Normal/high to Low and from Normal/low to High – Integrated Database Laboratory Test (unit) Direction Lab Test 1 High Low … Lab Test n Treatment N n (%) OR*a Heterogeneity P value*b P value*c A xxx xx(xx.x) xx.xx .xxx .xxx B xxx xx(xx.x) A xxx xx(xx.x) xx.xx .xxx .xxx B xxx xx(xx.x) … … A xxx xx(xx.x) xx.xx .xxx .xxx B xxx xx(xx.x) … High Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n = number of patients with abnormal post-baseline result. OR = Mantel-Haenszel odds ratio; add more as needed (alphabetically). *a – Mantel-Haenszel Odds Ratio stratified by study. Treatment B is numerator, treatment A is denominator. *b – Heterogeneity of odds ratios across studies was assessed using the Breslow Day test. *c – P values are from Cochran-Mantel-Haenszel (CMH) test of general association stratified by study. 37 Version 1.0 Draft 3 Table 12.2 Shift Table Analyses Lab Test Name Shifts from Last Baseline to Last Post-Baseline Result Post-Baseline Result Low Normal High Baseline Treatment Result n (%) n (%) n (%) T1 Low xx(xx.x) xx(xx.x) xx(xx.x) (N = xxx) Normal xx(xx.x) xx(xx.x) xx(xx.x) High xx(xx.x) xx(xx.x) xx(xx.x) Missing xx(xx.x) xx(xx.x) xx(xx.x) Total xx(xx.x) xx(xx.x) xx(xx.x) T2 Low xx(xx.x) xx(xx.x) xx(xx.x) (N = xxx) Normal xx(xx.x) xx(xx.x) xx(xx.x) High xx(xx.x) xx(xx.x) xx(xx.x) Missing xx(xx.x) xx(xx.x) xx(xx.x) Total xx(xx.x) xx(xx.x) xx(xx.x) PL Low xx(xx.x) xx(xx.x) xx(xx.x) (N = xxx) Normal xx(xx.x) xx(xx.x) xx(xx.x) High xx(xx.x) xx(xx.x) xx(xx.x) Missing xx(xx.x) xx(xx.x) xx(xx.x) Total xx(xx.x) xx(xx.x) xx(xx.x) Decreased Same Increased Treatment n (%) n (%) n (%) P value* T1 xx(xx.x) xx(xx.x) xx(xx.x) .xxx T2 xx(xx.x) xx(xx.x) xx(xx.x) .xxx PL xx(xx.x) xx(xx.x) xx(xx.x) Abbreviations: N = number of patients with a baseline and post-baseline result; in category; add more as needed (alphabetically). *P values are from likelihood-ratio chi-square test, compared with PL. Total n (%) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) xx(xx.x) n = number of patients 38 Version 1.0 Draft 3 Figure 12.4 Scatter Plot and Shift Summary for Quantitative Safety Measures – Integrated Database 39 Version 1.0 Draft 3 Table 12.3 Shift from Normal to Abnormal Summary for Qualitative Safety Measures – Individual Study Laboratory Test Lab Test 1 Treatment T1 T2 PL N xxx xxx xxx n (%) xx(xx.x) xx(xx.x) xx(xx.x) P value* .xxx .xxx Lab Test 2 T1 T2 PL … T1 T2 PL xxx xxx xxx … xxx xxx xxx xx(xx.x) xx(xx.x) xx(xx.x) … xx(xx.x) xx(xx.x) xx(xx.x) .xxx .xxx … Lab Test n .xxx .xxx Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n = number of patients with an abnormal post-baseline result. * – P values are from Fisher’s Exact test, compared with PL. 40