Download Analyses and Displays Associated with Outliers or

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Patient safety wikipedia , lookup

Transtheoretical model wikipedia , lookup

Clinical trial wikipedia , lookup

Adherence (medicine) wikipedia , lookup

Placebo-controlled study wikipedia , lookup

Multiple sclerosis research wikipedia , lookup

Transcript
Version 1.0 Draft 3
1.
Analyses and Displays Associated with Outliers or Shifts
from Normal to Abnormal – Focus on Vital Sign,
Electrocardiogram, and Laboratory Analyte
Measurements in Phase 2-4 Clinical Trials and
Integrated Summary Documents
Version 1.0
Created xx XXXX 201x
A White Paper by the PhUSE Computational Science Development of Standard Scripts for Analysis
and Programming Working Group
Disclaimer: The opinions expressed in this document are those of the authors and do not necessarily
represent the opinions of PhUSE, members' respective companies or organizations, or regulatory
authorities. The content in this document should not be interpreted as a data standard and/or
information required by regulatory authorities.
Note to reviewers: This is the 3rd draft sent for broad review and likely the last round. Please review
all sections. Thanks!
1
Version 1.0 Draft 3
2. Table of Contents
Section
Page
1.
Analyses and Displays Associated with Outliers or Shifts from
Normal to Abnormal – Focus on Vital Sign, Electrocardiogram, and
Laboratory Analyte Measurements in Phase 2-4 Clinical Trials and
Integrated Summary Documents ...............................................................................1
2.
Table of Contents ....................................................................................................................2
3.
Revision History ......................................................................................................................4
4.
Purpose ....................................................................................................................................5
5.
Introduction .............................................................................................................................6
6. General Considerations ...........................................................................................................7
6.1. All Measurement Types .....................................................................................................7
6.1.1.
P-values and Confidence Intervals .............................................................................7
6.1.2.
Importance of Visual Displays ...................................................................................7
6.1.3.
Conservativeness ........................................................................................................7
6.1.4.
Measurements After Stopping Study Medication ......................................................8
6.1.5.
Measurements at a Discontinuation Visit ..................................................................9
6.1.6.
Measurements Collected in Reflex Manner ...............................................................9
6.1.7.
Screening Measurements versus Special Topics .......................................................9
6.1.8.
Number of Therapy Groups .......................................................................................9
6.1.9.
Multi-phase Clinical Trials ......................................................................................10
6.1.10. Integrated Analyses ..................................................................................................10
6.2. Laboratory Analyte Measurements ..................................................................................10
6.2.1.
Planned versus Unplanned Measurements ...............................................................10
6.2.2.
Analytes Collected Qualitatively .............................................................................10
6.2.3.
Central Versus Local Laboratories ..........................................................................11
6.2.4.
Reference Limits ......................................................................................................11
6.2.5.
Above and Below Quantifiable Limits ....................................................................12
6.3. ECG Quantitative Measurements .....................................................................................12
6.3.1.
QT Correction Factors .............................................................................................12
6.3.2.
Reference Limits ......................................................................................................12
6.3.3.
JT Interval ................................................................................................................13
6.4. Vital Sign Measurements .................................................................................................13
6.4.1.
Reference Limits ......................................................................................................13
7. Tables and Figures for Individual Studies .............................................................................14
7.1. Recommended Displays ...................................................................................................14
7.2. Discussion.........................................................................................................................19
8.
Tables and Figures for Integrated Summaries .......................................................................21
2
Version 1.0 Draft 3
8.1.
8.2.
Recommended Displays ...................................................................................................21
Discussion.........................................................................................................................24
9. Example SAP Language ........................................................................................................26
9.1. Individual Study ...............................................................................................................26
9.2. Integrated Summary .........................................................................................................28
10. [To be further developed]References ....................................................................................30
11. Acknowledgements ...............................................................................................................32
12. Appendix ...............................................................................................................................33
3
Version 1.0 Draft 3
3. Revision History
Version 1.0 was finalized xx XXXX 201x.
4
Version 1.0 Draft 3
4. Purpose
The purpose of this white paper is to provide advice on displaying, summarizing, and/or analyzing measures of
outliers or shifts, with a focus on vital signs, electrocardiogram (ECG) quantitative findings, and laboratory
analyte measurements in Phase 2-4 clinical trials and integrated submission documents. This white paper also
provides advice on collection if a particular recommended display requires data to be collected in a certain
manner that may differ from current practice. The intent is to begin the process of developing industry standards
with respect to analysis and reporting for measurements that are common across clinical trials and across
therapeutic areas. In particular, this white paper provides recommended tables, figures, and listings for
measures of outliers or shifts for a common set of safety measurements. Separate white papers address other
types of data or analytical approaches (e.g., central tendency).
This advice can be used when developing the analysis plan for individual clinical trials, integrated summary
documents, or other documents in which measures of outliers or shifts are of interest. Although the focus of
this white paper pertains to specific safety measurements (vital signs, ECG quantitative findings, and laboratory
analyte measurements), some of the content may apply to other measurements (e.g., different safety
measurements and efficacy assessments). Similarly, although the focus of this white paper pertains to Phase 24, some of the content may apply to Phase 1 or other types of medical research (e.g., observational studies).
Development of standard Tables, Figures, and Listings (TFLs) and associated analyses will lead to improved
standardization from collection through data storage. (You need to know how you want to analyze and report
results before finalizing how to collect and store data.) The development of standard TFLs will also lead to
improved product lifecycle management by ensuring reviewers receive the desired analyses for the consistent
and efficient evaluation of patient safety and drug effectiveness. Although having standard TFLs is an ultimate
goal, this white paper reflects recommendations only and should not be interpreted as “required” by any
regulatory agency.
Detailed specifications for TFL or dataset development are considered out-of-scope for this white paper.
However, the hope is that specifications and code (utilizing SDTM and ADaM data structures) will be
developed consistent with the concepts outlined in this white paper, and placed in the publicly available PhUSE
Standard Scripts Repository.
5
Version 1.0 Draft 3
5. Introduction
Industry standards have evolved over time for data collection (CDASH), observed data (SDTM), and analysis
datasets (ADaM). There is now recognition that the next step would be to develop standard TFLs for common
measurements across clinical trials and across therapeutic areas. Some could argue that perhaps the industry
should have started with creating standard TFLs prior to creating standards for collection and data storage
(consistent with end-in-mind philosophy), however, having industry standards for data collection and analysis
datasets provides a good basis for creating standard TFLs.
The beginning of the effort leading to this white paper came from the PhUSE Computational Science
Collaboration, an initiative between PhUSE, FDA, and Industry where key priorities were identified to tackle
various challenges using collaboration, crowd sourcing, and innovation (Rosario, et. al. 2012). Several
Computational Science (CS) working groups were created to address a number of these challenges.
The working group titled “Development of Standard Scripts for Analysis and Programming” has led the
development of this white paper, along with the development of a platform for storing shared code. Most
contributors and reviewers of this white paper are industry statisticians, with input from non-industry
statisticians (e.g., FDA and academia) and industry and non-industry clinicians. Hopefully additional input
(e.g., other regulatory agencies) will be received for future versions of this white paper.
There are several existing documents that contain suggested TFLs for common measurements. However, many
of the documents are now relatively outdated, and generally lack sufficient detail to be used as support for the
entire standardization effort. Nevertheless, these documents were used as a starting point in the development of
this white paper. The documents include:







ICH E3: Structure and Content of Clinical Study Reports
Guideline for Industry: Structure and Content of Clinical Study Reports
Guidance for Industry: Premarketing Risk Assessment
Reviewer Guidance. Conducting a Clinical Safety Review of a New Product Application and. Preparing
a Report on the Review
ICH M4E: Common Technical Document for the Registration of Pharmaceuticals for Human Use Efficacy
ICH E14: The Clinical Evaluation of QT/QTc Interval Prolongation and Proarrhythmic Potential For
Non-Antiarrhythmic Drugs
Guidance for Industry: ICH E14 Clinical Evaluation of QT/QTc. Interval Prolongation and
Proarrhythmic Potential for Non-Antiarrhythmic Drugs
The Reviewer Guidance is considered a key document. As discussed in the guidance, there is generally an
expectation that analyses of outliers or shifts are conducted for vital signs, ECG quantitative findings, and
laboratory analyte measurements. The guidance recognizes value to both analyses of central tendency and
analyses of outliers or shifts from within reference limits to outside reference limits (below lower reference
limit or above upper reference limit). We assume both will be conducted for safety signal detection. This white
paper covers the outliers or shifts portion, with the expectation that an additional TFL or TFLs will also be
created with a focus on central tendency (see the CSS white paper pertaining to central tendency).
6
Version 1.0 Draft 3
6. General Considerations
This section contains some general considerations for the plan of analyses and displays associated with outliers
or shifts from normal to abnormal for laboratory analyte measurements, vital signs and ECG quantitative
measurements. Section 6.1 discusses general considerations for all the three safety domains. Section 6.2
discusses considerations specific to laboratory analyte measurements. Section 6.3 discusses considerations
specific to ECGs quantitative measurements. Section 6.4 discusses considerations specific to the vital signs.
6.1. All Measurement Types
6.1.1. P-values and Confidence Intervals
There has been ongoing debate on the value or lack of value of the inclusion of p-values and/or confidence
intervals in safety assessments (Crowe, et. al. 2009). This white paper does not attempt to resolve this debate.
As noted in the Reviewer Guidance, p-values or confidence intervals can provide some evidence of the strength
of the finding, but unless the trials are designed for hypothesis testing, these should be thought of as descriptive.
Throughout this white paper, p-values and measures of spread are included in several places. Where these are
included, they should not be considered as hypothesis testing. If a company or compound team decides that
these are not helpful as a tool for reviewing the data, they can be excluded from the display.
Some teams may find p-values and/or confidence intervals useful to facilitate focus, but have concerns that lack
of “statistical significance” provides unwarranted dismissal of a potential signal. Conversely, there are concerns
that due to multiplicity issues, there could be over-interpretation of p-values adding potential concern for too
many outcomes. Similarly, there are concerns that the lower- or upper-bound of confidence intervals will be
over-interpreted. (A percentage can be as high as x causing undue alarm.) It is important for the users of these
TFLs to be educated on these issues.
6.1.2. Importance of Visual Displays
Communicating information effectively and efficiently is crucial in detecting safety signals and enabling
decision-making. Current practice, which focuses on tables and listings, has not always enabled us to
communicate information effectively since tables and listings may be very long and repetitive. Graphics, on the
other hand, can provide more effective presentation of complex data, increasing the likelihood of detecting key
safety signals and improving the ability to make clinical decisions. They can also facilitate identification of
unexpected values.
Standardized presentation of visual information is encouraged. The FDA/Industry/Academia Safety Graphics
Working Group was initiated in 2008. The working group was formed to develop a wiki and to improve safety
graphics best practice. It has recommendations on the effective use of graphics for three key safety areas:
adverse events, ECGs and laboratory analytes. The working group focused on static graphs, and their
recommendations were considered while developing this white paper. In addition, there has also been
advancement in interactive visual capabilities. The interactive capabilities are beneficial, but are considered
out-of-scope for this version of the white paper.
6.1.3. Conservativeness
The focus of this white paper pertains to clinical trials in which there is comparator data. As such, the concept
of “being conservative” is different than when assessing a safety signal within an individual subject or a single
7
Version 1.0 Draft 3
arm. A seemingly conservative approach may end up not being conservative in the end. For example, for
studies that collect safety data during an off-drug follow-up period, one might consider it conservative to
include the adverse events reported in the follow-up period. However, this approach may result in smaller odds
ratios than including only the exposed period in the analysis. Another example occurs when choosing cut-offs
for shift/outlier analyses. A conservative approach for defining outcomes, from a single arm perspective, is one
that would lead to a higher number of patients reaching a threshold. However, a conservative approach for
defining outcomes may actually make it more difficult to identify safety signals with respect to comparing
treatment with a comparator (see Section 7.1.7.3.2 in the Reviewer Guidance). Thus, some of the approaches
recommended in this white paper may appear less conservative than alternatives, but the intent is to propose
methodology that can identify meaningful safety signals for a treatment relative to a comparator group.
6.1.4. Measurements After Stopping Study Medication
Measurements collected after stopping medications under study (e.g., treatment under study and comparators)
are common for various reasons. In some cases, “follow-up” phases are included to monitor patients for a
period of time after study medication is stopped. Additionally, study designs where keeping patients in a study
(for the entire planned length of time) after deciding to stop medication early are becoming more popular. In
these cases, patients can be off study medication for an extended period of time.
Measurements post study medication can also arise not by design. For example, a subject can decide to stop
study medication at any time, and then later attend the planned visit where the planned measurements are
obtained. There is currently no standard approach on how to handle safety assessments post study medication.
Some guidances contain advice on how long to collect safety measurements post study medication (e.g, 30 days
post or, x half-lives). Any advice or decisions related to the collection of safety measurements post study
medication should not be confused with how to include such data in displays and/or analyses. It is extremely
important to document within the database for analysis the best estimate of the last date study treatment was
taken as well as dates on which all numerical safety data were collected so that an accurate determination can be
made of time of data collection relative to last dose of medication.
We recommend that the TFLs in this white paper generally exclude measurements taken during a “follow-up”
phase. Separate TFLs can be created for the follow-up phase and/or the treatment and follow-up phases
combined. We also recommend that the TFLs in this white paper exclude measurements taken after the visit
which is considered the “study medication discontinuation” visit. In the study designs which keep patients in a
study for the entire planned length of time even after stopping medication, separate TFLs can be created for the
“off-medication” time and/or the treatment and “off-medication” times combined. This enables the researcher
to distinguish between drug-related safety signals versus safety signals that could be more related to
discontinuing a drug (e.g., return of disease symptoms, introduction of a concomitant medication, and/or
discontinuation- or withdrawal-effects of the drug) or due to subsequent therapy. We assume it is important to
distinguish among these. Generally, at least some TFLs that include data from follow-up phases and/or “offmedication” time will be required, but not usually as many as done for during treatment and not necessarily in
the same format as provided in this white paper. For some compounds (e.g., compounds with a long half-life
compared to the duration of the study, compounds used for a very short time like antibiotics), a more complete
set of TFLs including such data may be required. The ease of interpretation from such TFLs will vary
depending on the compound, disease, and/or design aspects, such as, the half-life of the compound, likelihood
of taking alternative therapy, allowed concomitant medications during the observation period, etc.
8
Version 1.0 Draft 3
For the case where a subject decides to stop study medication at any time and then later attends the planned visit
to obtain the planned measurements, we recommend measures taken at the study medication discontinuation
visit be included. Although some patients may be off medication, the time is generally short in these situations.
For this example, the inclusion of such measurements may more accurately reflect the safety profile of a
compound versus their exclusion. In study designs with a long period of time between visits, an alternative
approach may be warranted.
6.1.5. Measurements at a Discontinuation Visit
When creating displays or conducting analyses over time, how to handle data collected at discontinuation visits
should be specified. Since a subject’s discontinuation visit isn’t always aligned with planned timing, it’s not
obvious whether to include these measurements in displays or analyses over time. Such measurements are
“planned” per protocol, but not consistent with the planned timing. We generally recommend including
measures taken at the discontinuation visit toward the next timepoint. For example, if a patient discontinues
medication and the study between Visits 6 and 7, goes to the office for their discontinuation visit, we
recommend that the measurements taken at the discontinuation visit are grouped with “Visit 7”. The inclusion
of such measurements may more accurately reflect trends over time for the compound than their exclusion. In
study designs with a long period of time between visits, an alternative approach may be warranted.
6.1.6. Measurements Collected in Reflex Manner
In study designs, it is possible to have some measurements collected only when another measurement meets a
certain criteria (i.e., collected in a reflex manner). For example, sometimes a peripheral smear is only
performed when certain Complete Blood Count (CBC) analytes meet a specified threshold. How to handle
such measurements should be specified in analysis planning, which requires an understanding of collection
practices. Generally, measurements collected in a reflex manner would be used for individual patient
management and possibly for individual patient listings or individual case descriptions (e.g., as included in
patient narratives). Summaries of such measurements within or between treatment groups tend to be
uninterpretable as you can not generally assume normality among those who did not have the measurement, and
a summary among those meeting the critieria for receiving the measurement (sometimes a very small
denominator) tends not to be very helpful for signal detection purposes.
6.1.7. Screening Measurements versus Special Topics
The focus of this white paper pertains to measurements as part of normal safety screening. For many
compounds, some measurements are relevant to addressing a-priori special topics of interest. In these cases, it
is possible that additional TFLs and/or different TFLs are warranted. TFLs designed for special topics are outof-scope for this white paper. In addition, it is possible that additional TFLs are warranted when a safety signal
is identified using the TFLs recommended in this white paper and/or the TFLs that focus on central tendency
(separate white paper). Additional TFLs that would be considered “post-hoc” for further investigation are
considered out-of-scope.
6.1.8. Number of Therapy Groups
The example TFLs show one treatment arm versus comparator in this version of the white paper. Most TFLs
can be easily adapted to include multiple treatment arms or a single arm.
9
Version 1.0 Draft 3
6.1.9. Multi-phase Clinical Trials
The example TFLs for individual studies show two treatment arms and a comparator arm within a controlled
phase of a study. The example TFLs for integrated summaries show one treatment arm (assumes all the treated
arms pooled) and a comparator arm within the controlled phase of the studies. Discussion around additional
phases (e.g., open-label extensions) is considered out-of-scope in this version of the white paper. Many of the
TFLs recommended in this white paper can be adapted to display data from additional phases and/or additional
treatment arms.
6.1.10. Integrated Analyses
For submission documents, TFLs are generally created from using data from multiple clinical trials.
Determining which clinical trials to combine for a particular set of TFLs can be complex. Section 7.4.1 of the
Reviewer Guidance contains a discussion of points to consider. Generally, when p-values are computed,
adjusting for study is important. Creating visual displays or tables in which timepoints or treatment
comparisons are confounded with study is discouraged. Understanding whether the overall representation
accurately reflects the review across individual clinical trial results is important.
6.2. Laboratory Analyte Measurements
The following topics generally pertain to laboratory analyte measurements, though they may apply to other
measurement types, as well. In these cases, the discussion below may or may not apply.
6.2.1. Planned versus Unplanned Measurements
One topic that tends to be unique to safety (laboratory analyte measurements in particular) is the collection of
unplanned measurements. Unplanned safety measurements can arise for various reasons. During a study, the
clinical investigator sometimes orders a repeat test or “retest” of a laboratory test especially if he/she has
received an unexpected value. The investigator may also request the patient return for a “follow-up visit” due
to clinical concerns. In general, retests are repeat tests performed because an initial test result had an
unexpected value. The repeat result may either confirm the initial test results, or (less commonly) suggest that a
laboratory error occurred in the case of the initial result. Retests are often performed to verify that the action
taken by the investigator (e.g., changing the dose of study drug as allowed by the protocol) has the desired
effect (e.g., test results have returned to within reference limits). If such retests are conducted until desired
measurement results have been reached, analyses from baseline to last observation, for example, would be
biased toward “normality”. Thus, we recommend including only planned measurements when creating displays
or conducting analyses over time and when assessing change from baseline to endpoint. However, we
recommend including planned and unplanned measurements for analyses that focus on outliers or shifts across
an entire period, as these are intended to focus on the most extreme changes.
6.2.2. Analytes Collected Qualitatively
Some laboratory analyte measurements are collected in a qualitative manner that is usually binary (e.g. Elliptcytes:
normal/abnormal) or ordinal (e.g. Spherocytes: 0 [imply by lack of reporting], +, ++, +++, ++++). Some analytes
have a numeric value when present, but is better treated as qualitative data (e.g., atypical lymphocytes, a type of
abnormal white blood cell seen with some viral infections, should be treated as present, not present).
How to handle such analytes should be included in analysis planning. In general, a listing of abnormal findings is
sufficient.
10
Version 1.0 Draft 3
A summary of those shifting from normal during the pre-treatment period to abnormal during the treatment period
can also be considered. Converting qualitative measurements to abnormal versus normal categories when it is not
collected as abnormal and normal, is usually defined by laboratories and included in routine data transfers, but
should be confirmed and well understood by study teams.
6.2.3. Central Versus Local Laboratories
In recent years, most large studies have utilized a central laboratory to ensure consistency in laboratory
assessments across institutions. However, there are times when this is not feasible. For example, some studies
may need to utilize local laboratories due to the nature of the study. There are also cases where the scheduled
labs are done using a central laboratory, but ad-hoc local laboratory results are done as needed for patient care.
Generally, results from different laboratories should not be combined, unless careful review of laboratory assay
methods and laboratory limit determination methods have been deemed consistent. When feasible, samples can
be split such that the local laboratory results can be provided for urgent patient care, but results from the central
laboratory would also be available. If you adopt such practice, including data from the central laboratory only
is sufficient.
6.2.4. Reference Limits
Laboratories generally maintain reference limits that can be used to screen for potential pathology. Methods to
develop such limits vary, but many are developed with individual subject safety monitoring in mind. Thus, the
limits from many laboratories tend to be “sensitive” (reduced false negatives). Several statistical authors
(Copeland et al, 1977; O’Neil, xxxx; Quade et al, 1980) have presented arguments suggesting that conventional
reference limits with limits set at the 2.5th and 97.5th percentiles (95 percentile reference interval; commonly
used method for reference limit determination) after removal of outliers (Clinical and Laboratory Standards
Institute, 2008) might not be optimal for outlier / shift categorical analysis of laboratory analytes aimed at
detecting differences between groups. Quade et al (1980) in particular discussed the impact of misclassification
on estimation of incidence and on the power of an inferential test to detect a difference between groups when it
exists. Translating this into a problem of choosing reference limits determined by the reference interval, it will
be more important to choose a limit that is extreme enough that specificity remains high, but one that is not so
high as to decrease sensitivity to a very low value. The choice of optimal reference limit will be data dependent
and is likely to be variable across analytes, but using the principles that specificity has a greater effect than
sensitivity, we can make reasonable choices that could be superior to reference limits provided by the
laboratory. Currently, such alternatives are not widely available. When such alternatives are available, their
use is generally recommended.
Another aspect of choosing an optimal reference limit pertains to the population in which reference limits are
developed. “Reference individuals (patients)” are the individuals from whom biological samples are collected
for measuring an analyte in order to establish the reference limits for the analyte. For clinical use, the reference
sample group is generally determined to be healthy by some means. This is appropriate for screening individual
patients for presence or lack of health. Authors have suggested that it might be important to tailor the reference
population based on the purpose for which the derived reference limits will be used (Solberg, xxx). This could
include using a reference sample of clinical trial patients, or clinical trial patients with the disease under study.
As with limits developed using higher percentile reference intervals, limits developed using alternative
populations are not widely available. When such alternatives are available, their use is generally recommended.
11
Version 1.0 Draft 3
For some laboratory analytes, clinical thresholds (e.g., Fasting Glucose ≥126 mg/dL) have been published and
can be considered for use in outlier/shift summaries and analyses. Use of clinically-derived limits is
recommended (likely in addition to use of statistically-derived limits) when the analyte is of special interest.
For purposes of this white paper, it is assumed a reference limit is chosen that would identify values as low,
normal, or high for quantitative measurements. For qualitative measurements, it is assumed observations would
be identified as normal or abnormal. Providing a specific recommendation for the reference limits is out-ofscope for this version of the white paper. The specific choice of limit should be documented (protocol,
Statistical Analysis Plan, study report methods section, etc.). Reference limits for a laboratory analyte may vary
across demographics. For example, reference limits for a laboratory analyte may be different for < 45 years old
and ≥ 45 years old. We recommend using the reference limit according to the patient’s real age at the time the
laboratory measurement was taken instead of using the patient’s age entering the study.
6.2.5. Above and Below Quantifiable Limits
Values above or below quantitative range (eg, <0.0001) include critical information and should not be
discarded. Such values can generally be categorized as low or high and their inclusion in outlier/shift
summaries and analyses is recommended.
6.3. ECG Quantitative Measurements
Special considerations for “thorough QT/QTc studies” are considered out-of-scope for this white paper.
6.3.1. QT Correction Factors
As noted in the ICH QT/QTc guidance (Section IA; Background), because of its inverse relationship to heart
rate, the measured QT interval is routinely corrected by means of various formulae to a less heart-ratedependent value known as the QTc interval. Section IIIA of the same guidance provides a discussion of some
of the various correction formulas and notes the controversy around appropriate corrections. Generally, we
recommend that the TFLs include the corrected QT interval using Fridericia’s method (QTcF = QT/RR0.33). We
believe the regulatory and medical environments are ready to accept the exclusion of Bazett’s method from
standard TFLs. We believe a second method would likely be warranted for a more complete evaluation. The
second method could be one that is derived from a linear regression technique (Dmitrienko, et. al. 2005).
6.3.2. Reference Limits
As with laboratory reference limits, the choice for ECG reference limits can be controversial. Unlike
laboratory analytes, use of clinically-derived limits are commonly used for ECG outlier/shift summaries and
analyses. In addition, it’s common to include clinically-derived limitsfor both raw measures and change values
for identifying patients of potential concern. Unfortunately, the specific clinically-derived thresholds that are
used vary widely, hampering efforts to standardize analysis data across the industry. For purposes of this white
paper, it is assumed a reference limit is chosen that would identify raw values as low, normal, or high.
Providing a specific recommendation for the reference limits for either raw measures or changes is out-of-scope
for this version of the white paper. The specific choice of limits should be documented (protocol, Statistical
Analysis Plan, study report methods section, etc.).
12
Version 1.0 Draft 3
6.3.3. JT Interval
QTc is a biomarker with a long established history of being used to assess the duration of ventricular repolarization.
However, QTc encompasses both ventricular depolarization and ventricular repolarization. The length of the QRS
complex represents ventricular depolarization and the length of the JT interval, measured from the end of the QRS
complex to the end of the T-wave, specifically represents ventricular repolarization. JT can be corrected for heart
rate as with QT. Thus, when the QRS is prolonged (e.g., a complete bundle branch block), QTc should not be used to
assess ventricular repolarization. The decision as to which basis for assessing potential changes in ventricular
repolarization will be used should be based on the expected proportion of patients with widened QRS complexes for
any reason in that study. It is worth noting that this proportion increases with the age of the patient population and
the extent to which the population is expected to suffer cardiac disease.
6.4. Vital Sign Measurements
6.4.1. Reference Limits
As with laboratory and ECG reference limits, the choice for vital sign reference limits can be controversial.
Similar to ECG limits, use of clinically-derived limits are commonly used for vital sign outlier/shift summaries
and analyses, but vary widely. For purposes of this white paper, it is assumed a reference limit is chosen that
would identify raw values as low, normal, or high. Providing a specific recommendation for the reference
limits for either raw measures or changes is out-of-scope for this version of the white paper. The specific
choice of limits should be documented (protocol, Statistical Analysis Plan, study report methods section, etc.).
13
Version 1.0 Draft 3
7. Tables and Figures for Individual Studies
7.1. Recommended Displays
For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs in which low
and high limits are based on raw values without a change or percent change criterion, a 3-panel display that
includes a scatterplot, shift table, and a shift to low/high table is recommended. See Figures 7.1 and 7.2. In the
scatterplot portion, lines indicating the reference limits are included to ease the review of the plots. In cases
where limits vary across demographic characteristics and/or laboratories, lines indicating the most common
limit can be displayed, which is especially a good option if the population under study contains a relatively
large percentage of a particular demographic. Alternatively, lines for the lowest of the high limits and the
highest of the low limits can be displayed. Displaying lines for all limits can be considered but will likely be
too confusing to the users of the display.
Figure 7.1 is an example for assessing low values, and Figure 7.2 is an example for assessing high values.
Two sets of visuals, one for low and one for high for each laboratory analyte, vital sign and ECG are generally
desired. The summary of shifts from normal/high to low includes patients whose minimum baseline value is
normal or high. The summary of shifts from normal/low to high includes patients whose maximum baseline
value is low or normal.
For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs in which low
and high limits are based on a specified change or percent change value or a combination of a specified value and a
change or percent change, a 2-panel display that includes a scatterplot and a shift to low/high is recommended. See
Figures 7.3 and 7.4.
For laboratory analyte measurements collected qualitatively, a listing of abnormal findings is recommended
(Table 7.1).
For the shift from normal/high to low and shift from normal/low to high summaries, a test to compare
treatments using Fisher’s exact test can be included as reflected in Figures 7.1 through 7.4.
14
Version 1.0 Draft 3
Figure 7.1
Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing Low Value
– Individual Study
15
Version 1.0 Draft 3
Figure 7.2
Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing High Value
– Individual Study
16
Version 1.0 Draft 3
Figure 7.3
Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing Low Value with Change
Criteria – Individual Study
17
Version 1.0 Draft 3
Figure 7.4
Scatter Plot and Shift Summary for Quantitative Safety Measures Assessing High Value with Change
Criteria – Individual Study
18
Version 1.0 Draft 3
Table 7.1
Treatment-Emergent Abnormal Summary for Qualitative Safety Measures –
Individual Study
Laboratory Test (unit)
Lab Test 1
Lab Test 2
…
Lab Test n
Treatment
T1
N
xxx
n (%)
xx(xx.x)
P value*
.xxx
T2
xxx
xx(xx.x)
.xxx
PL
xxx
xx(xx.x)
T1
xxx
xx(xx.x)
.xxx
T2
xxx
xx(xx.x)
.xxx
PL
xxx
xx(xx.x)
…
…
T1
xxx
xx(xx.x)
.xxx
T2
xxx
xx(xx.x)
.xxx
PL
xxx
xx(xx.x)
…
Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n
= number of patients with abnormal post-baseline result.
* – P values are from Fisher’s Exact test compare with PL.
7.2. Discussion
There are certainly multiple ways to display outlier/shift summaries. For quantitative laboratory analyte
measurements, quantitative ECG measurements, and vital signs, we considered only diplaying the scatterplots,
only displaying the shift tables (See Table 12.1), and only displaying a shift to low/high table (See Table 12.2).
We also considered a display that combined the boxplot (from the central tendency white paper) with a
treatment-emergent table (See Figure 12.1).
We quickly discarded only displaying the scatterplots. With just a scatterplot, users of the plots will likely be
attempting to count and create percentages manually. We also quickly discarded only displaying the shift table
(Table 12.1) for similar reasons. Users of the shift tables tend to count and create grouped percentages
manually for those shifting to high from low/normal (or low from normal/high). Of note, shift tables become
complex to create and difficult to interpret if the definition of an outlier/shift includes a specified change or
percent change value. Thus, creating the shift table is not recommended in these cases. We strongly considered
only displaying shifts to low/high with all analytes on the table (Table 12.2). This table has the advantage of
being succinct and still reflecting the data that tends to be the most useful for signal detection. However,
feedback from the medical community has indicated a desire to sort information by analyte as opposed to by
analytical method. Thus, there’s a preference to have the ability to see the central tendency summary followed
by the outlier/shift summary by analyte. Table 12.2 is not suited for this type of presentation. Thus, we
strongly considered the display that has the boxplot and shift to low/high summary on the same page (Figure
12.1). This has the advantage of being succinct (one page for each analyte) and is by analyte consistent with
medical preferences. However, the 3-panel display is recommended since we believe the additional information
provided in the scatterplot is generally worth the extra page per analyte. Researchers can visually see the extent
of the shift (how high or how low relative to baseline measurements) and can visually see if there’s a clustering
by treatment. The shift table portion is perhaps of less value, but it is quite popular across current practices.
19
Version 1.0 Draft 3
Thus, it will likely help researchers who are used to shift tables have what they are used to seeing while also
seeing other useful displays (when limits are based on specified values without change criterion). The
information can still be sorted by analyte in the clinical study report (likely in the appendix) – the boxplot
followed by the 3-panel outlier/shift diplay sorted by analyte. This also makes it easy to bring the analytes that
end up being interesting into the body of the clinical study report. If a table such as Table 12.2 is created, a
manually created summary or a new table would be required to discuss the analyte of interest in the body of the
clinical study report.
For laboratory analyte measurements collected qualitatively, a shift from normal to abnormal table was
considered (Table 12.3). In most cases, we believe the listing of abnormal findings is sufficient. For any
analyte part of a topic of special interest, then a shift from normal to abnormal table will likely be of interest.
As noted in Section 6.2.2, it would be important to understand data collection to properly create the table.
We also considered another display for summarizing information in a succinct manner within the body of a
clinical study report. See Figure 12.2. This display has the advantage of quickly browsing through all the
analytes that were analyzed sorting by decreasing odds ratios. However, given the medical feedback to present
information sorted by analyte as opposed to analytical method, we believe the approach to bring forward the
boxplot and 3-panel outlier/shift figure into the body of the clinical study report for those analytes of interest
(with all the displays in the appendix), discussed by analyte, is a preferred approach.
20
Version 1.0 Draft 3
8. Tables and Figures for Integrated Summaries
8.1. Recommended Displays
For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, a display
that includes scatterplots by study, and a shift to low/high table is recommended for summaries across
studies. See Figure 8.1 (for low values) and Figure 8.2 (for high values). For cut-off criteria including change
value, additional reference line will be added to the scatter plots in Figure 8.1 and Figure 8.2 (see Figure 7.3
and 7.4).
For laboratory analyte measurements collected qualitatively, the same listing of abnormal findings (Listing
7.1) for individual studies is recommended for integrated summaries, or a shift from normal to abnormal table
can be considered (Table 8.1).
21
Version 1.0 Draft 3
Figure 8.1 Scatter Plot and Shift Summary for Quantitative Safety Measures for Low Value
– Integrated Database
22
Version 1.0 Draft 3
Figure 8.2 Scatter Plot and Shift Summary for Quantitative Safety Measures for High Value
– Integrated Database
23
Version 1.0 Draft 3
Table 8.1
Treatment-Emergent Abnormal Summary for Qualitative Safety Measures –
Integrated Database
Laboratory Test
(unit)
Treatment
N
n (%)
OR*a
Heterogeneity
P value*b
P value*c
Lab Test 1
A
xxx
xx(xx.x)
xx.xx
.xxx
.xxx
B
xxx
xx(xx.x)
A
xxx
xx(xx.x)
xx.xx
.xxx
.xxx
B
xxx
xx(xx.x)
…
…
A
xxx
xx(xx.x)
xx.xx
.xxx
.xxx
B
xxx
xx(xx.x)
Lab Test 2
…
Lab Test n
…
Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n
= number of patients with abnormal post-baseline result. OR = Mantel-Haenszel odds ratio; add more as
needed (alphabetically).
*a – Mantel-Haenszel Odds Ratio stratified by study. Treatment B is numerator, treatment A is
denominator.
*b – Heterogeneity of odds ratios across studies was assessed using the Breslow Day test.
*c – P values are from Cochran-Mantel-Haenszel (CMH) test of general association stratified by study.
8.2. Discussion
For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, utilizing the
same display recommended for individual studies was considered (3-panel display with a single scatterplot, shift
table, and shift to low/high table with meta-analytical methods added. However, due to concerns with potential
paradoxes (can we reference our book chapter?) when combining data from multiple studies, a single scatterplot
(with studies combined) and a single shift table (with studies combined) was discarded. Instead, a scatterplot by
study is recommended (unless the number of studies prohibits the use of such a display). A shift table by study
is not recommended due to space limitations, but would be available in the study reports of the individual
studies. The percentages provided in the shift to low/high table are subject to similar potential paradoxes,
however can be reviewed in context with the Mantel-Haenszel odds ratio (which does account for study). Users
of the figure would need to be educated to look for situations when the odds ratio appears inconsistent with the
presented percentages. In such cases, the odds ratio would reflect the data more appropriately and
understanding by-study results would be important. Utilizing methods to provide “adjusted cumulative
proportions” suggested by Chuang-Stein, et al (2011) might also be useful, but considered out-of-scope for this
version of the white paper.
Another display that was considered was a shift to low/high table with a corresponding forest plot that shows
incidence differences by study (Figure 12.4.). This display has the advantage of being practical even when
many studies are included in a summary. However, when the number of studies is small enough (e.g., 6 or less)
the scatterplot is recommended as it provides insight to patient level information by individual study that is
often very valuable for users of the figure. When the number of studies is large (e.g., >6), Figure 12.4 can be
considered.
24
Version 1.0 Draft 3
A simple shift to low/high table was also considered (Table 12.2). Again, we would strongly recommend the
display shown in Figure 8.1 over this display for integrate analysis when the number of studies is small enough
for the same reason stated above. When the number of studies is large (e.g., >6), Table 12.2 can be considered.
25
Version 1.0 Draft 3
9. Example SAP Language
9.1. Individual Study
For quantitative laboratory analyte measurements, 3-panel displays that include a scatterplot, shift table, and a shift
to high/low table will be created. Specifically, for each measurement, both a 3-panel display assessing low values
and a 3-panel diplay assessing high values will be created.
In the 3-panel display to assess low values, the scatterplot will plot the minimum value during the baseline period
versus the minimum value during the treatment period. Lines indicating the reference limits are included. In
cases where limits vary across demographic characteristics, lines indicating the most common limit will be
displayed. The shift table will include the number and percentage of patients within each baseline category
(minimum value is low, normal, high, or missing) versus each treatment category (minimum value is low,
normal, or high) by treatment. Patients with at least one result in the treatment period will be included in the
shift table. The shift from normal or high to low table will include the number and percentage of patients by
treatment whose minimum baseline result is normal or high and whose minimum treatment result is low.
Patients whose minimum baseline result is normal or high and have at least one result during the treatment
period are included. The Fisher’s exact test will be used to compare percentages of patients who shift from
normal or high to low between treatments.
The 3-panel display to assess high values will be created similarly. The scatterplot will plot the maximum
value during the baseline period versus the maximum value during the treatment period. The shift table will
include the number and percentage of patients within each baseline category (maximum value is low, normal,
high, or missing) versus each treatment category (maximum value is low, normal, or high) by treatment. The
shift from normal or low to high table will include the number and percentage of patients by treatment whose
maximum baseline result is normal or low and whose maximum treatment result is high. Patients whose
maximum baseline result is normal or low and have at least one result during the treatment period are included.
For laboratory analyte measurements collected qualitatively, a listing of abnormal findings will be created.
The listing will include patient ID, treatment group, laboratory collection date, analyte name, analyte finding.
For quantitative ECG measurements and vital signs with limits defined using a specified value without a change
criterion, 3-panel displays will be created as described above. For quantitative ECG measurements and vital signs
with limits defined using a specified value and a change criterion, 2-panel displays will be created. The 2-panel
display will include the scatterplot and the shift to low/high table. To assess increases, change from the
maximum value during the baseline period to the maximum value during the treatment period will be used. To
assess decreases, change from the minimum value during the baseline period to the minimum value during the
treatment period will be used.
Laboratory tests include all planned analytes as defined in the protocol, excluding those collected in a reflex
manner (only collected under certain circumstances). Alanine aminotransferase (ALT), aspartate aminotransferase
(AST), and total bilirubin will not be included in this analysis as they will be analyzed as described in the
hepatotoxicity section. Vital signs include systolic blood pressure, diastolic blood pressure, pulse, and temperature.
26
Version 1.0 Draft 3
Physical characteristics include weight and BMI. ECG parameters include heart rate, PR, QRS, QT, corrected QT
using Fredericia’s correction factor (QTcF=QT/RR0.333), and corrected QT using a large clinical trial population
based correction factor (QTcLCTPB=QT/RR0.413; Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun
DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges derived from a standardized clinical
trial population. DRUG INF J 39:395-405; 2005) When the QRS is prolonged (for example, a complete bundle
branch block), QT and QTc should not be used to assess ventricular repolarization. Thus, for a particular ECG, the
following will be set to missing (for analysis purposes) when QRS is ≥120: QT, QTcF and QTcLCTPB.
Large clinical trial population based reference limits will be used to define the low and high limits for laboratory
analyte measurements (Reference x or Attachment x – not shown in this example). Reference limits for ECGs and
vital signs are defined in Table 9.1 and 9.2, respectively..
Table 9.1
Selected Categorical Limits for ECG Data
Parameter
Heart Rate
(bpm)
PR Interval
(msec)
QRS Interval
(msec)
QTcF (msec)
QTcLCTPB
(msec)1
Males
Age (yrs): limit
≥18: <50 and
decrease ≥15
Low
Females
Age (yrs): limit
≥18: <50 and
decrease ≥15
Males
Age (yrs): limit
≥18: >100 and
increase ≥15
High
Females
Age (yrs): limit
≥18: >100 and
increase ≥15
All ages: <120
All ages: <120
All ages: ≥220
All ages: ≥220
All ages: <60
All ages: <330
All ages: <330
All ages: <60
All ages: <340
All ages: <340
All ages: ≥120
≥16: >450
<18: >444
18-25: >449
26-35: >438
36-45: >446
46-55: >452
56-65: >448
>65: >460
All ages: ≥120
≥16: >470
<18: >445
18-25: >455
26-35: >455
36-45: >459
46-55: >464
56-65: >469
>65: >465
NA=Not applicable
1. Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference
ranges derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005
Table 9.2
Categorical Criteria for Abnormal Treatment-Emergent Blood Pressure
and Pulse Measurement, and Categorical Criteria for Weight and
Temperature Changes for Adults
Parameter
Low
mmHg
High
mmHg
Systolic BP (mm Hg)
(Supine or sitting –
forearm at heart level)
Diastolic BP (mm Hg)
(Supine or sitting –
forearm at heart level)
Pulse (bpm)
(Supine or sitting)
Temperature
≤ 90 and decrease from baseline
≥ 20
≥ 140 and increase from
baseline ≥ 20
≤ 50 and decrease from baseline
≥ 10
≥ 90 and increase from
baseline ≥ 10
< 50 and decrease from baseline
≥ 15
< 96 degrees F and
decrease ≥ 2 degrees F
> 100 and increase from
baseline ≥ 15
≥ 101 degrees F and
increase ≥ 2 degrees F
27
Version 1.0 Draft 3
9.2. Integrated Summary
For quantitative laboratory analyte measurements, quantitative ECG measurements, and vital signs, a display
that includes scatterplots by study, and a shift to low/high table will be created. Specifically, for each
measurement, both a 2-panel display assessing low values and a 2-panel display assessing high values will be
created.
In the 2-panel display to assess low values, the scatterplots will plot the minimum value during the baseline period
versus the minimum value during the treatment period for each study. Lines indicating the reference limits are
included. For cut-off criteria including a change value, an additional reference line will be added to the
scatterplots. In cases where limits vary across demographic characteristics, lines indicating the most common
limit will be displayed. The shift from normal or high to low table will include the number and percentage of
patients by treatment whose minimum baseline result is normal or high and whose minimum treatment result is
low. Patients whose minimum baseline result is normal or high and have at least one result during the
treatment period are included. The Cochran-Mantel-Haenszel test stratified by study will be used to compare
percentages of patients who shift from normal or high to low between treatments. The Mantel-Haenszel odds
ratio and Breslow-Day test for heterogeneity will also be provided.
The 2-panel display to assess high values will be created similarly. The scatterplots will plot the maximum
value during the baseline period versus the maximum value during the treatment period for each study. The
shift from normal or low to high table will include the number and percentage of patients by treatment whose
maximum baseline result is normal or low and whose maximum treatment result is high. Patients whose
maximum baseline result is normal or low and have at least one result during the treatment period are included.
For quantitative ECG measurements and vital signs with limits defined using a specified value and a change
criterion, change from the maximum value during the baseline period to the maximum value during the
treatment period will be used to assess increases. Change from the minimum value during the baseline period
to the minimum value during the treatment period will be used to assess decreases.
For laboratory analyte measurements collected qualitatively, a listing of abnormal findings will be created.
The listing will include patient ID, treatment group, laboratory collection date, analyte name, analyte finding.
Laboratory tests include all planned analytes as defined in the protocol, excluding those collected in a reflex
manner (only collected under certain circumstances). Alanine aminotransferase (ALT), aspartate
aminotransferase (AST), and total bilirubin will not be included in this analysis as they will be analyzed as
described in the hepatotoxicity section. Vital signs include systolic blood pressure, diastolic blood pressure,
pulse, and temperature. Physical characteristics include weight and BMI. ECG parameters include heart rate,
PR, QRS, QT, corrected QT using Fredericia’s correction factor (QTcF=QT/RR0.333), and corrected QT using a
large clinical trial population based correction factor (QTcLCTPB=QT/RR0.413; Dmitrienko AA, Sides GD,
Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference ranges
derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005) When the QRS is
prolonged (for example, a complete bundle branch block), QT and QTc should not be used to assess ventricular
28
Version 1.0 Draft 3
repolarization. Thus, for a particular ECG, the following will be set to missing (for analysis purposes) when
QRS is ≥120: QT, QTcF and QTcLCTPB.
Large clinical trial population based reference limits will be used to define the low and high limits for
laboratory analyte measurements (Reference x or Attachment x – not shown in this example). Reference limits
for ECGs and vital signs are defined in Table 9.1 and 9.2, respectively..
Table 9.1
Selected Categorical Limits for ECG Data
Parameter
Heart Rate
(bpm)
PR Interval
(msec)
QRS Interval
(msec)
QTcF (msec)
QTcLCTPB
(msec)1
Males
Age (yrs): limit
≥18: <50 and
decrease ≥15
Low
Females
Age (yrs): limit
≥18: <50 and
decrease ≥15
Males
Age (yrs): limit
≥18: >100 and
increase ≥15
High
Females
Age (yrs): limit
≥18: >100 and
increase ≥15
All ages: <120
All ages: <120
All ages: ≥220
All ages: ≥220
All ages: <60
All ages: <330
All ages: <330
All ages: <60
All ages: <340
All ages: <340
All ages: ≥120
≥16: >450
<18: >444
18-25: >449
26-35: >438
36-45: >446
46-55: >452
56-65: >448
>65: >460
All ages: ≥120
≥16: >470
<18: >445
18-25: >455
26-35: >455
36-45: >459
46-55: >464
56-65: >469
>65: >465
NA=Not applicable
1. Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR. Electrocardiogram reference
ranges derived from a standardized clinical trial population. DRUG INF J 39:395-405; 2005
Table 9.2
Categorical Criteria for Abnormal Treatment-Emergent Blood Pressure
and Pulse Measurement, and Categorical Criteria for Weight and
Temperature Changes for Adults
Parameter
Low
mmHg
High
mmHg
Systolic BP (mm Hg)
(Supine or sitting –
forearm at heart level)
Diastolic BP (mm Hg)
(Supine or sitting –
forearm at heart level)
Pulse (bpm)
(Supine or sitting)
Temperature
≤ 90 and decrease from baseline
≥ 20
≥ 140 and increase from
baseline ≥ 20
≤ 50 and decrease from baseline
≥ 10
≥ 90 and increase from
baseline ≥ 10
< 50 and decrease from baseline
≥ 15
< 96 degrees F and
decrease ≥ 2 degrees F
> 100 and increase from
baseline ≥ 15
≥ 101 degrees F and
increase ≥ 2 degrees F
29
Version 1.0 Draft 3
10. [To be further developed]References
Amit O, Heiberger RM, and Lane PW. Graphical approaches to the analysis of safety data from clinical trials.
Pharmaceut. Statist. 2008; 7: 20–35. doi: 10.1002/pst.254.
Crowe BJ, Xia A, Berlin JA, Watson DJ, Shi H, Lin SL, et. al. Recommendations for safety planning, data
collection, evaluation and reporting during drug, biologic and vaccine development: a report of the safety
planning, evaluation, and reporting team. Clinical Trials 2009; 6: 430-440.
Biological Variation: From Principles to Practice. Callum G. Fraser. Washington, DC: AACC Press, 2001, 151
pp.
Dmitrienko AA, Sides GD, Winters KJ, Kovacs RJ, Rebhun DM, Bloom JC, Groh W, Eisenberg PR.
Electrocardiogram reference ranges derived from a standardized clinical trial population. Drug Inf J 2005;
39:395-405.
McGill R, Tukey JW, and Larsen WA. Variations of Box Plots. The American Statistician 1978; 32(1): 12-16.
doi:10.2307/2683468.JSTOR 2683468.
Rosario LA, Kropp TJ, Wilson SE, Cooper CK. Join FDA/PhUSE Working Groups to help harness the power
of computational science. Drug Information Journal 2012; 46: 523-524.
1) Solberg HE and Grasbeck R. Reference values. Adv Clin Chem 27:1-79; 1989.
Keep if used (related to reference limits):
1) CLSI. DEFINING, ESTABLISHING, AND VERIFYING REFERENCE INTERVALS IN THE
CLINICAL LABORATORY; APPROVED GUIDELINE – THIRD EDITION. CLSI document C28A3c. Wayne, PA: Clinical and Laboratory Standards Institute; 2008.
2) Copeland KT, Checkoway H, McMichael AJ, Holbrook RH. Bias due to misclassification in the
estimation of relative risk. AM J EPIDEMIOL 105:488-495; 1977.
3) Dixon WJ. Processing Data for Outliers. BIOMETRICS 9:74-89; 1953.
4) Horn PS, Pesce AJ. REFERENCE INTERVALS: A USER’S GUIDE. Washington, DC: AACC Press;
2005.
5) O’Neil RT. Assessment of safety. Chapter 13 in Peace KE (Ed.) BIOPHARMACEUTICAL
STATISTICS FOR DRUG DEVELOPMENT. New York: Marcel Dekker; 1988.
6) Quade D, Lachenbruch PA, Whaley FS, McClish DK, Haley RW. Effects of misclassifications on
statistical inferences in epidemiology. AM J EPIDEMIOL 111:503-515; 1980.
30
Version 1.0 Draft 3
7) Reed AH, Henry RJ, Mason WB. Influence of the statistical method used on the resulting estimate of
normal range. CLIN CHEM 17:275-284; 1971.
8) Thompson WL, Brunelle RL, Enas GG, Simpson PJ. Routine laboratory tests in clinical trials:
interpretation of results. J CLIN RESEARCH AND DRUG DEV 1:95-119; 1987.
9) Tukey J. Exploratory Data Analysis. Reding, MA: Addison-Wesley; 1977.
Chuang-Stein C, Beltangady M. Reporting cumulative proportion of subjects with an adverse event based on
data from multiple studies. Pharmaceut Statist, 10(1), 3-7 (2011).
The FDA/Industry/Academia Safety Graphics Working Group [reference to be added]
31
Version 1.0 Draft 3
11. Acknowledgements
The key contributors include: xxxx.
Additional contributors and members of the white paper project within the PhUSE Development of Standard
Scripts for Analysis and Programming Working Group include:
Acknowledgement to others who provided text for various sections, review comments, and/or participated in
discussions related to methodology: .
32
Version 1.0 Draft 3
12. Appendix
33
Version 1.0 Draft 3
Figure 12.1
Summary for Quantitative Safety Measures – Individual Study
34
Version 1.0 Draft 3
Figure 12.2
Summary of Common Treatment Emergent Abnormal for Quantitative Safety Measures – Individual Study
35
Version 1.0 Draft 3
Table 12.1
Shift from Normal/high to Low and from Normal/low to High for Laboratory Measures
Laboratory Tests
Shift from Normal/high to Low and from Normal/low to High
Abnormality
Laboratory Test
Direction
Treatment
N
Lab Test 1
Low
T1
xxx
T2
xxx
PL
xxx
Lab Test 2
…
Lab Test n
n (%)
xx(xx.x)
xx(xx.x)
xx(xx.x)
P value*
.xxx
.xxx
High
T1
T2
PL
xxx
xxx
xxx
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
.xxx
Low
T1
T2
PL
xxx
xxx
xxx
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
.xxx
High
T1
T2
PL
xxx
xxx
xxx
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
.xxx
…
Low
…
T1
T2
PL
…
xxx
xxx
xxx
…
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
.xxx
High
T1
xxx
xx(xx.x)
.xxx
T2
xxx
xx(xx.x)
.xxx
PL
xxx
xx(xx.x)
Abbreviations: N = number of patients with a normal (i.e.,. not low if calculating ‘low’ and not high
if calculating ‘high’) baseline and at least one post-baseline measure, n = number of patients with an
abnormal post-baseline result in the specified category.
*P values are from Fisher’s Exact test, compared with PL.
36
Version 1.0 Draft 3
Table 12.2
Shift from Normal/high to Low and from Normal/low to High
– Integrated Database
Laboratory Test
(unit)
Direction
Lab Test 1
High
Low
…
Lab Test n
Treatment
N
n (%)
OR*a
Heterogeneity
P value*b
P value*c
A
xxx
xx(xx.x)
xx.xx
.xxx
.xxx
B
xxx
xx(xx.x)
A
xxx
xx(xx.x)
xx.xx
.xxx
.xxx
B
xxx
xx(xx.x)
…
…
A
xxx
xx(xx.x)
xx.xx
.xxx
.xxx
B
xxx
xx(xx.x)
…
High
Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure, n = number of
patients with abnormal post-baseline result. OR = Mantel-Haenszel odds ratio; add more as needed (alphabetically).
*a – Mantel-Haenszel Odds Ratio stratified by study. Treatment B is numerator, treatment A is denominator.
*b – Heterogeneity of odds ratios across studies was assessed using the Breslow Day test.
*c – P values are from Cochran-Mantel-Haenszel (CMH) test of general association stratified by study.
37
Version 1.0 Draft 3
Table 12.2
Shift Table Analyses
Lab Test Name
Shifts from Last Baseline to Last Post-Baseline Result
Post-Baseline Result
Low
Normal
High
Baseline
Treatment
Result
n (%)
n (%)
n (%)
T1
Low
xx(xx.x)
xx(xx.x)
xx(xx.x)
(N = xxx)
Normal
xx(xx.x)
xx(xx.x)
xx(xx.x)
High
xx(xx.x)
xx(xx.x)
xx(xx.x)
Missing
xx(xx.x)
xx(xx.x)
xx(xx.x)
Total
xx(xx.x)
xx(xx.x)
xx(xx.x)
T2
Low
xx(xx.x)
xx(xx.x)
xx(xx.x)
(N = xxx)
Normal
xx(xx.x)
xx(xx.x)
xx(xx.x)
High
xx(xx.x)
xx(xx.x)
xx(xx.x)
Missing
xx(xx.x)
xx(xx.x)
xx(xx.x)
Total
xx(xx.x)
xx(xx.x)
xx(xx.x)
PL
Low
xx(xx.x)
xx(xx.x)
xx(xx.x)
(N = xxx)
Normal
xx(xx.x)
xx(xx.x)
xx(xx.x)
High
xx(xx.x)
xx(xx.x)
xx(xx.x)
Missing
xx(xx.x)
xx(xx.x)
xx(xx.x)
Total
xx(xx.x)
xx(xx.x)
xx(xx.x)
Decreased
Same
Increased
Treatment
n (%)
n (%)
n (%)
P value*
T1
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
T2
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
PL
xx(xx.x)
xx(xx.x)
xx(xx.x)
Abbreviations: N = number of patients with a baseline and post-baseline result;
in category; add more as needed (alphabetically).
*P values are from likelihood-ratio chi-square test, compared with PL.
Total
n (%)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
xx(xx.x)
n = number of patients
38
Version 1.0 Draft 3
Figure 12.4 Scatter Plot and Shift Summary for Quantitative Safety Measures – Integrated Database
39
Version 1.0 Draft 3
Table 12.3 Shift from Normal to Abnormal Summary for Qualitative Safety Measures – Individual Study
Laboratory Test
Lab Test 1
Treatment
T1
T2
PL
N
xxx
xxx
xxx
n (%)
xx(xx.x)
xx(xx.x)
xx(xx.x)
P value*
.xxx
.xxx
Lab Test 2
T1
T2
PL
…
T1
T2
PL
xxx
xxx
xxx
…
xxx
xxx
xxx
xx(xx.x)
xx(xx.x)
xx(xx.x)
…
xx(xx.x)
xx(xx.x)
xx(xx.x)
.xxx
.xxx
…
Lab Test n
.xxx
.xxx
Abbreviations: N = number of patients with a normal baseline and at least one post-baseline measure,
n = number of patients with an abnormal post-baseline result.
* – P values are from Fisher’s Exact test, compared with PL.
40