Download A Sample Size Formula for the Supremum Log

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
A Sample Size Formula for the Supremum Log-Rank
Statistic
University of Wisconsin-Madison
Department of Biostatistics and Medical Informatics
Technical Report 179
Kevin Hasegawa Eng and Michael R. Kosorok
Department of Biostatistics & Medical Informatics,
University of Wisconsin, Madison, Wisconsin 53792, U.S.A.
email: [email protected]
Summary. An advantage of the supremum log-rank over the standard log-rank statistic is
an increased sensitivity to a wider variety of stochastic ordering alternatives. In this paper,
we develop a formula for sample size computation for studies utilizing the supremum logrank statistic. The idea is to base power on the proportional hazards alternative, so that the
supremum log-rank will have the same power as the standard log-rank in the setting where
the standard log-rank is optimal. This results in a slight increase in sample size over that
required for the standard log-rank. For example, a 5.733% increase occurs for a two-sided
test having type I error 0.05 and power 0.80. This slight increase in sample size is offset by
the significant gains in power the supremum log-rank test achieves for a wide range of nonproportional hazards alternatives. A small simulation study is used for illustration. These
results should facilitate the wider use of the supremum log-rank statistic in clinical trials.
Key words. Brownian motion with drift, Contiguous alternatives, Counting processes,
Renyi type supremum, Sample size formula, Weighted log-rank statistics.
1
1. Introduction
Schoenfeld’s (1983) sample size formula for the log-rank test for two-sample, censored data
is well known and widely used. While the log-rank test assumes that the relative hazard
functions of the two samples are not time varying, Stablein, Cater and Novak (1981) establish
several biological examples where this assumption is not viable, such as where we might
observe a transitory treatment effect, and note that we frequently apply the log-rank test
inappropriately. For some of these stochastic ordering alternatives, the log-rank test has no
power to detect a group difference. This happens in particular when the hazards cross but
the survival functions remain ordered. An example of this phenomenon will be presented in
a simulation study given in section 6.
Lakatos (1988) and Ahnn and Anderson (1998) derive sample size estimators for the
Tarone-Ware class (Tarone and Ware, 1977) of statistics under an assumption of nonproportional hazards, arguing that a weighted test offers better sensitivity to these timedependent hazards. The log-rank statistic is included in both the Tarone-Ware and the
Gρ,γ class (Harrington and Fleming, 1982) of statistics, all of which fall into the family of
weighted log-rank statistics. As shown in Gill (1980), a weighted log-rank statistic with
estimated weight function consistent for φ(t) gives the optimal power against the contiguous
hazards alternative with time-varying proportionality function φ(t). Hence misspecifying the
weight function can result in less-than-optimal power.
Fleming, Harrington, and O’Sullivan (1987) conclude that one-sided Renyi-type supremum statistics are ”nearly” as powerful as the traditional statistics under the proportional
hazards assumption. More importantly, they are more sensitive than their linear rank counterparts in cases where the hazards are non-proportional. Kosorok and Lin (1999) conduct
power simulations which verify that the two-sided supremum log-rank is superior to the
traditional log-rank in a variety of non-proportional hazards settings. Kosorok and Lin
2
also provide a cautionary note that the supremum log-rank might in some settings be too
sensative to non-stochastic ordering alternatives at the expense of more clinically relevant
alternatives, and it may be better to use the supremum plus infimum log-rank in these settings. While the evidence from their simulation studies partially supports this conjecture,
the results are inconclusive, except that it is quite clear that both the supremum and the
supremum plus infimum versions have greater power than the usual log-rank for a substantial
variety of meaningful stochastics ordering alternatives.
The goal of this paper is to derive a sample size formula based on the limiting distribution
of the two-sided supremum weighted log-rank statistic. The idea is to base power on the
corresponding weighted proportional hazards alternative, so that the supremum statistic will
have the same power as the associated weighted log-rank in the setting where the weighted
log-rank is optimal. In other words, a sample size is first computed, via Schoenfeld’s (1983)
formula, based on the weighted log-rank test and the corresponding contiguous alternative
for which that weight is optimal, then an adjustment for the supremum version is computed.
Somewhat surprisingly, this adjustment ends up not depending on the chosen weight function.
A similar sample size algorithm for the supremum plus infimum version would also be worth
investigating, but this appears to be much more difficult technically and will not be pursued
further in this paper.
A brief description of weighted log-rank statistics and their supremum versions is given
in section 2. Section 3 provides the main derivation of the sample size formula. Section 4
outlines the algorithm for applying the sample size formula. Several sample size results are
presented in section 5, and a small simulation study is given in section 6. A few concluding
comments are given in section 7. Several technical details are given in the appendix.
2. Background
Weighted log-rank statistics are based on the counting process integral (Fleming and Har3
rington, 1991; Kalbfleisch and Prentice, 2002)
Zn (t) = n
−1/2
Z
t
0
Y 1 (s)Y 2 (s)
Ŵn (s)
Y 1 (s) + Y 2 (s)
dN 1 (s) dN 2 (s)
,
−
Y 1 (s)
Y 2 (s)
(1)
where N j and Y j are the counting and at-risk processes for group j, j = 1, 2, and where
Ŵn is an estimated weight function. Let n = n1 + n2 be the total number of subjects, with
nj subjects in group j, and let τ be the smallest t such that Y 1 (t)Y 2 (t) = 0 almost surely
for all n large enough, i.e., τ is the end of study time. We make the standard assumption
that Ŵn is uniformly consistent for some constant limiting weight function φ over all closed
subintervals of [0, τ ). For each t ∈ [0, τ ], the variance of Zn (t) can be consistently estimated
by
σn2 (t)
=n
−1/2
Z th
0
i2 Y (s)Y (s) dN (s) + dN (s) 1
2
1
2
Ŵn (s)
,
Y 1 (s) + Y 2 (s)
Y 1 (s) + Y 2 (s)
when the integrated hazard functions are continuous. Using the standardized process Tn (t) =
Zn (t)/σn (τ ), the standardized weighted log-rank statistic is Tn (τ ), while the corresponding
supremum version is supt∈(0,τ ] |Tn (t)|.
We will base our sample size studies on the contiguous time-varying proportional hazards
alternative wherein the hazard functions for the two groups are λn1 (t) = eγ
λn2 (t) = e−γ
∗ φ(t)/(2
√
n)
∗ φ(t)/(2
√
n)
λ0 (t) and
λ0 (t), respectively, where λ0 is a continuous baseline hazard and γ ∗ is a
scalar constant. This is the standard contiguous alternative used for power and sample size
studies (see, for example, Kosorok and Lin, 1999).
3. Sample Size Formula
We now derive the sample size formula for the supremum log-rank based on the statistic
and contiguous alternative presented in section 2. After presenting the derivation, we briefly
describe its relationship to Schoenfeld’s formula.
3.1 Derivation
As discussed in Fleming and Harrington (1991), the quantity M j (t) = N j (t)−
4
Rt
0
Y j (t)dΛnj (t),
where Λnj is the integral of λnj , is a continuous-time Martingale, for j = 1, 2. Hence we can
re-express (1) in terms of martingales as follows:
Zn (t) = n
+n
Z
−1/2
−1/2
Z
t
0
Y 1 (s)Y 2 (s)
Ŵn (s)
Y 1 (s) + Y 2 (s)
t
Ŵn (s)
0
dM 1 (s) dM 2 (s)
−
Y 1 (s)
Y 2 (s)
Y 1 (s)Y 2 (s)
(dΛn1 (s) − dΛn2 (s)).
Y 1 (s) + Y 2 (s)
(2)
(3)
The martingale central limit theorem yields that the martingale part Gn (t) on the righthand-side of (2) converges weakly to a mean zero Gaussian process with independent increments, with variance equal to the limiting value of the predictable compensator of G2n . This
predictable compensator is
n
2
2
Y 1 (s)Y 2 (s)
dΛ1 (s) dΛn2 (s)
Ŵn (s)
+
n
(Y 1 (s) + Y 2 (s))2
Y1
Y2
0
Z t
Y 1 (s)Y 2 (s)
dΛ0 (s),
≈ n−1
φ2 (s)
Y 1 (s) + Y 2 (s)
0
−1
Z th
i2
(4)
where Λ0 is the integral of λ0 .
Define πj to be the limiting value of Y j (s)/nj , j = 1, 2, and assume that the censoring
distribution is the same for all individuals and hence π1 = π2 (= π0 ). Also, assume that
nj /n, the proportion of individuals assigned to group j, converges to aj ∈ (0, 1), j = 1, 2.
Thus (4) is consistent for
Z
t
a1 a2 π1 (s)π2 (s)
dΛ0 (s) = a1 a2
φ (s)
a1 π1 (s) + a2 π2 (s)
2
0
Z
t
φ2 (s)π0 (s)dΛ0 (s)
0
= a1 a2 Dφ (t),
uniformly over t ∈ (0, τ ], where Dφ (t) =
Rt
0
φ2 (s)dD(s) and D(t) is the probability of ob-
serving an event by time t. Thus the limiting distribution of Gn is W (a1 a2 Dφ (t)), where W
is a standard Brownian motion.
For the remaining component (3), note that
∗
√ √
∗
n1/2 (dΛn1 (s) − dΛn2 (s)) = n−1/2 eγ φ(s)/(2 n) − e−γ φ(s)/(2 n) λ0 (s)ds.
5
By Taylor expansion, this is equal to γ ∗ φ(s)λ0 (s)ds (1 + o(1)), where o(1) is an error term
going to zero uniformly in s as n → ∞. Now arguing as we did for Gn , part (4) converges
uniformly in probability to γ ∗ a1 a2 Dφ (t).
As is generally done for sample size formulas, power is based on a fixed alternative rather
than on a contiguous alternative. The idea is that asymptotic arguments will also approximately apply for moderately large samples size n and a reasonable fixed alternative γ. Thus,
under the assumed contiguous alternative, we can obtain approximate power calculations
√
by setting γ ∗ = nγ. Hence we obtain that Zn converges weakly to the Gaussian process
√
W (a1 a2 Dφ (t)) + nγa1 a2 Dφ (t). Using similar arguments, we can show that σn2 (t) converges
uniformly to a1 a2 Dφ (t). Hence, if we let u(t) = Dφ (t)/Dφ (τ ), Tn (t) converges weakly to
√
W (a1 a2 Dφ (t)) γa1 a2 nDφ (t)
+ p
T (t) = p
a1 a2 Dφ (τ )
a1 a2 Dφ (τ )
q
Dφ (t)
Dφ (t)
+ γ a1 a2 nDφ (τ )
∼ W
Dφ (τ )
Dφ (τ )
p
= W (u(t)) + µu(t), µ = γ a1 a2 D,
(5)
where ∼ denotes equality in distribution, D = nDφ (τ ), and u(t) ranges over [0, 1].
Expression (5) is the equation for a Brownian motion process with drift (non-centrality
parameter) µ, which we denote hereafter as Wµ (u) (hence W (u) = W0 (u)). Let S1−α be the
two-sided critical value for the supremum of standard Brownian motion, i.e.,
!
P
sup |W (u)| > S1−α
= α.
(6)
u∈[0,1]
To compute power, we are interested in the probability of both rejecting the null and concluding the correct sign of the treatment effect and the alternative. This is essentially equivalent
to computing the probability of the event supu∈[0,1] Wµ (u) > S1−α when µ > 0 or of the
event inf u∈[0,1] Wµ (u) < −S1−α when µ < 0. By symmetry of Brownian motion, both of
these probabilities are equal. Thus we will only consider the µ > 0 case.
6
From the joint distribution of supu∈[0,1] Wµ (u) and Wµ (1) given in Borodin and Salminen
(2002, p. 251), we obtain after integration P supu∈[0,1] Wµ (u) > x = Φ(x−µ)+e2µx Φ(x+µ),
where Φ = 1 − Φ and Φ is the standard normal cumulative distribution. Thus, to compute
the sample size needed to achieve a power 1 − β with a two-sided type I error of α, we solve
for µ in the expression
Φ(S1−α − µ) + e2µS1−α Φ(S1−α + µ) = 1 − β.
(7)
µ2
,
D=
a1 a2 γ 2
(8)
We then compute
where D = nDφ (τ ). Note that when φ = 1, D is the expected number of events. More
generally, Dφ (τ ) must be estimated. The required sample size is then n = D/Dφ (τ ).
3.2 Relationship to Schoenfeld’s Formula
Schoenfeld’s (1983) formula is based on the limiting distribution of Tn (τ ) under the same
contiguous alternative we used in section 3.1. This limiting distribution is Wµ (1), which
is a normal deviate with mean µ and variance 1. The critical value in this case is Z1−α/2 ,
where Zq is the qth quantile of a standard normal distribution. Hence we must solve for µ
in Φ(Z1−α/2 − µ) = 1 − β rather than in (7). The resulting solution is µ̃ = Z1−α/2 + Z1−β ,
where we use the tilde to distinguish this noncentrality parameter from the solution of (7).
Hence expression (8) becomes
D =
(Z1−α/2 + Z1−β )2
,
a1 a2 γ 2
(9)
which is Schoenfeld’s (1983) formula when φ = 1, since γ is then the log of the targeted
hazard ratio.
The ratio of the sample size required for the supremum log-rank relative to the usual
log-rank for the same effect size γ is thus the ratio of D computed in (8) to the D computed
7
in (9). Since the denominators cancel, the ratio becomes R = µ2 /µ̃2 , where µ solves (7).
Hence the focus of our algorithm is to compute R for a given type I error α, type II error β,
proportionality function φ, log hazard ratio γ, and proportion a1 assigned to group 1. Then
one determines the sample size ñ based on the log-rank. Thus the conservative sample size
we recommend using for the supremum log-rank is n = Rñ.
4. Algorithm
Software in the R programming language for computing R = (µ/µ̃)2 from section 3.2 can
be downloaded from the web site: http://www.biostat.wisc.edu/˜kosorok/renyi.html. This
software also includes a program for computing supremum weighted log-rank statistics and
the accompanying p-values.
The solution S1−α to expression (6) must first be computed. We accomplish this by applying Newton-Raphson iteration to Billingsley’s (1968, p. 77–80) formula for the cumulative
probability distribution
G(x) = P r
sup |W (t)| ≤ x
t∈[0,τ ]
!
∞
(2k + 1)2 π 2
4 X (−1)k
exp −
,
=
π
2k + 1
8x2
(10)
k=0
which has density
∞
π X
(2k + 1)2 π 2
k
g(x) = 3
(−1) (2k + 1) exp −
.
x k=0
8x2
(11)
As we show in the appendix, these series converge quite quickly, and we can easily determine
an upper bound for the required lengths of these series to ensure a specified error.
We then use Newton-Raphson iteration to solve for µ in (7). Interestingly, this necessitated writing a new algorithm for computing the tail probability of a standard normal
deviate since the default R function for doing this was not sufficiently accurate. This new
function is incorporated in the downloadable software mentioned above.
5. Results
We present in figure 1 the necessary increase in sample sizes (R − 1) of the supremum log8
rank versus the traditional log-rank, as a percent, for various type I (α) and type II (β)
errors. We can see that for smaller α and smaller β (greater power), the required increase
in sample size is less. The values are computed from the downloadable software mentioned
above. All the curves appear to be asymptotically approaching Schoenfeld’s formula as the
power 1 − β increases. The magnitude of the increase in sample size is comfortably small:
for α = 0.05 and β = 0.2, the required increase is only 5.733%.
The approximations for the distribution function G, its inverse, and the algorithm for
computing µ are all extremely fast, taking less than a fraction of a second to get within 7
decimal places of accuracy.
6. A simulation study
A simple simulation study was done to evaluate the proposed sample size formula and to
check sensativity of the supremum log-rank to a non-proportional hazards alternative. Verification of type I error preservation was not performed since this was adequately verified
for both the log-rank and supremum log-rank in simulations reported in Kosorok and Lin
(1999). Two scenarios were considered. In scenario I, the control group had a hazard function with constant intensity of 1.6, while the treatment group had a a hazard function with
constant intensity 0.8. In scenario II, the control group had the same hazard function as the
control group in scenario I, while the treatment group hazard function had intensity 0.4 over
the time interval [0, 1/3], 2.2 over the interval (1/3, 1] and 1.6 thereafter. The hazard functions and survival functions for both the control and treatment groups and both scenarios
are given in figure 2 for the time interval [0, 1.2]. While the treatment and control hazards
cross under scenario II, the survival functions are clearly ordered and the treatment is thus
beneficial. In all groups, censoring was uniform on [0, 1], resulting in 59.4% censoring under
scenario I. The probability of being assigned to the treatment group was fixed at 1/2.
Both scenarios were examined under four scenarios based on two-sided type I error (either
9
0.05 or 0.01) and on power (either 0.80 or 0.90) for detecting a two-fold proportional hazards
treatment effect. Schoenfeld’s formula was combined with the proposed adjustment for the
supremum log-rank. The resulting quantity was then divided by the estimated probability of
observing an event under scenario I (0.406) and rounded up to obtain a projected sample size.
These sample sizes and the increase up from the sample size based on Schoenfelds formula
are given in table 1. Note that the size of the increase is quite small (ranging from 9 to 12).
Simulation studies were peformed for all eight sample size by scenario combinations, and
500 data sets were generated for each combination. Both the log-rank and the supremum
log-rank were applied to each data set. The resulting observed powers and Monte Carlo
standard errors (in parentheses) are also given in table 1, with scenario I results given in
normal font and scenario II results given in italics. McNemar’s test (see p. 210 of Fisher and
van Belle, 1993) for comparing proportions from paired data was used to test whether the
powers for the log-rank and supremum log-rank were significantly different. The resulting
p-values are given in the far right column of the table.
The fact that all observed powers for the supremum log-rank under scenario I are within
twice the Monte Carlo standard error of the target power indicates that the proposed sample
size algorithm performs quite well. While the observed power under scenario I for the logrank is slightly larger than for the supremum log-rank, the difference is quite small and only
statistically significant part of the time. Under scenario II, the power of the supremum logrank is substantially higher than the power of the usual log-rank, and the difference in power
between the two statistics is always highly statistically significant. These results provide
evidence that the supremum log-rank can be much more powerful than the usual log-rank
for meaningful non-proportional hazards alternatives with very little increase in sample size.
7. Discussion
The primary goal of this paper was to develop a sample size formula based on the supremum
10
log-rank statistic for use in planning clinical trials with time-to-event outcomes. In order
to ensure that there is no loss in power relative to the optimal weighted log-rank statistic,
a slight increase in sample size is required, as illustrated in figure 1 and verified in the
simulation study. In computing sample size for a clinical trial, we recommend first computing
the sample size ñ based on the optimal weighted log-rank statistic, and then using the
algorithm presented above to compute the factor R. The conservative sample size based on
the supremum log-rank is then n = ñR.
In clinical trial settings where sensitivity to a wide variety of stochastic ordering alternatives is desired, the supremum log-rank statistic should probably be the statistic of choice,
and thus the sample size formula developed in this paper will be applicable.
Acknowledgements
The second author was partially supported by grant CA75142 from the U.S. National Cancer
Institute.
References
Ahnn, S. & Andreson, S.J. (1998) Sample Size Determination in Complex Clinical Trials
Comparing More than Two Groups for Survival Endpoints. Statistics in Medicine 17, 2525–
2534.
Billingsley, P. (1968) Convergence of Probability Measures. New York: Wiley.
Borodin, A.N. & Salminen, P. (2000) Handbook of Brownian Motion – Facts and Formulae. Basel: Birkhauser.
Fisher, L.D. & van Belle, G. (1993) Biostatistics: A Methodology for the Health Sciences.
New York: Wiley.
Fleming, T.R., & Harrington, D.P. (1991) Counting Processes and Survival Analysis.
New York: Wiley.
11
Fleming, T.R., Harrington, D.P. & O’Sullivan, M. (1987) Supremum Versions of the LogRank and Generalized Wilcoxon Statistics. Journal of the American Statistical Association
82, 312–320.
Gill, R.D. (1980) Censoring and Stochastic Integrals. Tract 124, Amsterdam: The Mathematical Center.
Harrington, D.P. & Fleming, T.R. (1982) A Class of Rank Test Procedures for Censored
Survival Data. Biometrika 69, 553–66.
Kalbfleisch, J.D. & Prentice, R.L. (2002) Statistical Analysis of Failure Time Data. New
York: Wiley.
Kosorok, M.R. & Lin, C.-Y.(1999) The Versatility of Function-Indexed Weighted LogRank Statistics. Journal of the American Statistical Association 94, 320–332.
Lakatos, E. (1988) Sample Sizes Based on the Log-Rank Statistic in Complex Clinical
Trials. Biometrics 44, 229–241.
Schoenfeld, D. A. (1983). Sample-size formula for the proportional-hazards regression
model. Biometrics 39, 499–503.
Stablein, D.M., Cater, W.H. Jr. & Novak, J.W. (1981) Analysis of survival data with
nonproportional hazard functions. Controlled Clinical Trials 2, 149–159.
Tarone, R.E. & Ware, J. (1977) On distribution-free tests for equality of survival distribution. Biometrika 64, 156–160.
Appendix: Error Approximation
If we truncate the series in (10) at m, the error for computing G(x) is
∞
4 X (−1)k
(2k + 1)2 π 2
errorG (m; x) =
exp −
.
π k=m+1 2k + 1
8x2
12
If we wish to bound the error by , the required m is bounded above by
& √ r
'
1
1
x 2
−
mG () =
log
,
π
π 2
(12)
provided ≤ 0.1, and where we define due to be the smallest integer ≥ u. To see this, note
that
∞
4 X
(2k + 1)2 π 2
1
errorG (m; x) ≤
exp −
π k=m+1 2k + 1
8x2
Z
4 ∞ 1
(2k + 1)2 π 2
≤
exp −
dk
π m 2k + 1
8x2
Z
1 ∞ 1 −u
≤
e du,
π Um u
(13)
where Um = (2m + 1)2 π 2 /(8x2 ). If we insist that Um ≥ 1, then (13) is bounded above by
1
π
Z
e−u du =
Um
1 −Um
e
.
π
Thus an upper bound for m can be obtained by solving e−Um /π = for m. The resulting
solution is (12), provided UmG () ≥ 1. Fortunately, whenever ≤ 0.1, will also be less that
(πe)−1 and hence UmG () ≥ 1.
We can use a similar approach to examine the error resulting from truncating the series
in (11) at m for computing the derivative g(x). Denote this error
∞
π X
(2k + 1)2 π 2
k
errorg (m; x) = 3
(−1) (2k + 1) exp −
.
x k=m+1
8x2
If we wish to bound the error by , the required m is bounded above by
& √ r
'
x 2
1
2
−
mg () =
log
,
π
πx 2
(14)
provided ≤ (2/x)0.3. To show this, we use arguments similar to those used above to obtain
Z
π ∞
(2k + 1)2 π 2
(2k + 1) exp −
dk
errorg (m; x) ≤ 3
x m
8x2
Z
2 −Um
2
e−u du =
,
e
≤
xπ Um
xπ
13
where Um is as defined above. Thus an upper bound for m can be obtained by solving
2eUm /(πx) = for m. The resulting solution is (14), provided 2/(πx) ≥ 1 to ensure the
existence of the square root. Fortunately, whenever ≤ (2/x)0.3, 2/(πx) ≥ 1.
For computing both G and g in our algorithm, we truncate the infinite series in (10)
and (11) at the larger of mG () and mg (). We also ensure that is smaller than the lesser
of 0.1 and (2/x)0.3. For example, when x = 2 and = 10−8 , both mg () and mG () equal
d3.2421e = 4, and thus we only need to sum over the first 5 terms (m = 0, . . . , 4).
14
Table 1: Power comparisons from simulation studies. The sample size total based on the
proposed method is given along with the increase (in parentheses) above the Schoenfeld
sample size. The Monte Carlo error is given in parentheses next to the observed power
based on 500 simulations. The results for scenario I are given in normal font while the
results for scenario II are given in italics. The p-value of the difference between the log-rank
and supremum log-rank observed powers is also given.
Type I Target
Sample size
error power total (increase)
0.05
0.80
170 (9)
Observed power
P-value of
Log-rank Sup. log-rank difference
81.2 (1.7)
77.6 (1.9)
0.0020
58.0 (2.2)
84.6 (1.6)
<0.0001
0.05
0.90
227 (11)
91.2 (1.3)
73.2 (2.2)
90.2 (1.3)
95.2 (1.0)
0.1967
<0.0001
0.01
0.80
249 (9)
81.8 (1.7)
56.8 (2.2)
80.2 (1.8)
88.4 (1.4)
0.0593
<0.0001
0.01
0.90
317 (12)
91.2 (1.3)
65.4 (2.1)
89.4 (1.4)
96.2 (0.9)
0.0126
<0.0001
15
8
7
alpha=0.1
5
alpha=0.02
alpha=0.01
4
percent increase
6
alpha=0.05
3
alpha=0.005
2
alpha=0.001
0.5
0.6
0.7
0.8
0.9
1.0
1 − beta
Figure 1: Percent increase in sample size for the supremum log-rank over the standard
log-rank for various type I errors (alpha) and powers (1-beta).
16
Survival Functions: Scenario I
0.0
0.2
0.5
0.4
0.6
Probability
1.0
Intensity
1.5
0.8
2.0
1.0
Hazard Functions: Scenario I
0.2
0.4
0.6
0.8
1.0
1.2
0.0
0.2
0.4
0.6
0.8
1.0
Time
Time
Hazard Functions: Scenario II
Survival Functions: Scenario II
1.2
0.0
0.2
0.5
0.4
0.6
Probability
1.0
Intensity
1.5
0.8
2.0
1.0
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
0.0
Time
0.2
0.4
0.6
0.8
1.0
1.2
Time
Figure 2: Hazard (left) and survival (right) functions for scenarios I (top) and II (bottom).
Dashed lines are for the control group while solid lines are for the treatment group.
17