Download Design and analysis of clinical correlative studies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Transcript
4/27/16
FangxinHong.Ph.D.
Sr.ResearchScientist
Dept.BiostatisticsandComputationalBiology
Dana-FarberCancerInstitute
1
Disclosure:
—  Goal:practicallyuseful.
Nottrytobecomprehensive;notplantopresentmostrigorous
methodsorlatestdevelopmentinthefield
—  Scope:Discoverorvalidatemarkersusingarchivedspecimens
(prospective)Correlativestudiesembeddedinaclinicalstudyprotocol
or(retrospective)ingrantsusingsample/datafromaclinicalstudy.
Þ  Willnotcoverpre-clinicalstudies:nocell-line,noanimal
experiments.
—  Willnotdistinguishprognosticorpredictivebiomarker,prospective
orretrospectivestudy
(retrospectively)Studydesignmayhaveevidentiaryvalueclosetoa
prospectivestudyundercertainconditions(Simonetal,2009):
2
1
4/27/16
Outlines
—  Overview
Whatisbiomarker?
Whatiscorrelativestudy?
Generalprincipalsandguidelinesfordesign,analysisandreporting
—  Studytypes
Typesofsampling
Typesofmarkervalueandoutcomedata
Typesoftests/analyses
—  StudyDesignandanalysiswithexamples
Power/samplesizecalculation:approachesandprogramingcode
Statisticalsectionwrite-up
AnalysisplanandActualanalysis
3
WhatisaBiomarker?
—  Biomarker
Acharacteristicthatisobjectivelymeasuredandevaluatedasanindicatorof
normalbiologicprocesses,pathogenicprocesses,orpharmacologic
responsestoatherapeuticintervention
BiomarkersDefinitionsworkingGroupNationalInstitutesofHealth2001
Example:geneticmarkers:geneexpression,mutationstatus,SNP,….;imaging
biomarkers:PET+/-…;molecularbiomarkers:circulatingtumorcells…;
pharmacologicmarkers:serumconcentration..;pathogenicbiomarkers:EBV
virusload….
—  Assay
Amethodfordeterminingthepresenceorquantityofacomponent
—  Test
Aprocedurethatmakesuseofanassayforaparticularpurpose
Goodbiomarkers≠GoodAssays≠Tests
4
2
4/27/16
TypesofClinicalBiomarkers:
Diagnosis
• Confirmation
• Staging
• Subtyping
Prediagnosis
Pretreatment
• Risk
• Screening
• Early
detection
• Prognostic
• Predictive
Intratreatment
Posttreatment
• Early
• Early
response
endpoint
or futility
• Recurrence/
• Toxicity
progression
monitoring monitoring
5
WhatisaCorrela;veStudy?
Correlativescienceisatermusedtoshowtherelationshipbetweenbiology(i.e.
biomarkers)andclinicaloutcomes(i.e.diseaseprogression).Thisisthepromise
ofpersonalizedmedicine.
—  Addressesthequestion:
WhatisthecorrelationbetweentheXandY?
—  Cancerandobesity
—  QoLscoreandtreatment
—  ResponseandPerformancestatus
—  Prognosis(progression)andgenemutation(KRAS)
Timingofcorrelativestudies
—  Ascorrelativeobjectivesofaclinicaltrial/protocol-prospectivestudy
Thisbecomelesscommon,asthescienceischangingrapidly.Itishardto
predictwhichmarkerswillbeofinterestwhenthetrialcomplete(usually5-10
yearsafterconceptdevelopment)
—  Grantapplicationusingarchivedsamplesfromacompleteorongoingclinical
trial.
6
3
4/27/16
Clinicalcorrela;vestudies:Design
Guidelineprinciples
—  Definingclearobjectives:descriptiveorinferential
—  Specifyingstudysample:*allpatientsorselectedpatients
—  Specifyingdesignandprocedures:
Definingcorrelativemeasures:whatwillbequantifyineachsample
Samplingtimepoints:baseline,mid-treatment,end-treatment…
Choiceofoutcomemeasures:ORR,PFS,OS,toxicity
—  Specifyingmethodsand**consideringpowerandsamplesize
—  Writingstatisticalanalysisplan:shouldbeoutlinesforeachobjective
*Full-cohortdesignvs.Cohort-samplingdesign
**dependsonwhethertheobjectiveifdescriptiveorinferential.
7
AnalysisandRepor;ng
8
4
4/27/16
Typeofsampling
Forvariousreasons,includingcostsofbiomarkerevaluationandlackof
availabilityoftissueorconsent,foruseoftissue,thebiomarkersareoften
evaluatedononlyasubsetofthepatientsonthestudy.
—  Full-cohortdesign:useallpatientsthathavebio-specimen(orbiomarkerdata)
available(orhavePETscan);
—  Cohortsamplingdesign:sampleallcases(patientswithevent)andonlyafractionof
controls(patientswithoutevent).
Iftheclinicaloutcomesofinterestareeventtimes,andtheeventrateislow,then
itiswell-knownthatamoreefficientdesignisobtainedbysamplingsubjectswith
observedevents(cases)moreintensivelythancaseswithcensoredeventoutcomes(noncases).
1)Classiccase-cohortsampling:allstudycases,plusarandomsubsetofcontrols
2)Nestedcase–control(NCC)sampling:allstudycasesbutonlyafractionofcontrols
selectedrandomlyfromtherisksetofthematchedcases.
Thesetofpatientsenrolledonthestudywillbereferredasthecohort.Thesubsetofthe
cohortanalyzedforbiomarkerswillbereferredtoasthesample.Thelargersetofpatients
meetingthecriteriaforentryontheclinicaltrialwillbereferredtoasthepopulation.The
terms“studypopulation”and“targetpopulation”couldalsobeusedforthecohortandthe
population.
9
CommonTypesofdata
—  Biomarker
(1)Numericalmeasurement.
e.g.;geneexpression,circulatingcancercells
(2)Categorical(dichotomized)data*
e.g.;mutationpresent/absent,high/lowscore,SNP
(3)Percentageoutcome
e.g.;Ki-67index,growthpattern
Þ  Twotypes:Continuousorcategorical
—  Clinicaloutcome
(1)  Rate/dichotomized:CR/ORR,x-yearPFSrate
(2)  Time-to-eventoutcome:PFS,DFS,OS
*Ordinaldatausuallywouldberegroupedintodichotomizedgroups
10
5
4/27/16
Typeoftests/analyses
Biomarker
Clinicaloutcome
Rate/dichotomized
Time-to-eventoutcome
Categorical/
dichotomized
Fisher’sexact/Chisquaretest
Log-ranktest
Continuous
T-test/WilcoxsonTest
Orsimilar
Coxregressionmodel
Note:Testing(powercalculation)usuallyconsidersonemarkeratatime;
analysisoftenworkonmultiplemarkers,e.g.,high-throughputdata
Grantproposalorprotocolcorrelativeobjectives:power/samplesize
calculation,analysisplan
Manuscripts:analysisandreporting
11
Categoricalbiomarker:
E4402:PhaseIIITrialofComparingTwoDifferentRituximabDosingRegimens
forPatientswithLowTumorBurdenIndolentnon-Hodgkin'sLymphoma
STEP 2
STEP 1
Rev. 11/04, 5/07
Rev. 11/04
R
Classification/
Stratification Factors
E
G
Rev. 7/08
Arm I
Induction Rituximab1, 2
375 mg/m2 IV
x 4 weekly
doses
I
S
Rev. 7/08
T
·
Histology:
Follicular vs. Other
·
Age:
< 60 years vs. > 60
years
PR/CR
Restage Week 13
·
E
R
PD or < PR/CR
Time from diagnosis:
< 1 year vs. > 1 year
Arm A
Rituximab1 Retreatment
Rev.
7/08
Administer only for progressive disease (within 28 days of
CT scans documenting disease progression)
3
R
A
375 mg/m2 IV x 4 weekly doses then re-enter
observation
·
N
D
O
Rev. 7/08
M
Patients may be retreated on Arm A indefinitely following
each progression until they meet the criteria below for
rituximab failure4
Discontinue
Protocol
Therapy
Rev. 5/05
Arm B5
Rituximab1 Scheduled
Rev. 7/08
I
·
Single 375 mg/m2 IV dose
Rev. 7/08
·
q 13 weeks
Rev. 7/08
·
Continue to rituximab failure4
Z
E
For Quality of Life time points, see section 6.3
Accrual goal = Approximately 519 total patients (270 randomized follicular lymphoma patients)
Rev. 7/08
1.
2.
Rev. 7/08
Rev. 7/08
3.
4.
Rev. 7/08
Dose based on actual body weight.
Rituximab Induction is the first 4 week course of rituximab the patient receives during Step 1. For scheduling according to the
protocol study calendar, count from the first dose of the 4 week course of rituximab.
Randomization must be done within 1 week of the week 13 restaging visit.
Rituximab failure is defined as no response (<PR/CR), TTP < 26 weeks (TTP calculation begins on day 1 of: rituximab induction
OR rituximab retreatment OR rituximab scheduled), initiation of alternative therapy, or inability to complete therapy. See
section
6.2.1 for full definition.
5.
Treatment should start within 14 working days of the week 13 restaging visit.
12
6
4/27/16
E4402LRFgrant:designelements
—  Aim:TocorrelateimmunoglobulinFcreceptor(FcγR)polymorphismsinFL
patientsreceivingsingleagentrituximabtoresponse,responseduration,and
timetorituximabresistance.
—  Sample:(anticipated)259outof408FLpatientswillhavedataonFcγR
polymorphisms,thusbeavailabletocorrelatewithresponse.And192
(assuming74%respondrate)tocorrelatewithdurationofresponse(DOR).
—  Clinicaloutcome(thewholetrial):Responserate74%,medianresponse
duration:18month;mediantimetorituximabresistance:25months
Note:theoutcomewasknownforthewholetrial,thenumbersmightvary
withinthestudysample;
—  Markermeasurement:
Fc RIIIapolymorphism:VV,VF,FF,assuming20%VVpatientsand80%F
carriers
—  Hypotheses:patientswithVVtypehavehigherresponserateandlongerDOR
thanpatientswhoareFcarriers.
=>one-sidetesting
13
Power/samplesize:responserate
Fisher’sexacttestwillbeused(one-sidedsignificancelevelof0.05)toevaluate
whethertheresponserateforVVpatientsishigherthantheFcarriers.With
20%(n=52)VVpatientsand80%(n=207)Fcarriers,thefollowingtableliststhe
differencecanbedetectedwithatleast80%powerwithoverallresponserate
within95%CIoftheobservedone(ORR:74%,95%C.I.69%,78%).For
example,adifferenceof83%vs65%,87%vs71%and90%vs75%canbe
detectedbetweenVVpatientsandFcarriers.Alogisticregressionmodelwillbe
employedtoevaluatethepolymorphismeffectonresponserateaccountingfor
otherpatientcharacteristics.
Overallresponserate
ResponserateinVV
patients(HHpatients)
ResponserateinFcarriers
(Rcarries)
69%
83%
74%
87%
65%
71%
78%
90%
75%
14
7
4/27/16
Rcode:
library(desmon)
b2diff(0.69,0.2,259,alpha=0.05)
b2diff(0.74,0.2,259,alpha=0.05)
b2diff(0.78,0.2,259,alpha=0.05)
b2diff(p,r,n,alpha=0.025,power=0.8,exact=TRUE)
b2diffisintendedforfacilitatingpowercalculationswhenanew(binary)markerwillbe
measuredonanexistingsample.Thenthecombinedsamplessizenandoverallresponse
rateisknown.rgivestheexpectedproportionofthesampleingroup1(definedbythe
newmarker).b2diffthencalculatestheresponseprobabilitiesinthetwogroupsthatgive
adifferenceforwhichthetwo-samplecomparisonwillhavethespecifiedpower,subject
totheoverallresponseprobabilityandsamplesizeconstraints.Theprobabilitiesare
computedfortheone-sidedtestsineachdirection(higherresponseratesingroup1and
lowerresponseratesingroup1).
>b2diff(0.69,0.2,259,alpha=0.05)
[1]0.69000000.20000000.83244400.65421700.53450530.7290615
80%poweratone-sided0.05todetect0.83(group1)vs.0.65(group2),
or0.53(group1)vs.0.73(group2)
15
Power/samplesize:Dura;onofresponse
Durationofresponse(DOR)willbeestimatedusingtheKaplanandMeiermethodoverall
witharmscombinedandbyarmsifsignificantdifferenceisobserved.Logranktest(onesidedsignificancelevelof0.05)willbeusedtocompareDORamong192patients(70%of274
FLresponders)enrolledtomaintenancestepover58monthswith24monthsfollow-up.For
armscombined,assuming20%(n=38)areVVpatientsand80%(n=154)areFcarriers,the
followingtablelistthedifferenceinmedianDORcanbedetectedwithatleast80%power
withaseriespossibleoverallmedianestimate.Forexample,thereisatleast80%powerto
detectadifferenceinmedianDORof27monthsvs16monthswithanoverall18months
DOR.ACoxproportionalhazardsregressionmodelwillbefittoevaluatethesignificanceof
polymorphismeffectonDOR(TTRF)afteradjustingforotherpatientcharacteristics.
Overall
median
16
18
20
MedianDOR(months)
VVpatients
Fcarriers
24
27
31
14
16
18
16
8
4/27/16
Rcode:
##192pts/58months=>3.3pts/month
##computepowertodetectdifferenceinmedfora
##controlgroup(lowmedian,Fcarriers):r2=80%patients,
seriesselectedof
##experimentalgroup(VVpatients):r1=20%
medianofVVpatients
##overallmedianDOR16months,assumingmedianforone
##(1)overallmedian16months
##group=>computemedianfortheothergroup;
med.all=16
hazG2=function(med.all,med.g1,r1,r2)
med.g1=c(20,21,22,23,24,25,26,27,28,29,30,31)
{
haz.all=log(2)/med.all
r1=0.2
haz.g1=log(2)/med.g1
=0.8
haz.g2=(-1)*log((0.5-r1*exp(-haz.g1*med.all))/r2)/med.all
hazG2.pow(med.all,med.g1,r1,r2,0.05)
med.g2=log(2)/haz.g2
med.g1med.g2pow
haz.g2
power2015.170850.3924527
}
power2115.006370.5156537
power2214.855050.6324386
##computepowertodetectthedifferenceinDOR
power2314.715400.7337765
hazG2.pow=function(med.all,med.g1,r1,r2,alp,acc.per=58,
power2414.586150.8152157
acc.rate=3.3,add.fu=24)
power2514.466200.8764927
{
power2614.354600.9200949
Exponentialdistribution power2714.250520.9497021
haz.g1=log(2)/med.g1
haz.g2=c()
power2814.153230.9690430
for(iin1:length(med.g1))
power2914.062100.9812853
{haz.g2=c(haz.g2,hazG2(med.all,med.g1[i],r1,r2))}
power3013.976570.9888412
med.g2=log(2)/haz.g2
power3113.896140.9934133
pow=c()
for(iin1:length(haz.g1))
##(2)overallmedian18months
{pow.i=powlgrnk(acc.per,acc.rate,add.fu,alpha=alp,p.con=r2,
control.rate=haz.g2[i],test.rate=haz.g1[i])
pow=c(pow,pow.i[1])
Log-ranktest
rm(pow.i)}
hazG2.pow=cbind(med.g1,med.g2,pow)
hazG2.pow
}
17
Alterna;veapproach
Iftheanalyticalsubsetisknown,and#eventsarefixed(orwithinarange)=>
calculateHRthatwouldbedetectedbetweentwomarkergroupswithcertainpower
Exponentialdistribution
proportionalhazard
#events
HR(HR<1)
Example:AstudytoconfirmprognosticsignificanceofHGALprotein
expression.
Itisexpectedtohave77failuresintheanalyticalsample.Assuming
proportionalhazardbetweenHGAL-positiveandHGAL-negativegroups,there
is80%power(log-ranktest)atone-sided0.05significanceleveltodetecta
hazardratioof0.49inFFS.Forexample,withtheobserved5-yearFFSrate
72%,5-yearFFSratesof76%vs.57%canbedetectedbetweenHGAL-positive
andHGAL-negativegroups.
18
9
4/27/16
Rcode
##total77failures
n.fail=77
##proportiononpositiveornegativegroupa
r.pos=0.8
r.neg=0.2
TheHRcomputedare<1,
alpha=0.05##one-sidedalpha
withone-sidedαandpower
z.alpha=qnorm(alpha,lower.tail=FALSE)
1-β.Thus,inourcase
beta=0.2##betalevel
z.beta=qnorm(beta,lower.tail=FALSE)
HR(pos/neg)
##hazardratiodetectedbetweenpositiveandnegativegroups
haz.ratio.FFS=exp(-(z.alpha+z.beta)/sqrt(n.fail*r.pos*r.neg))
##[1]0.4924313##pos/neg
Importantfeatureofclinical
correlativestudy:theoverall
##Overall5-yearFFSisfixed,selectpossible5yearFFSrateforonegroup
clinicaloutcomeisfixed!Thus,
##=>compute5-yearFFSratefortheothergroup;
therates/mediansfortwogroups
areconstrainedbytheoverall
S5.overall=0.72
S5.pos=c(0.76,0.75,0.74,0.73)
rates/medians
S5.neg=(S5.overall-r.pos*S5.pos)/r.neg
#[1]0.560.600.640.68
##checkwhichonegivesHRthatmatches
haz.ratio=log(S5.pos)/log(S5.neg)##posvs.neg
#[1]0.47331510.56317080.67468920.8160263
log(0.76)/log(0.57)#1]0.4882185
Power/samplesizecalculation(nottheanalysis)
ismainlyusedtojustifythestudy,noneedtobe
100%rigorous.
19
Analysisandrepor;ng
Practicaltips:
—  Checkbiomarkerdata
Duplicatebiologicalsample=>twomarkervalues
—  Baselinepatientcharacteristics:comparepatientsthatareincludedin
theanalysisvs.theentiretrialcohort
=>Isthestudysamplearepresentativesubsetofthetrialcohort?
Ifnot,nothingtodowithbias,rathergeneralibility
20
10
4/27/16
Con;nuousbiomarker:
E1411:IntergroupRandomizedPhaseIIFourArmStudyInPatientsWithPreviouslyUntreatedMantle
CellLymphomaOfTherapyWith:ArmA=Rituximab+BendamustineFollowedByRituximab
Consolidation(RB→R);ArmB=Rituximab+Bendamustine+BortezomibFollowedByRituximab
Consolidation(RBV→R),ArmC=Rituximab+BendamustineFollowedByLenalidomide+Rituximab
Consolidation(RB→LR)orArmD=Rituximab+Bendamustine+BortezomibFollowedBy
Lenalidomide+RituximabConsolidation(RBV→LR)
N=332,randomizationratio:1:1:1:1
21
Correla;veobjec;ves(prospec;vestudy)
Prognostic
biomarker
Surrogate
endpoint
22
11
4/27/16
MinimalResidualDisease(MRD)Correla;veStudies
-write-up
(aim)Todeterminewhetherthenumberofmalignantcellsincirculationorin
marrowattheendofinductioncorrelatewithCRor2-yearPFS,
(approach)thenumberofmalignantcellswillbecomparedbetweenCRvs.nonCR,andbetweenthosewhoareprogression-freeat2yearsvs.thosewhoarenot,
usingWilcoxonrank-sumtestswithone-sidedTypeIerrorof10%.
(sample)Weexpectthatbloodsampleswillbecollectedinapproximately90%of
patientsatbaseline,and90%ofthesepatientswillhavebloodsamplesatthe
endofinduction(260samples).
(powercalculation)Assuming50%ofpatientswillachieveCRafterinduction,
wewillhaveadequatepower(90%)todetectaneffectsizeof0.33(0.33standard
deviationdifference)innumberofmalignantcellsbetweenCRandnon-CRwith
260bloodsamples.Assuming70%ofpatientswillbeprogression-freeat2years,
wewillhaveadequatepower(90%)todetectaneffectsizeof0.36innumberof
malignantcellsbetweenthosewhoareprogression-freeat2yearsvs.thosewho
arenot,with260bloodsamples.
23
High-throughputcon;nuousbiomarkerdatawith
dichotomizedclinicaloutcome
—  Aim:Toidentifygenewhoseexpressionlevelcorrelatewithresponse
=>Identifydifferentiallyexpressedgenesbetweenresponders(CR/PR)andnonresponders(SD/PD).
—  Power/samplesize:thestudycanbepoweredbasedoninferenceforindividual
genes=>two-sampletestonmean/median
—  Methods:ModifiedT-test(SAM)orLinearmodels(t-orF-statistic);fold-change
basedmethod;rank-basedmethod
Bionconductorhasacollectionofpackages:toidentifydifferentiallyexpressed
geneandcontrollingformultipletestingprocedure,e.g.,falsediscoveryrate
(FDR)approach
Example:Bioconductorpackages:Limma,SAM,RankProd,multtest…..
Rank Product: based on fold-change, also assess biological variation
-- intuition of FC criterion, naturally controlling of FDR
-- fewer assumptions under the model
-- increased performance with noisy data and/or low numbers of replicates
-- application in meta-analysis
-- New version is coming soon….. (improved computational capacity…)
24
12
4/27/16
Exampleofgrantwrite-up
Theanalysisofgeneexpressiondatawillbeprimarilyexploratory.After
microarrayhybridization,therawCELfileswillfirstbeanalyzedwithGC-RMA
method(fornormalizationandexpressionvaluecomputation)toobtaingene
expressionvaluesasatwo-waydatatable.Thereafter,thestatisticalanalysiswill
beusingRsoftwareandBioconductorpackages.
TotestwhetherGEPwilldistinguishFLpatientswithdifferentresponsetosingle
agentrituximab,wewillsearchfordifferentiallyexpressedgenesbetween
responders(CR/PR)andnon-responders(SD/PD).Arankbasedmethod
developedbyDr.Hong(RankProd,Rpackage,BioconductorProject)willbe
applied,withthecontroloftheoveralltypeIerror(multiplecomparison
adjustment)usingafalsediscoveryrate(FDR)approach.Atwo-way
unsupervisedhierarchicalclusteringanalysiswillbeappliedtoboththegenes
andthesamplestoillustratewhetherGEPcanseparaterespondersfromnonresponders.Andthegenesrankedonthetopwillbeassessedbygeneontology
andpathwayanalysis(e.g.,GeneSetEnrichmentAnalysis(GSEA))fortheir
potentialfunctionsinlymphomaprogressandtreatment.
25
SampleSizePlanningforDevelopingClassifiersUsingHigh
DimensionalData
—  Aim:Tobuildupaclassifier(pre-defineddiagnosticorprognosticclasses)
(step1)Identifydifferentiallyexpressedgenesbetweentwoclasses
(step2)Constructclassifierusingidentifiedgenes
—  Bothofthesestepswillintroducevariabilityintotheresultingclassifier,sobothmust
beincorporatedinsamplesizeestimation.
—  Method:
DobbinK,SimonR.Samplesizedeterminationinmicroarrayexperimentsforclass
comparisonandprognosticclassification.Biostatistics.2005Jan;6(1):27-38.
—  Tool
http://linus.nci.nih.gov/brb/samplesize/samplesize4GE.html
—  Theprogramcanbeusedheuristicallyfordevelopingclassifiersforpredictingrisk
groupswithsurvivaldata.
Howtousethisprogramwithsurvivaldata
26
13
4/27/16
High-throughputcon;nuousbiomarkerdatawithsurvival
outcome
E2496:RANDOMIZEDPHASEIIITRIALOFABVDVSSTANFORDV+/-
RADIATIONTHERAPYINLOCALLYEXTENSIVEANDADVANCEDSTAGE
HODGKIN'SDISEASE
Stratification Factors:
Disease: Locally extensive (stage I-IIA/
B
with massive mediastinal
adenopathy) vs. advanced
(stage III-IV)
Rev. 9/02
Number of adverse risk factors:
0 - 2 vs. 3 - 7
Time of entry: Before addendum 6
vs. after addendum 6
ARM A: ABVD1
R
Minimum of 6 and maximum of 8 cycles.3
A
N
D
Rev. 1/00
Modified involved field radiation will be given ONLY to
patients with massive mediastinal disease regardless of
stage. (See Section 5.51).
Rev. 3/01
ARM B: STANFORD V4
O
M
I
12 weeks of chemotherapy.
Z
E
Rev. 1/00
Modified involved field radiation will be given to all patients
with sites of disease > 5 cm in maximum transverse
dimension and/or macroscopic splencic disease. (See
Section 5.52).
1
For Arm A repeat every 28 days.
3
See Section 5.2. Number of cycles will depend on response; 2 cycles will be given beyond best response with a minimum of
6 cycles and a maximum of 8 cycles. If an objective response occurs between cycles 4 and 6, a total of 8 cycles of therapy
will be given. Patients with residual radiographic abnormalities after 4 cycles will receive 2 additional cycles. If there is no
change in residual radiographic abnormalities, these will be interpreted as residual scarring and ABVD will be discontinued.
If there is an objective change in radiographic measurements after 2 additional cycles of ABVD, 2 further cycles will be given
for a total of 8 cycles.
Rev. 3/01 4
For patients currently receiving treatment on Arm B (the Stanford chemotherapy regimen), for whom nitrogen mustard is not
available, a substitution with cyclophoshamide will be made on weeks 1, 5 and 9. The dose of cyclophosphamide is 650
mg/mm2. Dose modification and use of G-CSF should continue as outlined in the protocol.
Rev. 2/03
27
Grantproposal
—  Aim:toidentifygeneexpressionprofileforoutcomepredictionandrisk
stratificationinHodgkinlymphoma,andtodevelopandvalidatethisprofile
intoarobustmulti-geneclassifiersuitableforformalin-fixedparaffin
embedded(FFPE)tissues.
—  Sample:(fixed)309outof752elgiblecaseshavegeneexpressionprofile
measuredonFFPEsample.
—  Clinicaloutcome:Failure-freesurvival(FFS)andoverallsurvival(OS)
—  Markermeasurement:expressionlevelof259genes,includingthose
previouslyreportedtobeassociatedwithoutcomeincHL,weredetermined
bydigitalexpressionprofiling(Naostringtechnology)ofpretreatmentFFPE
biopsies.
28
14
4/27/16
Howtodoapowercalcula;on?
“Whenconstructingaprognosticmarker,eveniftheultimategoalistobuild
amultivariatemarker,afirststepistypicallytoidentifyindividualgenes
associatedwithdiseaseoutcome,sothatagainthestudyshouldbepowered
basedoninferenceforindividualgenes.”(DobbinandSimon,2005)
Whichgenes?
“genesthathavelowvariationacrosstheentiresetofsampleswouldbe
difficulttouseforprognosticpredictioninclinicalsituations”.(Simon,2002)
=>BasedonourpreviousstudyusingNaostringtechnology,thevarianceof
loggeneexpressionhasmedianvalueof0.57(1.15,and2.01forthe3rdquartile
and90%percentile,respectively).Therefore,wedemonstratestatistical
poweringeneidentificationusingmedian,upperquartile,and90%percentile
fromthepreviousexperiment.
29
WhatMethods?
30
15
4/27/16
WhatMethods?
Comparewiththesamplesizeformulaforbinarytesting
31
Powercalcula;onstatement
Thereare35deathsand81failures(progression/relapse,ordeath)observed
within309casesfromE2496.UsingtheformulabyHsiehandLavori(2000),the
followingtablelistshazardratioofdeath/failureassociatedwithaone-unit
changeinlogexpression,whichcanbedetectedwith90%power,underaseries
ofpossiblevarianceingeneexpression.Toadjustformultiplecomparison
problem(total235genes),asmallone-sidedalphaof0.001isusedinthis
calculation.Forexample,thereis90%powertodetectahazardratioof2.10with
death,or1.62withfailurewithaone-unitchangeinlogexpressionforagene
with1.0variance,atone-sidedsignificancelevelof0.001,usingCoxmodel.
Comparingwiththehazardratiodemonstratedinestablishedgene-expression
basedpredictor(e.g.,OncotypeDX),thestudyhasappropriatepowerfor
similartypeanalysisinthispatientpopulation
Variance(loggene
expression)
0.5
1.0
2.0
Hazardratio
OS
FFS
2.85
2.10
1.69
1.99
1.62
1.41
32
16
4/27/16
Analysis(Plan)
Therearemanyapproachedtobuildupsuchapredictor,fromhighthroughput(biomarker)data.Usuallyuselog-transformedand
standardizeddata,andthepredictorisalinearcombinationofafew
“important”genes
1)ParsimoniouspredictivemodelsusingapenalizedCoxmodelonthe
trainingcohort.
Rpackage“penalized”implements“elastic-net”methodonacox
regressionmodel.λ1andλ2parameterscanbetrainedbyusingaleaveoneoutcross-validationapproachwiththelog-likelihoodasthecrossvalidationmetric.λ1willbetrainedfirstandthenλ2willbetrainedwith
respecttotheoptimalλ1.
Totestwhetherthemodelingprocesscouldbesensitivetothenumber
ofgenesintroduced,themodelbuildingprocesswillberepeatedwith
thegenesassociatedwithOSinunivariateanalysisatp<0.05,p<0.10,
p<0.20,p<0.50andp<1.0(all229genes).
33
Analysis(Plan)
2)SupervisedPrinciplecomponents(SPC)techniquewithanestedKfoldcross-validationmethod.
Specifically,K-foldcross-validationmethoddividesdataintokparts,and(k-1)
partsarecombinedasatrainingsettodevelopamodelandonepartisusedasa
testingsettovalidateit,andtheprocedurerepeatsktime.Withineachtime,we
willcalculatetheunivariatescoreforeachgeneusingthetrainingsetandthen
retainonlythosefeatureswhosescoreexceedsathreshold.Andprincipal
componentswillbederivedfromtheselectedgenesfromthetrainingsetand
usedinregressionmodeltopredictoutcomeinthetestingset.Thethreshold
willbeestablishedastheonethatgivesthehighestaveragepartialloglikelihood.ASupervisedPrinciplecomponents(SPC)techniquewillthenbe
appliedtothewholedatasetandalistofgeneswillbeselectedwiththe
establishedthreshold,theprincipalcomponentsderivedusingonlythedata
fromtheseselectedfeatureareusedinamultivariateregressionmodeltopredict
outcome.
34
17
4/27/16
Sofar….
Covered:
Design(power/samplesize),development(write-up)andanalysison
Individualmarker&high-throughputdata.
Biomarker
Clinicaloutcome
Rate/dichotomized
Categorical Fisher’sexact/Chisquaretest
Time-to-eventoutcome
Log-ranktest
Continuous T-test/WilcoxsonTest Coxregressionmodel
Orsimilar
Allunderfull-cohortdesign,andwithinferentialobjective
Withdescriptiveobjectives,usuallywithsmallsamplesize(phaseIorII
trials)=>nopowerstatement,couldprovideconfidenceinterval
35
Cohortsampling
Sampleallcases(patientswithevent)andonlyafractionofcontrols(patientswithout
event).PopularinEpidemiologystudies.
Samplingapproaches:
1)  Classiccase-cohortsampling:allstudycases,plusarandomsubsetofcontrols
(independentofcharacteristicsofcases)
2)  Nestedcase–control(NCC)sampling:allstudycasesbutonlyafractionofcontrols
selectedrandomlyfromtherisksetofthematchedcases
Example:
Investigateblood-basedmarkers(e.g.,EBVload)inHLaspredictorsoftreatmentresponse
andclinicaloutcomesinHL(E2496).
Amongcaseswithspecimenavailable,atotalof54patientsrelapsed(cases)duringstudy
course.Foreachcase,wewillrandomizeddraw1-2matchedcontrolsbasedonother
baselineriskfactors(age,yearssincediagnosis,gender..)
Matchingmethods:PropensityScorematching
36
18
4/27/16
Cohortsamplingdesign-analysisapproaches
Weightedanalysisoftenused
Theunderlyingassumptionforweightedanalysisisthatthestudy
sampleisrepresentativeofthestudycohort,thoughwithadistorted
proportionofevents(failure),producedbyasamplingmethod
(enrichmentforfailure).Therefore,unbiasedestimatescanbeachieved
throughassigningdifferentweighttostudysamplewiththegoalof
restoringtheproportionoftreatmentfailures-inotherwordsmakethe
samplea“random”selectionfromthecohort.
1.  Gray(2009)givesdetailsforimplementationofweightedanalysesforevent-
stratifiedsubsamplingforbiomarkerstudiesinclinicaltrialsandepidemiologic
cohortstudies.
2.  CaiandZheng(2011)proposedInverseprobabilityweighted(IPW)estimators,
wherecontributionsofindividualsareweightedinverselyproportionaltotheir
samplingfractions.
37
Ahypothe;calexample
—  Aim:verifyabiomakerinriskprediction(separatehighvslowriskpatients)
—  Trialcohort:500patients,with100cases,400controls
—  Biomarkerperformance:highriskincludes90%casesand20%controls,low
riskgroupincludes10%casesand80%controls
—  Case-cohortdesign:sampleall100casesandonly100controls
Trialcohort
Riskgroup
#cases #controls
Eventrate
High
90
80
0.52
Low
10
320
0.03
Ratio=17
Studysample
Riskgroup
#cases #controls
Eventrate
High
90
20
0.82
Low
10
80
0.11
Theratiooffailuresin“high”vs.
“low”riskgroupislowerinthestudy
samplerelativetothetrialcohort,
thussimplelog-ranktestwould
under-estimatethedifference.
Ratio=7
38
19
4/27/16
Exampleofacohortusingcase-cohortdesign
Figure:Kaplan-MeierestimatesofFFS:unweightedanalysis(left)andweightedanalysis
(right)
39
Closingremarks
—  Clinicalcorrelativestudiesareusuallydesigntodiscoverorvalidatemarkers
usingarchivedspecimensfromatrial
Þ  Samplesizeisrelativelyfixed;clinicaloutcomeis(largely)known.
Þ  Power/samplesizecalculationisconstrainedbytrialdesignandsamplesize.
—  Descriptiveorinferential?Forclinicaltrialswithsmallsamplesize(phaseI,II
studies),descriptivemightbeanoption.
—  One,multipleorhigh-throughoutbiomarkersmightbeinvolved.
Þ  Powercalculationismainlybasedoninferenceforindividualbiomarkers,
thoughitisnotnecessary(notpossible)foreverybiomarkers.
—  Power/samplesizecalculationistojustifythestudy(willbeabletoget
somethingmeaningful).
—  Analysisplan(includingtests)needstobeprospectivelyspecifiedforall
objectives.
40
20