Download GS-98-437 A RANDOMIZED, DOUBLE-BLIND, PLACEBO

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
SAS Beyond TFLs:
An Application as a Statistician’s Tool
William Coar
[email protected]
Date: 10/15/2009
Denver SAS User’s Group
Outline






2
Motivation
Normal Distribution
Ratio of normally distributed random variables
Simulation
Visualization
Brief Application
Motivation
 SAS is the standard for analysis and reporting in the
pharmaceutical industry
– generation of Tables, Figures, and Listings (TFLs)
 Data generally reside in SAS format
 Statistician is a link between data and the report
– Use of TFLs
– Adhoc analyses
– Etc.
 SAS as a tool beyond reporting and analysis
3
Motivation
 Drug to improve exercise capacity in subjects with
pulmonary arterial hypertension
 The Treatment effect is change from baseline (cfb) in
distance walked in 6 minutes
 Food and Drug Administration approves the drug but
mandates additional information
– Is the treatment effect greater when the blood
concentration of drug is higher?
 Suppose less than ½ of the effect is retained when
concentrations are low
– Consider changing the dosing schedule?
4
Motivation
 Consider the ratio of two treatment effects
– Effect at peak concentration
– Effect at trough concentration
 Allow for a placebo correction
 Not uncommon to assume the treatment effect
(change from baseline) is a distribution that is
approximately normal
 What happens when we have a ratio of two normally
distributed random variables?
5
Motivation
 Consider the following random variables:

Trough effect, x ~ N trough,  x2
Peak effect, y ~ N peak ,  y2 


Placebo effect, z ~ N  placebo,  z2

 x,y,z can be individual observations or sample
means, x and y may be correlated
 Possible correlation
– Peak and trough for same treated subject
– Successive measurements in subjects on placebo
6
Motivation
 Wish to estimate
Trough   Placebo
R
 Peak   Placebo
with
xz a
r

yz b
 By assumption, we expect 0 ≤ R ≤1
 a and b are both normal random variables
 r is a random variable from some distribution
– Possibility of zero in the denominator
7
Normal Distribution
 Normal distribution:
– Bell-shaped continuous probability distribution
– (theoretical) frequency distribution is symmetric and
bell shaped
• Frequency distribution: a set of intervals, usually
adjacent and of equal width, each associated with a
frequency indicating the number of measurements in
that interval.
– Also called the Gaussian distribution
– Many desirable statistical properties
8
Normal Distribution
9
Normal Distribution
 Denoted N(,2
– Mean ()
– Variance (2)
 Continuous range from
(-∞, +∞)
– Small neighborhood
around 0 always
possible though it may
not be probable
– Likelihood around zero
depends on  and 
10
Ratio of Normal Random Variables
 Problems arise when it is possible for the
denominator to be zero
 Example
– Let x~N(x=5,x2=25), y~N(y=10, y2=25) where x
and y are independent
– 95% chance we observe x between (-5,15) and y
between (0,20)
– Expect to see a ratio around (x/y)=5/10=1/2
– Suppose we observe a 10 in the numerator and a
0.001 in the denominator
– r=10/.001=10,000
11
Ratio of Normal Random Variables
 Unconditional distribution is a Cauchy-like
distribution
– Undefined mean and variance
 Consider a condition
– Statistical test to see if denominator is unlikely to be
zero
– Assures the denominator is not near zero
 Not unreasonable to require a (significant) treatment
effect at peak blood concentrations
 Goal is to determine how the ratio behaves under
this condition
12
Simulate!
Simulation: Generate data
 If z~N(0,1), then x=x+z*x ~ N(x,x2)
 Bivariate Normal
– Two normally distributed random variables that
may/may not be correlated
– Using Cholesky Decomposition, if z1~N(0,1) and
z2~N(0,1) then
x   x  z1 x
2
 xy

xy
y   y  z1
 z 2  y2  2
x
x
and x~N(x,x2) and y~N(y,y2), rxy=xy/(xy)
13
Simulation: SAS Code
data x;
mu=0;
data x;
sd=1;
mux=0;
do j=1
to 10000;
muy=2;
z=normal(&seed);
sdx=1;
x=mu + z*sd;
sdy=2;
output;
end;sxy=0;
do j=1 to 10000;
run;
z1=normal(12356);
z2=normal(6789);
x=mux + z1*sdx;
y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2;
output;
end;
run;
14
Simulation: SAS Code
goptions reset=all lfactor=2;
axis1 value=none label=none minor=none;
axis2 value=(h=1.5) label=(h=2);
proc univariate data=x gout=intdata.dist1 noprint;
var x;
histogram / vaxis=axis2 midpoints=-5 to 5 by 1 haxis=axis1;
histogram / vaxis=axis2 midpoints=-5 to 5 by .5 haxis=axis1;
histogram
midpoints=-5 to 5 by .25 haxis=axis1;
filename/ vaxis=axis2
got "graphs\normal_freqdist.jpeg";
histogram / normal(color=red mu=est sigma=est w=6) vaxis=axis2
midpoints=-5
to 5 by .1 haxis=axis1;
goptions
reset=all device=jpeg
gsfname=got gsfmode=replace rotate=portrait
run;
xmax=2in ymax=2in fontres=presentation xpixels=950 ypixels=950;
proc greplay igout=intdata.dist1 gout=work.gseg tc=sashelp.templt template=l2r2 nofs;
treplay 1=univar 2=univar2 3=univar1 4=univar3;
run;
quit;
15
Simulation: SAS Code
ods graphics on / reset imagefmt=JPEG imagename="compnrml" noborder
height=4in width=4in;
proc sgplot data=y;
density x / lineattrs=(thickness=4)
legendlabel='Mu=0, Std=1';
density y / lineattrs=(thickness=4)
legendlabel='Mu=2, Std=2';
keylegend / location=inside
position=topright across=1;
run;
quit;
ods graphics off;
16
Simulation
 We can simulate data fairly easily
 Smooth functions show the general behavior and will
allow for multiple distributions per plot
 There are various ways to obtain histograms
– Proc Univariate
– Proc GPlot on the histogram output dataset
– Proc SGPlot and SGPanel
 Desire techniques that allow for multiple distributions
per plot
17
– Visualize behavior of the ratio for various R, sample
sizes, correlation structures, restrictions on the
denominator
Simulation
 Three different ratios (peak effect of 30m)
– 0, ½, 1
 Equal standard deviations for cfb
– 65m
 Three different sample sizes (2:1 randomization drug:placebo)
– 60:30
– 112:56
– 184:92
 Two (within-subject) correlation structures
– High (≈ .9)
– Low (≈ .1)
 Two different levels of alpha for testing the denominator
– 0.05, 0.01
18
Simulation
 Methods for organization should be well thought out
– That being said, tasks often evolve…
 Many conditions to vary (3x3x2x2)
–
–
–
–
–
19
Separate programs and/or folders
Macros
Permanent data
Graphs in permanent catalogs to be replayed later
Naming conventions
Simulation
 Process
– Define the set of conditions (ratio, sample size, etc.)
– Derive distributions of the numerator and denominator
– Simulate normal random variables
• Outcomes of a clinical trial
– Hypothesis test on the denominator (Type 1 Error)
– Exclude the observation if denominator not
significantly different from zero
– Plot
• Multiple curves per plot
• Multiple plots per page
20
Simulation
 Generate x, y, and z
x=mux + z1*sdx;
y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2;
z=muz + z3*sdz;
 Check for significance in denominator
sp=sqrt(sdy**2 + sdz**2);
zstat=probit(1-&alpha/2);
zcrit=(y-z)/sp;
sig=abs(zcrit)>zstat;
 Determine r
r=(x-z)/(y-z);
21
Visualization: One curve
 GPlot using histogram
output dataset
 SMxx interpolation in the
symbol statement
– Fits a smooth line to
jagged (noisy) data
– xx=degree of smoothing
• 0-99
• Higher=smoother
symbol1 value=none i=SM55 color=red l=3 width=5 ;
22
Visualization
 Multiple curves
– Set histogram datasets
together and create a by
variable
– Add another symbol
statement
– Update plot statement and
add legend
proc gplot data=forplot ;
plot _obspct_*_minpt_=bygrp / href=(0.5) chref=("red") whref=(2)
haxis=axis1 vaxis=axis2 legend=legend1;
run;
23
Visualization
 Add the remaining 4 lines
 Store it in a permanent
catalog to be replayed
later.
– gout=intdata.dist_6030L
 Update titles and labels
as needed
 For Proc Replay
– Suppress some labels
– Increase font sizes and
line thickness
24
Visualization
 Proc GReplay
– TDEF statement to create a 2x3 panel of plots
– x,y coordinates defined by percentages
x
x
1
2
4
3
x
x
x
25
x
5
6
– TREPLAY to insert the graphs created in the
simulation exercise
Visualization
proc greplay nofs;
tc=work.templt;
tdef Box2x3 des='2 by 3 Boxs'
1 / llx=0 lly=50 ulx=0 uly=100 urx=33.3 ury=100 lrx=33.3 lry=50
2 / llx=33.3 lly=50 ulx=33.3 uly=100 urx=66.6 ury=100 lrx=66.6
lry=50
3 / llx=66.6 lly=50 ulx=66.6 uly=100 urx=100 ury=100 lrx=100
lry=50
4 / llx=0 lly=0 ulx=0 uly=50 urx=33.3 ury=50 lrx=33.3 lry=0
5 / llx=33.3 lly=0 ulx=33.3 uly=50 urx=66.6 ury=50 lrx=66.6 lry=0
6 / llx=66.6 lly=0 ulx=66.6 uly=50 urx=100 ury=50 lrx=100 lry=0
;
run;
26
Visualization
 Prepare for replaying graphs
 Copy the graphs to the work graphics catalog
 Change the names
27
proc catalog c=work.gseg;
copy in=intdata.dist_6030L out=work.gseg;
change gplot=g6030L / entrytype=grseg;
copy in=intdata.dist_6030H out=work.gseg;
change gplot=g6030H / entrytype=grseg;
copy in=intdata.dist_11256L out=work.gseg;
change gplot=g11256L / entrytype=grseg;
copy in=intdata.dist_11256H out=work.gseg;
change gplot=g11256H / entrytype=grseg;
copy in=intdata.dist_18492L out=work.gseg;
change gplot=g18492L / entrytype=grseg;
copy in=intdata.dist_18492H out=work.gseg;
change gplot=g18492H / entrytype=grseg;
quit;
Visualization
28
xxxxxxx
Visualization
 “Almost” complete
– Adjust graphics options to optimize display
– Graphic options can be powerful to enhance display
• Time versus information trade-off
 Other ways to do this?
– Keep intermediate simulated data or histogram
datasets
– Proc SGplot or SGPanel
– How much processing is needed?
29
Application to Design
 Scenario 1: Data from one 16 week clinical trial
– At week 16, ½ subjects have 6mwd measured at peak
blood concentrations, the other measured 6mwd at
trough concentrations
– Observations that measure peak and trough are
independent
– Similar to low correlation
 Scenario 2: Schedule an extra visit at week 12 in one
clinical trial
– At week 12, some test at peak, some test at trough
– Switch and repeat at week 16
– Similar to high correlation scenario
30
Application to Design
31
xxxxxxx
Application to Design
 Higher variation in low correlation scenario
 If the true ratio was near 1:
– For Scenario 1: there is a non-negligible probability of
observing something less than ½
– For Scenario 2: much less likely to see something
near ½
 Costs associated with extra visit?
 Consequences of decisions made about the true
ratio?
– Asked to consider a different dosing schedule?
32
Conclusions
 Simulations in SAS allowed us to understand the
conditional distributions
– Beneficial in planning
 Many ways to use graphics to visualize in this simulation
– Available procedures
• Organizational structure to simulate data and graphs
– Obtain a lot of information in each panel
– Even more when panels are combined
 Graphic options can be powerful to enhance display
– Time versus information trade-off
33
Questions?