Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAS Beyond TFLs: An Application as a Statistician’s Tool William Coar [email protected] Date: 10/15/2009 Denver SAS User’s Group Outline 2 Motivation Normal Distribution Ratio of normally distributed random variables Simulation Visualization Brief Application Motivation SAS is the standard for analysis and reporting in the pharmaceutical industry – generation of Tables, Figures, and Listings (TFLs) Data generally reside in SAS format Statistician is a link between data and the report – Use of TFLs – Adhoc analyses – Etc. SAS as a tool beyond reporting and analysis 3 Motivation Drug to improve exercise capacity in subjects with pulmonary arterial hypertension The Treatment effect is change from baseline (cfb) in distance walked in 6 minutes Food and Drug Administration approves the drug but mandates additional information – Is the treatment effect greater when the blood concentration of drug is higher? Suppose less than ½ of the effect is retained when concentrations are low – Consider changing the dosing schedule? 4 Motivation Consider the ratio of two treatment effects – Effect at peak concentration – Effect at trough concentration Allow for a placebo correction Not uncommon to assume the treatment effect (change from baseline) is a distribution that is approximately normal What happens when we have a ratio of two normally distributed random variables? 5 Motivation Consider the following random variables: Trough effect, x ~ N trough, x2 Peak effect, y ~ N peak , y2 Placebo effect, z ~ N placebo, z2 x,y,z can be individual observations or sample means, x and y may be correlated Possible correlation – Peak and trough for same treated subject – Successive measurements in subjects on placebo 6 Motivation Wish to estimate Trough Placebo R Peak Placebo with xz a r yz b By assumption, we expect 0 ≤ R ≤1 a and b are both normal random variables r is a random variable from some distribution – Possibility of zero in the denominator 7 Normal Distribution Normal distribution: – Bell-shaped continuous probability distribution – (theoretical) frequency distribution is symmetric and bell shaped • Frequency distribution: a set of intervals, usually adjacent and of equal width, each associated with a frequency indicating the number of measurements in that interval. – Also called the Gaussian distribution – Many desirable statistical properties 8 Normal Distribution 9 Normal Distribution Denoted N(,2 – Mean () – Variance (2) Continuous range from (-∞, +∞) – Small neighborhood around 0 always possible though it may not be probable – Likelihood around zero depends on and 10 Ratio of Normal Random Variables Problems arise when it is possible for the denominator to be zero Example – Let x~N(x=5,x2=25), y~N(y=10, y2=25) where x and y are independent – 95% chance we observe x between (-5,15) and y between (0,20) – Expect to see a ratio around (x/y)=5/10=1/2 – Suppose we observe a 10 in the numerator and a 0.001 in the denominator – r=10/.001=10,000 11 Ratio of Normal Random Variables Unconditional distribution is a Cauchy-like distribution – Undefined mean and variance Consider a condition – Statistical test to see if denominator is unlikely to be zero – Assures the denominator is not near zero Not unreasonable to require a (significant) treatment effect at peak blood concentrations Goal is to determine how the ratio behaves under this condition 12 Simulate! Simulation: Generate data If z~N(0,1), then x=x+z*x ~ N(x,x2) Bivariate Normal – Two normally distributed random variables that may/may not be correlated – Using Cholesky Decomposition, if z1~N(0,1) and z2~N(0,1) then x x z1 x 2 xy xy y y z1 z 2 y2 2 x x and x~N(x,x2) and y~N(y,y2), rxy=xy/(xy) 13 Simulation: SAS Code data x; mu=0; data x; sd=1; mux=0; do j=1 to 10000; muy=2; z=normal(&seed); sdx=1; x=mu + z*sd; sdy=2; output; end;sxy=0; do j=1 to 10000; run; z1=normal(12356); z2=normal(6789); x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; output; end; run; 14 Simulation: SAS Code goptions reset=all lfactor=2; axis1 value=none label=none minor=none; axis2 value=(h=1.5) label=(h=2); proc univariate data=x gout=intdata.dist1 noprint; var x; histogram / vaxis=axis2 midpoints=-5 to 5 by 1 haxis=axis1; histogram / vaxis=axis2 midpoints=-5 to 5 by .5 haxis=axis1; histogram midpoints=-5 to 5 by .25 haxis=axis1; filename/ vaxis=axis2 got "graphs\normal_freqdist.jpeg"; histogram / normal(color=red mu=est sigma=est w=6) vaxis=axis2 midpoints=-5 to 5 by .1 haxis=axis1; goptions reset=all device=jpeg gsfname=got gsfmode=replace rotate=portrait run; xmax=2in ymax=2in fontres=presentation xpixels=950 ypixels=950; proc greplay igout=intdata.dist1 gout=work.gseg tc=sashelp.templt template=l2r2 nofs; treplay 1=univar 2=univar2 3=univar1 4=univar3; run; quit; 15 Simulation: SAS Code ods graphics on / reset imagefmt=JPEG imagename="compnrml" noborder height=4in width=4in; proc sgplot data=y; density x / lineattrs=(thickness=4) legendlabel='Mu=0, Std=1'; density y / lineattrs=(thickness=4) legendlabel='Mu=2, Std=2'; keylegend / location=inside position=topright across=1; run; quit; ods graphics off; 16 Simulation We can simulate data fairly easily Smooth functions show the general behavior and will allow for multiple distributions per plot There are various ways to obtain histograms – Proc Univariate – Proc GPlot on the histogram output dataset – Proc SGPlot and SGPanel Desire techniques that allow for multiple distributions per plot 17 – Visualize behavior of the ratio for various R, sample sizes, correlation structures, restrictions on the denominator Simulation Three different ratios (peak effect of 30m) – 0, ½, 1 Equal standard deviations for cfb – 65m Three different sample sizes (2:1 randomization drug:placebo) – 60:30 – 112:56 – 184:92 Two (within-subject) correlation structures – High (≈ .9) – Low (≈ .1) Two different levels of alpha for testing the denominator – 0.05, 0.01 18 Simulation Methods for organization should be well thought out – That being said, tasks often evolve… Many conditions to vary (3x3x2x2) – – – – – 19 Separate programs and/or folders Macros Permanent data Graphs in permanent catalogs to be replayed later Naming conventions Simulation Process – Define the set of conditions (ratio, sample size, etc.) – Derive distributions of the numerator and denominator – Simulate normal random variables • Outcomes of a clinical trial – Hypothesis test on the denominator (Type 1 Error) – Exclude the observation if denominator not significantly different from zero – Plot • Multiple curves per plot • Multiple plots per page 20 Simulation Generate x, y, and z x=mux + z1*sdx; y=muy + sxy*z1/sdx + sqrt(sdy**2-(sxy**2)/sdx**2 )*z2; z=muz + z3*sdz; Check for significance in denominator sp=sqrt(sdy**2 + sdz**2); zstat=probit(1-&alpha/2); zcrit=(y-z)/sp; sig=abs(zcrit)>zstat; Determine r r=(x-z)/(y-z); 21 Visualization: One curve GPlot using histogram output dataset SMxx interpolation in the symbol statement – Fits a smooth line to jagged (noisy) data – xx=degree of smoothing • 0-99 • Higher=smoother symbol1 value=none i=SM55 color=red l=3 width=5 ; 22 Visualization Multiple curves – Set histogram datasets together and create a by variable – Add another symbol statement – Update plot statement and add legend proc gplot data=forplot ; plot _obspct_*_minpt_=bygrp / href=(0.5) chref=("red") whref=(2) haxis=axis1 vaxis=axis2 legend=legend1; run; 23 Visualization Add the remaining 4 lines Store it in a permanent catalog to be replayed later. – gout=intdata.dist_6030L Update titles and labels as needed For Proc Replay – Suppress some labels – Increase font sizes and line thickness 24 Visualization Proc GReplay – TDEF statement to create a 2x3 panel of plots – x,y coordinates defined by percentages x x 1 2 4 3 x x x 25 x 5 6 – TREPLAY to insert the graphs created in the simulation exercise Visualization proc greplay nofs; tc=work.templt; tdef Box2x3 des='2 by 3 Boxs' 1 / llx=0 lly=50 ulx=0 uly=100 urx=33.3 ury=100 lrx=33.3 lry=50 2 / llx=33.3 lly=50 ulx=33.3 uly=100 urx=66.6 ury=100 lrx=66.6 lry=50 3 / llx=66.6 lly=50 ulx=66.6 uly=100 urx=100 ury=100 lrx=100 lry=50 4 / llx=0 lly=0 ulx=0 uly=50 urx=33.3 ury=50 lrx=33.3 lry=0 5 / llx=33.3 lly=0 ulx=33.3 uly=50 urx=66.6 ury=50 lrx=66.6 lry=0 6 / llx=66.6 lly=0 ulx=66.6 uly=50 urx=100 ury=50 lrx=100 lry=0 ; run; 26 Visualization Prepare for replaying graphs Copy the graphs to the work graphics catalog Change the names 27 proc catalog c=work.gseg; copy in=intdata.dist_6030L out=work.gseg; change gplot=g6030L / entrytype=grseg; copy in=intdata.dist_6030H out=work.gseg; change gplot=g6030H / entrytype=grseg; copy in=intdata.dist_11256L out=work.gseg; change gplot=g11256L / entrytype=grseg; copy in=intdata.dist_11256H out=work.gseg; change gplot=g11256H / entrytype=grseg; copy in=intdata.dist_18492L out=work.gseg; change gplot=g18492L / entrytype=grseg; copy in=intdata.dist_18492H out=work.gseg; change gplot=g18492H / entrytype=grseg; quit; Visualization 28 xxxxxxx Visualization “Almost” complete – Adjust graphics options to optimize display – Graphic options can be powerful to enhance display • Time versus information trade-off Other ways to do this? – Keep intermediate simulated data or histogram datasets – Proc SGplot or SGPanel – How much processing is needed? 29 Application to Design Scenario 1: Data from one 16 week clinical trial – At week 16, ½ subjects have 6mwd measured at peak blood concentrations, the other measured 6mwd at trough concentrations – Observations that measure peak and trough are independent – Similar to low correlation Scenario 2: Schedule an extra visit at week 12 in one clinical trial – At week 12, some test at peak, some test at trough – Switch and repeat at week 16 – Similar to high correlation scenario 30 Application to Design 31 xxxxxxx Application to Design Higher variation in low correlation scenario If the true ratio was near 1: – For Scenario 1: there is a non-negligible probability of observing something less than ½ – For Scenario 2: much less likely to see something near ½ Costs associated with extra visit? Consequences of decisions made about the true ratio? – Asked to consider a different dosing schedule? 32 Conclusions Simulations in SAS allowed us to understand the conditional distributions – Beneficial in planning Many ways to use graphics to visualize in this simulation – Available procedures • Organizational structure to simulate data and graphs – Obtain a lot of information in each panel – Even more when panels are combined Graphic options can be powerful to enhance display – Time versus information trade-off 33 Questions?