Download Simplifying NDA Programming with PROC SQL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Adherence (medicine) wikipedia , lookup

Polysubstance dependence wikipedia , lookup

Pharmaceutical marketing wikipedia , lookup

Electronic prescribing wikipedia , lookup

Pharmacognosy wikipedia , lookup

Biosimilar wikipedia , lookup

Compounding wikipedia , lookup

Neuropharmacology wikipedia , lookup

List of comic book drugs wikipedia , lookup

Medication wikipedia , lookup

Drug interaction wikipedia , lookup

Theralizumab wikipedia , lookup

Drug design wikipedia , lookup

Pharmaceutical industry wikipedia , lookup

Drug discovery wikipedia , lookup

Prescription costs wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Pharmacovigilance wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Bad Pharma wikipedia , lookup

Transcript
Pharmaceutical Industry
SIMPLIFYING NDA PROGRAMMING WITH PROC SQL
Aileen L. Yam
Coming Besselaar Inc., Princeton, NJ
ABSTRACT
The programming of New Drug Application (NDA) Integrated Summary of Safety (ISS)
usually in'Dol'Des obtaining patient counts, percentages, and other summary statistics such as
mean, standard deviation and range. This paper shows 1zow to obtain all of these results with
the SQL procedure. While PROC SQL is often percei'Ded as a data retriwal tool, its unique
features allow programmers to write compact codes to obtain data summaries for any
application similar to the NDA ISS or the safety summary tables in indi'Didual new drug
studies. This paper also shows that sweral DATA or other PROC steps can be reduced to one or
two steps with PROC SQL.
unique features are boldfaced
in the
following programs. Repeated uses of the
same features in a subsequent program are
not boldfaced or discussed again.
OVERVIEW
At the end of this paper are three
tables that represent the types of most
commonly presented sununaxy statistics in
safety tables in pharmaceutical research.
The data in those tables are fictitious for
illustration purposes only.
Variable names, data set names, macro
variable names and macro variable
references from the programs are italicized
in the discussion.
The three types of summary tables are:
1. counts, percentages, mean, standard
The intention of this paper is not to
advocate PROC SQL over DATA steps or
other procedures, and there is no benchmark
statistics to compare their performance
differences. The objective, however, is to
present the SQL procedure as a valuable
alternative for summarizing data with
fewer steps.
deviation, range and missing value
frequencies of demographic data;
2. counts and percentages of adverse
events by body system;
3. counts and percentages of adverse
events by body system and COSTART
term.
This paper shows that the summary
statistics of each of these three types of
tables can be obtained entirely within one
or two PROC SQL steps. The unique
features of PROC SQL make it possible to
reduce many DATA or other PROC steps in
swnrning, grouping, sorting, selecting first
occurrences of each subgroup, merging,
concatenating, conditional processing, and
calculating percentages, mean, standard
deviation, range and missing values. Such
TOTAL PATIENT COUNTS
The following SQL procedure obtains
total patient CDunts for drug 1, drug 2, and
all drug groups (drug 1 and drug 2 combined,
assigned as drug 3 for report writing
purposes).
Since total patient counts
appear in all three SIlI1lI11a1Y tables, the
counts are calculated 0Ila! and saved in a
permanent data set called totpat.
581
Pharmaceutical Industry
.-........................................................_......_......_...............__.
get the first observation of each patient for
~ count of nonduplicating patients. There is
ro need to count the patients in two steps,
one with by drug group and the other
without by drug group. The resulting counts
will not need to be passed into another
DATA step to be concatenated together, or
to be reorganized by _TYPE_ if a PROC step
for summary statistics is used.
i
j%let numdmg=2;
!
iproc sql;
create table perm.totpat as
select driig,
count (distinct patient) as totn
from raw.data
group by drug
union
select %eval(&numdIUjI:+l) as drug,
count(distinet patient) as totn
from raw.data;
DEMOGRAPHIC TABLE
The DISTINCT keyword eliminates
duplicate rows before counting.
The
GROUP BY clause is used to classify
patient counts into drug groups. The
UNION operator combines two queries,
putting the result from the first query on top
of the result from the second query. The AS
keyword assigns values to a variable.
The demographic table consists of two
parts, so two SQL procedures are written.
The first SQL procedure generates
(ent) and percentages (pet) of gender
and race groups in Table 1.
counts
.•......................................._......_.............................._..................
l%macro xx(outds=,var=);
Assuming that there are two drug
groups, 1 and 2. The first SELECT statement
counts the number of nonduplicating
patients in each drug group. The second
SELECT statement counts the number of
nonduplicating patients without grouping
patients by drug. Notice that ·the variable
drug is given a value of 3 in the second
SELECT statement.
Since PROC SQL
al10ws the selection of a Ii teral numeric
value or a character string for a variable,
any arbitrary value can be assigned. The
number of drug groups is set to 2 in the macro
variable reference, &numdrug; therefore,
the two drug groups combined is assigned as
3, that is, the number of drug groups,
&numdrug, plus one. The results from the
two SELECT statements are concatenated
into a permanent data set, totpat. Totpat
consists of patient counts in drug 1, drug 2, as
well as in drug 1 and drug 2 combined. The
macro variable reference, &numdrug, can be
adjusted according to the rnunber of drug
groups in a study.
lproe sql;
create table &outds as
•
select",
round(ent/
case sum(ent)
i
when 0 then •
else sum(ent)
end "100) as pet
from (select drug, &var,
count(distinct patient) as ent
from raw.data
~up by drug, &var)
group by drug
!
!.:
union
select ~,
round (cnt!
case sum(ent)
i
whenOthen.
else sum(ent)
end "100) as pet
i
from (select
%evaI(&numdrug+1) as drug,
&var,
count(distinct patient) as ent
from raw.data
"~
•
!
.i
I......""':
·
by 1, !"",P by ""'"
j
i %xx(outds:genent,var=gender);
i%xx(outds=raeecnt,var=race);
Several steps are saved. The data do
not need to be sorted by dnlg group and
patient. There is no need to set the data by
the sorted variables into a DATA step to
i
In the macro calls to xx, there are two
major queries joined by the UNION
582
Pharmaceutical Industry
operator. In each of these queries, a
subquay is used by nesting the second
SELECf statement within the first SELEcr
statement. CASE expression is used to
perform conditional processing. The SUM
function is used to calculate the grand to tal
for the denominator. The ORDER BY
clause sorts the results by the order-by
items in a default sequence, from the lowest
value to the highest value.
DATA
statement
for
calculating
percentages. The other is not having to sort
the result table in ascending order.
The second SQL procedure generates
mean, standard deviation, range and
number of missing values of age, weight and
height in Table 1.
_
_
,...._."..... ..._._._._.-...._.... .._._._......._..-.
I%macro yy(outds=,var=);
An asterisk (..) after the SELEcr
statement in the outer query indicates that
all the values, drug, &var and cnf, returned
by the second SELEcr statement are used.
In the second SELEcr statement, the
number (ent) of nonduplicating patients is
counted by drug group and by the macro
variable reference, &var, when it is
resolved. Percentages (pet) are calculated
in the outer query using cnt as numerator and
the SUM of cnt as denominator. The CASE
expression is used to prevent error message
when the denominator is zero. WHEN the
SUM of cnf is zero, mEN it is set to
missing, EISE the SUM of ent is the
denominator. Similar calculations are done
after the UNION operator without
grouping patients by drug. Thus, the counts
(enO and percentages (pet) of gender and
race for the two drug groups separately and
combined are obtained. The results are
ordered by the values in the first and
second columns, as indicated by 1 and 2 in
the ORDER BY clause. The first column is
the first variable specified in the SELEcr
statement, and the first variable is drug.
Similarly, the second column refers to the
second variable in the SELEcr statement,
and the second variable is a macro variable
that varies depending en the values
supplied in the macro calls. In other words,
the results are ordered by drug and gender
in the first macro call, and by drug and race
in the second macro call.
Iprocsql;
i
;
create table &:outds as
select drug,
"&var'" as var,
mean(&:var) as mean,
std(&:var) as std,
min(&:var) as min,
max(&:var) as max,
nmiss(&:var) as nmiss
from raw.clata
group by drug
union
select %eva1(&:numdrug+1) as drug,
"&var'" as var,
mean(&:var) as mean,
std(&:var) as std,
min(&:var) as min,
max(&:var) as max,
nmiss(&:var) as nmiss
from raw.data;
i%mendyy;
!%yy(outds=agestat,var=age);
i% yy(outds=wtstat,var=weight);
i%yy(outds=htstat,var-heignt);
!
In the macro calls to yy, the functions
MEAN, STD, MIN, MAX and NMISS are
used to calculate summary statistics. The
character string when resolved from the
macro variable reference, &var, is used to
associate each variable in the macro call
with its corresponding summary statistics.
All the summary statistics for the two
drug groups separately and combined are
calculated and concatenated within one
SQL procedure.
Besides the steps mentioned under the
Total Patient Counts section, two
additional steps are saved. One is not
having to pass the patient counts into a
ADVERSE EVENTS TABLES
The summary statistics for the two
adverse events tables, Table 2 and Table 3,
583
Pharmaceutical Industry
can be obtained by calling the same single
PROC SQL statement below.
specifies the columns for matching rows in
two data sets to be joined. The WHERE
clause specifies a condition for selecting the
data.
The
OUTER
UNION
CORRESPONDING operator concatenates
results from SELECT statements similar to
using a DATA step with a SET statement.
The differences between UNION and
OUTER UNION CORRESPONDING are:
UNION matches columns in a table
expression by ordinal position, keeping the
oolumn names in the result table from the
first
table.
OUTER
UNION
CORRESPONDING, on the other hand,
matches columns by column names. In
addition, when the OUTER UNION
CORRESPONDING operator is used, the
non-matching columns are retained in the
result table. The DESC keyword sorts the
result table in descending order.
r~:::::;~~:l=::~:;:::~::;:::::;:-
i
sortord=);
i
!proc sql;
create table &coutds (drop=totn) as
select distinct .,
round(cnt/
casetotn
whenOthen.
elsetotn
end ·100) as pet,
1 as seq
from (select &indsl..drug,
count(distinct patient) as cnt,
&inds2 ..totn
from raw.&indslleft join
.&inds2
on &indsLd~~&indS2..drug
where &Selectif
group by &:indsl ..drug)
outer union corresponding
select distinct .,
round(cnt/
case totn
whenOthen.
elsetotn
end ~oo) as pet,
Two sets of queries, identified by the
variable seq as 1 and 2 for report writing
purposes, are concatenated by the OUTER
UNION CORRESPONDING operator. The
first set of queries counts the total number
(cnt) of nonduplicating patients with
adverse events, merges the results with the
totpat permanent data set by drug group,
keeping only the rows from the adverse
events counts with the LEFT JOIN operator,
and calculates the percentages (pet) of ent.
Only patients from the double-blind period
(period=2) are selected in the WHERE
clause. The second set of queries performs
similar calculations, except that the
patient counts (ent) and percentages (pet)
are by body system or by body system and
2as~
:
i
!
I
from (select &:indsl ..drug, &:var,
count(distinct patient) as cot,
&:inds2..totn
from raw.&:indslleft join
.&:inds2
on &:indsLf~g=&indS2..drug
where &:selectif
~up by &:inds1..drug, &:var)
order by seq, drug, &sortord;
i%mendzz;
i
i%zz(mdsl =ae,inds2,;,totpat,out<!s=aebcot,
i
var=body,selectiI=%str(penod=2),sortord=
i
cnt desc);
i%zz(indsl=ae,inds2=totpat,outds=aebccnt,var=
i
%str(body,costart),selectif=%str(period=2),
! sortord=%str(body, cnt dese»;
!
COSTART term.
The first macro call to zz groups
adverse events by body system. The second
macro call to zz groups adverse events by
body system and COSTART term.
For the Adverse Events tables, the
DISTINCT option is used in two different
ways: to count the number of nonduplicating
patient for each adverse event category,
and to eliminate duplicate rows as a result
of the LEFT JOIN.
The DISTINCT keyword eliminates
duplicate rows of data. The LEFT JOIN
operator retrieves matching rows and nonmatching rows based on the data specified
on the left (raw.&inds1). The ON clause
The DISTINCT option is particularly
useful for counting patients with adverse
events, because patients with multiple
occurrences of the same adverse event are to
584
Pharmaceutical Industry
For liiL/'tlonal tnformatwn, contact:
Among the steps
mentioned previously, the most important
steps saved here are not having to sort the
adverse events data and to set the sorted
data to get the first oocurrences of adverse
events by patient.
be counted once only.
Aileen L. Yam
Corning Besselaar, Inc.
210 Carnegie Center
Princeton, NJ 08540-6233
Tel.: (609) 452-4200
The selection of first occurrences of each
adverse event, the conditional processing,
the sorting, the summing, the calculation of
percentages, the concatenation of data sets,
the sorting of the result table by seq, drug,
body system, and by descending adverse
event counts (ent) can all take place within
the same SQL procedure.
SUMMARy
This paper uncovers the potential of
PROC SQL as a very useful data summary
tool, in addition to being a data retrieval
tool.
The beauty of PROC SQL lies in the
simplicity and resourcefulness of the codes.
Several steps can be condensed to make onestep progranuning possible. The tradeoff is
it generally takes more time to write and
debug SQL programs, because with PROC
SQL, the intermediate results from each
step take place internally, and all the
query expressions produce a single output
table.
The programs in this paper were
originally developed for an NDA, but the
programming logic and techniques can be
used for other similar data summaries.
(Three
sample
NDA
Integrated
Summary of Safety tables are included on
the next two pages.)
SAS is a registered trademark or trademark of SAS
Institute Inc. in the USA and other countries. ®
indicates USA registration.
Other brand and product names are registered
tradema~ks or trademarks of their respective
companies.
585
Pharmaceutical Industry
TABLEt
SUMMARY OF DEMOGRAPHIC DATA
Drug 1
Drug 2
849
851
Male
Female
429 (51%)
420 (49%)
432 (51%)
419 (49%)
861 (51%)
839 (49%)
White
Black
Other
467 (55%)
362 (43%)
20 (2%)
471 (55%)
371 (44%)
9 (1%)
938 (55%)
733 (43%)
29 (2%)
Total Patients
AD Drug Groul/!!
1700
Gender
Race
Age
Mean
Standard Deviation
Range
# Missing
Weight (pounds)
Mean
Standard Deviation
Range
# Missing
Height (inches)
Mean
Standard Deviation
Range
# Missing
36.8
36.2
16.3
17-69
1
37.1
16.1
17-69
2
155.2
10.6
96-209
0
156.1
10.9
9S-212
1
155.8
10.8
96-212
1
64.7
65.6
9.0
60-76
65.2
8.9
59-76
8.8
59-72
0
16.2
17-69
3
0
0
TABLE 2
NUMBER AND PERCENT OF PATIENTS WITH ADVERSE EVENTS
BY BODY SYSTEM
lli!!&.l
1lD!z.l
Total Patients
849
851
Total Patients with Adverse Events
420
(49%)
320
(38%)
BODY ASA WHOLE
360
(42%)
277
(33%)
DICESllVE SYSrEM
280
(33%)
230
(27%)
SKIN AND APPENDAGES
200
(24%)
207
(24%)
RESPIRATORY SYSTEM
39
(5%)
32
(4%)
CARDIOVASCULAR SYSTEM
30
(4%)
28
(3%)
ENDOCRINE SYSrEM
9
(1%)
3
(.4%)
NERVOUS SYSTEM
2
(.2%)
1
(.1%)
etc.
586
Pharmaceutical Industry
TABLE 3
NUMBER AND PERCENT OF PATIENTS WITH ADVERSE EVENTS
BY BODY SYSTEM AND COSTART TERM
Dmi-l
~
Total Patients
849
851
Total Patients with Adverse Events
420
(49%)
320
(38%)
120
70
110
64
52
47
18
3
(14%)
(8%)
(8%)
(6%)
(6%)
(2%)
(.4%)
30
12
1
(13%)
(7%)
(6%)
(5%)
(4%)
(1%)
(.1%)
360
(42%)
277
(33%)
120
80
40
12
6
2
(14%)
(9%)
(8%)
(5%)
(1%)
(1%)
(.2%)
100
70
34
30
8
3
1
(12%)
(8%)
(4%)
(4%)
(1%)
(.4%)
(.1%)
280
(33%)
23D
(27%)
BODY AS A WHOLE
HEADACHE
CHILLS
FLU SYNDROME
ALLERGIC REACTION
INFECTION
FEVER
PAIN
Subtotal
62
52
42
DIGESTIVE SYSTEM
DIARRHEA
NAUSEA
FLATULENCE
STOMATITIS
GASTRI1lS
ESOPHAGmS
CONSTIPATION
Subtotal
72
etc.
587