Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE CONSTRUCTION OF LITERATURE INDEXES USING SAS Paul J. von Doehren, G. D. Searle & Co. ABSTRACT DISClJSSION Index preparation is often useful when a moderate-ta-large number of literature sources are available for a particular The indexing code has been divided ·into four sections. Each se'eticn has been placed in a separate macro as follows: application. Simple alphabetical listings, perhaps with subgrouping by MACRO INDXDATA - Transfers reference topic, are usually adequate when the number of references is small (e.g., up information from a user file to a SAS data set. (See documentation in Figure 6 for instructions on to 50 sources) and the number of topics is restricted. An improved form of indexing becomes attractive as the numbers of references, topics and preparation of the data file.) MACRO INDXAUTH - Constructs an alphabetical listing by author. potential users increase. Several indexing algorithms already exist. However, the acquisition and maintenance of an existing indexing software package for use with moderate size source lists One entry is produced for each author. MACRO INDXSOUR - Constructs an alphabetical listing according to may not provide sufficient flexibility, simplicity and convenience compared to an reference source. One entry is included for each reference. algorithm implemented in a familiar MACRO INDXKEYW - Transfers drop-list high-level language (e.g., SAS). information from a user file to a SAS data set and constructs an A collection of SAS statements that constructs a listinq of references according to subject is presented. The subject listing is produced using a combination of keyword and keyword-in- alphabetical listing according to title words and special keywords. One entry is included for each word or keyword except for those words that appear on the drop-list. context approaches to index construction. Simple author and source listings are The author index (macro INDXAUTH) is also constructed by the SAS code. constructed very simply by creating separate data set observations for each INTRODUCTION author as it is being assigned to a special sort variable, SORTKEY. The resulting data set is then sorted by the SORTKEY variable and formatted for output. The source index (macro INDXSOUR) is just a simple sort on the SOURCE and AUTHORI (i.e., principle author) Variables followed by output formatting. Keyword indexes produced from title words and special keywords provide a convenient means of locating articles on a particular topic within a longer list of reference articles covering a wide range of subjects. This is especially true when the list of titles becomes too long to manage in a simple alphabetical listing (e.g., by author). A variety of The keyword index (macro INDXKEYW) is a potential users also tends to complicate the construction of simple alphabetical listings. Computer software is already little more complicated. However, the majority of the code is concerned with available for the construction of keyword formatting the output. indexes (e.g., References 1 and 2). However these are not readily accessible and convenient for the occasional user. A previous application (Reference 3) illustrated how SAS could be used to generate a keyword-in-context (K.W.I.C.) observations are created for each title Rword" (i.e., SAS variable) and each keyword as they are being assigned to a special KEYWORD variable. The resulting data set is sorted by the KEYWORD variable and merged with the drop-list index for a specialized application. data set. It Separate data set All observations that match was pointed out that a basic keyword the drop-list entries are deleted. index could be constructed using a small number of SAS statements. The present paper introduces a set of SAS statements remainder of the code is concerned with The formatting the output and in particular shifting the title left or right (wrapping around to the opposite side of the page where necessary) so that the keywords form a vertical column down the page. The appearance of the output is that provides a more general capability for literature indexing. BAS users should find it easy to implement and modify the statements (if necessary) to fit specialized needs. described in the discussion of the example. 607 EXAMPLE The title in the keyword index (Figure 5) is shifted left or right so that the keyword is positioned in a vertical column down the page to facilitate visual scanning. If necessary, -the title will ~wrap-aroundn to the opposite side of the page (e.g., see title line for the keyword "THEORY" in Figure 5). A dashed line is also included in the line below each keyword to make the keyword column easier to follow. If the entry corresponds to a special keyword, e.g., 'STATISTICAL-PACKAGES' in Figure 5, the title is positioned with the first word in the keyword column and the keyword is positioned directly below. Author(sl and source information are also included in each keyword index entry. The information typically available for a reference includes author{s), title, keywords and publication source. The list of references for this paper provides an example of these types of information except for special keywords which might be obtained from the reference, an abstract or a review. Figure 1 presents this same information (edited) in a form that is useable by the indexing code. The author(s) for a particular reference are entered on a single line (or card). The title is entered using up to two lines~ The keywords are also entered on up to two lines and the source identification is entered on a single line. A detailed description for preparing input is included in the documentation provided in the listing (Figure 6). For the example of Figure 1 the first -two references used 5 lines (cards) each and the last reference used four lines. The '$$1 entries in the title lines of the first two references indicate that the title continues on the following line (see file description, Figure 6). The information for the references is placed on a Mfile" and is accessed in the code by an INFILE statement under the DDNAME of DATA (Figure 6). REFERENCES (1) Joiner, B. L. and J. M. Gwyne, (Eds .. ), "Current Index to Statistics Applications, Methods and Theory, Volume 5," Am Stat Assn and Inst of Math Stat, 1979. (2) Ross, 1. C. and J. W. Tukey, "Index to Statistics and Probability: Permuted Ti tIes; A-Microbiology,·1 The R&D Press, Los Altos, Calif., 1975. (3) Spitznagel, E. L., Jr., "K.W.I.C. Indexes With SAS," Proc. Third Annual SUGI Conference, pp. 267-270, 1977. Words that appear in the title but which are to be dropped by the code when constructing entries for the keyword index are included in a second file. Figure 2 provides a listing of words that were dropped during index construction for the example. This file is accessed by the code under the DDNAME of NOTKEY (Figure 6). Figures 3, 4, and 5 display the author, source and keyword indexes, respectively, that were produced using the macros plus the fOllowing SAS code. FIGURE 1. LISTING OF REFERENCE INFORMATION (EXAMPLE) 00010 00020 00030 00040 00050 00060 00070 00080 00090 00100 00110 00120 00130 00140 INDXDATA INDXAUTH TITLE TITLE FOR AUTHOR INDEX EXAMPLE; INDXSOUR TITLE TITLE FOR SOURCE INDEX EXAMPLE; INDXKEYW TITLE TITLE FOR KEYWORD INDEX EXAMPLE; Separate index entries appear for each author of a reference in the-author index (Figure 3). The author(s) appear at the left of the index entry. When a reference has more than one author the positions of the authors are interchanged so that the author used in the sorting operation appears at the left of the entry. The reference title and source are also included in each index entry. The same information appears in the source index entries (Figure 4).. The positions of the source and author(s) are interchanged so that the source appears at the left side of the entry. JOINER, B. L.(ED.) GWYNE, J. M. (AST.ED.) CURRENT I~"EX TO STATISTICS- $$ APPLICATIONS, METHODS AND THEORY J VOLUME-5 NONE AM ST ASN & INST MATH ST, 1979 ROSS, I~ c~ TUKEY, J. W. INDEX TO STATISTICS AND PROBABILITY, $$ PERMUTED TITLES, A-MICROBIOLOGY NONE THE R&D PRESS~LOS ALTOS,CA,~75 SPITZNAGEL, E. L.,JR K.W.I.C. INDEXES WITH SAS STATISTICAL-PACKAGES PROC 3RD SUOI CONF,P267,1977 FIGURE 2. LISTING OF WORDS TO BE DROPPED (EXAMPLE) 00010 A-MICROBIOLOGY 00020 AND CURRENT 00030 TITLES TO VOLUME-5 WITH 608 "'f-:"':'\"l';«\~'l'Ft~~~<'>:T>~"~~ •. ",,'"<-; ·'''','''·'''"''''''·.,.... 't'~fj.,:--;'· :~""'-''''','''-'- ''''_;-..-.: ..-~,,''J><,ni_;''': "'''Y'O'¥''o" - '~"'':'~",-'"~'<''''' >,-_'~,"'_'" FIGURE 3. AUTHOR INDEX (EXAMPLE) TITLE FOR AUTHOR INDEX EXAMPLE AUTHOR INDEX 0:16 MONDAY, JANUARY 26, 1981 AUTHOR(S) TITLE SOURCE GWYNE, J. M. (AST.ED.) • JOINER, B. L.(ED.) CURRENT INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VOLUME-S AM ST ASN AM ST ASN & INST MATH ST, 1979 JOINER, B. L.(ED.) * GWYNE, J. M. (AST.ED.) & 1 !KST MATH ST, 1979 CURRENT INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VOLUME-S ROSS, I. C. * TUKEY, J. W. INDEX TO STATISTICS AND PROBABILITY: PERMUTED TITLES. A-MICROBIOLOGY THE R&D PRESS,LOS ALTOS,CA,'75 SPITZNAGEL. E. L.,JR PROC 3RD SUGI CONF,P267,1977 K.W.I.C. INDEXES WITH SAS '"o '" TUKEY, J, W. * ROSS, I. C. THE R&D PRESS,LOS ALTOS,CA,'75 INDEX TO STATISTICS AND PROBABILITY: PERMUTED TITLES, A-MICROBIOLOGY FIGURE 4. SOURCE INDEX (EXAMPLE) TITLE FOR SOURCE INDEX EXAMPLE 0:16 MONDAY, JANUARY 26, 1981 SOURCE INDEX SOURCE TITLE AUTHOR(S) JOINER, B. L.(ED.) * GWYNE, J. CURRENT INDEX TO STATISTICS- APPLICATIONS. METHODS AND THEORY J VOLUME-5 AM ST ASN & INST MATH ST, 1979 PRoe 3RD SUG! CONF,P267,1977 K.W.I.C. INDEXES WITH SAS SPITZNAGEL, E. L. lJR THE R&D PRESS,LOS ALTOS,CA,'7S ROSS, I. C•• TUKEY, J. W. INDEX TO STATISTICS AND PROBABILITY: PERMUTED TITLES, A-MICROBIOLOGY ~. (AST.ED.) 2 FIGURE 5. KEYWORD INDEX (EXAMPLE) TITLE FOR KEYWORD INDEX EXAMPLE 0:16 MONDAY, JANUARY 26, 1981 3 KEYWORD INDEX KEYWORD SOURCE • AUTHOR(S) CURRENT INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VOLUME-5 ------------• JOINER, B. L.(ED.) • GWYNE, J. M. (AST.ED.) AM ST ASN & INST MATH ST, 1979 CURRENT INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VDLUME-5 • JOINER, B. L.(ED.) * GWYNE, J. M. (AST.ED.) AM ST ASN & INST MATH ST, 1979 INDEX Te STATISTICS AND PROBABILITY: PERMUTED TIT~S, A-MlCROBIOLOGY * ROSS, I. C. * TUKEY, J. W. THE R&D PRESS,LOS ALTOS,CA,-75 K.W.I.C. INDEXES WITH SAS * SPITZNAGEL, E. L. ,JR PRoe 3RD SUGr CONF,P267,1977 K.W.I.C. INDEXES WITH SAS -------* SPITZNAGEL. E. L.,JR PRoe 3RD SUGr CONF,P267,1977 CURRENT INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VOLUME-5 • JOINER, B. L.(ED.) • GWYNE, J. M. (AST.ED.) AM ST ASN & INST MATH ST, 1979 INDEX TO STATISTICS AND PROBABILITY; PERMUTED TITLES, A-MICROBIOLOGY * ROSS, 1. C. * o rl "' TUKEY, J. W. IHE R&D PRESS,LOS ALTOS,CA,'75 INDEX TO STATISTICS AND PROBABILITY: PERMUTED TITLES, A-MICROBIOLOGY * ROSS, I. C. * TUKEY, J. W. THE R&D PRESS,LOS ALTOS.CA,~75 K.W.I.C. INDEXES WITH SAS * SPITZNAGEL, E. L.,JR PRoe 3RD SUGI CONF,P267,1977 K.W.I.C. INDEXES WITH SAS STATISTICAL-PACKAGES* SPITZNAGEL, E. L.,JR PROC 3RD SUGI CONF,P267,1977 INDEX TO STATISTICS AND PROBABILITY: PERMUTED TITLES, A-MICROBIOLOGY * ROSS, I. C. * TUKEY, J. W. THE R&D PRESS,LOS ALTOS.CA,~75 CURRENT INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VOLUME-5 ----------* JOINER, B. L.(ED.) * GWYNE, J. M. (AST.ED.) .~ INDEX TO STATISTICS- APPLICATIONS, METHODS AND THEORY, VOLUME-5 * ST ASN & INST MATH ST, 1979 CURRENT JOINER, B. L.(ED.) • G''YNE, J. M. (AST.ED.) AM ST ASN & INST MATH ST, 1979 INDEX TO STATISTICS AND PROBABILITY: PERMUTED TITLES, A-MICROBIOLOGY ------- * ROSS, I. C. * TUKEY, J. W. THE R&D PRESS,LOS , ___ .. ....'-',.. ~' _~ '"",_,~,,;-, . '~ ___ ." .. _-"-.•• "__ ,~._""-:.t.",,,_, __ , ".---,,,~,- -,- , -• • 0 _;-'-'-'.':'~-'-:_'"------'- •• _,--C"~':"" ._., _,~_ •.• ;~,:;;"'~~_ .• -_'''><·''''m'''......1..'--''----'~ ....""-,--,,,,-~,,,s.~ ALTOS.CA,~75 " r " _ """"" FIGURE 6. I tp: 00010 00020 00030 00040 00050 00060 000 70 00080 00090 00100 00110 OOlZO 00130 00140 00150 00160 00170 00180 00190 00200 00210 00220 00230 00240 00250 00260 00270 00280 00290 00300 00310 00320 00330 00340 00350 00360 00370 00380 00390 00400 00410 00420 00430 00440 00450 00460 00470 00480 00490 00500 00510 00520 00530 00540 00550 00560 00570 00580 00590 00600 00610 00620 00630 00640 00650 00660 00670 00680 INDEX CODE LISTING OPTIONS NOSOURCE; MACRO INDXDATA ***********************************************************************; ******** ******** ******** ******** ******** ******** ******** LIT ERA T U R E I N D E X I N G COD E ********; ********; ********; PAUL J. ********; VON DOEHREN SCIENTIFIC EVALUATION DEPARTMENT G. D. SEARLE & CO., BOX 5110 *****~**; ********; CHICAGO, ILLINOIS 60680 ********; ******** ********; ***********************************************************************; * NAME FILE DESCRIPTION *************************************************************; ******* FILE CONTAINING WORDS TO BE DROPPED DURING THE CONSTRUCTION 'NOT KEY OF THE KEYWORD INDEX. LINE SIZE CURRENTLY LIMITED TO 72 COLUMNS. MAXIMUM -WORD SIZE IS 20 CHARACTERS. WORDS }fUST BE SEPARATED BY AT LEAST ONE BLANK CHARACTER. ANY NUMBER OF WORDS MAY BE INCLUDED ON A GIVEN LINE SUBJECT TO THE ABOVE. *DATA FILE CONTAINING REFERENCE CONSISTS OF FOUR PARTS: INFORMATION~ EACH REFERENCE ENTRY **AUTHOR: UP TO 5 AUTHORS ON ONE 72-COLUMN LINE. MAX 25 CHARACTERS PER AUTHOR ENTRY, LAST NAME FIRST. SINGLE EMBEDDED BLANKS WITHIN AUTHOR. DOUBLE BLANK-CHARACTER BETWEEN AUTHORS. '*TITLE: UP TO 25 WORDS ON ONE OR TWO (OPTIONAL) 72-COLUMN LINES. MAX 20 CHARACTERS PER WORD, NO EMBEDDED BLANKS. IF SECOND LINE USED THEN -$$- REQUIRED AS LAST ENTRY ON FIRST LINE TO INDICATE CONTINUATION. ""KEY: UP TO 15 KEYWORDS ON ONE OR TWO (OPTIONAL) 72-COLUMN LINES. MAX 20 CHARACTERS PER WORD, NO EMBEDDED BLANKS. IF SECOND LINE USED THEN -$$- REQUIRED AS LAST ENTRY ON FIRST LINE TO INDICATE CONTINUATION. . **SOURCE: UP TO 30 CHARACTERS (SINGLE EMBEDDED BLANKS) ON ONE LINE IDENTIFYING THE SOURCE OF THE REFERENCE. ******************************DATA INPUT*******************************; DATA SETDATA; LENGTH TITLEI-TITLE25 $ 20.; ARRAY TITLE (I) TITLEI-TITLE25; LENGTH KEYI-KEYI5 $ 20.; ARRAY KEY (I) KEYI-KEY15; INFILE DATA LS;72 MISSOVER; INPUT (AUTHORI-AUTHOR5) (& $25.); DO OVER TITLE; INPUT TITLE @; IF TITLE NE '$$; THEN GOTO ENDOl; INPUT; INPUT TITLE @; END01: END; INPUT; DO OVER KEY; INPUT KEY @; IF. KEY NE '$$' THEN GOTO END02; INPUT; INPUT KEY @; END02: END; INPUT; TLAST=26; KLAST=16; INPUT SOURCE & $30.; TLT=O; DO OVER TITLE; IF TITLE;- - THEN GO TO ENDTITLE; TLT;TLT+LENGTH(TITLE)+l; END; ENDTITLE: TLAST=I-l; IF TLT GT 115 THEN GOTD OBSDROP; DO OVER KEY; IF KEY EQ THEN 'GOTO ENDKEY; END; ENDKEY: KLAST=I-l; RETURN; OBSDROP: PUT 'TITLE EXCEEDS LINE LENGTH FOR AUTHOR ENTRY'; PUT @5 N AUTHORI-AUTHOR5; DELETE; %MACRO INDXAUTH**************AUTliOR INOEX******************************; DATA SETAUTH; SET SETDATA; ARRAY AUTHOR{J) AUTHORI-AUTHOR5; DO OVER AUTHOR; IF AUTHOR EQ ' ~ THEN DELETE; SORTKEY=AUTHOR; OUTPUT; END; PROC SORT DATA;SETAUTH OUT;SBTAUTH; RY SORTKEY; DATA NULL; SET SETAUTH; ARRAY TITLE(I) TITLEI-TITLE25; ARRAY AUTHOR(J) AUTHORI-AUTHOR5; FILE PRINT COLUMN=NCOL HEADER=HEADER LINE=NLINE; IF N EQ 1 THEN PUT _PAGE ; IF NLINE GT 55 THEN PUT _PAGE_; I 611 t ! ~ I !, 00690 00700 00710 00720 00730 00740 007S0 00760 00770 00780 00790 00800 00810 00820 00830 00840 00850 00860 00870 00880 00890 00900 00910 00920 00930 00940 00950 009&0 00970 00980 00990 01000 01010 01020 01030 01040 010S0 01060 01070 01080 01090 01100 0111 0 01120 01130 01140 01150 01160 01170 01180 01190 01200 01210 01220 01230 01240 012S0 01260 01270 01280 01290 01300 01310 01320 01330 01340 01350 01360 01370 01380 PUT @ 2 @; START=O; DO OVER AUTHOR; IF AUTHOR EQ ' , THEN GOTO FINISHj IF AUTHOR EQ SORTKEY THEN START=I; IF START EQ 1 THEN DO; PUT AUTHOR @; PUT '* '@; END; END; FINISH: DO OVER AUTHOR; IF AUTHOR EQ SORTKEY THEN GO TO SKIPA; PUT AUTHOR @; PUT ' . '@; END; SKIPA: BACK2~-2; PUT +BACK2' , @ 100 SOURCE; PUT @ 6 @; DO OVER TITLE; IF TITLE EQ ' , THEN GO TO SKIPT; PUT TITLE @; END; SKIPT: PUT; PUT; RETURN; HEADER: PUT @ 60 'AUTHOR INDEX'; PUT; PUT @ 2 'AUTHOR(S)' @100 'SOURCE'; PUT @ 6 ~TITLE'; PUT @ 2 130*'-'; PUT; %MACRO INDXSQUR**************SQURCE INDEX******************************; PRoe SORT DATA=SETDATA OUT=SETSOUR; BY SOURCE AUTHORl; DATA NULL; SET SETSOUR; ARRAY TITLE(I) TITLEI-TITLE2S; ARRAY AUTHOR(J) AUTHOR1-AUTHOR5; FILE PRINT COLUMN:NCOL HEADER=HEADER LINE=NLINE; IF N EQ 1 THEN PUT PAGE j IF NLINE GT S5 THEN PUT _PAGE PUT-@-SO @; DO OVER AUTHOR; IF AUTHOR EQ ' IF J NE 1 THEN PUT '* ' @; ; , THEN GOTO SKIPA; PUT AUTHOR @; END; SRIPA: PUT @ 2 SOURCE; PUT @ 6 @; 00 OVER TITLE; IF TITLE EQ ' , THEN GO TO SKIPT; PUT TITLE @; END; SKIPT: PUT; PUT; RETURN; HEADER: PUT @ 60 'SOURCE INDEX'; PUT; PUT @ SO 'AUTHOR(S)' @2 'SOURCE'; PUT @ 6 'TITLE'; PUT @ 2 130*--'; PUT; %MACRO INDXKEYW**************KEYWORD INDEX*****************************; DATA SETNOTK; INFILE NOTKEY LS=72; INPUT KEYWORD: $20. @@; PRoe SORT; BY KEYWORD; DATA SETPERM; SET SETDATA; ARRAY TITLE (LREY) TITLEI-TITLE2S; ARRAY KEY (I) KEYI-KEYI5; DO OVER TITLE; IF TITLE = ~ ; THEN GOTO ENDTITLE; KEYWORD m TITLE; OU1PUT; END; ENDTITLE. LKEYmO; IF KEYlm'NONE' THEN GOTD ENDKEY; DO OVER KEY; IF KEY:' , THEN GOTO ENDKEY; KEYWORD:KEY; OUTPUT; END; RETURN; ENDKEY: DELETE; RETURN; PROG SORT; BY KEYWORD; DATA INDEX; MERGE SETPERM SETNOTK(IN=NOTKEY); BY KEYWORD; IF NOTKEY=l THEN DELETE; DATA NULL j SET INDEX; ARRAY TITLE (I) TITLEI-TITLE25; ARRAY KEY (I) KEYI-KEYI5; ARRAY AUTHOR (J) AUTHORI-AUTHORS; FILE PRINT COLUMM=NCOL HEADER=HEADER LINE=NLINEj IF N EQ 1 THEN PUT PAGE; IF NLINE GT S5 THEN PUT PAGE_i KEYWD~O; IF LKEY EQ O~THEN~DO; LKEY=l; KEYWD=l; END; PUT @50 @; DO I=LKEY TO TLAST; IF LENGTH(TITLE) + NCOL GT 130 THEN PUT @2 @; PUT TITLE: $20. @; END; IF LKEY GT 1 THEN DO; PUT @50 @; DO 1=(LKEY-l) TO 1 BY -1; LOC=NCOL-LENGTH(TITLE)-I; IF LOC LT 2 THEN LOC=130-LENGTH(TITLE)-I; PUT @LOC TITLE :$20. @; PUT @LOC @; END; END; PUT; KEYSIZE=LENGTH(KEYWORD); IF KEYWD=O THEN DO; PUT @50 @; DO NK=l TO KEYSIZE; PUT @; END; PUT @70 ELSE PUT @50 KEYWORD @70 @; DO OVER AUTHOR; IF LENGTH(AUTHOR) + NeOL GT 130 THEN PUT @2 @; IF AUTHOR EQ ~ ~ THEN GO TO SKIPOUT; PUT ~* ~ AUTHOR @j @j END; END; SKIPOUT: PUT; IF KEYWD=O THEN PUT @100 SOURCE; ELSE DO; PUT @50 @; DO NK=l TO KEYSIZE; PUT '-' @j END; PUT @ 100 SOURCE; END; RETURN; HEADER: PUT @59 'KEYWORD INDEX'; PUT; PUT @50 'KEYWORD' @70 ' . AUTHOR(S)' @lOO 'SOURCE-; PUT @2 130*'-~; RETURN; *****************************END OF CODE******************************;% 612 SAS INSTITUTE PANEL DISCUSSION