Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
SAS DATABASE SYSTEMS FOR RESEARCH AND DEVELOP"ENT~ ANALYStS. AND DATA COLLECTION Lee H. Schwartz 20HAR84. Diamond Shamrock Chemicals Company ABSTRACT - Database systems~tilizing mini or personal computers as SAS1U program generators are powerful and cost-effectIve data collection and analysis tools. These database systeas aeet the needs of a Researcn and Development environaent. The capabilities of the database systems include data collection. maintenance. analysis. and reporting. All systsas can be supported by a general analysis package ghich features subsetting and the coaaon SAS PAOCs. Database operation eliainates the need for users to knog the SAS language and editing techniques. bl Reporting Reporting. usually in tabular fora. is needed to prInt input data and reproduce existing data. nany ti88s a varietY,of reports is produced to satIsfy the various Inforaational needs of ditferent pereonnel. cJ Analysis TABLE Of CONTENTS Even though reporting displays the inforaation aany tiae8 the ~eport does not give InSIght Into the system being studied. Therefore. in addition to reporting. analysis is a key step in the understanding or developaent of a process or product. ~ol~ecte~. 1. INTRODUCTION 1.1 Purpose of this paper 1.2 Research and Development data requirements 1.3 System description 1.4 System bsnefits dl Ease of use and cost effectiveness 2. DATABASE CONCEPTS AND DESCRIPTION 2.1 Database types 2.2 JCL support files 2.3 Database prograa types 2.4 Database maIntenance programs 2.4.1 Data input programs 2.4.2 Data correction prograas 2.4.3 Data recalculation prograas 2.4.4 Data deletion programs 2.4.5 Notes on database aaintenance 2.5 Analysis prograas 2.5.1 Standard analyses 2.5.2 Non-standard analyses 2.5.3 Com~lex non-standard analysee 2.6 Data prIntout prograas 2.7 Utility programs A8 yith any coaputer project the systea aust be easy enough for the users to understand and operate Ylth confidence. And. of course. the costs of using the co.puter systea aust not exceed the benefits obtained. ttl Securi ty Since the inforaation being collected is usually proprietary some form of security may be required. 1.3 Syetea description Interaction Yith the database. by the user. is achievsd by running Databus tID prograas yritten to accomplish specific database operations. These Databus programs accept user input to produce files containing SA5 prograas. CODaunication is then established Yith the mainframe coaputer Yhich runs SAS. and the flies containing the SAS programs are eubmitted in batch. 3. SUnl'lARY Section 1. INTRODUCTION 1.1 Purpose of this paper The SAS language can be a very important data collection and analysis tool in a research and development environment. Database systeae utilizing SAS offer versatility. ease of use. and cost effectiveness. The purpose of this paper is to describe SAS based data collection and analysis systems that have been developed at the Diamond Shamrock Cheaieals Company over the paet feg years. Typically there is one Databus program for each ~pecific database operation. Each prograa requires Input of only the information that is variable. All the constant information is built into the DatabuB prograa and is unseen by the user. Besides the specific support programs Yritten for each database systea. a generalIzed analYSIS softuare package, has been developed. This package. called DaUa Analysis Softgare. gives simple analysis capabilitIes to personnel Yho do not knoy SAS. prograa editing. Instead users can concentrate upon the analYSIS tbey gant to perform. Data Analysis Software covers JCL. locatIng the database. eubsetting. grouping. and various SAS PROCs. The database systeas described in this paper utilize remote mini-coaputers for preparation of batch SAS prograas. These SAS programs are then subaitted over the telephone lines to the mainfraae computer ghere they are executed. This paper covers the benefits of ueing aini-computers as front-end SAS generators for data collection and analYSiS. including PLOT. GL". REG. STEPVISE. CHART. PRINT. RSQUARE. nEANS. and SAS/GRAPH GPLOT. 1.2 Research and Developaent data requireaents All the softwre described can be operated by technicians or secretaries. Data processing personnel are only needed to establish the database systea. In many R&D environments data collection and analysis is the only tool for understanding and improving a product of process. Unfortunately. aany products or processes cannot be eaSIly analyzed. Thie may be due to the complexity of the 919tea or magnitude of data to be analyzed. With the advent of computers. analyses uhich uere formerly not juetifled due to the tiae and cost involved. are noy poseible and econoaical. Note: DatabuB is a Datapoint coaputer language. A different type of mini-coaputer (or personal coaputer) ana alternate language could be used in place of Datapoint equipaent and the Databus language. The.typical data handling requirements of an R'D project are: ® ® aJ Data collection and aaintenance Some fora of input is required to enter information into the computer. This aay take the form of aanual input or automated input. For correction purposes the capability to modify eXIsting data muat also be provided. 352 SAS is a tradeaark of SAS Institute Inc. CATABUS and DATAPOINT are regietered trademarks of the Datapoint Corporation. 1.4 System benefits Exa.ple~ In a bouling league database the boulers name is the key variable. The bowler's sex. birthdate. and team name are 10 variables. The bowler's ueekly scores and the date they were bouled are operating data variables. The following benefits have been observed from using this database system structure; a) Simpl iei ty The database prograDs require only the Information whicli is varIable. The users do not 10 Data Dataset need to see or understand the SAS programs and JeL which they have ~ritte~. ~rogram operations are: pick froD the lIst. fIll In the blank! NAnE ~nd TEAnNAnE yes or no questions. Default~ allow ~ep~tltlon of previous values or generatIon of mIssIng SEX BIRTHDAY values. bl !ntegti ty ~f size. Any user can prepare SAS programs wIth the knowledge and confidence that they uill always run the first time. GAlfE3 The Job Control Language (JCLl cards required for a database system generally remain constant. . "odifications to the JCL cards are only reqUIred ghen the administrators of the mainframe computer make changee in the JCL etructure and ghen the databaee is moved to a different SAS dataset. The databases are just tables of data located at the mainframe computer. All the characteristics and relationships of the database system are resident in the Databus programs on the Datapoint equipment. ThiS meana the str~cture and operation of the database can be eaSily changed by altering the Databus programs. Neu and existing functIons. Input variables. and dependent variables can be added or changed at uill. This structure permits the evolution of the database into a form most suited to the application. ~inimized GAnE2 2.2 JCL support tiles c) Veraah I ity dl NAnE DATE GAlfEl This structure is used to eliminate redundant storage of constant data such as TEAlfNAlfE. SEX. and BIRTHDAY. This bowling database will be used as a simple example throughout this paper. A quality database program.will always write a syntactically correct SAS Job. regardless Operating Data Dataset To simplify conversion when JCL changes occur. all database programs call upon t"O available Jct support files. These programs access only one of the J~L files. either the read-urite or the read-only file. Only programs uhich write to the database access the read-urite JCL. Thie is a safety measure uhleh prevents any possible loss of data during generation of reports or analyses. cost To implement a JCL change the tvo JeL support files are converted through eal t to the neg format.. All SAS programs generated after the conversion ulil contain the updated JeL cards. There are no timesharing coate for running a Datapoint computer for the required hours of data Input. Often batcll processing costs far less than timesharing. A high speed .odem allous lou cost database o~eration thousands of miles from the maInframe SIte. Host importantly. a SAS job can be generated in less than 101 of the time it uould take to urite by hand. There sre no debugging coats since the SAS programs run correctly the first time. 2.3 Database progral types The various database programs can be broken dogn into four types; a) Database maintenance programs. Section 2.4 el Data proceasing ataff relief Database maintenance.programs perform input. correction. calculation of dependent variables. and deletion of data. lfany tImes reports froE the printout programs are Included in.the database Input programs. Database maintenance progra.s are the only programs uhich can urite to the database datasets. nany t1mes the data processing staff becomes overloaded. Programs uhlch urite SAS in effect transfer these data processing tasks to the users placing the requests. Rot only does this reduce a data processing staff's workload •. but it forces the user to asaign the proper priority to the task. b) Analysis programs. Section 2.5 Section 2. DATABASE CONCEPTS AND DESCRIPTION Anslysis programs per forD some type of analYSis on the database or database subset and produce a report. A systeD may have any number 01 analysis programs. Whenever the need develope. additional analysis programs may be added to the database systel. AnalYSIS programs are read-only. they cannot write to the database daiasets. 2.1 Database types There are two types of database systems uhich are co ••only used. These database systems operate similarly. differing only in the amount of disk storage reqUIred. alOne dataset database c) Data printout programs. Section 2.6 A one dataset database is the simplest type of database. It consists of observations containing a vslue for each variable. Usually s variable IS selected as the key variable. Data printout programs produce a formatted report of the database or database subset. A printout program may be classified as a SImple analysis program. A system can have any number of printout programs. which present the database InformatIon In aifferent forms. Printout programs are read-only. they cannot urite to the database dataaete. b) Tuo dataset databaaa A two dataeet database consists of ~atched 'ID data' and 'Operating data' datasets. These tuo datasets are merged. or associated. using a key variable. The 10 dat~ dataset contains the key varIable and varIables which are aluays constant for any given key variable. The operating data dataset contains the key variable and varlables uhich are not necessarily constant for the key variable. d) Utility programs. Section 2.7 Utility programs are uritten for a database system but are not lnvolved 9it~ database maintenance. analYSIS. or prIntIng. "enu and com.unications programs are examples of utility programs. 353 2.4 Database maintenance programs DATA UICOnlNG: INPUT #1 NAnE$ 1-20 nONTH 25-26 Database maintenance programs are the only programs ghicb can grlte to the database. database maintenance program types The folloglng haVB developed. DAY been YEAR 12 GAnEI 5-10 GAnE2 15-20 GAHE3 25-30; HI. DATE=nDYlnONTH.DAY.YEAR}; a) Data input programs. Section 2.4.1 Data input programs accept screens of data input by the user. A SAS Jab IS wri tten which. when SERIES=GAnEl+GAnE2+GAnE3; DAYAVG=SERIES/3; Bent to the mainframe. creates an Input dataset. calculates dependent variables. and appends the input dataset to the database dataset. HI; CARDS; Joe Smith b) Data correction programs. Section 2.4.2 database lnformation by specifYing the observation containing the error. the variable 08/08/83 ISS 172 145 150 John Brown Data correctlon programs allow modification of to be corrected. and the new value for that Steve' Johnson 190 c) Data recalculation programB. Section 2.4.3 156 01/22183 176 182 HI; Data recalculation programs reg~nera~e dependent variableB for selected observations In the database. Recalculation programB are uBed to correct dependent variableB after an input variable bas been corrected. PROC SORT; BY NAnE DATA; HI; DATA BOTH; SET HDAT.BO~LING INCOnIHG; BY NAnE DATE. d) Data deletion programs. Section 2.4.4 allo~ 161 08/08/83 more data variable. Data deletion programs database observations. 28-29 31-32 HI; DATA BOWLING; purging of unwanted SET BOTH; BY NAnE DATE; 1**; DATA NULL; SEr INCOftING:BY NAME DATE; 2.4.1 Data input programs FILE PRINT Data input programs allou the repeated addition of observations to a database. There are tuo typeB of data input programs. manual and automatic. Manual input programB,require personnel to enter data by hand. Automat1c Input programs transform data collected by instruments Into SAS programB Without re~u1rlng_manual Input. Both types of input programs urlte similar SAS programs. 2.4.2 Data correction programs Data correction programB allou modification of existing databaBe information., ThiS may be neceBBary due to operator Input error. discovery that data ~as originally in error. security. or for various other reasonB. A typical manual data input program.start~ by . requesting tbe naDe of the output tile which Will contain the SAS program. The UBer then begins tilling 1n the Input Bcreen. UBually an observation can be Input ~Ith one Input screen. If there are too many var1abieB to fit in one screen. multiple Bcreens are used to input one observation. After the Bcreen has been entered. the user IS alloued to make as many changes as required to correct input errors. After approval of the data on the screen. the observatlon is written to the SAS program belng generated. and the data input program cycles back to a fresh input screen. Input continues untll all observallons have been entered. After input is complete the report optlons are selected (ex: number of copleB). The SAS program lS then ready to be sent to the mainframe for execution. ThlB SAS program IS Bubmltted only once and is deBtroyed. Blnce resubmlsslon uould duplicate observatlons ~bich have already been added. A tYPical data correctIon program startB by requesting the name of the output file yhich will contain the SAS program. The user then selects_a specific database observation Yhich will be modified. In a bowling database the bo~ler's name and the date howled ~ould be entered. After specifying the observation the user BelectB the variable to be modified. A neu value for the variable IS then entered. The user can then modify another variable in the current observation. specifY,another observation to be modifIed. or terminate correction. Whenever an observation is Bpecified a FIND statement is written to the SAS program. Each.tiee.a variable and value is enterBl a REP stalDent IS written. A VERIFY RESET statement is written,after completing the modification of each observation. This prevents SAS from,skipping the remaining modifications. if an observation could not be found. The SAS,program is submitted only once and is destroyed. since reBubmission IB redundant. Since a significant amount of time is spent running data input programs. it is important to make the input screens efficient for the operators. Defaults to miSSing values or to the last value entersd can often reduce input tiDe by 10-50%. Range cbecking prevents lnput of bad data and the subsequent time spent on correction. The follouing page details the structure of a typical SAS progral generated by a data correction prograa. PROC EDITOR 18 used but a DATA step with IF BtatementB uould also perfon the same ope~atil?n. In order to maintain the database Bort •• odlflcatlon of the observation speCification variables (ex; bowler's name and date bouled) IS not permitted-using the PROe EDITOR program style. The following example detailB the structure of a typical SAS program generated by a data Input program. Only the data cards in the program vary. All other cards are constant and ~ritten by the data Input program. The data Input program writeB the data to the SAS program in a format ~hich matches the BAS INPUi statement. I/job card II EXEC SAS IIRDAT DD DSN=datsBet name.DIBP=SHR IIWDAT DO DSN=dataset name.DISP=OLD I/SYSIN DD I In: Iljob card II EXEC SAS IIHDAT DD DSN=dataaet name.DIS?=SHR IIWDAT DO DSN=dataset name.DIS?=OLD IISYSIN DO * TITLEI INSTITUTION NAnE; TITLE2 BOWLING DATABASE; I ... TITLEI INSTITUTION NAnE. TITLE2 BOWLING DATABASE; **1: 354 EDITOR RUN: ~ROC DATA~VDAT.BOVLINGI A typical data deletion program starts by requesting the name of the output flle yhich ~ill contaIn the SAS program. The user then specifies observatione to be deleted. In a bouling database the bouler's name and date bou~ed uould be entered. The user specifies all obeervatlons to be deleted. The SAS program ie sub.itted only once and is destroyed. sin~e resubmission IS redundant. f.f •• ; YER I.LAST FI~D ~A"E~'Jee S.ith' DATE=8589' REP GAnEl=l66, REP GAnE2-167; VERIFY RESET; fff.f.: The follOWing example details the structure of a typical SAS program generated by a data deletion program. Only the names betueen slngle quotes and the date are entered. The cards remain constant except for the number of OR cards in the IF statement. Data deletion programs are identical to data recalculatIon prograas except DELETE replaces the equations. FIND VER I.LAST NA~E='John Broun' DATE=8589, REP GAnEl=lSS' VERIFY RESET; fft'f': Iljob card II EXEC SAS IIROAT DO DSN=dataeet name.DISP=SHR IIWDAT DO DSN=dataset name.DISP=OLD IISYSIH DO f " 2••• 3 Data recal~ulation programs Data recalculation programs regenerate dependent variables fer selected observalions in the database. If an input variable is corrected and dependent values gere calculated from It. data recalculation is necessary. Data recalculation is not needed when the corrected input variable IS not used in dependent variable calculations. TITLEl INSTITUTION MAnE, TITLE2 BOWLING DATABASE; f**; DATA GOT: SET ROAT.BOWLING,BT MAnE DATE, 1**: IF NAnE='Joe Smith' AKD DATE=85S9 OH NAnE='John Broyn' AND DATE=8589 A typical data recalculation program starts by requesting the name of the output file which will contain the SAS program. The user then specifies observations which need to be recalculatea. In a bowling database the boyler's naDe and data bowled would be,entered. Tbe user specifies as many observatIons as needed. The SAS program. is submitted only once and is destroyed. sInce resubnlSsion is redundant. THEN OELETE, ***: DATA WDAT.BOWLING: SET GOT,Br MAnE DATE, " 2.4.5 Hotes on database maintenance The database system described in sections 2.4 is versatile. but is not the only type of structure ~hIch Day be used. A system can be created yhlCh has correction and recalculation combined with data input. When a new observation is entered through data lnput. it IS added to the database. If the observation already exists. then the entered observation replaces the existing ob~ervation. Depending on the requirements of database system. a structure can be selected whIch best meets the needs of the user. The data recalculation program utilizes the same equations as the de~endent variable calculation sectlon of the dta Input SAS program (Section 2.4.1). The following example details the structure of a typical SAS progrm generated by a data recalculation program. Only the names between single quotes and the date are entered. The cards remain constant except for the number of OR cards in the IF statement. Iljob card II If a new input variable is to be added to the database the follo~ing steps are performed: EXEC SAS IIRDAT DO DSN=dataset name.OISP=SHH IliDAT 00 DSN=dataset name.DISP=OLD IISYSIN DO I a) A neu field is added to the input screen. If I: TITLEI INSTITUTION NAnE: TITLE2 BOWLING DATABASE: bl The SAS Input ~tatement is modified to include the ney Input variable. DATA GOT, SET RDAT.BOVLINGiBY NAnE DATE; ff.: c) The data input program's yrite statement is modified to also urits the neu input variable. 1ft; d) Any report uhich will lnclude the neu input variable is ~odlfled IF NAHE='Joe Smith' AHD DATE=8589 OR NAftE='John Bro~n' AND OATE=8589 THEN DO; SAS.automatically creates m1ssing values. for the neu varIable. for every observatIon Input before addItion of the neu variable. OATE=nDY InONTH. DAY. YEAR) , SERIES=GAnEl+GAnE2+GAnE3, DAYAYG=SERIES/3: END; If a ney d~pendent var~able needs to be added to the database. lts formula IS added to the equation sections of the SAS.programs generated by the data input program (Sectlon 2.4.1) and the data recalculation program (Sectlon 2.4.3). fft; DATA WDAl.BOWLING: SEi GOT:BY NAnE DATE: I. 2.4.4 Data deletion programs 2.5 Analysis Programs Database deletion programs allo~ purglng ot unyanted database observatione. Data deletIon combined ~lth data input. data correction. and data recalculation gives the user the ability to modify the database to any desired form. Analysis programs perform some type of analysis upon the database or databaee subset and produce a report. A eystem may have any number of.analysis programs. Whenever the need develops. additIonal analysis progra~s may be added to the database system. AnalYSIS programs are read-only. and cannot grite to the database dataaete. 355 *.1; IF NANE='Joe S.ith' OR NANE:'John Broyn' Analysis programs fall into three main categories: a) Standard analyses. Section 2.5.1 THEN OUTPUT; A standard analysis 18 any analysis which is performed more than once. "I; 1.1 PRINTOUT INDIVIDUAL'S RECORD: This is acco.pllshed by running an analY818 program that generates a SAS program. ,WhiCh performs the analYSIS. III: PROe PRINT DATA=GOTiBY NAHE: ... Usually the Input to the analYSIS program consists of prOViding the specifications VAR OATE SERIES GAnE! GAnE2 GAnEJ OAYAVG; necessary to ohtaln {he deSired database subset. : 1.1 b) Non-standard analyses. Section 2.5.2 HEANS. HAXIHUns. AND HJNINU~S; "Ill; PROC nEANS ftAXDEC=O DATA=GOT:BY NAnE: VAR SERIES GAnE! GAnE2 GAnE3, A non-standard analysi's is any analysis performed only once. Since 1t is a one-time SAS program. ,writing a special analysis program is not justified. Instead. a generalized analysis package haa been written whIch covers database I •• ; 1.1 PLOT' & CORRELATION OF SERIES VS DATE; "Ii PROe REG DATA=GOT:BY NAnE; nODEL SERIES=DATE; subsetting and the common SAS PROCs. OUTPUT OUT=SERIES P=PREDVALU: PRCe PLOT UNIFORn:BY NAnE; PLOT SERIES*DATE PREOVALU_OATE=' t ' IOVERLAY; c) Complex non-standard analyses. Section 2.5.3 If a non-standard analysis is not simple enough to be accomplished using the general1zed analYSIS package. then the user must resort to ~rltlng a SAS program br hand. Therefore. only users that kno~ SAS wil be able to perform complex analyses. .. I; /. Notes: S&ries IS the total of the three games bouled on a given day. OAYAVG IS the dally average. 2.5.1 Standard analyses A standard analysis is generally used to summarize Information in the database. Standard analyses can be used to evaluate logIcal subsets of the database. or to reduce large amounts of data to a reasonable and understandable level. Since the analysis viii be performed repeatedly. development of a specialIzed program IS JustIfIed. 2.5.2 Hon-standard analyses Often unforeseen anaIy·ses re required. In research and development the varIOUS relationships between variables are Investigated to sa In an understanding and control of the system. ThIS developmental. IntUItIve. and trlal-and~rror Investlsatlve process creates a need for one-time SAS analysts programs. These,SAS analysis programs prove or d1sprove. and quantIfy. proposed hypothesss. To develop a standard analysiS program a SAS program IS written by hand and debugged. A deCision IS made about what part of the SAS program WIll become varIable. For instance. the Bubsettlng section. expanded or regular. report types. or optional analyses may.be varIable. The debugged SAS program IS Inserted Into the analYSIS program and an Input S&ctl0n is added UhlCh wrItes the variabl& part of the SAS program. The demand for one-time SAS analysis programs can be satisfied by eIther the originator or by data processing personnel .. When the data processing etaff writes the SAS analYSIS programs. the nu.ber of rQquests tend to Increase bQyond the staff's capabilities. ThiS can create a SItuation ~here the responsibililty of writing SAS analysis programs 1s shifted to the originator (user). Un£o~lunattely. many tImes users lack the ability or t1me to learn SAS. A houling database individual summary program might include: a) chronological table of all scores to date (data used in analyses) To give users the abilIty to generate SAS analYSis programs. wlthout knOWIng SAS or editing. and extenslve analYSIS software package has been gritten. This soft~are utilizes varIable 11stS. one per database. UhlCh contain a table of all the variables and variable characteristics. The user references variables by selecting them from the variable list. The varIable heading. SAS varIable name. type (character or numerIC). and other variable information is loaded and used by the analysis software. The key database Information is also stored In the var1ahle lIst. ThIS Includes the dataset name(s} and location. the database title. and JCL inforllation. b) means of the series. first. second. and third games (overall an increase or decrease after the first gamel cl maxima and mInima of the series. first. second. and third games (find the best and the worst games and serIes) dl plot with linear correlation of the series versus date (indicatIng an upward or downvard trend) The individual suumary program starts by requesting the name of the output file VhlCh will contaIn the SAS program. The user then speCIfies the name,of the boYlers to be analyzed. These naDes are used In the subsetting IF statement. The SAS program. is submitted to the computer and the output IS received. If this SAS program IS not destroyed It can be resubmitted. after more data has been lnput. to obtaIn an updated analysis. A typical run of the analysis software starts by entering the name of tbe database system to be acceasea~ The program then requests the naue of the output.flle which will contain the SAS progra•. The analYSIS software wrItes the JeL and DATA statement. A subsetting process. consisting of selecting variables and conditIons generates the subsetting IF statement. An analysis selectIon menu appears allowing access to varlous SAS PROes. An~ number of analyses may be selected and specified. When, the SAS program is complete It is submItted to the mainframe and the output is returned. ThIS SAS program may stl11 be of value atter more data has been Input. The following page shows the structure of a simple boYllng analYSIS SAS program. /I Job card I I EXEC SAS IIRDAT DO DSM=dataset name.DISP=SHR I/SY5IN OD " 1**; The analysis softuare package currently supports the SAS PROes; PLOT. PLOT linked with GLH. REG. STEPWISE. CHART. PRINT. RSQUARE. HEANS. and SAS/GRAPH GPlOT. Additional PROCs are added when justifIed. TITLE! INSTITUTION NAnE; TITLE2 BOYLING DA1ABASE: *u; DATA GOT: SET RDAT.BOWLING:BY NAnE DATE: This short section cannot fully describe the capabliities. ,versatillty. ease of use. and operation of the analYSIS software package. .1S6 2.5.3 Coaplex non-standard analYSes J. SUftnARY Often the analysis software package described 1n section 2.5.2. non-standard analyses. will not perform the analysis of interest. For exaaple. analysis may be too coaplex. the PROe may not currently be supported by the analysis package. or new dependent var1ables .ust be ca culated. Tbis paper describes a technique for interfacing with SAS in the batch mode. Due to the versatility of SAS. this is only one of the many Interface possibilIties. the The systea descrIbed not only satisfies research and development data handlIng needs. but dany general data collection and analysla applications as well. Kany systems may utilize this database structure, as long as there are no requlreaents for instantaneous data retrieval. The syetem's response tiae can only be as fast as the job turnaround tide at the aainframe computer. nost database systems ~hich require analysiS do not need instantaneous data retrieval. and can therefore profit from this type of database operation. There are two vays SAS analysis program can be written. Both methods require knowledge of the SAS language: a) The user produces the fraaework of the SAS analysis prograa by running the analysis package (Section 2.5.2>, This SAS program is modified through edit to accomplish the desired analyses, Since a majority of the SAS program is already written. program development time can be reduced significantly. Lee~r:l::p- b) The analysis is complex enough to require writing the SAS analysis program froa scratch. This aethod allows any analysis if the tiae and expense can be justified. 2.6 Data printout Diamo~~k 7528 Auhurn Road Painesville. Ohio (216)-357-3918 Chemlcals Company 44077 pro~ Data printout progra.e produce a for~atted report of the database subset. A printout progra. can be claSSified as a simple analysis prograa. Printout prograas are read-only. they cannot urite to the database datasets. A database system .ay have any number of printout programs which present the database information in aifferent for.s. In a bowling databaae there .ight be a weekly report which frints out all gaaes bowled 1n a given week. a bowler s report uhich prints the history for a given bowler. ana a tea. report which prints team records. The development of a printout program is si.ilar to development of an analysis,prograa (Section 2.5.1>. A FILE PRINT routine is wrItten and debugged which produced the desired formatted report. This report 1S inserted into the printout prograa as write stateuente. Inputs are wri Uen for the variabla section of the program. In the above exaaples the ~eek would be infut for weekly reports. the bowler's naaes for bouler s reports. and team naaes for tea. reports. Nany times one of the FILE PRINTs develo~ed for report programs is included in the data lnput progra.. This produces a standard report lor all 1nput data. In a bowling database the ~eeily report ~ould be included in the bouling data input program. 2 W", 0::: La z:> ~w ~r « The structure of a SAS prograa produced bY,a printout program is identical to the progra. shown 1n Section 2.5.1 except the PRDes are replaced by a FILE PRINT. ~::> -0 ~u <-') « 0 :> 2.7 Utility prograas w 2 w Utility programs do not write SAS programs hut do provide database services. Soae utillty progra.s siapllfy database operation while others add capabllibities. ExaapleB of utility programs are: ~ (f) a W >- The database system main aenu allows easy access to all database system prograds. Upon coapletton of a database system progra. the main aenu is redisplayed. "ain .enu prograas eliajnate the need to reaember program naaes. ~- 7: W « :> ill ~ (§ Coa.unications prograds aid in Bubmitting SAS prograas to the .alnfraae coaputer. c) Data stripping prograDs ~ w >ii'~ "w 0>uv> ,> '> 357 u ~ zV> Data stripping ptogra.s extract data froa spooled SAS output, Stripping prograas allow retrieval of data fro. the database. This data .ay then be aanipulated as desired (ex: local plotters). " ifw wZ ~ (f) bl Co••unication prograas 'l' 0 w~ (f) a) Rain menu prograDs 0 0 v> :> « ~~ ~~:5 «<<r v>w~ -'" 3", ~« "'w'" ">-" ««< <ra<r ",w"' ozO g:~2: