Download SAS Data Base Systems for Research and Development, Analysis and Data Collection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

IMDb wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
SAS DATABASE SYSTEMS FOR RESEARCH AND DEVELOP"ENT~ ANALYStS. AND DATA COLLECTION
Lee H. Schwartz 20HAR84. Diamond Shamrock Chemicals Company
ABSTRACT - Database systems~tilizing mini or
personal computers as SAS1U program generators are
powerful and cost-effectIve data collection and
analysis tools. These database systeas aeet the
needs of a Researcn and Development environaent.
The capabilities of the database systems include
data collection. maintenance. analysis. and
reporting. All systsas can be supported by a
general analysis package ghich features subsetting
and the coaaon SAS PAOCs. Database operation
eliainates the need for users to knog the SAS
language and editing techniques.
bl Reporting
Reporting. usually in tabular fora. is needed to
prInt input data and reproduce existing data.
nany ti88s a varietY,of reports is produced to
satIsfy the various Inforaational needs of
ditferent pereonnel.
cJ Analysis
TABLE Of CONTENTS
Even though reporting displays the inforaation
aany tiae8 the ~eport does not give
InSIght Into the system being studied.
Therefore. in addition to reporting. analysis is
a key step in the understanding or developaent
of a process or product.
~ol~ecte~.
1. INTRODUCTION
1.1 Purpose of this paper
1.2 Research and Development data requirements
1.3 System description
1.4 System bsnefits
dl Ease of use and cost effectiveness
2. DATABASE CONCEPTS AND DESCRIPTION
2.1 Database types
2.2 JCL support files
2.3 Database prograa types
2.4 Database maIntenance programs
2.4.1 Data input programs
2.4.2 Data correction prograas
2.4.3 Data recalculation prograas
2.4.4 Data deletion programs
2.4.5 Notes on database aaintenance
2.5 Analysis prograas
2.5.1 Standard analyses
2.5.2 Non-standard analyses
2.5.3 Com~lex non-standard analysee
2.6 Data prIntout prograas
2.7 Utility programs
A8 yith any coaputer project the systea aust be
easy enough for the users to understand and
operate Ylth confidence. And. of course. the
costs of using the co.puter systea aust not
exceed the benefits obtained.
ttl Securi ty
Since the inforaation being collected is usually
proprietary some form of security may be
required.
1.3 Syetea description
Interaction Yith the database. by the user. is
achievsd by running Databus tID prograas yritten to
accomplish specific database operations. These
Databus programs accept user input to produce files
containing SA5 prograas. CODaunication is then
established Yith the mainframe coaputer Yhich runs
SAS. and the flies containing the SAS programs are
eubmitted in batch.
3. SUnl'lARY
Section 1. INTRODUCTION
1.1 Purpose of this paper
The SAS language can be a very important data
collection and analysis tool in a research and
development environment. Database systeae utilizing
SAS offer versatility. ease of use. and cost
effectiveness. The purpose of this paper is to
describe SAS based data collection and analysis
systems that have been developed at the Diamond
Shamrock Cheaieals Company over the paet feg years.
Typically there is one Databus program for each
~pecific database operation. Each prograa requires
Input of only the information that is variable. All
the constant information is built into the DatabuB
prograa and is unseen by the user.
Besides the specific support programs Yritten for
each database systea. a generalIzed analYSIS softuare
package, has been developed. This package. called
DaUa Analysis Softgare. gives simple analysis
capabilitIes to personnel Yho do not knoy SAS.
prograa editing. Instead users can concentrate upon
the analYSIS tbey gant to perform. Data Analysis
Software covers JCL. locatIng the database.
eubsetting. grouping. and various SAS PROCs.
The database systeas described in this paper utilize
remote mini-coaputers for preparation of batch SAS
prograas. These SAS programs are then subaitted over
the telephone lines to the mainfraae computer ghere
they are executed. This paper covers the benefits of
ueing aini-computers as front-end SAS generators for
data collection and analYSiS.
including PLOT. GL". REG. STEPVISE. CHART. PRINT.
RSQUARE. nEANS. and SAS/GRAPH GPLOT.
1.2 Research and Developaent data requireaents
All the softwre described can be operated by
technicians or secretaries. Data processing
personnel are only needed to establish the database
systea.
In many R&D environments data collection and analysis
is the only tool for understanding and improving a
product of process. Unfortunately. aany products or
processes cannot be eaSIly analyzed. Thie may be due
to the complexity of the 919tea or magnitude of data
to be analyzed. With the advent of computers.
analyses uhich uere formerly not juetifled due to the
tiae and cost involved. are noy poseible and
econoaical.
Note: DatabuB is a Datapoint coaputer language. A
different type of mini-coaputer (or personal
coaputer) ana alternate language could be used in
place of Datapoint equipaent and the Databus
language.
The.typical data handling requirements of an R'D
project are:
®
®
aJ Data collection and aaintenance
Some fora of input is required to enter
information into the computer. This aay take
the form of aanual input or automated input.
For correction purposes the capability to modify
eXIsting data muat also be provided.
352
SAS is a tradeaark of SAS Institute Inc.
CATABUS and DATAPOINT are regietered trademarks
of the Datapoint Corporation.
1.4 System benefits
Exa.ple~
In a bouling league database the
boulers name is the key variable. The bowler's
sex. birthdate. and team name are 10 variables.
The bowler's ueekly scores and the date they
were bouled are operating data variables.
The following benefits have been observed from using
this database system structure;
a) Simpl iei ty
The database prograDs require only the
Information whicli is varIable. The users do not
10 Data
Dataset
need to see or understand the SAS programs and
JeL which they have
~ritte~.
~rogram
operations
are: pick froD the lIst. fIll In the blank!
NAnE
~nd
TEAnNAnE
yes or no questions. Default~ allow ~ep~tltlon
of previous values or generatIon of mIssIng
SEX
BIRTHDAY
values.
bl
!ntegti ty
~f
size. Any user can prepare SAS programs wIth
the knowledge and confidence that they uill
always run the first time.
GAlfE3
The Job Control Language (JCLl cards required for a
database system generally remain constant. .
"odifications to the JCL cards are only reqUIred ghen
the administrators of the mainframe computer make
changee in the JCL etructure and ghen the databaee is
moved to a different SAS dataset.
The databases are just tables of data located at
the mainframe computer. All the characteristics
and relationships of the database system are
resident in the Databus programs on the
Datapoint equipment. ThiS meana the str~cture
and operation of the database can be eaSily
changed by altering the Databus programs. Neu
and existing functIons. Input variables. and
dependent variables can be added or changed at
uill. This structure permits the evolution of
the database into a form most suited to the
application.
~inimized
GAnE2
2.2 JCL support tiles
c) Veraah I ity
dl
NAnE
DATE
GAlfEl
This structure is used to eliminate redundant
storage of constant data such as TEAlfNAlfE. SEX.
and BIRTHDAY. This bowling database will be used
as a simple example throughout this paper.
A quality database program.will always write a
syntactically correct SAS Job. regardless
Operating
Data Dataset
To simplify conversion when JCL changes occur. all
database programs call upon t"O available Jct support
files. These programs access only one of the J~L
files. either the read-urite or the read-only file.
Only programs uhich write to the database access the
read-urite JCL. Thie is a safety measure uhleh
prevents any possible loss of data during generation
of reports or analyses.
cost
To implement a JCL change the tvo JeL support files
are converted through eal t to the neg format.. All
SAS programs generated after the conversion ulil
contain the updated JeL cards.
There are no timesharing coate for running a
Datapoint computer for the required hours of
data Input. Often batcll processing costs far
less than timesharing. A high speed .odem
allous lou cost database o~eration thousands of
miles from the maInframe SIte. Host
importantly. a SAS job can be generated in less
than 101 of the time it uould take to urite by
hand. There sre no debugging coats since the
SAS programs run correctly the first time.
2.3 Database progral types
The various database programs can be broken dogn into
four types;
a) Database maintenance programs. Section 2.4
el Data proceasing ataff relief
Database maintenance.programs perform input.
correction. calculation of dependent variables.
and deletion of data. lfany tImes reports froE
the printout programs are Included in.the
database Input programs. Database maintenance
progra.s are the only programs uhich can urite
to the database datasets.
nany t1mes the data processing staff becomes
overloaded. Programs uhlch urite SAS in effect
transfer these data processing tasks to the
users placing the requests. Rot only does this
reduce a data processing staff's workload •. but
it forces the user to asaign the proper priority
to the task.
b) Analysis programs. Section 2.5
Section 2. DATABASE CONCEPTS AND DESCRIPTION
Anslysis programs per forD some type of analYSis
on the database or database subset and produce a
report. A systeD may have any number 01
analysis programs. Whenever the need develope.
additional analysis programs may be added to the
database systel. AnalYSIS programs are
read-only. they cannot write to the database
daiasets.
2.1 Database types
There are two types of database systems uhich are
co ••only used. These database systems operate
similarly. differing only in the amount of disk
storage reqUIred.
alOne dataset database
c) Data printout programs. Section 2.6
A one dataset database is the simplest type of
database. It consists of observations
containing a vslue for each variable. Usually s
variable IS selected as the key variable.
Data printout programs produce a formatted
report of the database or database subset. A
printout program may be classified as a SImple
analysis program. A system can have any number
of printout programs. which present the database
InformatIon In aifferent forms. Printout
programs are read-only. they cannot urite to the
database dataaete.
b) Tuo dataset databaaa
A two dataeet database consists of ~atched 'ID
data' and 'Operating data' datasets.
These tuo datasets are merged. or associated.
using a key variable. The 10 dat~ dataset
contains the key varIable and varIables which
are aluays constant for any given key variable.
The operating data dataset contains the key
variable and varlables uhich are not necessarily
constant for the key variable.
d) Utility programs. Section 2.7
Utility programs are uritten for a database
system but are not lnvolved 9it~ database
maintenance. analYSIS. or prIntIng. "enu and
com.unications programs are examples of utility
programs.
353
2.4 Database maintenance programs
DATA UICOnlNG:
INPUT #1 NAnE$
1-20
nONTH 25-26
Database maintenance programs are the only programs
ghicb can grlte to the database.
database maintenance program types
The folloglng
haVB
developed.
DAY
been
YEAR
12 GAnEI 5-10
GAnE2 15-20
GAHE3 25-30;
HI.
DATE=nDYlnONTH.DAY.YEAR};
a) Data input programs. Section 2.4.1
Data input programs accept screens of data input
by the user.
A SAS Jab
IS
wri tten which. when
SERIES=GAnEl+GAnE2+GAnE3;
DAYAVG=SERIES/3;
Bent to the mainframe. creates an Input dataset.
calculates dependent variables. and appends the
input dataset to the database dataset.
HI;
CARDS;
Joe Smith
b) Data correction programs. Section 2.4.2
database lnformation by specifYing the
observation containing the error. the variable
08/08/83
ISS
172
145
150
John Brown
Data correctlon programs allow modification of
to be corrected. and the new value for that
Steve' Johnson
190
c) Data recalculation programB. Section 2.4.3
156
01/22183
176
182
HI;
Data recalculation programs reg~nera~e dependent
variableB for selected observations In the
database. Recalculation programB are uBed to
correct dependent variableB after an input
variable bas been corrected.
PROC SORT;
BY NAnE DATA;
HI;
DATA BOTH;
SET HDAT.BO~LING INCOnIHG;
BY NAnE DATE.
d) Data deletion programs. Section 2.4.4
allo~
161
08/08/83
more data
variable.
Data deletion programs
database observations.
28-29
31-32
HI;
DATA BOWLING;
purging of unwanted
SET BOTH;
BY NAnE DATE;
1**;
DATA NULL;
SEr INCOftING:BY NAME DATE;
2.4.1 Data input programs
FILE PRINT
Data input programs allou the repeated addition of
observations to a database. There are tuo typeB of
data input programs. manual and automatic. Manual
input programB,require personnel to enter data by
hand. Automat1c Input programs transform data
collected by instruments Into SAS programB Without
re~u1rlng_manual Input.
Both types of input programs
urlte similar SAS programs.
2.4.2 Data correction programs
Data correction programB allou modification of
existing databaBe information., ThiS may be neceBBary
due to operator Input error. discovery that data ~as
originally in error. security. or for various other
reasonB.
A typical manual data input program.start~ by .
requesting tbe naDe of the output tile which Will
contain the SAS program. The UBer then begins
tilling 1n the Input Bcreen. UBually an observation
can be Input ~Ith one Input screen. If there are too
many var1abieB to fit in one screen. multiple Bcreens
are used to input one observation. After the Bcreen
has been entered. the user IS alloued to make as many
changes as required to correct input errors. After
approval of the data on the screen. the observatlon
is written to the SAS program belng generated. and
the data input program cycles back to a fresh input
screen. Input continues untll all observallons have
been entered. After input is complete the report
optlons are selected (ex: number of copleB). The SAS
program lS then ready to be sent to the mainframe for
execution. ThlB SAS program IS Bubmltted only once
and is deBtroyed. Blnce resubmlsslon uould duplicate
observatlons ~bich have already been added.
A tYPical data correctIon program startB by
requesting the name of the output file yhich will
contain the SAS program. The user then selects_a
specific database observation Yhich will be modified.
In a bowling database the bo~ler's name and the date
howled ~ould be entered. After specifying the
observation the user BelectB the variable to be
modified. A neu value for the variable IS then
entered. The user can then modify another variable
in the current observation. specifY,another
observation to be modifIed. or terminate correction.
Whenever an observation is Bpecified a FIND statement
is written to the SAS program. Each.tiee.a variable
and value is enterBl a REP stalDent IS written. A
VERIFY RESET statement is written,after completing
the modification of each observation. This prevents
SAS from,skipping the remaining modifications. if an
observation could not be found. The SAS,program is
submitted only once and is destroyed. since
reBubmission IB redundant.
Since a significant amount of time is spent running
data input programs. it is important to make the
input screens efficient for the operators. Defaults
to miSSing values or to the last value entersd can
often reduce input tiDe by 10-50%. Range cbecking
prevents lnput of bad data and the subsequent time
spent on correction.
The follouing page details the structure of a typical
SAS progral generated by a data correction prograa.
PROC EDITOR 18 used but a DATA step with IF
BtatementB uould also perfon the same ope~atil?n. In
order to maintain the database Bort •• odlflcatlon of
the observation speCification variables (ex; bowler's
name and date bouled) IS not permitted-using the PROe
EDITOR program style.
The following example detailB the structure of a
typical SAS program generated by a data Input
program. Only the data cards in the program vary.
All other cards are constant and ~ritten by the data
Input program. The data Input program writeB the
data to the SAS program in a format ~hich matches the
BAS INPUi statement.
I/job card
II EXEC SAS
IIRDAT DD DSN=datsBet name.DIBP=SHR
IIWDAT DO DSN=dataset name.DISP=OLD
I/SYSIN DD I
In:
Iljob card
II
EXEC SAS
IIHDAT DD DSN=dataaet name.DIS?=SHR
IIWDAT DO DSN=dataset name.DIS?=OLD
IISYSIN DO *
TITLEI INSTITUTION NAnE;
TITLE2 BOWLING DATABASE;
I ...
TITLEI INSTITUTION NAnE.
TITLE2 BOWLING DATABASE;
**1:
354
EDITOR
RUN:
~ROC
DATA~VDAT.BOVLINGI
A typical data deletion program starts by requesting
the name of the output flle yhich ~ill contaIn the
SAS program. The user then specifies observatione to
be deleted. In a bouling database the bouler's name
and date bou~ed uould be entered. The user specifies
all obeervatlons to be deleted. The SAS program ie
sub.itted only once and is destroyed. sin~e
resubmission IS redundant.
f.f •• ;
YER I.LAST
FI~D
~A"E~'Jee
S.ith'
DATE=8589'
REP GAnEl=l66,
REP GAnE2-167;
VERIFY RESET;
fff.f.:
The follOWing example details the structure of a
typical SAS program generated by a data deletion
program. Only the names betueen slngle quotes and
the date are entered. The cards remain constant
except for the number of OR cards in the IF
statement. Data deletion programs are identical to
data recalculatIon prograas except DELETE replaces
the equations.
FIND VER I.LAST
NA~E='John Broun'
DATE=8589,
REP GAnEl=lSS'
VERIFY RESET;
fft'f':
Iljob card
II EXEC SAS
IIROAT DO DSN=dataeet name.DISP=SHR
IIWDAT DO DSN=dataset name.DISP=OLD
IISYSIH DO f
"
2••• 3 Data
recal~ulation
programs
Data recalculation programs regenerate dependent
variables fer selected observalions in the database.
If an input variable is corrected and dependent
values gere calculated from It. data recalculation is
necessary. Data recalculation is not needed when the
corrected input variable IS not used in dependent
variable calculations.
TITLEl INSTITUTION MAnE,
TITLE2 BOWLING DATABASE;
f**;
DATA GOT:
SET ROAT.BOWLING,BT MAnE DATE,
1**:
IF NAnE='Joe Smith' AKD DATE=85S9
OH NAnE='John Broyn' AND DATE=8589
A typical data recalculation program starts by
requesting the name of the output file which will
contain the SAS program. The user then specifies
observations which need to be recalculatea. In a
bowling database the boyler's naDe and data bowled
would be,entered. Tbe user specifies as many
observatIons as needed. The SAS program. is submitted
only once and is destroyed. sInce resubnlSsion is
redundant.
THEN OELETE,
***:
DATA WDAT.BOWLING:
SET GOT,Br MAnE DATE,
"
2.4.5 Hotes on database maintenance
The database system described in sections 2.4 is
versatile. but is not the only type of structure
~hIch Day be used.
A system can be created yhlCh has
correction and recalculation combined with data
input. When a new observation is entered through
data lnput. it IS added to the database. If the
observation already exists. then the entered
observation replaces the existing ob~ervation.
Depending on the requirements of database system. a
structure can be selected whIch best meets the needs
of the user.
The data recalculation program utilizes the same
equations as the de~endent variable calculation
sectlon of the dta Input SAS program (Section 2.4.1).
The following example details the structure of a
typical SAS progrm generated by a data recalculation
program. Only the names between single quotes and
the date are entered. The cards remain constant
except for the number of OR cards in the IF
statement.
Iljob card
II
If a new input variable is to be added to the
database the follo~ing steps are performed:
EXEC SAS
IIRDAT DO DSN=dataset name.OISP=SHH
IliDAT 00 DSN=dataset name.DISP=OLD
IISYSIN DO I
a) A neu field is added to the input screen.
If I:
TITLEI INSTITUTION NAnE:
TITLE2 BOWLING DATABASE:
bl The SAS Input ~tatement is modified to include
the ney Input variable.
DATA GOT,
SET RDAT.BOVLINGiBY NAnE DATE;
ff.:
c) The data input program's yrite statement is
modified to also urits the neu input variable.
1ft;
d) Any report uhich will lnclude the neu input
variable is ~odlfled
IF
NAHE='Joe Smith' AHD DATE=8589
OR NAftE='John Bro~n' AND OATE=8589
THEN DO;
SAS.automatically creates m1ssing values. for the neu
varIable. for every observatIon Input before addItion
of the neu variable.
OATE=nDY InONTH. DAY. YEAR) ,
SERIES=GAnEl+GAnE2+GAnE3,
DAYAYG=SERIES/3:
END;
If a ney d~pendent var~able needs to be added to the
database. lts formula IS added to the equation
sections of the SAS.programs generated by the data
input program (Sectlon 2.4.1) and the data
recalculation program (Sectlon 2.4.3).
fft;
DATA WDAl.BOWLING:
SEi GOT:BY NAnE DATE:
I.
2.4.4 Data deletion programs
2.5 Analysis Programs
Database deletion programs allo~ purglng ot unyanted
database observatione. Data deletIon combined ~lth
data input. data correction. and data recalculation
gives the user the ability to modify the database to
any desired form.
Analysis programs perform some type of analysis upon
the database or databaee subset and produce a report.
A eystem may have any number of.analysis programs.
Whenever the need develops. additIonal analysis
progra~s may be added to the database system.
AnalYSIS programs are read-only. and cannot grite to
the database dataaete.
355
*.1;
IF
NANE='Joe S.ith'
OR NANE:'John Broyn'
Analysis programs fall into three main categories:
a) Standard analyses. Section 2.5.1
THEN OUTPUT;
A standard analysis 18 any analysis which is
performed more than once.
"I;
1.1 PRINTOUT INDIVIDUAL'S RECORD:
This is acco.pllshed
by running an analY818 program that generates a
SAS program. ,WhiCh performs the analYSIS.
III:
PROe PRINT DATA=GOTiBY NAHE:
...
Usually the Input to the analYSIS program
consists of prOViding the specifications
VAR OATE SERIES GAnE!
GAnE2 GAnEJ OAYAVG;
necessary to ohtaln {he deSired database subset.
:
1.1
b) Non-standard analyses. Section 2.5.2
HEANS. HAXIHUns. AND
HJNINU~S;
"Ill;
PROC nEANS ftAXDEC=O DATA=GOT:BY NAnE:
VAR SERIES GAnE! GAnE2 GAnE3,
A non-standard analysi's is any analysis
performed only once. Since 1t is a one-time SAS
program. ,writing a special analysis program is
not justified. Instead. a generalized analysis
package haa been written whIch covers database
I •• ;
1.1 PLOT' & CORRELATION OF SERIES VS DATE;
"Ii
PROe REG DATA=GOT:BY NAnE;
nODEL SERIES=DATE;
subsetting and the common SAS PROCs.
OUTPUT OUT=SERIES P=PREDVALU:
PRCe PLOT UNIFORn:BY NAnE;
PLOT SERIES*DATE
PREOVALU_OATE=' t '
IOVERLAY;
c) Complex non-standard analyses. Section 2.5.3
If a non-standard analysis is not simple enough
to be accomplished using the general1zed
analYSIS package. then the user must resort to
~rltlng a SAS program br hand.
Therefore. only
users that kno~ SAS wil be able to perform
complex analyses.
.. I;
/.
Notes:
S&ries IS the total of the three games bouled on a
given day. OAYAVG IS the dally average.
2.5.1 Standard analyses
A standard analysis is generally used to summarize
Information in the database. Standard analyses can
be used to evaluate logIcal subsets of the database.
or to reduce large amounts of data to a reasonable
and understandable level. Since the analysis viii be
performed repeatedly. development of a specialIzed
program IS JustIfIed.
2.5.2 Hon-standard analyses
Often unforeseen anaIy·ses re required. In research
and development the varIOUS relationships between
variables are Investigated to sa In an understanding
and control of the system. ThIS developmental.
IntUItIve. and trlal-and~rror Investlsatlve process
creates a need for one-time SAS analysts programs.
These,SAS analysis programs prove or d1sprove. and
quantIfy. proposed hypothesss.
To develop a standard analysiS program a SAS program
IS written by hand and debugged.
A deCision IS made
about what part of the SAS program WIll become
varIable. For instance. the Bubsettlng section.
expanded or regular. report types. or optional
analyses may.be varIable. The debugged SAS program
IS Inserted Into the analYSIS program and an Input
S&ctl0n is added UhlCh wrItes the variabl& part of
the SAS program.
The demand for one-time SAS analysis programs can be
satisfied by eIther the originator or by data
processing personnel .. When the data processing etaff
writes the SAS analYSIS programs. the nu.ber of
rQquests tend to Increase bQyond the staff's
capabilities. ThiS can create a SItuation ~here the
responsibililty of writing SAS analysis programs 1s
shifted to the originator (user). Un£o~lunattely.
many tImes users lack the ability or t1me to learn
SAS.
A houling database individual summary program might
include:
a) chronological table of all scores to date (data
used in analyses)
To give users the abilIty to generate SAS analYSis
programs. wlthout knOWIng SAS or editing. and
extenslve analYSIS software package has been gritten.
This soft~are utilizes varIable 11stS. one per
database. UhlCh contain a table of all the variables
and variable characteristics. The user references
variables by selecting them from the variable list.
The varIable heading. SAS varIable name. type
(character or numerIC). and other variable
information is loaded and used by the analysis
software. The key database Information is also
stored In the var1ahle lIst. ThIS Includes the
dataset name(s} and location. the database title. and
JCL inforllation.
b) means of the series. first. second. and third
games (overall an increase or decrease after the
first gamel
cl maxima and mInima of the series. first. second.
and third games (find the best and the worst
games and serIes)
dl plot with linear correlation of the series
versus date (indicatIng an upward or downvard
trend)
The individual suumary program starts by requesting
the name of the output file VhlCh will contaIn the
SAS program. The user then speCIfies the name,of the
boYlers to be analyzed. These naDes are used In the
subsetting IF statement. The SAS program. is
submitted to the computer and the output IS received.
If this SAS program IS not destroyed It can be
resubmitted. after more data has been lnput. to
obtaIn an updated analysis.
A typical run of the analysis software starts by
entering the name of tbe database system to be
acceasea~
The program then requests the naue of the
output.flle which will contain the SAS progra•. The
analYSIS software wrItes the JeL and DATA statement.
A subsetting process. consisting of selecting
variables and conditIons generates the subsetting IF
statement. An analysis selectIon menu appears
allowing access to varlous SAS PROes. An~ number of
analyses may be selected and specified. When, the SAS
program is complete It is submItted to the mainframe
and the output is returned. ThIS SAS program may
stl11 be of value atter more data has been Input.
The following page shows the structure of a simple
boYllng analYSIS SAS program.
/I Job card
I I EXEC SAS
IIRDAT DO DSM=dataset name.DISP=SHR
I/SY5IN OD "
1**;
The analysis softuare package currently supports the
SAS PROes; PLOT. PLOT linked with GLH. REG. STEPWISE.
CHART. PRINT. RSQUARE. HEANS. and SAS/GRAPH GPlOT.
Additional PROCs are added when justifIed.
TITLE! INSTITUTION NAnE;
TITLE2 BOYLING DA1ABASE:
*u;
DATA GOT:
SET RDAT.BOWLING:BY NAnE DATE:
This short section cannot fully describe the
capabliities. ,versatillty. ease of use. and operation
of the analYSIS software package.
.1S6
2.5.3 Coaplex non-standard analYSes
J. SUftnARY
Often the analysis software package described 1n
section 2.5.2. non-standard analyses. will not
perform the analysis of interest. For exaaple.
analysis may be too coaplex. the PROe may not
currently be supported by the analysis package. or
new dependent var1ables .ust be ca culated.
Tbis paper describes a technique for interfacing with
SAS in the batch mode. Due to the versatility of
SAS. this is only one of the many Interface
possibilIties.
the
The systea descrIbed not only satisfies research and
development data handlIng needs. but dany general
data collection and analysla applications as well.
Kany systems may utilize this database structure, as
long as there are no requlreaents for instantaneous
data retrieval. The syetem's response tiae can only
be as fast as the job turnaround tide at the
aainframe computer. nost database systems ~hich
require analysiS do not need instantaneous data
retrieval. and can therefore profit from this type of
database operation.
There are two vays SAS analysis program can be
written. Both methods require knowledge of the SAS
language:
a) The user produces the fraaework of the SAS
analysis prograa by running the analysis package
(Section 2.5.2>, This SAS program is modified
through edit to accomplish the desired analyses,
Since a majority of the SAS program is already
written. program development time can be reduced
significantly.
Lee~r:l::p-
b) The analysis is complex enough to require
writing the SAS analysis program froa scratch.
This aethod allows any analysis if the tiae and
expense can be justified.
2.6 Data printout
Diamo~~k
7528 Auhurn Road
Painesville. Ohio
(216)-357-3918
Chemlcals Company
44077
pro~
Data printout progra.e produce a for~atted report of
the database subset. A printout progra. can be
claSSified as a simple analysis prograa. Printout
prograas are read-only. they cannot urite to the
database datasets.
A database system .ay have any number of printout
programs which present the database information in
aifferent for.s. In a bowling databaae there .ight
be a weekly report which frints out all gaaes bowled
1n a given week. a bowler s report uhich prints the
history for a given bowler. ana a tea. report which
prints team records.
The development of a printout program is si.ilar to
development of an analysis,prograa (Section 2.5.1>.
A FILE PRINT routine is wrItten and debugged which
produced the desired formatted report. This report
1S inserted into the printout prograa as write
stateuente. Inputs are wri Uen for the variabla
section of the program. In the above exaaples the
~eek would be infut for weekly reports. the bowler's
naaes for bouler s reports. and team naaes for tea.
reports.
Nany times one of the FILE PRINTs develo~ed for
report programs is included in the data lnput
progra.. This produces a standard report lor all
1nput data. In a bowling database the ~eeily report
~ould be included in the bouling data input program.
2
W",
0:::
La
z:>
~w
~r
«
The structure of a SAS prograa produced bY,a printout
program is identical to the progra. shown 1n Section
2.5.1 except the PRDes are replaced by a FILE PRINT.
~::>
-0
~u
<-')
«
0
:>
2.7 Utility prograas
w
2
w
Utility programs do not write SAS programs hut do
provide database services. Soae utillty progra.s
siapllfy database operation while others add
capabllibities. ExaapleB of utility programs are:
~
(f)
a
W
>-
The database system main aenu allows easy access
to all database system prograds. Upon
coapletton of a database system progra. the main
aenu is redisplayed. "ain .enu prograas
eliajnate the need to reaember program naaes.
~-
7:
W
«
:>
ill
~
(§
Coa.unications prograds aid in Bubmitting SAS
prograas to the .alnfraae coaputer.
c) Data stripping prograDs
~
w
>ii'~
"w
0>uv>
,>
'>
357
u
~
zV>
Data stripping ptogra.s extract data froa
spooled SAS output, Stripping prograas allow
retrieval of data fro. the database. This data
.ay then be aanipulated as desired (ex: local
plotters).
"
ifw
wZ
~
(f)
bl Co••unication prograas
'l'
0
w~
(f)
a) Rain menu prograDs
0
0
v>
:>
«
~~
~~:5
«<<r
v>w~
-'"
3",
~«
"'w'"
">-"
««<
<ra<r
",w"'
ozO
g:~2: