Download Using the DBLOAD Procedure to Create and Populate SYSTEM 2000' Data Management Software Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Using the DBlOAD Procedure to Create and Populate SYSTEM 2000'
Data Management Software Databases
David W. Pitts, SAS Institute Inc., Austin, Texas
Kim D. Hiserote, SAS Institute Inc., Austin, Texas
ABSTRACT
SAS Data Variables
The architecture in Version 6 of the SAS'" System has opened up
new ways to migrate data from SAS data sets to SYSTEM 2000"
databases. It also provides a way to migrate data from other DBMS
databases to SYSTEM 2000 software. This paper describes the
process of taking existing SAS data sets, creating a SYSTEM 2000
database, and populating that database. Examples show how to
create a database view, how to map data variables from a SAS data
Each observation in a SAS data file contains one data value for each
variable; that is, each column of data values is a variable. SAS variables have several defining attributes. The attributes used by PAOe
DBlOAD to build a SYSTEM 2000 software item are declaration
type, length, format, name, and label.
There are two types of variables, numeric and character. The length
attribute is the number of bytes used to store each of a variable's
values in a SAS data file. A variable's format is the pattern the SAS
System uses to display each value of a variable. The name is the
a·byte name that becomes the SYSTEM 2000 item name. If the
label option is yes, then the label name is used for the item name.
set to that view in order to add complete new entries in the database. or to append records to existing entries. Input is not limited
to SAS data sets. Any view supported by the ACCESS procedure
in Version 6 of the SAS System can be used to populate a SYSTEM
2000 database.
Members Of Type Access
INTRODUCTION
The SAS System files of type access are called access descriptors.
These files hold essential information about databases you want to
access, for example, the database name, item names, and item
types. They also contain the corresponding SAS System information, such as the SAS variable names and formats that describe the
data.
This paper discusses the basic concept of a SAS data set and the
pertinent information required by the DBlOAD procedure. It pro·
vides step·by-step guidance on how to select and use screens and
commands needed to create a database, add new entries to an
existing database, and add descendant records to already existing
entries. The example in this paper shows the screens using the
interactive SAS Display Manager System facility. The lowercase
data shown on the screens represent the data that were just
entered.
For SYSTEM 2000, the access descriptor can contain the entire
database definition from which you create your view descriptors.
You can use the SAS/ACCESS" interface to create an access
descriptor, or PAOC DBlOAD can create the access descriptor
when it creates a new database.
SAS DATA LIBRARIES
Members of Type View
You need to understand the different SAS data libraries and SAS
files in Version 6 before you can effectively use the PAOe DBlOAD
interface with SYSTEM 2000 software. The SAS data library is the
highest level of file organization; it contains files that are managed
by the SAS System. Each SAS file belongs to one of three general
categories: a SAS data set, a SAS catalog, or other SAS file. Each
file is a member of the data library and each member has a member
type. A SAS data set can be one of two member types, type data
or type view.
A SAS data set of type view is called a SAS data view. You can
think of a SAS data view just as you do a SAS data file. It does not
matter to the SAS software whether the data come from a SAS data
view or a SAS data file. The difference between a SAS data view
and a SAS data file is that a SAS data view does not actually contain
data values. Instead, it contains the definition of data stored else·
where. You use SAS data views to define supersets or subsets of
a SAS access descriptor.
In Release 6.06 of the SAS System, SAS data views can be created
with PROC Sal, PRDC ACCESS, and PRDC DBlOAD. PROC
DBlOAD creates both the access and view descriptors when it
creates a new database.
Member Type Data
A SAS data set is any file that the SAS System can access as
though it were a physical object containing a data portion with val·
ues stored in a rectangular form and a descriptor portion that identifies the values to the SAS System. The descriptor portion of a data
file can be stored at the beginning of the data file, or it can be in
a different file as well as in a different format. The descriptor information includes the names and attributes of the variables in the data
file. It also contains other information, including the date and time
of the data file's creation, the engine used to create it, and any host·
dependent information, such as the number of observations per
page and the data file's phYSical name. The SAS System uses this
infonnation to process the data correctly.
DATABASE CREATION
The DBlOAD procedure enables you to create and load a SYSTEM
2000 database from a SAS data set or a SAS data view. You create
a database by allocating the database files and invoking PAOe
DBlOAD in batch, interactive line mode, or interactive display man·
ager mode.
Database File Allocation
The data portion of a data file contains data values in rows and columns. Each row in a data file represents one observation. Each col·
urnn has a variable name associated with it and contains data values
for the variable.
When using SYSTEM 2000 software in a single~user environment,
you must allocate the appropriate database files in your SAS session prior to invoking the DBlOAD procedure. For Multi~User~
442
software, the database files must be allocated in the Multi-User
region. For single-user environments, you can issue a elist in TSO
to allocate SYSTEM 2000 database files. You can issue this clist
prior to running the SASS2K elist, or you can use the TSO subset
mode when you are already executing the SAS software. The following example shows the allocation for the BANKING database
using the S2KDBAl clist:
Access
Descriptor
indicates the name of the access
descriptor that the DBlOAD procedure
will create. When creating a new
database, the access descriptor must not
exist.
Multi-User
is NO if you are creating the database in
the single-user environm~nt or YES if you
are in a Multi-User environment.
label
is NO if you want the 8-character SAS
variable name for the SYSTEM 2000 item
name or YES if you want the
40-character SAS label to be used (if
S2KDBAL DBN(BANKING) DSN(BANKING] DBVOL(SAS999) NEW
PRoe DBlOAD with Interactive Display Manager Mode
Although the examples in the paper show full-screen processing,
everything can be done in batch with SAS statements. If you are
using a full-screen terminal, you can specify all the procedure statements except the lOAD statement and still invoke the interactive
display manager. The simplest way to run interactively is to type
any).
Create only
proc dbloadi run;
is NO if you want to create and load the
database or YES if you only want to
create the database and not load it.
if you leave the Database View and Access Descriptor fields blank,
PROe DBlOAD creates the access and view descriptor in the
WORK data set with a name of WORK. <database-name>.
<type>. This means you must use the SAS/ACCESS interface to
create your permanent access and view descriptors later.
For the initial load it is recommended you pre-sort your input data
and use the S2KlOAD statement to tell SYSTEM 2000 software
that you want to do an optimized load. You must specify this statement before you enter the RUN statement, as in this example:
proc dbload; s2kload; run;
When you have entered all the necessary information, press
ENTER. At this pOint the database name and view and access
descriptors are checked to ensure they do not already exist. For
the initial load the NEW DATA BASE IS <database> command is
issued to SYSTEM 2000 software. When that is successful, the
load Display window appears as shown in Screen 2.
If you have only the Version 6 SAS/ACCESS interface to SYSTEM
2000 software licensed, then the SYSTEM 2000 load Identification
window appears. Otherwise, a list of all licensed SAS/ACCESS
interfaces are listed. You then place your cursor by SYSTEM 2000
and hit ENTER and the load Identification window appears as
shown in Screen 1.
Dar.OAD,
co ...... od
DBLOAn, DATABAliE <dahbase name>
COlllllland ••• >
==~>
SYSTEM 2000 in
S'£STEII 2000
(CI
Load Display l!iodo>l
Loa4 Ident1!1catJ.on "indow
Database, <databose name>
InPllt Data -
1.ibrary, trans
)I"lnber, bank
Typ@' DATA
lIeOlbe<; banki-nq
Type, VI&!!
Fune 1.vl
Database Vhw - 1.ibrary,
CUSTNAHE
CUSTID
acc:ollnt oumber
account type
trans type
trans alllOllnt
tran" d~h
If Creatl0g a New Databue. Please Enter,
Database Name, bankug
pau"ord,
Acc:ess Descnptor - 1.ibrary,
Muiti-User(tml' NO
Screen 1
SUUSH
1.abel, 110
lIeOlber, banking
Type, ACCESS
Screen 2
Password
Database Name
SAS Name
Format
CUSTNAHE
CDSTID
ACCTIIUK
ACCTTTP
TRANSTYP
TRANSAMT
TRANSDAT
..".
$20.
""
D01.I.AB10.2
DATU.
Sample load Display Window for the BANKING
Database
Database Name, SAS Name, and Format are protected fields and
cannot be changed. Any name or format change must be made prior
to or when you invoke PROC DBlOAD. You can enter and change
the folloWing fields:
These fields appear in the window:
Database View
Index
Create only: NO
Sample load Identification Window Creating the
Banking Database
Input Data
Component lIame
indicates the input SAS data set or a
SAS/ACCESS view that will be used to
create the SYSTEM 2000 database. If the
input data is a view, overtype TYPE:
DATA with TYPE: VIEW.
Func
indicates the view descriptor that the
DBlOAD procedure will create and use
to populate the database. When creating
a new database, the view descriptor must
not exist.
Lvi
Component
Name
becomes the SYSTEM 2000 master
password for the specified database
name being created.
indicates the database name being
created.
443
specifies which variables to use. Use 0 to
drop or S to select a variable. By default
all variables are selected.
specifies the hierarchical database level.
The default is level zero.
is the same as the SAS name field unless
the label option was specified. You can
change the names by typing over them.
Notice in the sample screen the
component names in lowercase were
changed and are different from the SAS
names.
Index
specifies key items. Type a Y for any
item you want to be a SYSTEM 2000 key
item. Or you may initially load the
database with all non-key items, then use
the QUEST procedure and issue the
CREATE INDEX <item name or
number> command later. The load is
faster with fewer key items.
The DBLOAD procedure generates aU of the component numbers
automatically, starting with one for the first level zero item. The item
numbers that follow continue consecutively to the next level change.
The numbers for the records below level zero start with the next
available hundred number. Screen 3 is a sample DESCRIBE of the
BANKING database that DBLOAD created. Notice the record
names. You can easily change the record names by invoking PROC
QUEST and issuing DEFINE language commands to change the
name to something more descriptive.
Here are a few other commands you can type on the command Hne:
CANCEL
SYS1"E1( QELEAS£ NUIlBEII. 11.6A
IlATA BASE NAI(E IS
BAmCING
OErINITIOII NUJlBER
DATA f-ASE CYCLE NUIlBER
18
l ' CUSTNAME (CHAR X(20))
2. CUSUIl (CHAR XP))
100. nCORD...l.EVEL....l (RBCOIlD)
101. ACCOUNT IIOKBRR (INTEGER llUHeER 9999 IN 100)
102. ACCOUNT TYPE (CHAI< X IN 100)
200' UCORIL.tBVEL....2 (RECORD IN 100)
201'
T"AlI" TYPI': (CIIAR X IN 200)
202. TRANS AMOUfiT ( HON-f.EY KONEY $9(7).99 HI 20~)
20~.
TIlMS DATE (DATE IN 200)
terminates processing without executing
the load and returns to the Load
Identification window.
RESET
resets all item names, level numbers, and
key/non-key status to the defaults,
including deleted variables.
SHOW ALL
shows aU previously deleted and selected
variables. It works like a toggle switch in
that it leaves the D in the function field to
drop the items again unless you change
it.
Once you have made the desired changes, type END or press the
PF key for END so that your changes can be verified. When everything is correct, you receive the following message:
At most In) obs will be loaded.
Enter LOAD to continue.
Screen 3
If the number of observations is too large and you are only running
a test, you can type the command WHERE followed by a valid SAS
Loading Data into an Existing Database
WHERE clause to subset your data before you issue the LOAD
command.
Loading additional entries into an existing database is just as easy
as the initial creation. When you see the Load Identification window
(as shown in Screen 1), you only need to fill in the top two lines to
describe the input data set and the database view. Information such
as the database name, password, single-user or Multi-User environment, and variable names are stored in the view descriptor. This
time you do not see the Load Display window since the database
and the view must already exist. When you press ENTER you
receive the same message stating the number of observations and
to enter LOAD when you are ready to begin the loading. Again, you
can use the SAS WHERE clause to subset your data before you
begin loading.
At this point the SASI ACCESS access and view descriptors are built
and the SYSTEM 2000 DEFINE commands are issued to define the
database. Then the database is loaded unless you requested create
only.
Creating SYSTEM 2000 Item Descriptions from SAS Variables
Table 1 shows the conversion of a SAS variable to a SYSTEM 2000
item.
Table 1
Converting SAS Variables to SYSTEM 2000 Data Items
SAS NUMERIC VARIABLES
Length
Format
Item Description
any DATE
DATE
DATE and TIME
DOUBLE
Updating Existing Logical Entries
w.d
DEC 9(x).9(d)
DOLLARw.d
MONEY 9(x).9(d)
For a brief recap, a SYSTEM 2000 logical entry begins with the top
record, nOlmally called CO ENTRY record. All descendant records
belong to a logical entry. The prior discussions were concerned with
adding complete new logical entries to the database.
W
INT 9(w)
E
REAL
~>8
E
DOUBLE
<8
none of the above
REAL
none of the above
DOUBLE
~>
=
Loading data after the database already exists means that not all
input data variables may match a SAS variable name within the view
descriptor. Only those input variable names that do match are
loaded and the mismatched variables are ignored.
SYSTEM 2000
<8
where x
Sample BANKING Database Definition
To add descendant records to an existing logical entry, you need
to specify BY keys in your view descriptor. A BY key is similar to
a BY group in the SAS System. You need to specify enough BY
keys to uniquely identify the record to which you want your new
records attached. If the BY keys do not qualify a unique record, the
new records are attached to the first record that meets the qualification.
w - (1 +d)
SAS CHARACTER VARIABLES
SYSTEM 2000
Length
Format
Item Description
y
$HEX
UNDEFINED X(y)
Y
$CHAR
TEXT X(y)
Y
none of the at>ove
CHAR X(y)
You specify the BY keys when you define or edit a view descriptor.
A BY key is an optional collection of one or more database items,
usually at least one from each database level in the view deSCriptor.
When a view descriptor contains BY keys, a SYSTEM 2000
444
where-clause is issued using those keys to look for already existing
records. If no records are found, then a complete logical entry is
added to the database. Otherwise, the SYSTEM 2000 interface
view engine determines the descendant records that are added
based on the records qualified using the BY keys.
The S2KLEN Statement
A unique feature of SYSTEM 2000 software is character overflow
that allows for more efficient data storage. CHARACTER and TEXT
fields defined as four or more characters in length can hold up to
250 characters of data, although space in the data table is limited
to the defined length. When a data value exceeds the defined length,
a pointer in the data table points to the displacement in the overflow
table where the data resides. This allows you to define the length
of your items that will hold most data values but still accommodate
values up to 250 characters when necessary. This technique can
save a lot of disk storage if your database contains optional comment fields that are rarely valued.
SPECIAL CONSIDERATIONS
Here are a few considerations you need to be aware of when using
PRoe DBLOAD with SYSTEM 2000 software.
The S2KLOAD Statement
The S2KLOAD statement indicates that you want SYSTEM 2000
software to use optimized load mode processing. You can use optimized load for the initial load or for incremental loads that involve
adding entirely new logical entries. You cannot use optimized loading when you are attaching new records to existing entries using
BY keys.
S2KLEN variable-identifier=n (where n is an integer from 1 to 250)
defines an item's length; n must be four or greater to use overflow.
This statement is recognized only when you are creating a new
database. The value of n is used in the definition of the new database instead of the SAS variable length for that item. You must
issue this statement before you enter the RUN statement.
If you are loading a large amount of data, it is recommended that
you use the S2KLOAD statement for your initial and incremental
inserts. You must issue this statement before you enter the RUN
statement. Optimized load mode is more efficient than the default
insert mode, but it has some restrictions:
Your access and view descriptors define the length of a database
item. If the length is not altered in the descriptor, the default is the
SYSTEM 2000 item definition length. Therefore to prevent truncation when retrieving values that overflow, the descriptor length must
be as big as your largest value. Note that SYSTEM 2000 overflow
values can be up to 250 characters, and the maximum length you
can specify in a descriptor is 200.
• Data must be sorted in data tree order prior to the load.
• Entire logical entries are always inserted.
Your input cannot be a SYSTEM 2000 view in the same
Multi-User or single-user environment.
CONCLUSION
SYSTEM 2000 Data Management software offers data storage that
allows for fast data access, flexible data query, full security at the
schema item level, automatic Coordinated Recovery, Multi-User,
PLEX, and the Self-Contained Facility. Beginning with Version 6 of
the SAS System, the SASfACCESS interface to SYSTEM 2000
software allows any SAS procedure to use a view of a database
just as you would use a SAS data set. This gives the SAS user the
full advantages of SYSTEM 2000 software and still allows you to
use the SAS System tools that are familiar to you.
For the Multi-User environment, your output database is
opened in exclusive mode.
• Coordinated Recovery is temporarily disabled for the
database during the load process.
Pre-sort the Input Data when Adding Complete Entries
The number of inserts and the levels at which inserts are performed
depend on the order of the data and on what fields change from
observation to observation. When you insert an observation the
interface view engine compares the data to the prior observation.
Depending on how many fields have changed, one or more records
are inserted at the levels that did change. When the data are not
in proper order, more redundant data could be added to the database.
PROC DBlOAD offers a method to migrate your SAS data set to
a SYSTEM 2000 database. Either interactive or batch, you can easily create and load a SYSTEM 2000 database from your SAS data
set or from a view of another DBMS supported by SAS/ACCESS
software. With PROe DBLOAD and SYSTEM 2000 software, you
can initially load your data, do incremental loads to an existing database, and add descendant records to existing entries.
The SORT procedure is used to sort a SAS data set. You cannot
use PROC SORT with a view descriptor used as -input to a load,
but you can include an ordering clause in the view descriptor.
SAS, SAStACCESS, and SYSTEM 2000 are registered trademarks
and Multi-User is a trademark of SAS institute Inc., Cary, NC, U.S.A.
445