Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
ContactPoint wikipedia , lookup
Using the DBlOAD Procedure to Create and Populate SYSTEM 2000' Data Management Software Databases David W. Pitts, SAS Institute Inc., Austin, Texas Kim D. Hiserote, SAS Institute Inc., Austin, Texas ABSTRACT SAS Data Variables The architecture in Version 6 of the SAS'" System has opened up new ways to migrate data from SAS data sets to SYSTEM 2000" databases. It also provides a way to migrate data from other DBMS databases to SYSTEM 2000 software. This paper describes the process of taking existing SAS data sets, creating a SYSTEM 2000 database, and populating that database. Examples show how to create a database view, how to map data variables from a SAS data Each observation in a SAS data file contains one data value for each variable; that is, each column of data values is a variable. SAS variables have several defining attributes. The attributes used by PAOe DBlOAD to build a SYSTEM 2000 software item are declaration type, length, format, name, and label. There are two types of variables, numeric and character. The length attribute is the number of bytes used to store each of a variable's values in a SAS data file. A variable's format is the pattern the SAS System uses to display each value of a variable. The name is the a·byte name that becomes the SYSTEM 2000 item name. If the label option is yes, then the label name is used for the item name. set to that view in order to add complete new entries in the database. or to append records to existing entries. Input is not limited to SAS data sets. Any view supported by the ACCESS procedure in Version 6 of the SAS System can be used to populate a SYSTEM 2000 database. Members Of Type Access INTRODUCTION The SAS System files of type access are called access descriptors. These files hold essential information about databases you want to access, for example, the database name, item names, and item types. They also contain the corresponding SAS System information, such as the SAS variable names and formats that describe the data. This paper discusses the basic concept of a SAS data set and the pertinent information required by the DBlOAD procedure. It pro· vides step·by-step guidance on how to select and use screens and commands needed to create a database, add new entries to an existing database, and add descendant records to already existing entries. The example in this paper shows the screens using the interactive SAS Display Manager System facility. The lowercase data shown on the screens represent the data that were just entered. For SYSTEM 2000, the access descriptor can contain the entire database definition from which you create your view descriptors. You can use the SAS/ACCESS" interface to create an access descriptor, or PAOC DBlOAD can create the access descriptor when it creates a new database. SAS DATA LIBRARIES Members of Type View You need to understand the different SAS data libraries and SAS files in Version 6 before you can effectively use the PAOe DBlOAD interface with SYSTEM 2000 software. The SAS data library is the highest level of file organization; it contains files that are managed by the SAS System. Each SAS file belongs to one of three general categories: a SAS data set, a SAS catalog, or other SAS file. Each file is a member of the data library and each member has a member type. A SAS data set can be one of two member types, type data or type view. A SAS data set of type view is called a SAS data view. You can think of a SAS data view just as you do a SAS data file. It does not matter to the SAS software whether the data come from a SAS data view or a SAS data file. The difference between a SAS data view and a SAS data file is that a SAS data view does not actually contain data values. Instead, it contains the definition of data stored else· where. You use SAS data views to define supersets or subsets of a SAS access descriptor. In Release 6.06 of the SAS System, SAS data views can be created with PROC Sal, PRDC ACCESS, and PRDC DBlOAD. PROC DBlOAD creates both the access and view descriptors when it creates a new database. Member Type Data A SAS data set is any file that the SAS System can access as though it were a physical object containing a data portion with val· ues stored in a rectangular form and a descriptor portion that identifies the values to the SAS System. The descriptor portion of a data file can be stored at the beginning of the data file, or it can be in a different file as well as in a different format. The descriptor information includes the names and attributes of the variables in the data file. It also contains other information, including the date and time of the data file's creation, the engine used to create it, and any host· dependent information, such as the number of observations per page and the data file's phYSical name. The SAS System uses this infonnation to process the data correctly. DATABASE CREATION The DBlOAD procedure enables you to create and load a SYSTEM 2000 database from a SAS data set or a SAS data view. You create a database by allocating the database files and invoking PAOe DBlOAD in batch, interactive line mode, or interactive display man· ager mode. Database File Allocation The data portion of a data file contains data values in rows and columns. Each row in a data file represents one observation. Each col· urnn has a variable name associated with it and contains data values for the variable. When using SYSTEM 2000 software in a single~user environment, you must allocate the appropriate database files in your SAS session prior to invoking the DBlOAD procedure. For Multi~User~ 442 software, the database files must be allocated in the Multi-User region. For single-user environments, you can issue a elist in TSO to allocate SYSTEM 2000 database files. You can issue this clist prior to running the SASS2K elist, or you can use the TSO subset mode when you are already executing the SAS software. The following example shows the allocation for the BANKING database using the S2KDBAl clist: Access Descriptor indicates the name of the access descriptor that the DBlOAD procedure will create. When creating a new database, the access descriptor must not exist. Multi-User is NO if you are creating the database in the single-user environm~nt or YES if you are in a Multi-User environment. label is NO if you want the 8-character SAS variable name for the SYSTEM 2000 item name or YES if you want the 40-character SAS label to be used (if S2KDBAL DBN(BANKING) DSN(BANKING] DBVOL(SAS999) NEW PRoe DBlOAD with Interactive Display Manager Mode Although the examples in the paper show full-screen processing, everything can be done in batch with SAS statements. If you are using a full-screen terminal, you can specify all the procedure statements except the lOAD statement and still invoke the interactive display manager. The simplest way to run interactively is to type any). Create only proc dbloadi run; is NO if you want to create and load the database or YES if you only want to create the database and not load it. if you leave the Database View and Access Descriptor fields blank, PROe DBlOAD creates the access and view descriptor in the WORK data set with a name of WORK. <database-name>. <type>. This means you must use the SAS/ACCESS interface to create your permanent access and view descriptors later. For the initial load it is recommended you pre-sort your input data and use the S2KlOAD statement to tell SYSTEM 2000 software that you want to do an optimized load. You must specify this statement before you enter the RUN statement, as in this example: proc dbload; s2kload; run; When you have entered all the necessary information, press ENTER. At this pOint the database name and view and access descriptors are checked to ensure they do not already exist. For the initial load the NEW DATA BASE IS <database> command is issued to SYSTEM 2000 software. When that is successful, the load Display window appears as shown in Screen 2. If you have only the Version 6 SAS/ACCESS interface to SYSTEM 2000 software licensed, then the SYSTEM 2000 load Identification window appears. Otherwise, a list of all licensed SAS/ACCESS interfaces are listed. You then place your cursor by SYSTEM 2000 and hit ENTER and the load Identification window appears as shown in Screen 1. Dar.OAD, co ...... od DBLOAn, DATABAliE <dahbase name> COlllllland ••• > ==~> SYSTEM 2000 in S'£STEII 2000 (CI Load Display l!iodo>l Loa4 Ident1!1catJ.on "indow Database, <databose name> InPllt Data - 1.ibrary, trans )I"lnber, bank Typ@' DATA lIeOlbe<; banki-nq Type, VI&!! Fune 1.vl Database Vhw - 1.ibrary, CUSTNAHE CUSTID acc:ollnt oumber account type trans type trans alllOllnt tran" d~h If Creatl0g a New Databue. Please Enter, Database Name, bankug pau"ord, Acc:ess Descnptor - 1.ibrary, Muiti-User(tml' NO Screen 1 SUUSH 1.abel, 110 lIeOlber, banking Type, ACCESS Screen 2 Password Database Name SAS Name Format CUSTNAHE CDSTID ACCTIIUK ACCTTTP TRANSTYP TRANSAMT TRANSDAT ..". $20. "" D01.I.AB10.2 DATU. Sample load Display Window for the BANKING Database Database Name, SAS Name, and Format are protected fields and cannot be changed. Any name or format change must be made prior to or when you invoke PROC DBlOAD. You can enter and change the folloWing fields: These fields appear in the window: Database View Index Create only: NO Sample load Identification Window Creating the Banking Database Input Data Component lIame indicates the input SAS data set or a SAS/ACCESS view that will be used to create the SYSTEM 2000 database. If the input data is a view, overtype TYPE: DATA with TYPE: VIEW. Func indicates the view descriptor that the DBlOAD procedure will create and use to populate the database. When creating a new database, the view descriptor must not exist. Lvi Component Name becomes the SYSTEM 2000 master password for the specified database name being created. indicates the database name being created. 443 specifies which variables to use. Use 0 to drop or S to select a variable. By default all variables are selected. specifies the hierarchical database level. The default is level zero. is the same as the SAS name field unless the label option was specified. You can change the names by typing over them. Notice in the sample screen the component names in lowercase were changed and are different from the SAS names. Index specifies key items. Type a Y for any item you want to be a SYSTEM 2000 key item. Or you may initially load the database with all non-key items, then use the QUEST procedure and issue the CREATE INDEX <item name or number> command later. The load is faster with fewer key items. The DBLOAD procedure generates aU of the component numbers automatically, starting with one for the first level zero item. The item numbers that follow continue consecutively to the next level change. The numbers for the records below level zero start with the next available hundred number. Screen 3 is a sample DESCRIBE of the BANKING database that DBLOAD created. Notice the record names. You can easily change the record names by invoking PROC QUEST and issuing DEFINE language commands to change the name to something more descriptive. Here are a few other commands you can type on the command Hne: CANCEL SYS1"E1( QELEAS£ NUIlBEII. 11.6A IlATA BASE NAI(E IS BAmCING OErINITIOII NUJlBER DATA f-ASE CYCLE NUIlBER 18 l ' CUSTNAME (CHAR X(20)) 2. CUSUIl (CHAR XP)) 100. nCORD...l.EVEL....l (RBCOIlD) 101. ACCOUNT IIOKBRR (INTEGER llUHeER 9999 IN 100) 102. ACCOUNT TYPE (CHAI< X IN 100) 200' UCORIL.tBVEL....2 (RECORD IN 100) 201' T"AlI" TYPI': (CIIAR X IN 200) 202. TRANS AMOUfiT ( HON-f.EY KONEY $9(7).99 HI 20~) 20~. TIlMS DATE (DATE IN 200) terminates processing without executing the load and returns to the Load Identification window. RESET resets all item names, level numbers, and key/non-key status to the defaults, including deleted variables. SHOW ALL shows aU previously deleted and selected variables. It works like a toggle switch in that it leaves the D in the function field to drop the items again unless you change it. Once you have made the desired changes, type END or press the PF key for END so that your changes can be verified. When everything is correct, you receive the following message: At most In) obs will be loaded. Enter LOAD to continue. Screen 3 If the number of observations is too large and you are only running a test, you can type the command WHERE followed by a valid SAS Loading Data into an Existing Database WHERE clause to subset your data before you issue the LOAD command. Loading additional entries into an existing database is just as easy as the initial creation. When you see the Load Identification window (as shown in Screen 1), you only need to fill in the top two lines to describe the input data set and the database view. Information such as the database name, password, single-user or Multi-User environment, and variable names are stored in the view descriptor. This time you do not see the Load Display window since the database and the view must already exist. When you press ENTER you receive the same message stating the number of observations and to enter LOAD when you are ready to begin the loading. Again, you can use the SAS WHERE clause to subset your data before you begin loading. At this point the SASI ACCESS access and view descriptors are built and the SYSTEM 2000 DEFINE commands are issued to define the database. Then the database is loaded unless you requested create only. Creating SYSTEM 2000 Item Descriptions from SAS Variables Table 1 shows the conversion of a SAS variable to a SYSTEM 2000 item. Table 1 Converting SAS Variables to SYSTEM 2000 Data Items SAS NUMERIC VARIABLES Length Format Item Description any DATE DATE DATE and TIME DOUBLE Updating Existing Logical Entries w.d DEC 9(x).9(d) DOLLARw.d MONEY 9(x).9(d) For a brief recap, a SYSTEM 2000 logical entry begins with the top record, nOlmally called CO ENTRY record. All descendant records belong to a logical entry. The prior discussions were concerned with adding complete new logical entries to the database. W INT 9(w) E REAL ~>8 E DOUBLE <8 none of the above REAL none of the above DOUBLE ~> = Loading data after the database already exists means that not all input data variables may match a SAS variable name within the view descriptor. Only those input variable names that do match are loaded and the mismatched variables are ignored. SYSTEM 2000 <8 where x Sample BANKING Database Definition To add descendant records to an existing logical entry, you need to specify BY keys in your view descriptor. A BY key is similar to a BY group in the SAS System. You need to specify enough BY keys to uniquely identify the record to which you want your new records attached. If the BY keys do not qualify a unique record, the new records are attached to the first record that meets the qualification. w - (1 +d) SAS CHARACTER VARIABLES SYSTEM 2000 Length Format Item Description y $HEX UNDEFINED X(y) Y $CHAR TEXT X(y) Y none of the at>ove CHAR X(y) You specify the BY keys when you define or edit a view descriptor. A BY key is an optional collection of one or more database items, usually at least one from each database level in the view deSCriptor. When a view descriptor contains BY keys, a SYSTEM 2000 444 where-clause is issued using those keys to look for already existing records. If no records are found, then a complete logical entry is added to the database. Otherwise, the SYSTEM 2000 interface view engine determines the descendant records that are added based on the records qualified using the BY keys. The S2KLEN Statement A unique feature of SYSTEM 2000 software is character overflow that allows for more efficient data storage. CHARACTER and TEXT fields defined as four or more characters in length can hold up to 250 characters of data, although space in the data table is limited to the defined length. When a data value exceeds the defined length, a pointer in the data table points to the displacement in the overflow table where the data resides. This allows you to define the length of your items that will hold most data values but still accommodate values up to 250 characters when necessary. This technique can save a lot of disk storage if your database contains optional comment fields that are rarely valued. SPECIAL CONSIDERATIONS Here are a few considerations you need to be aware of when using PRoe DBLOAD with SYSTEM 2000 software. The S2KLOAD Statement The S2KLOAD statement indicates that you want SYSTEM 2000 software to use optimized load mode processing. You can use optimized load for the initial load or for incremental loads that involve adding entirely new logical entries. You cannot use optimized loading when you are attaching new records to existing entries using BY keys. S2KLEN variable-identifier=n (where n is an integer from 1 to 250) defines an item's length; n must be four or greater to use overflow. This statement is recognized only when you are creating a new database. The value of n is used in the definition of the new database instead of the SAS variable length for that item. You must issue this statement before you enter the RUN statement. If you are loading a large amount of data, it is recommended that you use the S2KLOAD statement for your initial and incremental inserts. You must issue this statement before you enter the RUN statement. Optimized load mode is more efficient than the default insert mode, but it has some restrictions: Your access and view descriptors define the length of a database item. If the length is not altered in the descriptor, the default is the SYSTEM 2000 item definition length. Therefore to prevent truncation when retrieving values that overflow, the descriptor length must be as big as your largest value. Note that SYSTEM 2000 overflow values can be up to 250 characters, and the maximum length you can specify in a descriptor is 200. • Data must be sorted in data tree order prior to the load. • Entire logical entries are always inserted. Your input cannot be a SYSTEM 2000 view in the same Multi-User or single-user environment. CONCLUSION SYSTEM 2000 Data Management software offers data storage that allows for fast data access, flexible data query, full security at the schema item level, automatic Coordinated Recovery, Multi-User, PLEX, and the Self-Contained Facility. Beginning with Version 6 of the SAS System, the SASfACCESS interface to SYSTEM 2000 software allows any SAS procedure to use a view of a database just as you would use a SAS data set. This gives the SAS user the full advantages of SYSTEM 2000 software and still allows you to use the SAS System tools that are familiar to you. For the Multi-User environment, your output database is opened in exclusive mode. • Coordinated Recovery is temporarily disabled for the database during the load process. Pre-sort the Input Data when Adding Complete Entries The number of inserts and the levels at which inserts are performed depend on the order of the data and on what fields change from observation to observation. When you insert an observation the interface view engine compares the data to the prior observation. Depending on how many fields have changed, one or more records are inserted at the levels that did change. When the data are not in proper order, more redundant data could be added to the database. PROC DBlOAD offers a method to migrate your SAS data set to a SYSTEM 2000 database. Either interactive or batch, you can easily create and load a SYSTEM 2000 database from your SAS data set or from a view of another DBMS supported by SAS/ACCESS software. With PROe DBLOAD and SYSTEM 2000 software, you can initially load your data, do incremental loads to an existing database, and add descendant records to existing entries. The SORT procedure is used to sort a SAS data set. You cannot use PROC SORT with a view descriptor used as -input to a load, but you can include an ordering clause in the view descriptor. SAS, SAStACCESS, and SYSTEM 2000 are registered trademarks and Multi-User is a trademark of SAS institute Inc., Cary, NC, U.S.A. 445