Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
ContactPoint wikipedia , lookup
THE MULTIPLE ENGINE ARCIIITECTURE: A COMPARISON OF DATABASE MANAGEMENT SYSTEMS USING THE SAS/ACCESS® INTERFACE Alma Zuniga, SAS Institute Inc., Austin, Texas ABSTRACT system from within a SAS program, using the descriptor files created with the ACCESS procedure SAS® System Version 6 architecture has opened up new ways to interface the SAS® System with several database management systems. Engines were a new feature of Version 6 of the SAS System. These engines retrieve data directly from files formatted by other software vendors. This enables you to use SAS procedures and program statements to process data values stored in these files without the cost of converting them into SAS data files. This paper will compare the different database management systems using the SASjACCESS® interface. * extract data from a database management system and place it in a SAS data file using the ACCESS procedure or the DATA step * load data into a database management system using the DB LOAD procedure * update data in a database management system using the SQL procedure. SAS/FSP® software, and the APPEND procedure. INTRODUCTION OVERVIEW OF USING THE INTERFACE This paper discusses the purpose of the SAS/ACCESS interface software. It will give you an overview of the interface software and describe the procedures that make up the software. The intent of the Mulitple Engine Architecture is to access your data from your database management system (DBMS) directly, without having to create an intermediate SAS data file. Is the SASjACCESS interface software similar across the different software vendors? The differences, if any, are discussed. The SASjACCESS interface consists of three parts: * the ACCESS procedure. which you use to define the SASjACCESS descriptor files * the interface view engine, which enables you to use data from your database management system in SAS programs in much the same way you use SAS data files * the DBLOAD procedure, which enables you to create and load databases using data from SAS data sets. PURPOSE OF SAS/ACCESS SOFTWARE SASjACCESS software provides an interface between the SAS System and a database management system. The Multiple Engine Architecture was designed to meet several goals, including a transparent method of accessing data stored in a database management system. The SASjACCESS interface inc1udes an interface view engine for reading and writing databases. It also includes the ACCESS procedure for defming a new type of SAS file called a descriptor file and the DBLOAD procedure for creating and loading databases. You can perform- the following tasks with the SASjACCESS interface: The ACCESS procedure enables you to describe data from your database management system to the SAS System. You store the description in SASjACCESS descriptor files, which you can use in SAS programs just as you would use SAS data files. You can print, plot, and chart the data described by the descriptor files, use them to create other SAS data files, and so- on. The interface view engine is an integral part of the SASjACCESS interface, but you do not have to deal directly with the engine. The SAS System automatically interacts with the interface view engine when you use data from your database management system in your SAS programs. You can simply use data from your database management system just as you would use SAS data. * create SASjACCESS descriptor files using the ACCESS procedure * access data directly in a database management 580 Engine Selection Window lists all the available Version 6 SASjACCESS interface products. installed on your machine. This window appears only if you are creating an access descriptor and you have more than one Version 6 SASjACCESS interface product installed on your system. If you have only one Version 6 SASjACCESS interface product. the Engine Selection Window does not appear and you go directly to the Access Descriptor Identification Window. The DBLOAD procedure enables you to create and load databases using SAS data. SASfACCESS DESCRIPTOR FILES To use the SAS/ACCESS interface with your database management system software. you must define special files that describe the database and data to the SAS System. These files are called SAS/ACCESS descriptor files. The two types of descriptor files are * access descriptor files (member type access) Access Descriptor Identification Window begins the process of creating an access descriptor. You use this window to specify the database you would like to access. If all the fields are filted in correctly, the information is processed and the Access Descriptor Display Window appears. * view descriptor files (member type view). An access descriptor contains information about the database you want to use. The information includes the database name, the items description, and their data type. You use the access descriptor to build view descriptors. You can think of an access descriptor as being a master descriptor file for a single database because it contains a complete description of the database. Access Descriptor Display Window is used to define, edit, or browse an access descriptor. You can choose SAS variable names, formats, informats, ·and lengths, and drop any items you do not want to appear in the View Descriptor Display Window. A view descriptor defines a subset of the data described by an access descriptor file. You choose this subset by selecting particular items. and you can specify selection criteria that the data must meet. In most cases you can even specify a sequence order for the data. After you create your view descriptor files, you can use them in a SAS program to read the data directly from the database or to extract the data and place them in a SAS data file. Typically, for each access descriptor that you define, you have several view descriptors, each selecting different subsets of data. View Descriptor Display Window is used to derme, edit. or browse a view descriptor. You choose SAS variable names. formats, informats, and lengths if the access descriptor has not assigned them with the Assign Names = YES option. Selection Criteria Entry Window appears when the SUBSET command in the View Descriptor Display Window is used. This window is used to specify criteria for selecting logical entries to be presented to the SAS System and to specify a sort order for the data. The SAS System sends the criteria you enter to your database management system software to be processed when you use the view descriptor. ACCESS PROCEDURE The A CCESS procedure is an interactive windowing procedure that enables you to create and edit the descriptor files used by the SAS/ACCESS interface to your database management system. The procedure consists of the following windows: DBLOAD PROCEDURE Access Window enables you to create access descriptors and view descriptors. as well as edit these files and extract data by placing them into a SAS data file. The Access Window lists all the files and their associated librefs that are active during your SAS session. The DBLOAD procedure runs in interactive display manager, interactive line, and batch modes. It enables you to create and load a database using data from a SAS data file, from a view created with the SQ L procedure. or from another database (using a view descriptor created with the ACCESS procedure). In some cases you can submit SQL statements (except the 581 SELECT statement) to your database management system for processing without leaving your SAS session. The DBLOAD procedure enables you to SAS /A CCESS software, an¥ SAS procedure, as well as the DATA step. can access data from a database management system directly, without having to create an intermediate SAS data file. Version 6 also gives you the option of extracting data from your database management system. if that is better suited for your applications. The SAS!ACCESS software is in production for the following database management systems: * create a new database definition only * create a new database defmition and load data * load new logical entries into an existing database * SYSTEM 2000® Data Management Software lit insert new data records into existing logical entries. * DB2"" The DB LOAD procedures consists of the following windows: * SQL/DS'" * ORACLE® Engine Selection Window lists all available Version 6 SASjACCESS interface products installed on your machine. This window appears only if you have more than one Version 6 SASjACCESS interface product licensed on your system. If you have only one interface. this window does not appear and you go directly to the Load Identification Window. Load Identification Window is used to identify the location of the input data and specify information on the database that you want to create. Load Display Window is used to specify the items and data type associated with each SAS variable. You can also choose to load a subset of your input data. * RdbjVMS"" * Prime INFORMATION"" OPERATING SYSTEMS SYSTEM 2000 data management software is SAS Institute's hierarchical database software for mainframe computer systems running under MVS and CMS operating systems. DB2 is an IBM® relational database management system that runs under the MVS operating system. SQL/DS is an IBM® relational database management system that runs under the eMS operating system. Query Entry Window can be invoked from either the Load Identification Window or the Load Display Window. The Query Entry Window enables you to submit SQL statements to your database management system without leaving the SAS System. ORACLE is Oracle Corporation's relational database management system that runs under the VMS™ operating system. MULTIPLE ENGINE ARCHITECTURE Prime INFORMATION is an information management system that runs under the PRIMOS® operating system. RdbjYMS is a relational database management system created by the Digital Equipment Corporation (DEC®) that runs under the VMS operating systems on VAXDl machines. In Version 5 software, you had to use a procedure to extract data from your database management system and place it in SAS data file before you could use the i data in a SAS program. In Version 6 of the SAS System, the Multiple Engine Architecture was designed to provide transparent access to data st~red in database management systems of other vendors. In Version 6 a CREATING AN ACCESS DESCRIPTOR The ACCESS procedure allows you to create and edit descriptor files and create output SAS data files. When 582 creating an access descriptor file. certain fields must be filled in the Access Descriptor Identificaion Window. The following fields are common in the interface software for the database management systems: program environment. Database enter Ibe name of Ibe SYSTEM 2000 database you want to use. Library is automatically filled in with the libref specified in the PROC ACCESS statement or in the Access Window command line. This can be modified. DB2 Authorization ID is filled in with the authorization ID for the table. If you do not specify an authorization ID, it defaults to your userid. Member is automatically filled in with the member name specified in the PROC ACCESS statement or in the Access Window command line. The member name for the access descriptor being created can be modified. Table Name is filled in with the DB2 table or view name. You must enter a value in this field. SSID is filled wilb Ibe DB2 subsystem ID. If yo'! do not know the subsystem ID. contact your DBA. Type has the value ACCESS because all access descriptors have a member type of access. This field cannot be edited. SQLiDS Assign Names gives the option of standardizing SAS variable names for subsequent view descriptors. The default value NO penuits you to edit SAS variable names. formats, informats. and lengths when you create view descriptors. The value YES gives the ACCESS procedure control of assigning SAS variable names, formats. informats. and lengths. This prevents you from changing them when you create a view descriptor. Authorization ID indicates the authorization ID for the SQLiDS table. The aulborization ID is Ibe SQL/DS userid of the table's creator. If you do not fill in this field. it will default to your userid. Table Name indicates the table or view you want to use. You must enter a value in this field. The following should be filled based on the database management system: User SYSTEM 2000 Data Management Software Password enter a SYSTEM four characters. Display Window components for authorized. 2000 password of one to The Access Descriptor will contain only those which the password is ID indicates the userid for the table or view you want to use. If the table or view was created under another person's userid, you must enter a value in this field. If you fill in this field, you must fill in the Password field. Password enter the password for the userid. ORACLE Multi-UserTM specifies the execution environment that the SYSTEM 2000 database exists in. YES, Ibe default, specifies that the database exists in the SYSTEM 2000 Multi-User environment. NO specifies that the database exists in a single-user environment, that is, your SAS Table Name enter the ORACLE table or view name. You must enter a value in this field.User 583 Name enter your ORACLE username. Password enter your ORACLE password. The password is not displayed on the display. Member specify a member name for the view descriptor. Rdb/VMS Type Database specify the location and name of the Rdb/VMS database. It is recommended that you use the fully qualified VMS filename sothat the access descriptor can be used from any location. You must enter a value for this field. all view Output SAS Data Set: Library and Member used when extracting data from your database into a SAS data me. This is an optional field. The following fields vary based on the database management system being accessed: Table Name specify the RdbjVMS table or view name. Y 00 must enter a value for this field. SYSTEM 2000 Data Management Software Prime INFORMATION File has the value VI EW because descriptors are of member type view. Database taken from access descriptor. cannot be edited. Name Prime INFORMATION specify the filename. You must enter a value in this field; it cannot include the directory specification. This field Password is an optional. nondisplay field. No validation of the password is done at the time.it is entered. An appropriate message will result at the time the database is accessed if the password is invalid. If a password is not specified. one must be provided at the time the database is accessed by a SAS program with a data set option. Vocabulary Path fill in with a fully qualified pathname for the vocabulary file. If the name does not contain a directory specification, then only your current directory will be searched. After you fill in the appropriate fields, and they are correct, the information is processed and the Access Descriptor Display window appears. At this point you are able to select the items you wish to appear when creating a view descriptor. Multi-User specifies the execution environment. DB2 CREATING A VIEW DESCRIPTOR Use the View Descriptor Display \Vindow to define, edit, or browse a view descriptor. You use this window to select items for a view descriptor and to choose SAS variable names, formats, informats, and lengths if Ute access descriptor has not assigned them with Assign Names ~ YES. Authorization Id taken from access descriptor. cannot be edited. This field Table Name taken from access descriptor. cannot be edited. This field SSlD taken from access descriptor. cannot be edited. The following fields are common in the interface software for the database management systems. This field SQljDS Library specify the libref associated with the SAS data library where you want to store the view descriptor. Authorization 10 taken from access descriptor. cannot be edited. 584 This field Table Name taken from access descriptor. cannot be edited. DBMS data in SAS procedures and the DATA step in virtually the same way you use SAS data files. You can also select and combine data described by view descriptors (from other DBMS) using the SAS WHERE statement and the SQL procedure. You cannot updata your DBMS data directly using the DATA step. but you can update a DBMS data using the following procedures: APPEND, FSEDIT, FSVI EW, and SQ L. This field ORACLE Table taken from access descriptor. cannot-be edited. User Name taken from access descriptor. cannol be edited. This field The following are some differences found in the SAS/ACCESS interface Software: This field DB2,ORACLE Rdb,IYMS Database taken from access descriptor. cannol be edited. This field .Table Name taken from access descriptor. cannot be edited. This field Any ORDER BY clause associated with the view descriptor is ignored by the FSEDIT procedure. The data are not presented for editing in any particular order . DB2, SQL/DS, ORACLE, and RdbjVMS In SAS/FSP procedures. scro1ling backward through data described by a view descriptor is less efficient than a SAS data file because the DBMS' cursors process rows first-to-Iast only, not last-tofirst. For example. suppose the table has 5.000 rows and the current row is 3,400. To scroll backward to row 3,399, PROC FSEDIT must sequentially pass rows 1 through 3.398. Prime INFORMATION File Name taken from access descriptor. cannot be edited. This field Vocabulary Path taken from access descriptor. cannot be edited. This field Prime INFORMATION You cannot update Prime INFORMATION data through the SASjACCESS Interface. Once the fields are filled correctly and the items are selected for your view descriptor? you have another option. The SUBSET command displays the Selection Criteria Entry Window. Use this window to specify criteria for selecting logical entries to be presented to the SAS System and to specify a sort order for the data. The SAS System sends the criteria you enter to your database management software to be processed when you use the view descriptor. CREATING AND LOADING YOUR DATABASE The DBLOAD procedure enables you to create and load a database from a SAS data file or data described from a view descriptor. The following fields are common .in the interface software of the database management systems: Note: The SUBSET command is not available in the SAS/ACCESS interface to Prime INFORMATION. Input Data specifies the name of the input SAS data set. If you do not enter a value. the default is the last data set that was created. USING YOUR DBMS DATA IN SAS PROGRAMS Once the descriptors are created you can begin to use them in your SAS programs. The advantage of the SAS/ACCESS interface to your DBMS is that it enables the SAS System to read and write your 0 BMS data directly from SAS programs. You can use your Access Descriptor specifies the name of the access descriptor to be created when the procedure creates the new 585 database. Authorization ID specifies the authorization 10 for the new table. If you do not enter a value. it defaults to your userid. The following fields vary based on the database management system being accessed: SYSTEM 2000 Dala Management Software Table Name specifies the name of the new table. You must fill in this field because there is no default value. Database View specifies the name of the view descriptor you want to create. based on the new database. The view descriptor name must not already exist. This field is optional. ssm specifies the DB2 subsystem 10. You must fill in this field because there is no default. Password specifies the password that will become the master password for the new database. When you enter the password. it does not appear on the screen. In Oatabase;Tablespace specifies the name of the table space in which you want the new table stored. Commit Frequency specifies the number of inserts you want issued before a commit is done. An entry of o indicates you want only one commit performed after all inserts have been issued. The default value is 1000. Database Name specifies the name of the new database. This field must not be blank. Multi-User specifies the access mode. The default (NO) indicates a database will be created in a single-user environment. YES indicates a Multi-User environment. Error Limit specifies the number of SQL errors you will allow without stopping the load. The default value is 100. Label specifies whether you want to use the SAS label information for item names (yES). The default is NO. which uses the 8-character SAS variable identifers for item names. This field is only used when creating a new database. SQLJDS Same fields specified as in D B2 software. In SQLJDS Ibe term In DBSPACE is used in place of In Database,rrablespace. ORACLE Create only specifies whether you want to load any data. YES means you want to create a new database but you do not want to load any data. NO (the default) means you do want to load data. Input Limit the maximum number of specifies observations you want to load into the ORACLE table. A value of 0 indicates you want to load all observations. The default value is 5000. DB2 User Input Limit specifies the maximum number of observations you want to load into the DB2 table. A value of 0 indicates you want to load all observations. The default value is 5000. Name specifies the ORACLE username. If usemame and password are left blank. ORACLE's automatic operating system OPS$sysid option is used. Password specifies the ORACLE password. 586 Table Name specifies the name of the new table. You must filJ in this field because there is nO default value. default value is 100. Prime INFORMATION Thc SASjACCESS interface to Prime INFORMATION does not contain a DBLOAD procedure. SPACE specifies the name of the ORACLE space definition where you want the new table stored. Once the information has been processed from the Load Identification Window. you proceed to the Load Display Window. If you want to load a subset of your input data, you can do so from this window. The following are the common fields that exist in the DBMS. Commit Frequency specifies the number of inserts you want issued before a commit is done. An entry of o indicates you want only one commit performed after all inserts have been issued. The default value is 1000. Func Error Limit specifies the number of SQL errors you will allow without stopping the load. The default value is 100. enables you to control whether a SAS variable is loaded into the table (database). D drops a column (item) from the load. S selects a previously dropped column {item) with default settings. Column (Component) Name lists all column (component) names for the new table (database). The names default to the SAS variable names. but you can change them. Input Limit maximum number of specifies the observations you want to load into the RdbjVMS table. A value of 0 indicates you want to load all observations. The default value is 5000. Column Type lists the data types for each column The values are based on the SAS variable formats, but you can change them. This field is not specified in the SASjACCESS interface to SYSTEM 2000 Data Management Software. Database is the name and physical location of the database in which you want the new table stored. If you do not fill in this field, the table is created in the default database, if you have designated one. Nulls indicates whether the column accepts null values. The default value is Y. but you can change it. This field is not specified in the SASjACCESS interface to SYSTEM 2000 Data Management Software. Table specifies the name of the new table to be created and loaded. You must fill in this field because there is no default value. SAS Name lists the SAS variable names from the input data set. You cannot edit this field. Commit Frequency specifies the number of inserts you want issued before a commit is done. An entry of o indicates you want only one commit performed after all rows have been inserted. The default value is 1000. (SAS) Format lists the SAS variable formats from the input data set. You cannot edit this field. The foUowing fields vary based on the database management system being accessed: Error Limit specifies the number of VAX SQL errors you will allow without stopping the load. The 587 SYSTEM 2000 Data Management Software Space is filted in with the space definition name specified on the Load Identification Window. You may edit this field. Database is filled in with the database name that was specified in the Load Identification Window. RdbjVMS Lvi is the level number for the SYSTEM 2000 item. The default value is 0, but you can change it. The level number must he an integer from 0 through 9. Database is filled in with the database name and location specified in the Load Identification Window. You may edit this field. Index allows you to specify the item as a key If you type in the character Y in this the item will become a key (indexed) any other character or a blank means key_The default is non-key. item .. field, item; non- Table is fiUed in with the table name specified in the Load Identification Window. You may edit this field. Althougb the main purpose of the DBLOAD procedure is to create and load your table (database), there are other options. DB2 Table SYSTEM 2000 Data Management Software is filled in with the table name that was specified in the Load Identification Window. You may edit this field. To add data to an existing database, you must issue the VIEWDESC= statement. If you are adding new logical entries to the database, you can specify the S2KLOAD statement, which causes the procedure to use optimized load processing. Database/Tablespace is filled in with the database or table space name specified in the Load Identification Window. You may edit this field. When creating a new database, the procedure always creates an access and view descriptor. SQLjDS Table is filled in with the table name that was specified in the Load Identification Window. You may edit this field. The DBLOAD procedure allows you to have records at multiple levels, but they must be on the same path. If you have disjoint. schema records, you must create the database definition outside of the DBLOAD procedure. DBSPACE is filled in with the database or table space name specified in the Load Identification Window. You may edit this field. The following applies to DB2. SQLjDS. ORACLE and RdbjVMS: You can use the procedure to submit SQL statements without creating and loading a table. The Query Entry Window can be invoked from either the Load Identification Window or the Load Display Window. The SQL statements you enter must not refer to the table being created because it does not exist. Otherwise, you can enter any valid SQL statement except the SELECT statement. (However. you can enter the SELECT statement as a substatement within another statement. If you have stored your SQL statements in an external file. you can use the INCLUDE command to copy ORACLE User Name is filled in with the ORACLE user name that was specified in the Load Identification Window. This field cannot be modified. Table Name is filled in with the table name specified on the Load Identifi~ation Window. You may edit this field. 588 them to the window. CONCLUSION The Multiple Engine Architecture's main purpose was to access data stored in a database management system QUEST PROCEDURE through a transparent method. The naming convention The QUEST procedure is the fourth part of the SASjACCESS interface to SYSTEM 2000 Data Management Software. The QUEST procedure allows you to access a SYSTEM 2000 database directly, that is. without using a view descriptor. The procedure is basically a messenger for SYSTEM 2000 statements: when you submit a statement in the QUEST procedure, the SAS System scans the statement and passes it to SYSTEM 2000 software. which then executes iL SYSTEM 2000 software includes an interactive language. also called QUEST. that is used for creating. browsing, updating, and managing SYSTEM 2000 databases. The QUEST procedure gives you full access to that language, either from the display manager. interactive line-mode sessions, or balch mode. This procedure enables you to access SYSTEM 2000 software to perform a variety of tasks, for example. and the design of the SASjACCESS interface software was kept very similar, if not the same, across the different DBMS vendors. Through the ACCESS procedure you can create -descriptors to access your data. Data that can be stored in hierarchical or relational databases. The data may be handled differently based on the DBMS used. but when that data is used with your SAS procedures, the result is the same. The DBLOAD procedure enables you to create and load tables (databases) from your existing SAS data files. Most businesses have more than one DBMS installed, and that is why it is so important to keep the terminology similiar. And by using the SASjACCESS interface software, you can now generate reports from different DBMS through the SAS System. SAS, SASjACCESS, SASjFSP, and SYSTEM 2000 are registered trademarks and Multi-User is a trademark of SAS Institute Inc., Cary, NC, USA. * retrieving data from a database * updating data in a database * defining a new database * assigning passwords to a database * saving a database IBM is a registered trademark and DB2 and SQLjDS are trademarks of International Business Machines Corporation. * restoring a database * enabling rollback. RdbjVMS, VAX, and VMS are trademarks of Digital Equipment Corporation. PRIMOS is a registered trademark and Prime INFORMATION is a trademark of Prime Computer, Inc. ORACLE is Corporation. 589 a registered trademark of Oracle