Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concurrency control wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
INTERFACING NORMALIZED RELATIONAL DATABASE STRUCTURES WITH SAS@SOFTWARE James R. Johnson, Glaxo Inc. Roger D. Cornejo, Glaxo Inc. Abstract (DML) for Selecting, Inserting, Updating, and Deleting rows and columns of data from a relational database. SQL is also a data definition language (DDL) for creating and deleting database objects (table1s, views, indexes, synonyms, etc.). Additionally, SQL is a data control language (DCL) where user access to' the database and its tables and views are granted (Grants). SQL is not a procedural language since it does not offer programming constructs such as logical sub setting with IF-THEN-ELSE or CASE statements, or looping with DO-END, DO-WHILE, or DO-UNTIL programming constructs. SAS® Software, in contrast, does have a procedural language component in the DATA step that supports logical and looping programming constructs. The library of SAS® Software procedures is internationally accepted as one of the most comprehensive tool sets of pre-packaged analysis routines available for application developers and end-users. SAS/ACCESS® software specifically offers an interface to relational databases, thus providing the application developer and end-user with a tool for taking advantage of the strengths of relational database technology and the procedural language components of the SAS System to solve various business needs. The use of relational database (RDBMS) technology and different levels of normalization (1st, 2nd, 3rd, 4th normal data structures) is proliferating throughout the data processing industry. RDBMS systems are valued for their ability to maintain the integrity of data, reduce unnecessary data redundancy, and provide maximum flexibility in retrieval. At the same time SAS@ software is established as the general fourth generation language (4GL) tool for data analysis and reporting. Clearly, the use of relational databases and SAS® software should be included in the ideal tool kit for systems development. Use of the different and distinct forms of normalized data structures, within an RDBMS, impacts the creation and use of SAS data sets. Well designed ROBMS applications typically result in normalized tables that remove repeating data groups, duplicated data, and establishes appropriate key to data associations within a table. Many of the powerful features in the SAS system utilize data sets that are not normalized, thus presenting several interfacing problems. This dichotomy in each products usage need not force a compromise in ~ach products strengths. In this paper we will discuss four different normalization forms used in relational databases, as well as presenting methods of interfacing with three normalized data structures using SAS® software. It is the intent of this paper to discuss the varying degrees of a normalized database, through a real world example, and how the use of FROC SQL, the DATA step and SAS System procedures call effectively be used to access and analyze information stored in these data structures. Introduction Throughout the 1980's the implementation of relational theory in the design and development of database management systems has been evident with the introduction of such products as DEC's RdB®, IBM's DB2®, and Oracle Corporation's ORACLE RDBMS®. Each of these database engines utilize the basic relational principle that tables of information can be defined which obey constraints, called relations, as outlined by Date (1981). Relational database technology offers application developers and end-users with a tool that can improve the usefulness and quality of data by organizing the information into varying degrees of normalized tables which will better meet the business needs of the application being developed. Normalization Generally speaking, normalization is a formal process of organizing data into table~ (relations) of logically related information that satisfy conditions defined for the various normal forms (Kemm 1989). The normalized database environment is a means by which an organization can: Provide a base of all data elements relevant to the business requirements. Provide a processing environment that makes these data elements easily available to all appropriate users, both current and future. Ensure data integrity in all of the data elements. Provide a stable, reproducable, highly flexible, and standardized data architecture to meet the clearly defined business needs. Optimize the performance of the database. One of the problems that has persisted throughout the development of relational technology is the lack of procedural data analysis and reporting tools for application developers and end-users to interface with a relati.onal database. SQL is the industry standard data manipulation language 421 In poorly designed relational database applications (non-normalized) the developer and end-user will often find that some or all of the following anomalies may take place. how the rules of normalization can be applied as well as demonstrate access to the data via PROC SQL and/or the DATA step. Access to this type of table structure is as simple as creating a descriptor file and view member of a catalog using SAS/ACCESS@ Software. The SAS/ACCESS to Oracle procedure was utilized for this paper. The rules and presentation could, however, be generalized for use with access to the Rdb and DB2 relational data base environments. un-related data elements may be placed into tables together. data elements may appear as repeated groups of information (e.g. un-indexed arrays of data elements) data element values may be stored as null or blank for selected types of rows in the table. referential integrity problems may exist that can result in a corrupt database. Table 2. Non-NoUlalized Data Table •. Non-Normalized Data Table for Drug Survey Example. information is contained in one data table. It is the goal of the normalization process to reduce and in most cases eliminate these features of poorly designed applications. Normalization techniques were created to provide database designers with a methodology for detecting and preventing these problems. Normalizing an application will remove un-controlled redundancy of data elements, reduce the amount of structural changes to table designs, and favor the transaction process for inserting, updating, and deleting rows in a table. Null? *PATIENT *VISIT VISIT_DATE No repeating groups of data elements. All non-key columns dependent upon the entire key. 3NF All non-key columns dependent on the enti.re key and have no other dependencies. No Donkey column is dependent on another non-key column. 'NF No row i.n a table has more than one multivalued d.ependency. CRAR(6) NUMBER MARTIAL_STATUS TEST_l TEST_2 TEST 3 CHAR(7) CHAR(6) NUMBER NUMBER NUMBER NUMBER NUMBER CBAR(1) NOT NULL NUMBER CBAR(7) NUMBER BP_DYS NUMBER NUMBER lIEART RATE NUMBER BP_SYS * .. primary key This non-normalized table structure looks similar to the "file" structures historically utilized to store and retrieve information on a particular application. In the example, all repeating groups, infODmation regarding individual visits, and information about the investigator are physically stored together in a single row of the table. Also note that redundant information on data elements collected once per patient are proliferating throughout the table. Access to this type of non-normalized table can be accomplished with the following simple PROC SQL, DATA step, or procedural examples. Note that in each case the SAS statements are reading directly from an ORACLE database table to complete the task requested. The FROC SQL example will automatically read and display the information requested in a single step, with no intermediate data set being fODmed. Rule for Existence 2NF SEX AGE AGK_ONITS TEST_4 TEST_S Table 1. states of Normalization in a Relational databas<:I. * 1NF Type NOT NULL NUMSER NOT NULL NUMBER NOT NULL DATE CHAR{~) Many developers have arrived at INF, 2NF, 3NF database designs just by applying a common senSE understanding of the data elements which are to be stored in their applications. Nevertheless, there are distinct levels of normalization and rules for how each level can be applied to meet the business needs of the database being designed. Table 1. lists the various states of normalization to be discussed in this paper, as well as a simple rule for its existence. NOODalization Form All Other normali~ation states have been defined. but are not important to tbe discussion in this paper. Note that the naming conventions in the programming statements are different than the naming of columns in the respective tables. SAS/ACCESS@ Interface to Oracle software was used to create descriptor files to the databases~ Naming conventions in the access procedures must follow traditional 8 character naming conventions used historically in SAS software. Non-Normalized Data Structure A sample, single table, database that does not apply the rules of nonnalization defined in Table 1 is presented in Table 2. The sample data set for this table, as a line listing, is presented in Appendix 1. This example will be used to show 422 libname oracledb The normalization process will generate tables that generally favor update processing over the query process. Sometimes this is not a desirable situation for the application. For example, in tables utilized for query only, that are built and maintained by a loading utility, lNF representation may be sufficient to support the application. It is often desirable in query-based applications to de-normalize the table for performance reasons (e.g. increase the access speed to the information). This can result in an increase in the redundancy of information in the table, albeit planned redundancy. to [ ) to; proc Itql, Itelect '" from oracledb. alldata; where patient '" 10; quit: or proc print dat ...-oracledb alldata; Where patient .. 10; The DATA step can optionally be utilized to read directly from an ORACLE database table or view to create a traditional SAS data set. Table 3 presents a representation of the sample database where the repeating groups of test scores are moved into another Lable, thus creating a two table database environment. Notice that the TEST_SCORES table assigns the test number to an individual column while the cOLresponding test result is placed in the results column. The keys patient, visit, and test number are utilized to uniquely identify each record in the TEST_SCORES table. data test; set oracledh. alldata; if patient .. 10; Clearly, the non-normalized data structure is not desirable for the problems that are associated with update and maintenance or the table (including the addition of a new test), excess and redundant storage of information (see appendix 1), an,d the poor performance that would be seen when sorting and querying the example. This sample data -set, along with almost every type of data storage and retrieval application can be analyzed and stored within a relational database management system using normalization techniques that follow in the discussion. Table 3. First Normal Form (lNF) INF Data Tables for Drug Survey Example. Repeated groups of test scores are partitioned out into a test scores table. All other information resides in the patient table. Table Name: PATIENTS Null? First Normal Form (1 NF) Data Structure *PA'1IEN'1 *VISI'1 VISIT_DATE ..x First no~l form (INF) provides for no repeating groups of information. This means that no array ~tructure~ (e.g. Al, A2, A3, . . . ) .::Ihould be utilized in the table design. It also means that all repeating groups of information are moved into their own separate table (along with the primary key columns) and values for Al, A2, A3 are stored as rows rather than columns. The primary reason for this form is that in most cases it is difficult to predict the number of repeating elements that may be utilized, and adding another row to a table is trivial, while adding another column is not an elementary task. Additionally, if we need to perform operations on the values for AI, A2, A3, ___ in the nonnormalized table the operation needs to be duplicated for each repeat group rather than coded once (e.g. Al*10, A2*lO, A3*lO, ••. , versus A*lO) NO'1 NULL NUMBP.R NOT NULL HUMBER NOT NULL DATE CHAR (6) NUMBER CHAR(6) RACE MAR'1IAL_ S'1ATUS STUDY DRUG CHAR(l) CRAR(1) cHAR(l) NO'1 NULL NUMBER CHAR (1) NUMBER NUMBER BP_SYS BP_DYS HEAR'1_AA'1E * '" NUMBER NUMBER primary key Null? *PATIEN'1 *VIS:tT *TES'1_NUM TEST_VALUE * '" On occasion INF can be violated when the number of repeating elements is fixed either by definition or business rule (e.g. fixed monthly columns such as JAN, FEB, MAR, . • . ). Denormalization may also. at times be utilized for performance reasons to create valid INF violations. However, very clear, defined business or performance rules must be predefined to control the violation of INF database architecture. Type NO'1 NULL NUMBER NO'1 NULL NUMBER NUMBEl< NUMBER primary key Access to the collective set of information in the INF patient and test scores tables is completed via a simple join of the tables using the SQL join construct in the WHERE clause. The SORT procedure and the DATA step may also be utilized to collectively evaluate the join of the two tables in a similar fashion. However, using traditional SAS programming constructs will create intermediate data sets in either a 423 permanent or work library, dependending upon how they are specified. Clearly, the use of the SQL procedure to join and display the information in a single step provides a more robust single step processing environment than the multi-step procedures and DATA step environment. rules of 2NF. A measurements table that uniquely identifies the visit# and associated measurements collected at a particular visit, is created to eliminate the problems of redundant data seen in Appendix 1. All data elements that are collected once and are visit independent are placed in the patients table. For purposes of discussion the ORDER BY clause is a similar statement to the SORT procedure and the WHERE clause completes the same function as a match MERGE/BY statement syntax in the DATA step. The WHERE clause can be extended to generate similar results as a sub-setting IF statement in the DATA step. Table 4. Second Normal Form (2NF) 2NF data tables for Drug Survey Example. Information not associated with a visit is partitioned out into a patient demog table, while information that is visit dependent is placed in a patient measure table. The test scores table is maintained from 1MP. Table Name,PATIENT_DeMOG Access to the 1NF database tables via PROC SQL is as follows: Name Null? *PATIENT CaAR(6) SRX libname oracledb .. 11 " ; proc sql; select a .patient,b. visit, a. invnum, a. sex, a.race, b.testnum, b.result from oracladb. fnpat a, oracledb fnscore b where a.patient '"' b.patient and a. visit = b. visit order by patient, visit; Type NOT NULL NUMBE:R AGE NUMB. . AGE_UNITS CtlAl\(6) RACE CHAR(l) MARTIAL_STATUS STUDY_D:RUG rNV_NUM INV_NAMe INV paONE_EXT CHAA(7) CHAF(l) NOT NULL NUMBER CtlAl\(7) NUMBE:R * '" primary key quit; Table Name, PATIENT_MEASURE Access via traditional SAS programming constructs would be as follows, assuming that the data are not stored in sorted order. Name Null? *PATIENT *VISIT VISIT_DATE BP SYS BP_DYS HEART_RATE proc sort data=oracledb fnpat out"'"pat; by patient visit; run; proc sort data"'oracledh fnscore out=score; by patient visit; Type NOT NULL NUMaE:R NOT NULL NUMBER NOT NULL DATE NUMBER NUMBER NUMBER * '" primary key run; Figure 1. presents a pictorial view of how the sample database is implemented in 2NF form. The key to this database design is in the fact that a one-to-many relationship exists between the patient~demog tables and the patient~measure and test scores tables. It is the implementation of these one-to-rnany relationships that serve as the foundation to good database design. data scores; merge pat score; by patient visit; keep patient visit invnum sex race testnum result; run; proc print data""scores; run; Second Normal Form (2NF) Data Structure Access to the information in each of the tables is obtained using a join, similar to the INF query, except that additional WHERE clause criteria is required to link the patient measure data with patient demography data via the patient key. The SQL statements below present the logical join required to read selected information from each of ~he tables in a 2NF implementation of the database. As with the 1NF join the example below will complete the join and display the results in a single step. Second normal form (2NF) provides that all of the columns in a table are dependent upon the entire key of the table. Any column of information in a 2NF designed table should require all elements that comprise the tables primary key to uniquely identify it. If the column can be uniquely identified by only part of the key then that column, and the relevant keys, should be separated out into another table. More formally, a table is in 2NF if it is in INF and there are no columns which are dependent on only part of the primary key. lihnatne oracledb "[]"; proc sql; select a .patient, a. sex, a. race, a. studydg, a invname, b. testnum,b .rasult, c.visit, c. visitdt, c .bysys, c.bpdys from oracledb. snpat a, oracledb. snscore b, oracledb.snrslt c The information regarding repeated measurements of blood pressure and heart rate are visit dependent, and as such should be partitioned out into an individual measurement table to meet the 424 from oracledb . .,npat a, oracl...db cns",ora b, oracledb. snrs.lt c where a.patient = b.patient and a.patient e.patient and b patient c.patient b.visit c.visit and quit; where a.pati.ent - b.patient and a.patient c.patient and b. patient c.patient c.visit and b.v:!.sit order by a.patient, b.visit, b.testnum - quit; To accompllsh the same task, as in the above SQL statements, the following basic DATA step and procedure logic could be utilized to complete the join of these three tables. This traditional implementation of SAS programming steps assumes that the data are not stored in sorted order in the tables. , Once a view of the database is created and stored in the catalog then the complete power of the SAS System can be ut..:i;lized to access, analyzer and report on information stored in the database as if it were a SAS data set. For example, simple statistical procedures can be executed directly against the database using SAS procedures. The fOllowing example reads directly from the database view defined as SNTABLE, created above, to complete a means and frequency distribution computation. libname oracledb "[]"; proc sort data=oracledb snpat out=pat; by patient; proc sort data_oracledb.snscore out=seore; by patient visit testoum; proc means n mean data=oracledb. sntable; by patient visit; var result; run; run; proc 8<Jrt data",oracledb snrslt proc freq data"'oracledb.sntable; where visit'" 1; tab.le atudydg*sex*race; out~rslts; by patient visit; data JllUltipat; merge score (in=in1) rslts (in~in2) pat (in"'in3); by patient visit; if in1 and in2 and in3; run; Advantages of Views into Databases run; Creating and utilizing views offers several advantages when working with relational databases. These advantages are not limited to 2NF fODffi architectures and should provide benefit across all forms of database design. Advantages include: proe print data=sntable; Access to a 2NF database architecture need not be as cumbersome as in the. above example, it can be as simple as reading from the traditional SAS data set structures utilized historically (e.g. the single SAS data set) by creating a view (stored SQL statement). The ability to create a vlew of the joined information and access the view directly, with either the DATA step or a procedure, is a new feature of SAS® Software that will allow for greater speed of retrieval and much more robust access to data in a normalized database. Now applications can utilize the procedural language programming constructs that have been time tested over many SAS® Software applications with views directly into a database as wel~ as the traditiona~ OAO data set. The following example creates a view of the infoDffiation and permanently stores this view in a catalog. Note that this example creates a view of three tables of information in a manner that is consistent with looking at the data in lNF. Once this type of view of the database is made available to the end-user then all of the components of the SAS System become available to an application. savings on disk space as views are virtual usage o'f the machine. Views are nothing more than stored SQL statements (virtual tables); no disk space is used to store the data that is "project/joined" by the view. data will always be current in that the database is accessed directly each time a query'is made. data can be shared among more users without the creation of additional of data sets. many different views or combinations of views can be created to grant access to all or selected information in the database. Third Normal Form (3NF) Oata Stnocture Third Normal Form (3NF) eliminates all columns of information in a table that are uniquely identified by data items in the table other than key data columns. All non-key columns are dependent upon the entire key (2NF) and have n~ other field dependencies. This latter type of dependency is referred to as a transitive data dependency (Date 1981). Columns that have a transitive data dependency are removed, along with their respective-identifiers r into a separate table. Most database designers aim to lihnama oracledb "[]"; proe sql; create view oracledb. sntable as select a.patient, a. sex, a. race, a. etudydg, a invname, b. teetnum,b. result, c.visit, c .v:is:itdt, '" .by"ys, c. bpdy" 425 one investigator is associated with many patients, and a one-to-many relationship exists between the patient~ table and the patient_measure and test scores tables. The projection from 2NF to 3NF offers greater flexibility in database design in that information is discretely grouped into logical tables of infor.mation that are properly related. create a 3NF database environment. The Boyce Codd Normal Form (BCNF) is an extension of 3NF. BCNF diffe~s f~om 3NF in that it applies to candidate keys as well (Date 1981). In the 2NF implementation of the sample database a transitive dependency exists in the patient demog~aphics table. The information associated with an investigator (inv num, inv_name, inv_phone) is not totally dependent upon the components of the prima~y keys in the 2NF patients table. In fact, the investigato~s information is independent of the patients information, but a patient cannot be without an investigator. Therefore, the information regarding an investigator is partitioned out into its own table thus eliminating the transitive dependency. The identifying key in the investigators table (inv_num) is left as a column in the patient data table to allow for the identification of the investigators information associated with that patients information. This is referred to as a foreign key. Acces~ to the 3NF databa~e is the same as with the 2NF database design. Additional WHERE clause c r i t e r i a are required to address the investigators table. The example below demonstrates the join of four tables to create a view into the sample database in 3NF form. 1ibname oracledb .. (]"; proc sql; create view <:>racledb.tntable select a. pat i6nt, a. sex, a. race, a studydg b. teatnum, b.result, c. visit, c.visitdt, c.bysys, c .bpdys, d. invname from oracledb.tnpat a, orael..db.tnscore b, oracledb.tnr.s1t c, oracledb.tninv d where a.patient = b.patient and a.patient '" c.patient and b.patient .. c.patient and b.visit c.visit and a.invnum 'Z d.invnum ; quit; The 3NF database design can implemented as shown in table 5. Table 5. shows that an investigators table is created and that the patients demographics table is altered to allow for the investigator number to be a foreign key. The transitive dependencies of investigator name and phone number are eliminated from the 2NF implementation. All other lNF and 2NF table designs are retained. Tabl.. 4. Access to a 3NF implemtation of the database using traditional SAS programming (DATA and ~ROC steps) constructs would be similar to the 2NF database implementation. Additional sorting and merging would be required to match merge the investigator with the patient. The traditional SAS programming steps to match merge these tables' of information is not being presented because as a database is fully normalized the benefits of using views becomes obvious with the saving of machine and programming resources. Third Normal Form (3NF) 3NF data tables for Drug Survey Example. The information from the inve3tigator is partitioned out into a separate tab1e. The test Bcot"es table is maintained from INF, and the measur.. table is retained from 2NF. Table Name: INVESTIGATORS Null? "'INV_NUM Type NOT NOLL NUMBER INV_NAME INV_PHONE_EXT CHAR (7) NUMBER Fourth Normal Form (4NF) '" = primary key Fourth normal form (4NF) provides for the decomposition of a relation into two or more projections (e.g. subset tables) if the relation has multi-valued dependencies that are not functional dependencies (Date 1981, Watterson, 1989). Simply stated, if a table has more than one column (or sets of columns) which are independently dependent upon the key then these column(s) and key should be partitioned out into a separate set of tables. It is not the intent of this paper to discuss the architecture, access, or retrieval from this form of normalized database in the examples presented. This definition is presented to demonstrate that the relational model can be extended beyond the traditional 3NF forms that is normally used by database designers. Table Name:PATIENT_DEMOG Name *PATIENT SEX Null? Type NOT NULL NUMBER AGE AGE UNITS CHAR(6) NUMBER CHAR(6) RACE CHAR(l) MARTIAL_STATUS STUDY DRUG CHAR (7) CHAR(l) NOT NULL NUMBER '" '" primary key Figure 2. presents a pictorial view of how the sample database is implemented in 3NF form. The keys to this database deSign are in the fact that 426 Transposing a Normalized Data Structure Summary One of the strengths of a normalized relational database outlined thus far is the elimination of repeating groups (e.g. arrayed structures). This In this paper a discussion has been presented on the use of different degrees of normalization using a relational database and the interface with SAS@ Software's procedural language components. The use of a relational database can improve the quality and accessibility of information requirements in many different applications. By applying normalization techniques, application developers and end-users alike can take advantage of the str~ngths of storing data in the various normal forms (lNF, 2NF~ 3NF, or 4NF). These strengths include, but are not limited to, the elimination of redundant data~ broader access, and improved utilization of disk space. with the introduction of SAS/ACCESS@ software the application developer and end-user now have the ability to fully integrate the strengths of SAS programming constructs (DATA step, Macro facility, and statistical procedures) with a relational database. The combination of these two application development software packages provides a tool set for systems developed in the 1990's. aspect of normalization is carried from the very first step in applying normalization techniques to database design. Figure 1 clearly shows that the TEST_SCORES table provides for three keys to fully describe the individual rows in the table (patient, visit, and test_num), thus eliminating the array constructions. Often programmers want to present or analyze information structured with repeating groups utilizing the strengths of array processing. An application can s t i l l take advantage of these array processing features by creating a discrete view that transposes the information. In the sample database a view of the TEST_SCORES table is created that transforms the individual test numbers into columns. For example the following syntax creates a permanent view into the TEST_SCORES table that is the join of the table on itself once for each of the test numbers (1 through 5) . proe aql; create View oracledb.tranpo!!e a$ !!elect tl.patient, tl.vi!lit, tl. result ' test_l' , t2.reault 'teat_2', t3.reault 'teat_3', t4.r .. sult ' t .. st_4', t5.result ' t .. st_5' from anacore tl, /!Inscore t2, t3, snscor" t4, snscore t5 where tl.patlent "" t2.patient and tl.patlent t3.patient and tl.patient t4.patient and tl.patient = t5.pati .. nt and t1.visit - t2.visit and t1.visit t3. visit and tl.vi .. it t4.vi .. it and tl.visit tS.visit and tl.t .. stnum .., and t2. testnum - 2 and t3. t .. atnum = 3 and t4. to!lstnum .. 4 and t5.tO!lstnum = Acknowledgements The authors wish to thank Ms. Nancy Wheeler, and Mr. Robin Pasley for their contributions and review to this paper. References Date C.J. 1981. An IntroductLon to Database Systems. Addison-Wesley Publishing. Reading, MA. 574 pages. . - Kemm T. 1989. A Practical Guide to Normalization. DBMS December 1989 pp:46-52. -, Watterson K. 1989. From chaos to order. Data Based Advisor, Feb. 1989 7(2) :33-37. Contact Information quit; Access to this type of view can utilize the ARRAY statement syntax supported in the DATA step or with the VAR statement in procedures as demonstrated in the following example. James R. Johnson Roger D. Cornejo Glaxo, Inc. MIS - Scientific Computing Five Moore Drive Research Triangle Park, North Carolina 27709 Phone: (919) 248-7341 FAX:J: (919) 248-2571 data _ nl1ll_: aO!lt oraeledb. tranpose; array teats{5} test_1 - test_5; do j=l to 5; if test{j} = . then put "Missing Test Scor.... patient'" +1 visit- ; end; SA.'l, SAS Software, and SAS/ACCESS Software are registered ot SAS Institute Inc., Cary, North Carolina USA. run; DB2 proe print data-oracledb. tranpose; var patient visit test_2 test_4; i~ trademark~ a registerad trademark of IBM Corporation. Rdb Is a r~i ... tered trademark of Digital Equipment Corporation. run; ORACLE ROBMS 1" a registe"",d trademark of Oracle Corporation, B<:>lmont, California USA. proe mean" data=oracledb .tranpoae; var test_1 teat_3 test_5; run; 427 Figur,e 1. Physical Implementation of a 2NF form database for the example. PAllENT.J)BMOG Primory Key .paII..., PrirnooyK.,.. .... ~~ • .....~~"m Figure 2. Physical implementation of 3NF database tables for sample application. PATII!NT..MEASIJlt£ TIIST..s<:Oll1!S INVSSTIOATORS PrlmKyX.y. ~-. ~.~ -inv__ .... ~-- APPENDIX 1. Sample Data for Normalization Examples. Sample Drug/Response SUrvey Information Patient Visit Visit Date Selt 10 10 10 10 10 1 2 01-JAN-89 05-JAN-89 01-FEB-89 15-FES-89 01-MAR-89 50 50 50 50 50 1 4 5 1 2 YEARS C YEARS C YEARS C YEARS C """= SINGLE 2. SINGLE SINGLE 26 21 snmLl!! 20 20 .. " " 40 50 50 55 55 45 97 42 56 55 96 99 41 99 98 99 100 100 99 100 A A 25 B B A A A MARRIED 65 60 78 63 60 30 35 'tEARS 8 77 65 60 37 19 27 YEARS B MARRIED 79 69 MAAR"'D 70 39 42 26 25 B YEARS B 61 63 35 35 35 35 35 YEARS C 45 44 43 30 B 59 100 99 97 61 100 YEARS C SINGLE SINGLE SINGLE SINGLE SINGLE 01-FEB-89 10-FE8-89 26-FEB-89 25-MAR-89 16-APR-89 FEMALE 45 FEKALE 45 FEMALE 45 FEMALE 45 FEMALE 45 YEARS B MARRIED 8 YEARS B YEARS 8 YEARS B MARRIED 15-FEB-89 01-MAR-89 07-MAR-89 MALE MALE 'tlUIRS C YEARS C MALE 55 55 55 15-MAR-89 MALE 55 YEARS C 26-HAR-89 MALE 55 'tlUIRS C SINGLE SINGLE SINGLE SINGLE SINGLE 26-JAN-89 10-FE8-89 25-FEB-89 05-MAR-89 26-MAR-89 40 40 40 40 40 MALE YEARS C FEMALR 25 25 25 25 25 FEMALE FEMALE FEMALE FEMALE MALE MALE MALE MALE MALE YEARS C YEARS c YEARS C YEARS YEARS C MARRIED MA!\RIED MARRIED '7 58 57 66 33 53 63 31 32 " . B B B B 100 A A 90 65 67 67 100 91 100 A 84 100 100 99 " 35 2' 71 27 B 100 61 65 66 61 B 67 . 428 so 5 10 12 83 n 19 16 B5 91 B5 N_ Phone Ext. SyS/Dys . .c_ 01 01 01 01 01 Smith Smith Smith Smith smith 1223 1223 1223 1223 1223 120/12 122/76 134/79 126/12 135/79 75 05 05 05 05 05 DeMarco DeMarco DeMarco DeMarco DeMarcQ 8901 8901 8901 8901 8901 129/81 127/11 125/11 121/72 128/80 " " 01 01 01 01 01 Smith Smith smith Smith smith 1223 1223 1223 7223 1223 135/68 135/69 .2 134/77 80 10 10 10 10 10 Yourdon Yourdon Y ourdon Yourdon Yourdon 5512 5512 5512 5512 5512 161/80 165/72 166/82 165/88 178/90 "82 05 05 05 05 05 DeMarco DeMarco DeMarco DeMarco DeMarco 8901 8901 8901 8901 8901 192/99 188/86 115/82 166/19 126/12 75 74 72 74 Study Inv. Drug NUIII. """""'D MAAR"'D 30 30 30 30 30 5 MALE MALE 15 15 15 15 15 os YEARS B 20 4 MALE ----- Test Scores ----- YEARS B 15-JAN-89 01-FEB-89 15-FEB-89 15-MAR-89 15-APR-89 20 20 20 20 MALE Age Martial Age Units Race status A A B B B B B Inv. BP Seart .." 99 97 66 67 75 80 ./. ./. 85 84 84 65