* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database creation and/or reorganization over multiple database
Survey
Document related concepts
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Registry of World Record Size Shells wikipedia , lookup
Ingres (database) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
Calhoun: The NPS Institutional Archive Theses and Dissertations Thesis and Dissertation Collection 1989-06 Database creation and/or reorganization over multiple database backends. McGhee, Deborah A. Monterey, California. Naval Postgraduate School http://hdl.handle.net/10945/25800 - - dOG* NAVAL POSTGRADUATE SCHOOL Monterey, California DATABASE CREATION AND/OR REORGANIZATION OVER MULTIPLE DATABASE BACKENDS by Deborah A. McGhee June 1989 Thesis Advisor: Approved David K. Hsiao for public release; distribution unlimited T244C10 Security Classification of this page REPORT DOCUMENTATION PAGE la Report Security Classification lb Restrictive Markings 2a Security Classification Authority 3 UNCLASSIFIED Approved Declassification/Downgrading Schedule 2b 4 Performing Organization Report Numbers) Name 6a 6b Office Symbol of Performing Organization Distribution Availability of Report for public release; distribution is unlimited. 5 Monitoring Organization Report Number(s) 7a Name of Monitoring Organization (If Applicable) (city, state, Monterey, and ZIP code) 7 b Address CA 93943-5000 8b Office Symbol (If and ZIP code) CA 93943-5000 Procurement Instrument Identification Number 9 Applicable) and ZIP code) (city, state, (city, state, Monterey, 8a Name of Funding/Sponsoring Organization 8c Address Naval Postgraduate School 52 Naval Postgraduate School 6c Address Source of Funding Numbers 1 Program Element Number Project TukNo No Work Uml A. on No Title (Include Security Classification) 1 1 DATABASE CREATION AND/OR REORGANIZATION OVER MULTIPLE DATABASE BACKENDS 12 Personal Author(s) McGhee, Deborah A. 13b Time Covered 13a Type of Report Master's Thesis Supplementary Notation 6 |i 1 To From The views expressed in this 4 Date of Report (year, month.day) 15 Page Count June 1989 105 thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. Cosati Codes .7 Group Field 1 Subject Terms (continue on reverse 8 if necessary and identify by block number) Multi-Lingual Database System (MLDS), Multi-Model Database Systerr Subgroup (MMDS), Multi-Backend Database System (MBDS) 1 Abstract (continue on reverse 9 if necessary and identify by block number To create a record in a database, one uses the INSERT command. However, Backend Database System (MBDS), the insert command only inserts one record in the Multiat a time. When reating very large databases consisting of thousands or millions of records, the use of the [NSERT command Once MBDS a database uses the is is a time-consuming process. •ecords tagged for deletion agged records some of the records of the database may be tagged for deletion. tag these records. Over some period of time, those should be physically removed from the database. Hence, removing created, DELETE command to is in new essence creating databases from untagged records. Distribution/Availability of Abstract :0 X| unclassified/unlimited same 2 as report | DT1C users I !2a Name Prof. )D 1473. 84 MAR Abstract Security Classification UNCLASSIFIED 22b Telephone (Include Area code) (408) 646-2253 of Responsible Individual David K. Hsaio FORM 1 83 APR edition m3y be used until exhausted All other editions are obsolete 22c Office Symbol Code 52Hq security classification of this page Unclassified Unclassified Security Classification of this page 19 Continued: In this thesis, we present a methodology to efficiently create very large databases in gigabytes computers and to reorganize them when they have been tagged for deletion. Specifically, we design a utility program to by-pass the system's INSERT command, to load the data of the database directly on to disks and create all the necessary base data and meta data of the on parallel database. S/N 0102-LF-014-6601 security classification of this page Unclassified Approved for public release; distribution is unlimited. Database Creation and/or Reorganization over Multiple Database Backends by Deborah A. McGhee Lieutenant, United States Navy B.S., University of South Carolina, 1982 Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE from the NAVAL POSTGRADUATE SCHOOL June 1^9, -^ A , r ' nil mi ABSTRACT To create a record in a database, one uses the the Multi-Backend Database record at a time. When System (MBDS), the a database MBDS for deletion. is insert command in only inserts one creating very large databases consisting of thousands or millions of records, the use of the Once INSERT command. However, created, uses the INSERT command some of is a time-consuming process. the records of the database DELETE command may be to tag these records. tagged Over some period of time, those records tagged for deletion should be physically removed from Hence, removing tagged records the database. is in essence creating new databases from untagged records. In this thesis, in gigabytes on for deletion. command, to we parallel present a methodology to efficiently create very large databases computers and Specifically, we design a to reorganize utility them when they have been tagged program to by-pass the system's INSERT load the data of the database directly onto disks and create necessary base data and meta data of the database. IV all the TABLE OF CONTENTS I. INTRODUCTION A. MOTIVATION 1 B. BACKGROUND 2 C. II. 1 1. The Multi-Lingual Database System 4 2. The Multi-Backend Database System 7 THESIS ORGANIZATION THE KERNEL SYSTEM A. 11 THE KERNEL DATA MODEL AND THE KERNEL DATA LANGUAGE B. 8 11 1. The Attribute-Based Data Language 11 2. The Attribute-Based Data Model 14 a. The Meta Data 14 b. The Base Data 15 THE TEMPLATE SPECIFICATION 17 1. Template Descriptions 19 2. Descriptor Specifications 19 3. Database Records 22 C. LIMITATIONS 24 D. POSSIBLE SOLUTIONS 29 1. One-Pass Approach 29 m. IV. V. 2. Two-Pass Approach 30 3. The Selected Solution 31 THE PROPOSED DESIGN 34 A. THE DESIGN OF DATABASE CREATION/REORGANIZATION B. MULTIPLE PHASES OF THE ALGORITHM . . 34 35 1. Phase-One 35 2. Phase-Two 41 PARALLEL OPERATIONS C. SEQUENTIAL D. THE TIME-COMPLEXITY ANALYSIS 46 E. DETAILED DESIGN SPECIFICATION OF ALGORITHM 51 VS. IMPLEMENTATION ISSUES 43 56 CONCLUSIONS 62 A. A REVIEW OF THE RESEARCH 63 B. SOME OBSERVATIONS AND INSIGHT 64 APPENDIX A - ATTRIBUTE TABLE PROGRAM SPECIFICATIONS 65 APPENDIX B - DESCRIPTOR TABLE PROGRAM SPECS 72 APPENDIX C - CLUSTER DEFINITION TABLE PROGRAM SPECS 75 APPENDIX D - RECORD CHECKER PROGRAM SPECS 79 LIST OF REFERENCES 92 INITIAL DISTRIBUTION LIST 94 VI LIST OF FIGURES Figure 1.1 The Multi-Lingual Database System Figure 1.2 Multiple Language Interfaces for the Figure 1.3 The Multi-Backend Database System Figure 2.1 Sample Directory Tables 16 Figure 2.2 Sample Record Relationship 18 Figure 2.3 Sample Template 20 Figure 2.4 Sample Descriptor Figure 2.5 Sample Record Figure 3.1 Phase-One 40 Figure 3.2 Phase-Two 44 File 5 Same KDS 7 9 23 File 25 File vn INTRODUCTION I. A. MOTIVATION In order to create a record in databases, an INSERT command is In these used. The cases, inserting records into any database requires input/output (I/O) operations. record is read from one secondary storage, processed by the database system, then stored onto another secondary storage. record. The insert command is We notice that there are three operations per usually efficient for single record insertion, as well as a small batch of insert operations. The problems arise when massive amounts of records are to be loaded, generating thousands upon thousands of I/O requests. this occurs, the loading of a new database is a time-consuming process. building of a utility package that by-passes the system's the data directly onto the secondary storage would be When Hence, the INSERT command and loads a viable solution to help creating databases more quickly. Over are deleted records at some records the course of time, from the database. that point Instead, systems tag in them so these records have been identified for deletion. that they "garbage collection" routine. may be deleted at at the time of deletion. non-prime times. It later. Most systems employ a Although, garbage collection process imposes additional overhead and complexity on the system [Ref. records no longer required and Most operational systems do not physically remove when time in the database are l:p. 348], it does not remove tagged Instead, the storage reclamation [Ref. 2:p. 432] identifies all the unused/unnecessary cells (i.e., is done storage space occupied by tagged records) and returns them to free storage. allows that a record when the record is is only tagged and identified for deletion. of tagged records within any database the secondary storage. Good is space its is The garbage collection not reclaimed at every instance The recognition and disposal [Ref. 3:p. 396] essential to the optimization of the records optimization produces good data organization, which, in Records turn, provides efficient data retrieval. that are tagged for deletion are removed from the database and the active records are reorganized on the secondary storage fill the space vacated by the creating a new removed Removing tagged records records. is in essence utility package capable of creating and/or reorganizing large databases over multiple disks. this utility package is in package will provide a more The direct support of the multi-backend database system (MBDS), which will be discussed utility to database from the untagged records. This thesis will provide a detailed design specification for a development of on in efficient more means detail in the next section. for generating new This databases as well as provide a garbage-collection capability. B. BACKGROUND Over fairly the past standard. language for that two decades, database design and implementation methods were The general approach was particular model, homogeneous database systems, which database system approach are specify a data model, define a data and develop a system transactions written in the data language. corresponding data language. to to manage and execute This approach lead to the development of restricts the user to a single data Some examples model with its of systems using the homogeneous IBM's Information Management System (IMS) which supports the hierarchical data model and the Data Language DMS-1100 supporting the network data model and the IBM's SQL/Data System supporting the relational data I (DL/I), Sperry Univac's CODAS YL data language, model and the Structured English Query Language (SQL), and Computer Corporation of America's Local Data Manager (LDM), which supports the functional data model and the Daplex data language. A revolutionary approach to database multi-lingual database system previously [Ref. (MLDS), The design of 4]. management system development which eliminates the MLDS is the restrictions discussed affords the user the ability to access and manipulate several different databases, using their corresponding data models with their The major design goal of respective data languages. MLDS is the development of a system which can be accessed via different data models and their model-based data languages MLDS functional/Daplex). network/CODASYL, relational/SQL, hierarchical/DL/I, (e.g., and will function as a heterogeneous collection of databases vice a single database system. The many advantages of databases using different hardware upgrades, and data the MLDS are its ability to support a wide variety of models and languages, economy and efficiency of reusability of database transactions developed on a conventional system. MLDS databases. their has taken further steps toward a more complete utilization of Currently, all are resident data models are allowed access to the database only through corresponding languages: network databases its hierarchical databases are only accessible through DL/1, only accessible through CODASYL, functional databases are only accessible through Daplex, and relational databases are only accessible through SQL. MLDS extends the concept of a multi-lingual database systems to a multi-model database system (MMDS) which the databases based on in accessed by data languages based on different data models. the user of one data alio The obvious data model. model We MLDS. models can be This type of environment and manipulate data stored in another the cross-access of databases based is on true sharing of data over multiple databases. The following subsections function of MMDB benefit of models which allows different to access different overview of the structure and will give the reader an also introduce the reader to the architecture of the multi- backend database system (MBDS). MBDS is by the database system used MLDS to support database transaction processing. The Multi-Lingual Database System 1. A in Figure block diagram 1.1. To of the multi-lingual database system is shown access or modify the database, the user issues transactions through the language interface layer (LIL) using a user data data language (MBDS) (UDL) for that particular model. mapping system (KMS). KMS model (UDM) written LIL routes in a user the transaction to the kernel performs one of two tasks, depending on the type of database transaction requested. If the transaction specified KMS transforms the (KDM) (KC), UDM database definition. which processing. then routes by the user is for the creation of a new database, database definition to an equivalent kernel data model KDM the database definition request Upon completion. KDS to notifies the is kernel KC. which sent to the kernel controller database in turn system forwards (KDS) its for results to * KMS v ^ > KFS UDM : UDL : LIL : KMS : KC KFS : KDM KDL KDS User Data Model User Dale Language Language Interface Layer Kernel Mapping System Kernel Controller Kernel Formatting System Kernel Data Model Kernel Data Language Kernel Database System Data Model Data Language System Module Information Flow Figure 1.1 the The Multi-Lingual Database System formatting kernel structure to UDM KFS system (KFS). The user equivalent via LIL. KDM transforms the results from the then informed that the request has is been processed and more requests can be accepted. If the transaction specified KMS KC translates sends KDL UDL to the UDM KDL transaction to transaction to KDS KDM to sends the results in form. form The KFS then by the user a database manipulation request, equivalent and sends for execution. KFS, is via KC, KDL Once processing for transforming transaction to is KC. complete, KDS KDM form from the returns the results of the transformed data to UDM. The LIL, KMS, For example, there KFS are hierarchical/DL/I, KDS is a separate and unique set of LIL, is network/CODASYL-DML and a single, major attribute -based data series of reports be to show how transformed to work on preliminary from language KFS MLDS. each for On hand various language all KDS through the the other that the actual model and SQL to ABDL raw data attribute-based [Ref. 8], from DL/I to to model has not been completed at respectively, for MLDS. network and functional data the same time presenting More at ABDL from [Ref. 9], [Ref. 11]. CODAS YL-DML Additionally, the language The language interface software for the functional the time of this thesis, but detailed design for 15]. Additionally, there is The model is the Graphic LAnguage Actor language currently under development at for the its also research language interface for an object-oriented data model and specified data language. the KDL, relational [Ref. 12], hierarchical [Ref. 13], implementation has been documented [Ref. model using while ABDL been completed for and network [Ref. 14] data models. a and (ABDL) a complete set of algorithms for the data-language translations interface software has incorporating KDM the relational, hierarchical, and from Daplex [Ref. 10], the attribute -based data language the corresponding data-language transactions [Refs. 5,6,7]. works provide ABDL into and functional/Daplex. It is have been selected and implemented as the recent as the accessed and manipulated by the various user-defined language interfaces. is The can KMS, KC component shared and accessed by Figure 1.2 depicts this concept. interfaces. A known collectively Currently, these models and their corresponding languages are relational/SQL, model. data and Four similar modules are required for each language interface of the interface. the KC Database its (GLAD) Naval Postgraduate Figure 1.2 School. Multiple Language Interfaces for the Associate Professor of Computer Science Same KDS Thomas C. Wu is in charge of this project. 2. The Multi-Backend Database System To overcome traditional approach (MBDS) was to designed. the performance and upgrade problems associated with the database system design, the multi-backend database system MBDS has solved these problems by using multiple backend Each backend has processors connected in parallel to a single controller. hardware, software and disk system, as shown in Figure 1.3. are not unique to each backend, but all replicated over is own its Hardware and software backends. The backend The processors are connected to the backend controller via a communication bus. backend controller can be accesses The backend computer. controller either is by the user directly or through the host responsible for supervising the execution of database transactions. Performance gains are realized by processors. transaction Also, size, a if then more C. the number of backend number of backend processors of inversely proportional to the number of backend processors MBDS the database size remains constant, then the response time for the user If the is increasing the system. increases in direct proportion to the database produces nearly invariant response time for the same transaction. on the detailed discussion MBDS the reader For referred to [Refs. 16 and 17]. is THESIS ORGANIZATION Under the current mode of operation there The process into the database is operations. This in itself is take several days to load a When to actually quick and simple. not bad, but new and the delete operation is efficient manner in which to detennine where a record should be inserted Each insert operation generates a set when performed several thousand times, of I/O it can removed from the large database. used, records are not physically database, but are actually tagged for deletion. records grow in numbers. no Current operations are satisfactory for single generate or reorganize large databases. record insertions. is They should be Over a period of time, these tagged periodically collected and 8 removed from the Backend Store 1 Jackend •Processor 1 Backend Store 2 Backenc Processor 2 To a /'Backend Host Controllj Backend Store n Backenc Processor n Communications Bus The Multi-Backend Database System Figure 1.3 database to obtain the maximum provide an algorithm which is used of disk space for active records. flexible enough This thesis will to create large databases and provide a "garbage collection" feature to get rid of the tagged records and reorganize remaining active records. In Chapter II, we in regards to creating look new at the kernel software and describe the system's limitations databases and deleting records from a database. We also describe two approaches for an algorithm designed to create and/or reorganize large databases over multiple backends. then select the appropriate approach. outlines the multiple phases of the algorithm. A Chapter in complexity analysis of the algorithm versus the complexity analysis of the insert operation is new given, indicating the efficiency of the new algorithm. of the actual algorithm. V, we make C, and D Also included In Chapter IV, we in this chapter is a pseudo-code design discuss implementation issues. our conclusions about the proposed design. Finally, In Chapter Appendices A, B, provide the program design specifications for building the Attribute Table, Descriptor-to-Descriptor-Id Table, Cluster Definition Table and program specifications for the Record Error Checker routine, respectively. 10 n. In this chapter first section, we second section, and record). we THE KERNEL SYSTEM discuss the overall configuration of the kernel system. discuss the kernel data language and the kernel data model. we discuss the specifications for the input files In the third section, to database creation we (i.e., In the In the template, descriptor discuss the limitations of the system with respect and garbage collection. In the final section we discuss possible solutions and select a proposed solution for overcoming the limitations discussed in the second section. A. THE KERNEL DATA MODEL AND THE KERNEL DATA LANGUAGE The kernel system is model-based data language. system (MBDS) is composed of two 1. the kernel data The kernel data model used the attribute-based data model. supports the attribute-based data model The next two parts: is in the model and its multi-backend database The kernel data language the attribute-based data language that (ABDL). sections introduce the concepts and terminology of the kernel system. The Attribute-Based Data Language ABDL supports UPDATE, RETRIEVE operation with a and five primary database operations: RETRIEVE-COMMON. A qualification. A qualification database on which to operate. 11 is request in INSERT, DELETE. ABDL is a primary used to specify the part of the The INSERT command The qualification of the INSERT is request For example, the following inserted. used to is a insert a list new record into the database. of keywords and a record body being INSERT command INSFJlT(<FILE,USCensus>,<CrTY,Cumberland>,<POPULATION,4000>) will insert a record into the US Census file for the city of Cumberland with a population of 4,000. database. The DELETE The qualification of a delete record request is used to remove one or more records from the is a query. A query, in the operation, specifies which record in the database will be deleted. For example, the DELETE command following DELETE will delete all The The query UPDATE US Census request UPDATE specifies how ((FILE = USCensus) and records from the qualification of the specifies DELETE is file (POPULATION > 100000)) with a population greater than 100,000. used to modify records of the database. The request consists of two parts: the query and the modifier. which records of the database are to be modified and the modifier the record to be updated will be changed. For example, the following UPDATE command UPDATE (FILE = USCensus) (POPULATION = POPULATION + 12 5000) will modify all records in the In this example the query = POPULATION + request RETRIEVE The by increasing all populations by 5,000. is (POPULATION by-clause, RETRIEVE command may AVG, MEN, MAX) on request consists of three parts: name which is The used to retrieve records in the database. is identifies the record to consists of the attributes (fields the user. file 5000). The query and a by-clause. Census (FILE = USCensus) and the modifier is The RETRIEVE qualification of the US in the record) optional, is list, be retrieved, and the target whose values attribute values. list are to be output to used to group records. consist of an aggregate operation one or more output the query, a target (i.e., Also, the COUNT, SUM, For example, the following RETRIEVE command RETRIEVE ((File = USCensus) and (POPULATION > 50000)) (CITY) will output the city of all records in the US Census file with populations greater than 50,000. The attribute value. three parts: two The qualification the query, the target The query and join the RETRIEVE-COMMON target files. list The request of the list, is used to merge two RETRIEVE-COMMON and a common attribute files 13 in common request consists of between the two are used as specified above, while the attribute RETRIEVE-COMMON command by ABDL is the files. key functions the to same as the JOIN command in SQL. For example, the following RETRIEVE-COMMON command RETRIEVE ((FILE=CanadaCensus) and (POPULATION>50000)) (CITY) COMMON (POPULATION, POPULATION) ((FILE = USCensus) and (POPULATION > RETRIEVE will find all records in the find all US records in the the records from these names whose cities Census file file whose populations five simple with a population greater than 50,000, with a population greater than 50,000, identify have the same population With these to access files Canada Census 50000)) (CITY) commands, figures are common and return the city figures. ABDL provides the user with the means and manipulate the database. 2. The Attribute-Based Data Model In constructs: the attribute-based data the database, files, records, pairs, attribute-value ranges, model, data attribute is sets, considered in the following value domains, attribute-value keywords, directories, directory keywords, non-directory keywords, keyword predicates, record bodies, and queries. These constructs are applied to two types of a. data: meta data and base data. The Meta Data The meta data is The various meta data the stored data about the structure and form of the constructs form the directory of the database. The directory contains the following constructs: attributes, descriptors, and clusters. The base data. 14 attribute is A used to represent a category of certain descriptor attribute. same set A is common property of the base data. used to describe a unique range of values or distinct value for the cluster is a group of records in which every record in the cluster More of descriptors. is of the specifically, the directory is organized in three tables: the attribute table (AT), the descriptor-to-descriptor-id table (DDIT), and the cluster- definition table (CDT). AT maps directory attributes to descriptors, while CDT each descriptor to a unique descriptor-id. ids. There are three classifications of descriptors. maps DDIT maps descriptors-id sets to clusters- A type-A descriptor a conjunction is of less-than-or-equal-to and greater-than-or-equal-to predicates, where the same attribute appears in each predicate. C A type-B descriptor consists of equality predicates. equality predicates defined over A The type-C descriptor defines a set of type-C sub-descriptors. all sample of each directory table b. is data a collection of by a unique attribute-value type- sub-descriptor are unique attribute-values which exist in the database. provided in Figure 2.1. The Base Data The base database is A set is files. raw data the actual These attribute-value pair (or is a A keyword) and the member value domain of the attribute. record textual is composed of two information For example, <POPULATION, 50000> A attribute defined in the database. pairs of a record or file are called directory A keywords and 15 parts: an body). An name and the (record of the Cartesian product of the attribute pair having 50,000 as the value for the population attribute. one attribute-value for each the database. contain a group of records that are related files of directory keywords. pair makes up that is an attribute record contains at most Certain attribute-value are kept in the directory for Attribute Type Rttr bute i POPULRT ON CITY FILE DDIT Entry R C B 1 D11 D21 D31 Rn Attribute Table (RT) Descr ptor ID i <= POPULRTION <= 50,000 50.001 <= POPULRTION < = 100,000 100,001 <= POPULRTION <= 250,000 250,001 <= POPULRTION <= 1,000,000 D11 DI2 D13 D14 D21 D22 D23 D24 D3 CITY CITY CITY CITY = = Columbus = Montereu Toronto LE LE = F F I D32 I I = = Detroi t Canada Census US Census R Descr ptor-to-Descr iptor-ID Table (DDIT) i ID Desc-iD Set Rddress List CI D11.D21.D32 D14.D22.D32 D12.D23.D32 D14.D24.D31 a1 C2 C3 C4 , m R2 R3.A5.R6 R7.R8 R Cluster-Definition Table CCDT) Figure 2.1 Sample identifying records or Directory' Tables files. Those attribute-value called non-directory keywords. base data of the database. Below A is record in the pairs not kept in the directory are files of the database constitutes the an example of a record. 16 (<FILE,USCensus>,<CITY,Marina>,<POPULATION,50000>, Moderate Climate { Angle brackets, <,> enclose the attribute-value pair, curly brackets, record body, and the record itself database may be identified is A by keyword predicates. keyword predicate tuple consisting of a directory attribute, a relational operator an Combining keyword predicates attribute -value. characterizes a query of the database. The query is to be accessed and/or modified. B. THE TEMPLATE SPECIFICATION The construction of a database the template file, the descriptor file, directory and record structures of a To The These files are used by MBDS files The system schema and Figure the name 2.2.b of each template this (file) captures the data relationship from one representation that is its file to Figure attributes. defines the file system within some and supplier records. 2. 2. a shows the record Each record The process of nonnalization another, as well as, provides a form of suitable for the template specification. 17 in the database of the database: schema would be normalized. and form clusters. consists of purchase-order, part, shows how normal contain the actual data or form the record Figure 2.2 describes the relationship of the records. shows files better describe the input files created, consider a purchasing large business. a three descriptor file contains the directory The record to which record The template file. is the =,!=,<,>,<=,>=), and disjunctive by three input and the record file. (i.e., in specifies controlled and their descriptor definitions. attributes records. is enclose the {,}, The records of enclosed in parenthesis. ) Normalization requires that Purchase Order Order Number Supplier Order Delivery Number Date Date Y Total Cost Supplier Supplier City Name 1r Part Quantity Part Price Number (a) Schema Purchase-Order for a Purchase Order System (order-number order-date, Part (b) (supplier-number, Normalized file is form a unique for a supplier-name, city) Purchase Order System Schema Sample Record Relationship certain attributes appear in order total-cost) (order-number, part-number, quantity, price) Supplier Figure 2.2 supplier-number, delivery-date, more than one repeated in the supplier identifier. file file. and The is supplier number combined with in the purchase- the supplier This duplication does not imply that the value stored because normalization is is name to redundantly concerned with logical structures rather than physical organization 18 Template Descriptions 1. The template database. file In general, there contains the descriptions of the templates defined in the may be many databases in the The format of keeping in the different databases mind, the template descriptors for the records separate. MBDS. So a template file for a given database with this in must be kept n templates could be as follows: Database name of templates in the database Template description for template #1 Template description for template #2 Number Template description A typical descriptor with Number m for template #n could be as follows: attributes of attributes in a template Template name Attribute #1 data type Attribute #2 data type Attribute #m data type There are three different data types: integer name of this database will be Order, Part, and Supplier. for the PURCHASING 2. directory string (s), PURCHASING. The With this database, as information shown and floating point (0- The template names will be Purchase- we can now generate the template file in Figure 2.3. Descriptor Specifications A name = (i), descriptor WANG) is or (price keywords a keyword predicate of = $200). MBDS the form, for example, (supplier- recognizes two kinds of keywords: non- for search and retrievals, and directory 19 keywords for search and PURCHASING 3 6 Purchase-order template s order-# s supplier-# s order-date s delivery -date s total-cost f 5 Part template s order-# s part-# s quantity i price f 4 Supplier Figure 2.3 retrievals as well as directory keyword template s supplier-# s supplier-name s city s Sample Template forming The clusters. descriptors only. descriptor file is an input As an example, of $10,000 and up to $50,000 since 1988, WANG), (10,000 =< is WANG with a descriptor or-equal-to. is a total cost derivable with a set of three descriptors: total-cost < 50,000), and (order-date There are three types of descriptors: type-A, type-B, and type-C. A contains a cluster containing records purchasing database for purchase orders from a supplier (supplier-name = file that Cluster formulations are based on the attribute values and value ranges of the descriptors. in the File = 1988). A type- conjunction of two predicates: less-than-or-equal-to and greater-than- An example of a type-A descriptor 20 is (10,000 =< total-cost < 50.000). For creating a type-A descriptor, the attribute (i.e., A of a type-B descriptor (supplier-name is However, equality predicate. type-B predicate and the value range The value range of 10,000 and up to 50,000) must be specified. of upper and lower limits. total-cost) is is expressed in terms an equality predicate. = WANG). The type-C An predicate example is values are provided later by the input record. its (i.e., also an These descriptors are then automatically converted to a set of type-B descriptors with the same attribute name and values corresponding to the value range. As an example, if the template has a type-C descriptor with values of purchase-order, part, and supplier provided by the record, the purchase-order), (template When = first part), set of type-B descriptors generated is (template = and (template = supplier). specifying descriptors, the attributes of the given descriptor must be unique and the specification of the values and the value ranges must be mutually exclusive. Below is an example of a descriptor file with n descriptors. Database name Descriptor definition #1 Descriptor definition #2 Descriptor definition #n $ The in "$" indicates the end of the descriptor terms of the attribute and by the value ranges as its file. Each descriptor definition Descriptor-type Value range 1 Value range 2 Value range k expressed associated descriptor type and data type and followed shown below. Attribute is Data-type @ Value ranges are expressed C are equality values, which holding the exact value. is therefore indicated are exact. "!". file is The placehold limit is not used. The "<S>" indicates limits. Since type-B and type- for the upper limit The value of is used for the lower limit the end of the descriptor file. An depicted in Figure 2.4. Database Records Once will terms of upper and lower The lower by an example of a descriptor 3. in the template and descriptor files are defined, the data record format be specified. Data records can be prepared format for a typical record file is as in separate files for loading. The follows: Database name @ Template name #1 Record #1 template #1 Record #2 template #1 Record #n template #1 @ Template name #2 Record #1 template #2 Record #2 template #2 Record #n template #2 The database name identifies the database to which the template The "@" indicates the beginning belongs. template's name. of a All records following the template 22 new template name belong and the record followed by the to that template until PURCHASING template C s Purchase-order Part Supplier @ total-cost A 1000.00 100000.00 100000.00 500000.00 @ f order-# A #1 #50 #100 #1000 #50 #100 @ s price A 1000.00 50000.00 50000.00 500000.00 f @ supplier-name B s WANG IBM DEC @ B city s Monterey San Jose Carmel @ $ Figure 2.4 either the Values in "@" the Sample Descriptor or "$" is encountered. File The "$" indicates the end of the entire record. record are separated by a space. Purchasing database in the An example purchase-order template would look like <order-#,26>,<supplier#,51>,<order-date, 18-May-1988> <delivery-date,22-Jul-1988>,<total-cost, 200,500.00> If we of a record in the extract the values of each attribute value pair, 23 we have 26 which 18-May-1988 51 22-Jul-1988 $200,500.00 a record of the purchase-order template in the Purchasing database. is 2.5 gives a more complete record Figure file. LIMITATIONS C. The same kernel system described in the earlier section provides functional capabilities as most other database systems. a clean, easy way Yet for system. to access all its and manipulate data. It is its users with the provides It its users with a simple, but powerful database simplicity, the system lacks an efficient means to generate very large databases. The capability to create large databases does exist but and quite tedious. The INSERT command into the database. Let us briefly discuss is the how it receives the slow, cumbersome, means by which records this are inserted process works. In order to add a record to the database an insert operation The kernel system is command, recognizes it is issued by the user. as an insert operation and proceeds to process the request. The record undergoes a record processing phase where it is checked for validity prior to insertion on a necessary and very involved process and The INSERT operation The complication system. be inserted. there are keywords type-C is If the attributes is We backend. Record processing is outlined here. requires considerable work on the part of the kernel due to the type of directory keywords of the record to of the directory keywords are of type- A or type-B, then no complications. The complications are of type-C. to the first look at arise when the attributes of the directory the steps required to insert a record without attributes. 24 PURCHASING @ Purchase-order 26 51 18-May-1988 31 51 25-jun-1988 V780 VAX-1 1^780 Memory RA81 VAX-8600 22-Nov-1988 14-Apr-1889 191500.00 381900.00 1 91500.00 42000.00 58000.00 381900.00 @ Part 26 26 26 M780 D81 V8600 31 8 1 1 @ Supplier DEC 51 Carmel $ ngure 2.5 Sample Record File In step one, the attribute-value pairs (or keywords) of the record are identified. The attribute, of an attribute-value pair, successful match keyword. There indicates may that If there is directory keyword. If is least at used for matching the attributes in AT. one keyword of the record be more than one match directory keyword. record for insertion is if the record contains not a match, then the none of the keyword of is A a directory more than one the record attributes of a record is a directory is not a keyword, the rejected by the system. In step two, once the attribute-ids are obtained, the descriptors in DDIT are searched to determine whether a descriptor covers the keyword of the record for insertion having the same corresponding descriptor-id attribute. is used to If the descriptor covers the is performed. the fonn the descriptor-id group. In step three, a search for a descriptor-id set in descriptor-id group keyword then CDT If a descriptor-id set is 25 which found to is identical to the be identical to the descriptor-id group, then a cluster has inserted. If a descriptor-id group group either (1) the descriptor-id group cannot be a descriptor-id been not found during the search of is a is new cannot form a new In step four, its In case (1), this indicates that a set. placed at the new descriptor-id set. we identified in step three. is associated A new this implies: entry in CDT new is cluster is to created for the In case (2), the descriptor-id group cluster-id. In this case, the record is rejected. are concerned with the placement of the record into the cluster The record-id is transformed into a disk address and the record This process of record placement secondary storage so addressed. more complicated than CDT, descriptor-id set, or (2) the descriptor-id be formed for the record to be inserted. descriptor-id set and identified to receive the record to be it The records of sounds. a cluster must be evenly distributed across the multiple backends, so the placement of the record on a backend simple task. (1) The following The cluster-id set is sent to the controller. This information controller tasks all know A backend then receives from is obtained from the table is known maintained by the controller. the cluster into the backends to which the record work on the by broadcasting the operation and record communication bus. Meta data processing by backends from all is is meta data and to receive the record cluster-id not a on the backend whose disk has the available block backend table (CINBT), which controller does not is steps are taken: the controller the information the cluster. is is for as the cluster-to-next- In the beginning, the to be inserted. The to identify the cluster to all backends, via the parallel operations. One the determined by the backends, the controller receives the same cluster-id the backends. again via the communication bus. 26 The (2) controller locates the entry in the Once to the cluster-id received. enough space is the cluster is identified, it is available to receive the record. is CINBT whose space If is decreased by the length of the record to be inserted. enough, the controller forfeits the The from the next backend. space and allocates a controller then sends a cluster-id is identical checked to determine if available, the cluster size If the space is not large new block of secondary storage message over the communication bus, simply instructing the backend to write the record into the backend 's available secondary space. To write a record we retrieve the from secondary storage, write the record into that portion We portion back onto the secondary storage. there is and then stores the cluster note this operation not parallel since is only one backend performing the writing. The entire process record, but not of records. processing. Thus, backend 's portion of the cluster we when described above is very efficient when inserting a single creating a very large database consisting of thousands or millions Each record generates two I/O requests and the hardware Thousands of records during I/O will generate thousands of separate I/O requests. can see where the inefficiency lies when What we attempt to create very large databases. is idle using the current to do mode in this thesis is to of operation minimize the involved process on the record-by-record basis where the entire kernel system up by the large number of The kernel system insert operations for a large record from the database. as no longer active. tied records. lacks the capability to reclaim storage space once a record has been identified for deletion. it number of is The The DELETE command DELETE command does not physically remove a simply tags the record, identifying The system no longer recognizes 27 this record as a part of the database, but the record on secondary to reside records and "garbages" storage medium. is still fit What now storage. (i.e., Some in smaller than the record that previously lived larger than the previous record, then it by the tagged record and there all records are fixed- at that spot, then the new is record could be If the new record to be inserted is cannot be inserted into that slot and must be When most placed someplace else on the medium. memory record added to the record to be added to the database only creates smaller fragments. It are active But a smaller record will not solve the problem of fragmented inserted into this space. space on disk. new most databases not new If the continues still on secondary storage size then a nicely into the space occupied are variable-length. it This results in fragmentation within the were of the same would be no fragmentation. Unfortunately, length. resides tagged records). If all records database would a physical part of the database since large systems want to reclaim that is no longer needed, they perform an operation called storage reclamation or compaction. This type of function involves gathering occupied areas of storage into one end or the other in secondary storage. hole instead of numerous small holes. This concept can be applied to the database problem where records tagged for deletion If all the active are not required to remain in the database. records are collected and then redistributed onto secondary' storage, this will rid the storage secondary This leaves a single large free storage storage Compressing the medium of is the fragmentation. creation of a new In other words, the reorganization of database active records remaining in storage 28 is with only active records. the route that should be taken. D. POSSIBLE SOLUTIONS In this section we propose two solutions to resolve the problem for the lack of an efficient database creation and reorganization function. solution is to After each proposal is to speed produce the same output. inputs to the algorithm are: (1) the template descriptor if new creating a database be reorganized along with a record redistributed. The large presented, the best possible solution will be selected. is will medium) With both up the process of creating very Both approaches use the same type of input and tape basic idea behind each bypass the system and to load the data directly to the disks. proposed solutions, the intention databases. The file, or, (2) the (on tape file output, once the algorithm is AT and and a record file, DDIT file The (on of the database to medium) containing the data to be used, will generate a database over multiple disks consisting of both base and meta data. 1. One-Pass Approach The One-Pass approach record will be handled before for this algorithm is is named because of the number of times each placed onto secondary storage. is it so as follows: AT DDIT and (1) Generate the (2) Load records from tape (3) Process each record (4) Update (5) Build (6) Assign records (7) Write cluster to backend DDIT CDT - - - temporary work space check syntax and format record adding if first into new Type-C record on first to cluster 29 attributes track/ Update CDT The basic strategy (8) Repeat steps three through seven (9) Write meta data to secondary storage The advantage to this algorithm disadvantage to this algorithm storage many 2. is in processing that each record each record handled only once. is that clusters of records are written onto the secondary is times as the clusters grow in size. Two-Pass Approach The two-pass approach informs us will be placed onto secondary storage. The that data will be handled twice before basic strategy for this algorithm follows: Phase -One AT and DDIT (1) Generate (2) Load records (3) Process each records (4) Update (5) Build/Update (6) Repeat steps three through five in processing (7) Determine record distribution If good go into a temporary work space check record syntax and format record - DDIT CDT distribution and/or small percent of to all records bad records then Phase-Two else redefine descriptor file and repeat Phase-One. Phase -Two (1) The Scan and sort records - bv each cluster-id number 30 is it as CINBT (2) Build (3) Load records onto (4) Write meta data to their disk (5) Send CINBT, IIGAT and The advantages to this the clusters, (2) it their disks approach sorts records nGDDIT are: (1) by it enforces even distribution of records cluster-id number, which loading of clusters of records onto disks, (3) pass if to controller disk. it a large percentage of the input records are corrupted and (4) each record that assists in the building is handled twice. The disadvantage IIGAT and nGDDIT and are used for generating new type-C descriptors. entry for a descriptor values. writes clusters to this algorithm are insert-information- These tables are part of the meta data residing on the controller table, respectively. DDIT it and insert-information-generation-descriptor-to-descriptor-id- generation-attribute-table the next and disallows proceedings into the second of records only once onto the permanent storage. is among new type-C These two descriptor. IIGAT houses the attribute and IIGDDIT houses tables are used to generate new type-C all the type-C descriptor-ids for pre-defined type-C attributes. The 3. Selected Solution In selecting the best solution to the database creation and reorganization problem, As it other. we must weigh the advantages and disadvantages of each appears in our case, the advantage of one approach So based on effective this, we must and efficient manner advantages mentioned are how in is proposed solution. the disadvantage of the determine which approach will yield to us the most which to store often a single record 31 large databases. is The two major handled, getting a good record distribution within the clusters We storage. and how often a cluster written onto the secondary is look at these advantages in more detail and determine the relative importance of each. In the One-Pass approach the record secondary storage. In the then, during pass-two, in the is it two-pass approach Two-Pass approach record is on the overall efficiency of the algorithm. As mentioned which a record problem is added the record read/handled twice. handled twice, and because earlier, the is it the large number of I/O operations we design of the two algorithms, processed in pass-one and It is INSERT command when each stated that the that in the is record problem must be performed. see why important to understand Single record insertion arises with large database creation, INSERT command. We is handled twice, what impact, use of the to a database. using the processed and then placed onto and placed onto secondary storage. The record read, sorted is is is is if any, the only has it way quick and easy. is the in The individually inserted encountered because of If we look One-Pass approach at the that general each record processed generates an I/O operation for writing the record onto the secondary storage. This is similar to the current design and no performance gain Two-Pass approach each record does not generate an I/O records onto secondary storage as clusters of records is its own request. encountered. In the The placing of the deferred to the second pass where records are written and not individually. each record generating is Since records are written in clusters vice I/O request, the number of I/O requests greatly. 32 is reduced Since records of a cluster are available for distribution on the parallel disks, an even distribution of records within a cluster will produce better disk utilization and speedier data retrieval. even facilitate quarter which are of we distribution, On track. accommodate The numbers of records within the We may other hand, block sizes smaller we do not split may have have shown that the number of times distribution of the records of the cluster when all difficulty to a record over different blocks a record We note it is easier to have a it is writes each This will require a which induces many computations and good record distribution records of a given cluster are ready for distribution. approach embodies both of these concepts, handled does not is The One-Pass approach cluster of records repeatedly onto the disks as the cluster grows. I/O operations. To different backend's disks, respectively. necessarily imply the processing will be slower. new vary greatly. should choose smaller blocks sizes, say one-half or one- large record sizes, since two clusters among clusters Since the Two-Pass our choice as the best approach. These concepts are (1) fewer I/O operations for cluster creation, and (2) better distribution of clustered records on parallel disks due to one-time record collection 33 and distribution. m. THE PROPOSED DESIGN In this chapter we discuss the algorithm in detail. multiple phases of the Two-Pass approach. complexity analysis for this Also in algorithm to show that this chapter, it we look at the provide a time- does provide a more efficient means, over current operations, of generating large databases. Finally, we provide a pseudo code of the actual design specifications for the algorithm. Pascal-like A. we Specifically, THE DESIGN OF DATABASE CREATION/REORGANIZATION The INSERT command used to insert a single record into the database. is In order to create a large database holding thousands or millions of records, under the mode current of operation records to be inserted. method, The DELETE command not physically These Although it execute as is tagged removed from records are fragmentation on the medium. We records is We as there are possible to generate a large database under this is used to delete records from the database. the database but are actually tagged as considered To rid the the active records and then redistribute disks. many INSERT commands a very laborious and inefficient process. is it we must "garbages" medium in the no longer database of this fragmentation, them evenly by Records are and we essence creating a new database with have designed an algorithm which creation and reorganization (garbage collection). 34 is its lead collect to all clusters across the multiple discovered in the previous chapter that ridding the system of in active. its tagged active records. flexible The user enough for both database will specify- in the beginning stage what type of operation he/she will Based on be performing. this input, the algorithm will proceed to perform either a creation or reorganization function. input, output, The and internal processing for the type of operations to be performed are slightly different, but in either case this algorithm will produce a database that is evenly distributed across the multiple backends per cluster. MULTIPLE PHASES OF THE ALGORITHM B. The proposed algorithm has two phase is Phase-One and Phase-Two. phases: concerned mostly with record processing, while the second phase with loading data, both meta and base, onto the backends. phase approach is when is similar to the logic used by the record detail the 1. is MBDS. The logic used in the two- difference between main directory is tables: the attribute table the cluster-definition table (AT), the descriptor-to-descriptor-id table (CDT). Upon entering If the database then the template and descriptors a in concerned primarily with record processing and building the database or reorganized active data). these files are retrieved, the AT them on disks. The next two sections will discuss this phase, the user maps director) unique descriptor-id. AT new user has specified a creation of a new files descriptor for the will be generated. attributes to descriptors The file and contains 35 must a have specified the type of function he/she wishes to perform built. concerned Phase-One (DDIT) and Once first Two-Phase approach. Phases-One three actually placed The is The all (i.e., create database will be retrieved. A partial DDIT maps directory DDIT will also be each descriptor to keyword descriptors. Each descriptor definition is expressed in terms of the attribute, and followed by the value ranges. type, data type, information and AT obtained to construct both is read in a single pass and file is This value entry value. At set. by attribute is followed its AT same time the its is the "@" is known file. Once encountered. DDIT are be retrieved. already exist. This Since this is Once DDIT is reasoning behind to is If the form the descriptor-id is These values Each AT made are read reach is and a partial user has specified a DDIT a reorganization of active records, their AT and is retrieved the type-C attributes will be DDIT encounter them in the active records. for building If not, they Appendices AT A may and B is removed from The conceivable that be re-introduced are attached and DDIT, respectively. 36 will DDIT contains only attributes that are active. it in type-C descriptors are garbage collection), then once records are tagged for deletion program specification DDIT also being built. all will and (i.e., type-C attributes could have been deleted. the keywords the group of range values time because The file. AT ensure that tliis at this all attributes entry for the attribute introduced by the records. reorganization of the active records DDIT. is the that assigned a unique is the descriptor has been read not completed is since they The end of file AT directory to help DDIT Once an value ranges. all corresponding range values can then be place in DDIT. constructed. not DDIT being constructed, is sequentially from the descriptor when when complete used as an index into is this from the descriptor Each keyword have been extracted and placed into AT. from is and DDIT. To construct their corresponding attributes types are extracted descriptor AT It associated descriptor its all when we and provide Once we have built record processing phase. At function being performed is AT and a DDIT we partial We is the record examined We creation or garbage collection. start by loading a main memory file into (or a work is At this point the record After scanning the record, scanned. when we determine This number will be used later percentage of bad records processed. record with the then get a single record from our work space, keeping a running total of records in the database. A now proceed processed the same, whether the this point, records are block of records to be processed from the record space). can check to determine its syntax. to determine the record template. it The Once processing phase begins. checked for will be the template is errors. First, of the record attribute first the is identified as valid, the remaining attributes are checked to determine if the record contains the correct number of attributes These records are and that each reflect the information total it all contained in the template number of bad records at this These data structures records in the database. file. If the will be sent to a record handler for processing be formatted These data structures hold checked against specific data structures. the record template descriptions for errors, attribute also has its correct associated data types. will be maintained. If the record found to contain is bad records. record which time will be built (or is not a complete it will time for subsequent placement onto the disk in Phase-Two. is The new "#" is the placed between each attribute in the record and the "&" the special character used to CDT count of the without errors is Formatting involves embedding special characters within the record. special character A mark the end of the record. DDIT entries will be placed in the table). CDT because the address of each cluster 37 is will be updated and The is CDT not built at this known. Recall provides us with a count for the records in each cluster. later when we determine whether has been achieved. a temporary into the work the record file to file are This information will be used is Once the record repeated until then placed into is all the records loaded the records are processed, in the record file. we check more records If there are in be processed, then another block of records are read into the work is repeated. After processed. all compose error-free records that all CDT, into This entire procedure more records space and the procedure record made is work space has been processed. to see if there are any CDT partial or not a good record distribution within the clusters After an entry space. The secondary storage until Phase-Two. that records are not placed onto This process will continue until records are processed, a tape Appendices the database. provide the program specifications for building CDT C is all records in the produced containing and D are attached and and the record-error-checking routine, respectively. Once determine if conditions: we if record the processing should proceed onto Phase-Two. there is good record a percentage of records containing errors records processed). We that are relatively the the CDT. We can same now is size. look The achieved is to look distribution then proceed to Phase -Two under two within the clusters, relatively small (e.g., less than ideal size of a cluster the track size. at the partial obtained good distribution of the records. is We we must and 10% if the of total define a good record distribution within a cluster as clusters number of backends times built phase has been completed at the A average cluster 38 would be approximately During the record processing phase CDT, and judge whether criterion to determine if size. If the or not good we we have distribution average cluster size deviates greatly from the ideal cluster size, then a good distribution was not obtained. If it is determined that the records within the clusters are not distributed evenly, then the algorithm will not proceed to Phase-Two. attribute values Cluster formulations are based on the problem would be the automatic division of the range might be an appropriate solution. Problems within clusters. non-numeric, or cluster size. may this division In any case, combined would guarantee a encountered when when even possible solution to this attributes. At glance this first definite redistribution of records dividing range attributes that are much result in clusters that are record distribution is smaller than the ideal not obtained the range These ranges should be either decreased and reduced Adjusting to created a larger interval. be required or desired. combinations. It may be attributes should be adjusted. sized or One and values ranges of the descriptors. all ranges variables The procedures described above can be taken The readjustment of range in a may in not number of attributes for a particular database should be handled on a case-by-case basis. The second condition under which we proceed small percent (say 10%) of the records contain errors. corrupted and a large percent algorithm will not proceed (e.g., 30%) of onto Phase-Two. to Phase-Two If for is if only a some reason data the data processed is is bad, then this At the time, the user can reexamine the input data, take whatever actions are needed to correct the problem with the data and then reenter the algorithm. If Phase-One is a success (both good record distribution and small percent of bad records), then the algorithm will proceed to Phase-Two. Figure 3.1 provides a flow-chart diagram of Phase-One. 39 L—EUASE-QUeJ REDEFINE DESCRIPTOR FILE CREATE <CH^ril!^>- RBDRG 1 1 GET FILES GET TABLES t BUILD AT RMTYPEC 31 FROM DDrr BUILD DDIT LOAD RECS ^ t GETREC t SCAN REC t FRROFCHK no TYES FORMAT REC 1 i iPnATP nnu 1 BUILD CDT 1 W/REC TEMP YE W/REC TAPE NO ^T YES CLUST. DIST t BAD REC % NO ^Tyes CEPHAS E-TWO^ Figure 3.1 Phase-One 40 1 BAD REC HANDLER Phase-Two 2. Phase-Two onto disks. mainly concerned with loading the meta data and base data is The records that make up This tape was built during Phase-One. Two is The records on a tape sort. In other words, each other. all Most systems provide a algorithms available (i.e., Quick The first clusters Bubble is placed into either main stored on tape. their cluster-id id will numbers. be placed next to There are a number of so sort, etc.) we will not specify a sort memory work or a Since the space. records are sorted by their cluster-id numbers, the building of the cluster simpler. We things: cluster size A is made need only read the records sequentially, since the records of a given cluster are next to each other. blocks. sort After the tape has been sorted, algorithm but leave this decision to the implementor. a portion of the tape by have the same tape-sort capability. sort, now been function to be performed in Phase- the tape will be sorted whose the records the database have When building a cluster and cluster number. total size without surpassing it. when number of bytes The of bytes in the block (track). first Each time is block (track) in the records is as close to a cluster made is to disks in number of records have a sufficient two bytes of each block the address of the record of the cluster ourselves with two Records of a cluster are loaded block will be written to a disk been collected and the we concern refer to the total number loaded onto one or more disks into the CDT. The record address, called record-id, therefore consists of the track address and the offset of the record within the track in which the blocked record resides. It is at this point that a CDT is Each backend has being built for each backend. the backends have identical cluster definition information 41 (i.e., its Although own CDT. all CDTs in descriptor-id sets), the ones in their own backends their record-ids in the id-to-next-backend table controller meta data and CDT on the disks of a backend, the backend's If the record tracks are not have own contain only those records-ids that refer to their CDT entries. (CINBT) does not this loading process, the cluster- The being constructed. is this table is Also during tracks. table part of the is used to identify the next backend to receive the next block of records of a given cluster to be placed on the backend's disk. process is repeated until the entire input tape tape have been processed, a unique CDT is will processed. have been After built for the records insert-information-generation-descriptor-to-descriptor-id-table are also part of the meta data residing on IIGAT houses type-C descriptors. type-C descriptor. nGDDIT are used to generate DDIT and AT we From id of houses new type-C all all global DDIT entry for a available to create these type-C attributes and from We DDIT we two Once tables. get the last descriptor- extract this information insert this data into their respective tables. new These two tables descriptor-ids for pre-defined type-C attributes. is tables used for generating new is the type-C descriptor values. are built, all the information extract and The (TIGAT) and (nGDDIT). These the attribute and the next any particular type-C descriptor. and then the controller on the each backend. tables to be built are insert-information-generation-attribute-table final AT all This from These tables AT and DDIT exist to provide information on keeping track of type-C attributes which in rum help keep descriptor information on the multiple backends consistent. All the disks. meta data has been generated and is now CINBT. IIGAT and nGDDIT can be loaded onto original function in entering Phase-One was 42 ready to be placed on the the controller disk. a database creation, then AT and If the DDIT be replicated onto each backend and each individual will backend. appropriate If the DDIT reorganization of the active records, the individual CDTs will SEQUENTIAL AT VS. No entering be loaded to its was a Phase-One be replicated onto each backend and Figure 3.2 provides in this function is not affected. PARALLEL OPERATIONS Sequential operations infer that sequence. will in will be loaded to their respective backends. a flow chart diagram of Phase-Two. C. function original CDT tasks will be performed all one a time, in at other operation will be performed until the operation before it has been completed. In most cases, sequential processing implies some sort of task dependency. One that task cannot start until another task has been completed. one or more tasks can be performed at the same time. Parallel operations infer The start of one task does not depend on the completion of another task, so these tasks can be processed in unison. In most cases, parallel processing usually provides a quicker response over sequential processing. The proposed Two-Phase approach algorithm performs algorithm reads a flat file (i.e., the record a file and then processes these records. retrieve another chunk and continue have been processed. tape, file is It retrieves "chunks" of records Once those records by are processed, flat file is created. clusters to be loaded onto the disk space. also processed sequentially. it This is will file This clustered Records of a cluster are stripped from the blocked into tracks and these blocks are then placed onto backends' disks. placing of the clusters onto disks from to repeat this process until all records in the file During record processing another will contain the records record file). This sequentially. not performed randomly. 43 A The round-robin approach ( - .^PHASE^) feT *l LOAD REC t fc r GET REC *l LOAD TO <£^ME£LLj£> ^JYES <^TKFULLJ> T RUII D CINRT ES 7 ND CLSZ = RECSZ t ADD REC SIZE TOCHJRSI7F RACKFND? F BUILD CLUST YES B.E. W.) LOAD B.E. t m? ^MOREP^>^- 1 ND f YES LOAD TO B.E. t BUILD IIGAT, 1IGDDIT IIGDDIT& CINBT TO CNTRL IIGAT, CRF *TF ^^\ F \CORR^' FORG LOAD AT & DDITTOEA. B.E. LOAD INDIV CDT TO APPRO. LOAD DDIT TO EACH BACKEND LOAD INDIV CDT TO APPRO. BACKEND. BACKEND. ( Figure 3.2 STOP Phase-Two 44 \^ 1^ is The used. first backend's disk receives the first block of records of a cluster, the second backend's disk receives the next block of records of the cluster and so on. When the last backend's disk receives block of the cluster, and its if there are more records remaining in the cluster, then the next backend to receive a block of records of the cluster records of all is the first on clusters are placed of operations. to the disks. to all memory and Records cannot be they cannot be loaded onto done sequentially, different blocks of a given cluster backends different repeated until Nevertheless, even though blocking records for the disks until they are processed. is is There seems to be no overlapping All tasks are seen to be performed sequentially. processed until they are read into the main given clusters This procedure backend's disk again. be placed on their respective disks in may parallel. be sent to we have Here, record-serial processing and block -parallel storing operations. MBDS, when onto records loading the record is place. backends belongs. backend into CDT, will The to some parallel The backend controller communication bus, searching for the backend on will be placed. All backends will respective does perform ready to be placed on to a backend. polls each backends, over the which the record disk, After a record has been assigned a cluster operations but in a limited capacity. number, the This is where the to parallel operations actually take simultaneously search their meta data, specifically their determine if the record belongs to one of their clusters. All respond to the controller, identifying the cluster to which the record controller takes the backends' The portion of to receive the record. main memory, the record is acknowledgements and the cluster identifies on the backend is the retrieved written into that portion of the cluster, and the portion 45 of the cluster When placed back onto the disk. is using the toward parallel MBDS INSERT command, The procedure just described is sequential. the architectural design of MBDS The remaining four commands take capabilities. design to utilize parallel processing to offers very full advantages of If the reader requires its fullest. little more information on the benefits of parallel operations using the other four commands, he/she D. is referred to the references. THE TIME-COMPLEXITY ANALYSIS In this section we present a complexity analysis of the complexity analysis of the INSERT new algorithm versus the operation, indicating the efficiency of the new algorithm. The new algorithm makes use of The new algorithm functions almost identically to the difference between the two strategies the data onto multiple disks. inserts, each record is Under are second phase. bound from tape to the into is how mode processed and then generates Phase-One CPU, all INSERT INSERT operation. The operation. the algorithm handles the placement of the current algorithm on the other hand processes until its the basic strategy behind the of operation for multiple-recordits own I/O request. This new records, and defers the loading of the records will concern itself mainly with the computations that although there are some I/O requests which reads the records main memory for initial record processing. The second phase will be concerned with operations that are mainly I/O bound. We will consider the following observations that will be used to simplify our calculations. First, we look at the sequence of operations for both the 46 INSERT command and new the algorithm. Below is the sequence of operations for the INSERT command. load records from the record file get a record check record syntax and format record new type-C descriptor update DDIT update IIGAT and HGDDIT - new type-C descriptor update CDT - determine cluster to receive the record check CINBT — determine the backend whose available disk process a record - - receive the record is to I/O — retrieve cluster to receive record into write the record to I/O — its main memory cluster place the block of the cluster to a specific backend 's disk Repeat process the new until all records are processed. algorithm is The general sequence of operations for as follows: Phase -One load records from the record tape get a record check record syntax and format record update DDIT new type-C attributes update CDT - determine the cluster to receive the record repeat until all records are processed process record - - Phase -Two sort the record tape by cluster-ids load in records from the sorted tape build clusters update I/O — CINBT write block of records per cluster to specific repeat until generate generate all backend' s disk clusters are placed on the backend's disks IIGAT IIGDDIT - write AT I/O write CINBT, IIGAT and nGDDIT I/O - write different I/O and DDIT CDTs to all backends' meta data disks to different meta data disks 47 to the controller backends' backend We can see the order in achieved. is the area that both strategies contain similar operations. which they We we are executed pay close attention be constructed. these tables, their will assume that tables will be in initial total total x: total *ddit : w time time time time the new algorithm require that AT construction will not be included in the comparison study. main memory 1: ^proce» where the I/O operations are executed, since this and Since both strategies use exactly the same process to generate no I/O operations n: w and the degree of parallelism in which they have INSERT command and with our comparison study. t|o«d : differences are are seeking to optimize. Secondly, both the DDIT to The major at all We are required to read the directory tables. times. With these considerations, let We These us proceed define the following variables: number of backends number of records number of tracks placed on disks to read a block of records into main memory to get a single record to process a record to inset new type-C attribute into DDIT time to build/update CDT time to access CINBT — determine the backend to receive next the cluster t i/o- time to retrieve/return a cluster from/to a backend Migddit' time to write record(s) to a cluster time to build a cluster of track size in time to build/update IIGAT time to build/update IIGDDIT t„ time to perform the tape sort time to build/update CINBT '-write' V: t iig.t : Let us calculate the total time required n records into the database. system in the 48 bytes INSERT command The time required is 8K to process to process a single record and load through the The time required + tr + n(tl0^ We now , t mt into the database. + n(t l0^ + t^ + U + t^ + t^,, "*" writing the to load X (tbc + 2^ + t„J + -- (1) algorithm, to process and load required to process n records in Phase-One The time + tbcinbt n records + ti/o) in + t jig „ Phase-Two ti jgdd]t CINBT, IIGAT and nGDDIT to generate a large database using the n(t lotd + V, + 3(t 1/c ) - x vnnu + tMl + t cdI ) + + CDT writing each individual is i(t i/0 ) new is is + tp^ + U, + U) t^, The time required where t^ + calculate the total time required, using the n records t ion n records through the system to process l(t i/0 ) to is + 3ti/ respective backend and 3(t i/0 ) its to the controller disk. two-pass approach t ion + x(t* + tMnh{ The total is time required is + t if J + t iig> , + tj igddil + i(t i/0 ) + (2). We observe that there are some similar constants that appear in both equations (1) and (2) (i.e., t losd , equations. t r„ They fmtu ). Tr „ are calculated the same t m„ t cd , and t t lotd . t cd , and l Mu can be eliminated from both against the same number of records. Since they are constant factors that appear in both equations, removing them from each equation will not impact the overall comparative analysis. both equations. to This constant cannot be removed because two I/O operations, whereas nO™. + t iigll + t iigdd „ + in equation (2) t cinbl + 3t l/r it is +t wte ) --(!') 49 not. Tprocess also appears in in The equation (1) it is linked resulting equations are: + n(tproceM ) If we ignore t the x^ + iort + t^,) + tjx + references processing and data loading nCW^ + nt,™ + Now we 2^ t^J xt* + see that the requests in the new the we can + 3) + t^ + t^, + t^ meta data and concentrate -- (2') solely on record simplify the equations to the following: (1") -- Ux to i + i + 3) number of I/O U- + (2") requests is The number of I/O drastically reduced. algorithm depends largely on the number of tracks that will be written to the backends instead of the number of records. we view If these two equations solely on the basis of loading the records, equations (1") and (2") will appear as nlt^ + nt^^ + respectively. 2x yo xt„c + t^J + -- -- t lon (3) (4) Notice from equation (4) we removed t^i + load the meta data onto disk. If the database is 3). These I/O operations sufficiently large, then these I/O operations can be ignored. These two operations We are now in terms of record processing and I/O operations. can see that deferring the I/O operations until Phase-Two reduces the number of I/O operations greatly. When generating large databases, equation (3) will perform on the order of (3n) while equation (4) performs are loaded in blocks, less advantageous over the current I/O is mode on the order of (n + required. of operation. 50 This algorithm x). Since records does prove to be DETAILED DESIGN SPECIFICATION OF ALGORITHM E. program Create_Reorg(input,output); /**********************************************************/ */ /* This utility program can be used to generate a data over multiple backends/disks. A two pass approach is used in processing the data. Phase-One will scan the data, process the data, load this data to a tape and build three tables. Phase-Two will sort the tape produced in Phase-One, load the sorted data onto the /* /* /* /* /* /* /* disk, then load the tables built in /* the meta data Phase-One onto */ */ */ */ */ */ */ */ disk. /* */ j* *********************************************************/ begin Phase-One(...); ((Good_Cluster_Distribution) and (Less_N%_Bad_Records)) then if Phase -Two(...) else Exit— Redefine Descriptor end. /* main File; */ 51 procedure Phase-One(...); begin case TypeOperation of Create: begin get template file; get descriptor file; Build_AT(...); Build_DDIT(...); end; Reorg: begin AT; DDIT; remove Type get get C attributes from DDIT; end; /* end; case */ repeat /* process group of records /* process each record load in records; get a record; repeat */ count record; scan record; Error_Check_Record(...); if good_record then /* process good records begin format record; Build_DDIT(...); Build_CDT(...); write record to work space; end; 52 */ */ /* else process bad records */ begin Bad_Record_Handler; count bad records; end; get record; until no_more_records write block of records to tape; until end_of_tape CRITERIA: even distribution of records among clusters determine if there is a good cluster distribution; calculate percentage of bad records; /* end; /* Phase-One */ 53 procedure Phase-Two(...); begin tape sort by cluster id; load records to disk /* repeat */ load records; get a record; repeat if (record_in_same_cluster) then begin if (recordsize + clustersize <= tracksize-2bytes) then /* begin build a cluster add recordsize to clustersize; */ build cluster; end; /* else new cluster, same cluster id begin add recordsize to clustersize; determine backend to receive next cluster; load backend; /* for each individual backend Build_CDT(...); */ */ end; else /* new cluster, different cluster id */ begin Build CINBT; add recordsize to clustersize; determine backend to receive next cluster; load backend; Build_CDT(...); end; 54 get a record until until no_more_records end_of_tape load /* last cluster to backend */ CINBT; build determine backend to receive next cluster; load backend; Build_CDT(...); load controller meta data /* build EGAT; build IIGDDIT; load IIGDDIT IIGAT to load CINBT load to controller; controller; to controller; load meta data to disk /* */ */ case TypeOperation of Create: begin load AT load DDIT each backend; each backend; to to load individual CDT to appropriate backend; end; Re org: begin load DDIT to load individual end; end; /* /* case each backend; CDT to appropriate */ Phase -Two */ 55 backend; IV. The user its avenue in which the user accesses the database or interface is the interacts with the system. and IMPLEMENTATION ISSUES The user interface its usually characterized by the data The data model allows model-based data language. database in terms of is logical representation is always a high-level construct which "bogged down" with the details of the database The kernel software of database system the the user to refer to the and the data language allows the user to write generic transactions and queries against the database. language model is abstract The user data model and enough so the user is not and the database system. of the multi-lingual, multi-model, multi-backend (MLDS, MMDS, MBDS) is the attribute-based data and the attribute-based data language (ABDL). of accessing and manipulating the database. ABDL ABDL model (ABDM) provides the user with a means provides the user with five primary through these operations and dialogue that the operations and interactive dialogue. It is user interacts with the system. will be through the interactive dialogue that the It algorithm will be incorporated into the system. The simply. interactive dialogue All menu indicate the type is menu driven. The menus are extensive, but organized items are organized as a multi-level hierarchy with top levels to of dialogues and the low level to indicate the specific dialogue to follow The algorithm embodies many of Integration of the algorithm into MBDS the capabilities already existing in the system. should be done easily. generate the database are the template, descriptors and record 56 The files. inputs needed to MBDS has the capability, through the user interface, to prompt the user for inputs needed to reorganize a database are the attribute table to-descriptor-id table (DDIT). the interactive dialogue is menu driven. We functions already developed. MBDS These tables are already As stated earlier, will take advantage of the propose to modify some of the existing menus of menus should be changed and what menu appear on The (AT) and the descriptor- in existence. The algorithm to integrate the algorithm into the system. incorporation of the algorithm. this information. Upon options entering The next few pages should MBDS be reflect added to how facilitate the the the user will have the following the screen The Multi-Lingual/Multi-Backend Database System Select an operation: (a) - (r) - (h) - (n) - (f) - (x) - Execute Execute Execute Execute Execute the attribute-based/ABDL interface the relational/SQL interface the hierarchical/DL/I interface the network/CODASYL interface the functional/DAPLEX interface Exit to the operation system. Select-> a new If the user is creating a will select "a" as data base or reorganizing an existing database then he/she shown above. The next menu The attribute-based/ABDL (g) - (0) - (1) - (r) - (x) - to appear will be interface: Generate a database Reorganize an existing database Load a database Request interface Exit to operating system Select-> g 57 If the user is creating a new database, then he/she will select "g" as Next the user will be prompted for the shown above. number of backends: Enter number of backends-> The user will enter a numeric value such as "8". user with a series of prompts. These prompts are used template, descriptor and record files (meta data system two ways. He/she can have a file files). (g) (x) (z) - Load template, Load template, enter data into the The following prompts will appear: descriptor and record files (predefined files) descriptor and record files (generate) - Exit to operating system - Exit and stop MBDS has already built the meta data prompts will appear, Enter Enter Enter If the user The user can operation would you like to perform: (p) If the user to collect information about the already built containing the meta data files or can generate the meta data files at this point. What Next the system will provide the files he/she will select "p" and the following in succession, requesting the name of name of name of meta data file names the file containing Template information: the file containing Descriptor information: the file containing Records to be loaded: does not have predefined meta data files then he/she will select "g" and the following series of prompts will appear: For building the Template Enter name of Enter database file: the file to be used to store template information: id: Enter number of templates for database -name: Enter number of attributes for template #1: Enter name of template #1: Enter attribute name #1 for template #1: Enter value type: Enter attribute name #2 for template #1: Enter value type: Enter number of attributes for template #2: 58 name of template #2: Enter attribute name #1 for template #2: Enter value type: Enter attribute name #2 for template #2: Enter Enter value type: Enter number of attribute for template #n: Enter name of template #n: name #1 of Enter attribute template #n: Enter value type: For building the Descriptor file: Enter name of the file used Enter name of Do you want template to store descriptor information: file: attribute "attribute-name" to be a directory attribute: Enter the descriptor type for "attribute -name" (A,B,C): Enter upper bound for each descriptor in rum -Enter "@" to stop: Upper bound: Use "!" to indicate no lower bound exists —Enter "@" to stop: Lower bound: For building the record Enter name The system now has A message of the file: file containing records to be loaded: the three files that will appear it needs on the screen infonning to actually begin using the algorithm. the user, that all inputs are received and database generation has begun. The algorithm will proceed If Phase-One fails, is successful, the algorithm will proceed on to the system will send a message to the user. to process the records. If Phase-One will read one of the Phase-Two. The message following: PHASE-ONE FAILED!!! - BAD RECORD DISTRIBUTION PHASE-ONE FAILED!!! - DATA CORRUPT The user will correct the problem. If Phase-One failed distribution, the user should redefine his/her descriptor file. range variables should be adjusted. If Phase-One failed the user should check the file containing the bad data. 59 because of bad record Specifically, the type-A because of corrupt data, then This file was produced during made the first pass. After the user has new process to generate the the correction needed, he/she should re-start the database. If the user's operation is to reorganize an existing database then at the menu shown below: The attribute-based/ABDL interface: (0) - Generate a database Reorganize an existing database (1) - Load (r) - Request interface (x) - Exit to operating system - (g) a database Select-> o the user should select "o" as The user shown above. will receive a series of requesting information about the database to be reorganized. prompts The following prompt will appear Enter number of backends: name of database to be reorganized: name of the containing the records Enter Entering the Once now the user enters the identify the tables algorithm. A name of needed message will the database to be reorganized, the algorithm can as input. appear The system on the Two. If pass-one If fails, Pass-One is will now start actually informing screen reorganization of the specified database has begun. process the records. to be loaded: the The algorithm user will successful, the algorithm will proceed the system will send a message to the user. PHASE-ONE FAILED!!! - BAD RECORD DISTRIBUTION -- DATA CORRUPT 60 the that proceed on to to Phase- The message read one of the following: PHASE-ONE FAILED!!! using the will The user will take whatever action is appropriate to eliminate the problem. problem has been corrected the user process. 61 will re-enter the system and Once restart the the V. We have found model and its CONCLUSIONS that all database systems have four major components: a data model -based data language, a database, software and hardware. discussed these components in some detail over the past few chapters. system is (MBDS). have The kernel the underlying system used to support the multi-backend database system The kernel system has provides own its data model and language known as the model and language (ABDM, ABDL). Like most database systems attribute -based data ABDL We users its with the capability to and manipulate access its data. Unfortunately, as mentioned earlier, the kernel system does not lend itself to providing a fast and method of generating efficient kernel data language command. we store large databases onto disk This methodology is DELETE command reality, tags the is then we must INSERT In order to it We does work. have learned that the does not physically remove a record from the database but record as longer active. The through the use the one-record-at-a-time methodology. not very efficient but redistribute fragmentation. that a one-record-at-a-time operation. These tagged records leave "holes" database creating fragmentation within the medium. and We know can create these large databases with the use of the INSERT command The large database. them over the backends If we we 62 in the collect the active records eliminate the problem entire thrust of this thesis is to provide a solution to the of database creation and garbage collection. in of problem A. A REVIEW OF THE RESEARCH In this thesis, we have addressed of topic the database reorganization (garbage collection) over multiple backends. creation Specifically, and/or we have presented a methodology' that will efficiently create very large databases of gigabytes on parallel computers and reorganize them when they have records tagged for deletion. INSERT command, We have designed a utility program to load the data directiy onto disk to that have been by -pass the system's and create all necessary base data and meta data of the database. In this thesis, approaches) that We we may be recognized two approaches (i.e., One-Pass and Two-Pass taken with respect to database creation or garbage collection. discussed the two methods and gave our reasons for selecting the Two-Pass approach as the best alternative. The Two-Pass chosen methodology The as Phase-One. first I/O processing performed during the onto the backends' disks. second phase is two phases. The first phase deals mostly with record processing. in the first pass is the initial building Two. The second phase entailed of the three directory tables. first phase. The second phase phase known Also performed There is is is known very little as Phase- concerned primarily with loading data, both base and meta, is The loading of I/O-bound. the data generates The separation of 63 I/O requests, so the the data being processed and the data being loaded have reduced the number of I/O operations. clusters instead of individually. many Now records are loaded in We feel implementation. that the With methodology presented in this for sufficient is an efficient means of generating large the implementation databases with even distributed over multiple backends will B. thesis become a reality. SOME OBSERVATIONS AND INSIGHT The MBDS) multi-lingual, multi-model, multi-backend database system is The power of a very powerful, yet simple system. (MLDS,MMDS, the system lies in its provide one user access to a neighboring database which was created under ability to a different database model, using his/her model and language is own the kernel system data language. which supports the MBDS. There are considerable design, development and system a research vehicle for new The research attribute-based data MMDS MLDS, testing efforts in undertakings. This and making thesis, this once implemented, will provide an additional capability to an already powerful system. Some areas of future research in database creation, would be algorithm to generate more than one database on the backends that already at expanding a time, and (2) placing a have resident databases. 64 (1) new this database APPENDIX A ATTRIBUTE TABLE PROGRAM SPECIFICATIONS - #include <stdio.h> #include "flags. def #include "dblocal.def #include "beno.def #include "commdata.def #include "dirman.def #include "tmpl.def #include "dirman.ext" ddit_definition *ATM_FIND(attribute, desc_type, AT) Find an attribute in AT and return the pointer to its DDIT. Set */ desc_type to the type of descriptors defined on the attribute. */ struct /* /* char attribute [], *desc_type; /* not C, C, struct at_tbl_definition *AT; AT_NOTFOUND or AT_DELETED */ /* attribute table */ { pos; /* position of attribute in Attribute Table */ int #ifdef EnExFlagg printf("Enter ATM_FIND\n"); fflush(stdout); #endif /* find attribute in AT */ pos = AT_binsearch(AT, /* if attribute not in if (pos == -1 II AT attribute); */ AT->at_entrv[pos].at_desc_type AT_NOTFOUND; *desc_type = EnExFlagg #ifdef printf("Exitl ATM_FIND\n\n"); fflush(stdout); #endif 65 == AT_DELETED) { return(NULL); } else { *desc_type = AT->at_entry[pos].at_desc_type; #ifdef EnExFlagg printf("Exit2 ATM_FDSfD\n\n"); fflush(stdout); #endif return( AT->at_entry [pos] .at_ddit_ptr); } } /* end ATM_FIND */ ATM_INSERT(attr_name, /* Insert char attr_id, desc_type, ddit_ptr, AT) an attribute into AT. */ attr_name[]; int attr_id; char stmct ddit_definition struct at_tbl_definition desc_type; /* not C, C, AT_NOTFOUND *ddit_ptr; /* pointer to first *AT; or AT_DELETED DDIT */ element */ /* attribute table */ { position int = compare, 0. /* index to AT /* return value for attribute */ from string comparison of attributes */ i; EnExFlagg #ifdef printf("Enter ATM_INSERTV); fflush(stdout); #endif maintained sorted by attribute name; if not the */ must be checked for fullness; if the table is */ attributes marked for deletion are removed; if the table is still */ (no attributes were marked for deletion) an error condition exists */ /* the attribute table is /* first entry the table /* full, /* full if (!AT->at_no_entry) { simply insert */ strcpy(AT->at_entrv[0].at_AttrName, attr_name); AT->at_entry[0].at_AttrId = attr_id; /* if first entry into table AT->at_entry[OJ.at_desc_type = desc_type; AT->at_entry[0].at_ddit_ptr = ddit_ptr; ++AT->at_no entrv: 66 } else { /* if full table remove deleted (AT->at_no_entry == if entries */ AT_MAX_ENTRIES) AT_remove_del(AT); /* if table still full */ (AT->at_no_entry if printffERROR: = AT_MAX_ENTREES) AT is full in { ATM_INSERT().\n"); sleep(ErrDelay); } else { /* find correct position */ && while ((position < AT->at_no_entry) (compare = strcmp(attr_name,AT->at_entry[position].at_AttrName))>0) -H-position; /* if check for duplicate entries */ (Icompare (AT->at_entry [position]. at_desc_type != printf("ATM_INSERT attribute is already in AT\n"); && else AT_DELETED)) { down table entries */ = AT->at_no_entry - 1; /* shift for (i i >= position; — i) { strcpy (AT->at_entry [i+ 1 ] .at_AttrName ,AT->at_entry [i] .at_AttrName); AT->at_entry[i + l].at_AttrId = AT->at_entry[i].at_AttrId; AT->at_entry[i + l].at_desc_type = AT->at_entry[i].at_desc_type; AT->at_entry[i + l].at_ddit_ptr = AT->at_entry[i].at_ddit_ptr; } /* insert new entry */ strcpy (AT->at_entry [position] .at_AttrName, attr_name); AT->at_entry [position]. at_AttrId = attr_id; AT->at_entry [position]. at_desc_type = desc_type; AT->at_entry [position]. at_ddit_ptr = ddit_ptr; ++AT->at_no_entry; ) #ifdef EnExFlagg ATM_INSERTNn\n"); printf("Exit fflush(stdout); #endif return (position); } /* end ATMJNSERT */ 67 ATM_DELETE(attributeAT) Mark an attribute in AT for deletion. static /* */ char attribute!]; *AT; struct at_tbl_definition /* attribute table */ { position; /* position of attribute in int /* find attribute in position if AT AT table */ */ = AT_binsearch(AT,attribute); (position != -1) mark /* the attribute for deletion */ AT_DELETED; AT->at_entry [position]. at_desc_type = else SysError(5, } /* end "ATM_DELETE"); ATM_DELETE */ ATM_UPDATE( attribute, Update the dditptr /* char new_ddit_ptr, AT) for an attribute in AT. */ attribute []; struct ddit_definition *new_ddit_ptr: /* ptr to struct at_tbl_definition *AT; first element of /* attribute table */ 1 int position; /* index to AT */ EnExFlagg #ifdef printf( "Enter ATM_UPDATE\n"); fflush(stdout); #endif /* find attribute in position if AT */ = AT_binsearch(AT, (position != attribute); -1 update ddit ptr in AT */ AT->at_entry[position].at_ddit_ptr = new_ddit_ptr; /* else { SysError(5, "ATMJJPDATE"); sleep(ErrDelay); 68 DDIT */ #ifdef EnExFlagg ATM_UPDATE\nW); printf("Exit fflush(stdout); #cndif } /* ATM_UPDATE end at_tbl_definition struct AT Find the char dbid[]; /* */ *AT_lookuptbl(dbid) for a database and return a pointer I struct db_info *db_ptr, *DB_find(); EnExFlagg #ifdef printf( "Enter AT_lookuptbI\n"); fflush(stdout); #endif AT for database dbid */ db_ptr = DB_find(dbid); /* find if (db_ptr) /* AT is { found */ #ifdef EnExFlagg printf("Exitl AT_lookuptbl\n\n"); fflush(stdout); #endif return(db_ptr->db_at_pointer); } else /* { AT not found */ #ifdef EnExFlagg printf("Exit2 AT_lookuptbr\n\n"); fflush(stdout); #endif return(NULL); } } /* end ATJookuptbl */ 69 to it */ AT_binsearch(AT, attribute) /* Find an attribute in AT using binary search and return struct at_tbl_definition *AT; /* Attribute Table */ char its position */ attribute []; { high, int /* highest low = 0, /* lowest mid, /* compare; index in portion of tbl being searched */ index in portion of tbl being searched */ index who's attr name is being compared */ value from string comparison of attributes */ /* return #ifdef EnExFlagg printf( "Enter AT_binsearchVi"); fflush(stdout); #endif high = AT->at_no_entry - 1; while (low <= high) mid = (low + high) / 2; compare = strcmp( attribute, AT->at_entry[mid].at_AttrName); { if (! compare) { #ifdef EnExFlagg printf("Exitl AT_binsearch\nVi"); fflush(stdout); #endif return(mid); } else if (compare < 0) high = mid - 1; low = mid + 1; else } /* end while */ /* attribute not found in AT */ #ifdef EnExFlagg printf("Exit2 AT_binsearch\n\n"); fflush(stdout); #endif return (-1 } /* ); end AT_binsearch */ 70 AT_remove_del(AT) attributes marked for at_tbl_definition struct *AT; /* deletion AT table */ static Remove /* from AT. */ attribute table */ { int i, for (i if j = 0; /* indexes to = 0; i < AT->at_no_entry; ++i) (AT->at_entry[i].at_desc_type != AT_DELETED) { keep this attribute */ strcpy ( AT->at_entry [j] at_AttrName AT->at_entry [i] at_AttrName); /* , . AT->at_entry[j].at_AttrId = . AT->at_entry[i].at_AttrId; AT->at_entry[j].at_desc_type = AT->at_entry[i].at_desc_type; AT->at_entry[j].at_ddit_j>tr = AT->at_entry[i].at_ddit_ptr; } /* update number of entries in AT->at_no_entry = j; AT->at_write_required = } /* AT */ TRUE; end AT_remove_del */ 71 APPENDIX B - DESCRIPTOR TABLE PROGRAM SPECS. #include <stdio.h> #include "flags. def" #include "dblocal.def #include #include #include #include "beno.def "commdata.def "dinnan.def "dirman.ext" struct ddit_definition * DM_INSERT_DDIT(descriptor, val_type, ddit_list_header, new_desc_id) /* Add a descriptor to DDIT. If the descriptor is added to the beginning, return a pointer to /* */ */ * descriptor; struct desc_definition char it. val_type; *ddit_list_header;/* struct ddit_definition first element in DDIT list */ *new_desc_id; struct Descld int compare; { struct ddit_definition *create_ddit_node(), *new_ddit, *next_ddit,*prev_ddit; #ifdef EnExFlag printf("Enter DM_INSERT_DDn\i"); #endif new_ddit = create_ddit_node(); #ifdef m_pr_flag printf("new_ddit = %o\n", new_ddit); #endif /* place descriptor into ddit element */ strcpy(new_ddit->lower, descriptor->lower); strcpy(new_ddit->upper. descriptor->upper); di_cpy(&(ne\v_ddit->ddit_did), new_desc_id); if (!ddit_lLst_header) /* creating new { list */ 72 EnExFlag #ifdef printf("Exitl DM_INSERT_DDIT\n"); #endif return(new_ddit); } else { in existing list */ /find proper place prev_ddit next_ddit = NULL; = ddit_list_header; (next_ddit->lower[0] if = && NOBOUND next_ddit->upper[0] NOBOUND) /* the first one catchall descriptor; skip is it */ prev_ddit = next_ddit; next_ddit = next_ddit->next_ddit_definition; } while (next_ddit) compare = datacmp(next_ddit->upper, new_ddit->upper, val_type); { if (compare < 0) { prev_ddit = next_ddit; = next_ddit->next_ddit_definition; next_ddit } else break; ) /* if if new ddit is at (!prev_ddit) beginning of list */ { new_ddit->next_ddit_definition = ddit_list_header; #ifdef EnExFlag printf("Exit2 DM_INSERT_DDn\i M ); #endif return(new_ddit); } else { new ddit somewhere after first element */ new_ddit->next_ddit_definition = prev_ddit->next_ddit_definition; /* prev_ddit->next_ddit_definition = new_ddit; 73 EnExFlag #ifdef printf("Exit3 DM_INSERT_DDnW); #endif return(NULL); } } } /* /* end end if (ddit_list_header) */ DM_INSERT_DDIT */ 74 APPENDIX C #include #include #include #include #include - CLUSTER DEFINITION TABLE PROGRAM SPECS. <stdio.h> "flags. def "dblocal.def "beno.def "commdata.def #include "dirman.def #include "dirman.ext" *CDTM_INSERT(DT, cdt_definition struct d_i_s, cid) Add an entry to cluster-definition table and return a pointer to it. * It will also update DT if necessary (this happens if one or more of the * descriptor ids defining the cluster are not in DT). Update DTCT. */ /* struct dt_definition struct des_id_set int *DT; /* descriptor table */ *d_i_s; cid; I *new_cdt_ptr; *create_did_link_node(), *desc_ptr, *prev_desc_ptr; struct cdt_definition *create_cdt_node(), struct did_link_definition struct dtct_definition int *create_dtct_node(), *new_dtct_ptr; k, ind; #ifdef EnExFlag printffEnter CDTMJNSERTV'); #endif /* sort the descriptor-id set */ DescSort(d_i_s); /* allocate new CDT entry */ new_cdt_ptr = create_cdt_node(); #ifdef m_pr_flag printf("new_cdt_ptr = %oW, new_cdt_ptr); #endif /* store the information in the new_cdt_ptr->cdt_clus_no = new entry */ cid; 75 copy the descriptor ids into the cdt entry */ new_cdt_ptr->cdt_no_desc = d_i_s->dis_desc_count; /* desc_ptr = create_did_link_node(); #ifdef m_pr_flag printf("desc_ptr = %oNn", desc_ptr); #endif di_cpy(&(desc_ptr->dld_did), &(d_i_s->dis_dids[0])); new_cdt_ptr->cdt_did_pointer = desc_ptr; for (k = 1; k < d_i_s->dis_desc_count; ++k) { prev_desc_ptr = desc_ptr; desc_ptr = create_did_link_node(); m_pr_flag #ifdef = %o\n", desc_ptr); #endif di_cpy(&(desc_ptr->dld_did), &(d_i_s->dis_dids[k])); prev_desc_ptr->next_did_link_definition = desc_ptr; printf("desc_ptr } /* end for */ desc_ptr->next_did_link_definition = /* DT update for (k = NULL; 0; and DTCT */ k < d_i_s->dis_desc_count; ++k) /* find the descriptor id in DT { */ = binsr_dt(&(d_i_s->dis_dids[k]), DT)) == -1) not in DT; add it to DT */ ind = CDTM_DT_INSERT(&(d_i_s->dis_dids[k]), DT); new_dtct_ptr = create_dtct_node(); if ((ind /* the descriptor id is #ifdef m_pr_flag printf("new_dtct_ptr = %o\n", new_dtct_ptr); #endif new_dtct_ptr->dtct_cdt_pointer = new_cdt_ptr; new_dtct_ptr->next_dtct_definition = DT->dt_entry[ind].dt_dtct_pointer; DT->dt_entry[ind].dt_dtct_pointer = new_dtct_ptr; DT->dt_entry[ind].dt_clus_count++; end for */ DT->dt_write_required = }/* TRUE; 76 #ifdef EnExFlag printf("Exit CDTMJNSERTNnW'); #endif return (new_cdt_ptr); end }/* CDTM.INSERT */ CDTM_DT_INSERT(d_id, DT) /* Add a descriptor id to descriptor (DT) table if it not already there. Return the position of the descriptor id in DT. */ struct Descld *d_id; * struct dt_definition int pos, k; *DT; /* descriptor table */ { EnExFlag #ifdef printf("Enter CDTM_DT_INSERTMi") ; #endif check to see /* if ((pos = if the descriptor id binsr_dt(d_id, #* descriptor DT)) id is already in already in is != -1) DT { *# EnExFlag #ifdef printf("Exitl CDTM_DT_INSERT\nW); #endif return (pos); } */ /* the descriptor id is not in DT; add it */ #ifdef special_flag printf("DT->dt_no_did = %d\n", DT->dt_no_did); #endif /* Check if (DT->dt_no_did for room SysError(9, */ >= DT_MAX_DESC) "CDTM_DT_INSERT"); sleep(ErrDelay); } 77 { DT ... done where called /* find the appropriate position in DT for the descriptor id * in ascending order by descriptor ids) */ for (pos = 0; pos < DT->dt_no_did (DT is ranked && strcmp(DT->dt_entry[pos].dt_did.did, d_id->did) < 0; -H-pos) new descriptor id should be added to DT at position 'pos' */ move down the descriptor ids in DT to make space for the new one for (k = DT->dt_no_did; k >= pos; -k) /* the /* { di_cpy(&(DT->dt_entry[k].dt_did), &(DT->dt_entry[k-l].dt_did)); DT->dt_entry[k].dt_clus_count DT->dt_entry[k].dt_dtct_pointer = DT->dt_entry[k-l].dt_clus_count; = DT->dt_entry[k-l].dt_dtct_pointer; } /* add the new descriptor id */ di_cpy(&(DT->dt_entry[pos].dt_did), d_id); DT->dt_entry[pos].dt_clus_count = 0; DT->dt_entry[pos].dt_dtct_pointer = NULL; /* increment the number of descriptor ids in -H-DT->dt_no_did; #ifdef EnExFlag printf("Exit2 CDTM_DT_INSERTsnW'); #endif return (pos); } I* end CDTM_DT_INSERT */ 78 DT */ APPENDIX D #include "flags.def #include #include #include #include #include <stdio.h> - RECORD CHECKER PROGRAM SPECS. "beno.def" "commdata.def "reqprep.def "reqprep.ext" chk_ParsedTrafUnit(ParsedRequestsPtr, dbid, err_msg) Check /* struct all the requests in a traffic unit against the record template */ req_index_definition char *ParsedRequestsPtr; err_msg[], dbid[]; { struct k, tmpl_index; REQtbl_definition *req_ptr; struct rtemp_definition int *tmpl_ptr, *get_tmpl_ptr(); #ifdef pr_flag int z=0; #endif #ifdef EnExFlag printf( "Enter chk_ParsedTrafUnit Nn"); #endif for (k = 0; (req_ptr = ParsedRequestsPtr->req_tbl_pointer[k]) != I #ifdef pr_flag while(req_prr->req_tbl[z][0]!=EORequest) printf("%s\n'Veq_ptr->req_tbl[z++]); printf("%s\n" ,req_ptr->req_tbl[z+ 1 ] ); #endif /* Get the record template. Template name req_tbl[4][0] = request type req_tbl[tmpl_index] is found as follows: = template name 79 pointer */ NULL; ++k) if((req_ptx->req_tbl[4][0]-'0')==INSERT) tmpl_index=7; else tmpl_index=8; /* switch (req_ptr->req_tbl[4][0] case - '0') { INSERT: tmpl_index = 7; break; case RETRIEVE: case case case RET_COM: DELETE: UPDATE: tmpl_index = 8; } */ tmpl_ptr = get_tmpl_ptr(dbid,req_ptr->req_tbl[tmpl_index]); if #ifdef (!chk_request(ParsedRequestsPtr->req_tbl_pointer[k],tmpl_ptr,err_msg)) EnExFlag printf("Exitl chk_ParsedTrafUnit \n"); #endif return(FALSE); } } /* end of for loop */ /* all the #ifdef parsed requests are ok */ EnExFlag printf("Exit2 chk_ParsedTrafUnit \n"); #endif retum(TRUE); }/* end chk_ParsedTrafUnit */ 80 chk_request(req_ptr, rtemp_j>tr, err_msg) Purpose: This procedure checks the validity of a request by comparing it /* againest recored template. It returns TRUE if the request is valid; /* otherwise, it returns FALSE and an error message. */ /* */ */ /* struct REQtbl_definition struct rtemp_definition char *req_ptr; *rtemp_ptr; err_msg[]; { char flag_attr; int flag_insrt, flag_dlt, flag_rtrv, flag_updt; int j, mod_type; char #ifdef chk_attr_name(); EnExFlag printf("Enter chk_requesr \n"); #endif check request type */ switch (req_ptr->req_tbl[4][0] /* - '0') { case INSERT: flag_insrt #ifdef = chk_insrt_rec(req_ptr,rtemp_ptr,err_msg); EnExFlag printf("Exitl chk_request \n"); #endif rerum( flag_insrt) break; case DELETE: j = 6; flag_dlt #ifdef = chk_non_insrt_q (req_ptr,&j,rtemp_ptr,err_msg); EnExFlag printf("Exit2 chk_request Vi"); #endif return(flag_dlt); break; 81 */ case RET_COM: case RETRIEVE: j = 6; flag_rtrv = chk_non_insrt_q (req_ptr,&J4temp_j>tr,err_msg); if (!flag_rtrv) { #ifdef EnExFlag printf("Exit3 chk_request W); #endif return (FALSE); } EOQuery j++; /* skip check target /* while ( list */ */ req_ptr->req_tbl[j][0] != ETList ) { flag_attr = chk_attr_name(rtemp_ptr,req _ptr->req_tbl[j]); r if ( flag_attr == '0' ) { concat("invalid attribute name: ",req_ptr->req_tbl|j],err_msg); #ifdef EnExFlag printf("Exit4 chk_request Vi"); #endif return(FALSE); ) j++; /* skip attribute name */ j++; /* skip the aggregate */ end while }/* check the /* if ( ( req_ptr->req_tbl[j][0] != ETList attribute for ) */ by clause */ strcmp(req_ptr->req_tbl[j],"000") != ) { flag_attr if ( = chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]); flag_attr == '0' ) I #ifdef EnExFlag printf("Exit5 chk_request Vi"); #endif cone at ("invalid attribute retum(FALSE); } ) #ifdef EnExFlag printf("Exit6 clik_request Vi"); #endif name : ",req_ptr->req_tbl[j],en_msg); retum(TRUE); break; case UPDATE: 1=6; = chk_non_insrt_q(req_ptr,&j^temp_ptr,err_msg); flag_updt if (!flag_updt) { #ifdef EnExFlag W); printf("Exit7 chk_request #endif retum(FALSE); } j++; /* skip EOQuery mod_type = req_ptr->req_tbl[j][0] */ - '0'; j++; /* skip modifier type */ /* check the attr-being-modified */ = chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]); flag_attr if ( flag_attr == '0' ) { #ifdef EnExFlag printf("Exit8 chk_request \n"); #endif concat(" invalid attribute name being modified: ", req_ptr->req_tbl(j], err_msg); retum(FALSE); > j++; /* skip attribute being modified */ /* check modifier type */ switch mod_type ( ) ( case MTO: /* check the new value for valid value type */ /* <« TBC >» */ break; case case MT1: MT2: flag_attr if ( = chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]); == '0' ) flag_attr { #ifdef EnExFlag printf("Exit9 chk_request Nn"); #endif 83 concat(" invalid base attribute name: ", req_ptr->req_tbl[j], err_msg); retum(FALSE); } break; MT3: case flag_attr = chk_attr_name(rtemp_ptr req _ptr->req_tbl[j]); if (flag_attr = t r '0') { concat(" invalid base attribute name: ", req_ptr->req_tbl[j], err_msg); #ifdef EnExFlag printf("ExitlO chk_request W); #endif retum(FALSE); } j++; /* skip base attribute */ while ( req_ptr->req_tbl[j][0] != j += #ifdef ) skip EOExpr and number of predicates */ = chk_non_insil_q(req_ptr,&j,rtemp_ptr,err_msg); 2; /* £lag_attr if EOExpr (!flag_attr) { EnExFlag printf("Exitll chkjrequest W); #endif return (FALSE); I break; case MT4: flag_attr if ( = chk_attr_name(rtemp_ptr,req_ptr->req_tbl[j]); == '0' flag_attr ) ( concat(" invalid base attribute name: ", req_ptr->req_tbl[j] ,err_msg); #ifdef EnExFlag printf("Exitl2 chk_request VT); #endif retum(FALSE); ) break: default: concati "invalid modifier type",(mod_type 84 + '0 ),err_msg); , EnExFlag #ifdef printf("Exitl3 chk_request Nn"); #endif return(FALSE); break; }/* end switch ( mod_type ) */ EnExFlag #ifdef printf("Exitl4 chk_request Nn"); #endif retum(TRUE); break; default: concat( "invalid request type: ",req_ptr->req_tbl[4],err_msg); EnExFlag #ifdef printf("Exitl5 chk_request \n"); #endif return(FALSE); break; end switch }/* ( req_ptr->req_tbl[4][0] - '0' ) */ EnExFlag #ifdef printf("Exit chk_request \n"); #endif }/* end chk_request */ chk_insrt_rec(req_ptr, rtemp_ptr, err_msg) */ /* Purpose: /* /* */ This routine checks if all (and only) attribute names in record template are in an insert request. It also checks the */ /* value type associated with each attribute. /* if ok; otherwise it returns struct REQtbl_definition struct rtemp_definition err_msg[]; char FALSE flag_ins; int k; int no_kwrd; char returns TRUE and an error message. *req_ptr; *rtemp_ptr; ( int It str_no_attr[5]; 85 */ */ EnExFlag #ifdef printf("Enter chk_insrt_rec W); #endif no_kwrd = str_to_num(req_ptr->req__tbl[5]); if ( no_kwrd != rtemp_ptr->no_cntries ) { num_to_str(rtemp_ptr->no_entries, str_no_attr); concat( "number of keywords in the insert request should be: ", str_no_attr, err_msg); EnExFlag #ifdef printf("Exitl chk_insrt_rec \n"); #endif return(FALSE); ) for k = ( 0; k < rtemp_ptr->no_entries; -H-k ) { flag_ins = insrt_attr_name(rtemp_ptr->rt_entry[k].attr_name, rtemp_ptr->rt_entry [k] value_data_type . , req_ptr ,err_msg ) if (!flag_ins) { EnExFlag #ifdef printf("Exit2 chk_insrt_rec Vi"); #endif re turn( FALSE); } end for */ EnExFlag }/* #ifdef printf("Exit3 chk_insrt_rec Nn"); #endif return(TRUE); }/* end chk_insrt_rec */ insrt_attr_name(att_name, att_val_type, req_ptr, err_msg) /* /* Purpose: This routine checks */ if attribute name in record template is */ */ and also checks the validity of the value type associated with attribute name. It returns TRUE if */ */ attribute name exsists and valid valuetype; otherwise, it /* in the insert request, /* /* /* returns struct char FLASE. REQtbl_definition */ *req_ptr; att_name[]: 86 char att_val_type; char err_msg[]; { int j; int flag_value; EnExFlag #ifdef printf( "Enter insrt_attr_name \n"); #endif j = 6; while req_ptr->req_tbl[j][0] != ( EORecord { if ( strcmp(att_name,req_ptr->req_tbl[j]) ) = ) { flag_value if = chk_value_type(att_val_type^eq_ptr->req_tbl[j]); (!flag_value) { concat("invalid value type for attribute: ", req_ptr->req_tbl[j-l], err_msg); #ifdef EnExFlag printf("Exitl chk_insrt_rec W); #endif retum( FALSE); } EnExFlag #ifdef printf("Exit2 chk_insrt_rec \n"); #endif return(TRUE); } j = }/* j + 2; /* end while skip attribute name and */ concat("missing attribute name: #ifdef EnExFlag ", att_name, err_msg); printf("Exit3 chk_insrt_rec proc \n"); #endif return( FALSE); }/* attribute value */ end insrt_attr_name */ 87 chk_non_insrt_q(req_ptr, rtemp_ptr, err_msg) i, */ /* Purpose: This procedure checks the validity of the query part of /* /* non-insert request. /* returns FALSE It returns TRUE if valid; otherwise, it and an error message. */ */ */ REQtbl_definition *req_ptr; rtemp_definition *rtemp_ptr; struct struct char err_msg[]; int *i; int flag_value; { char chk_attr_name char flag_attr; ( ); EnExFlag #ifdef printf("Enter chk_non_insrt_q Vi"); #endif while ( req_ptr->req_tbl[*i][0] != EOQuery ) { flag_attr if ( = chk_attr_name(rtemp_ptr^req_ptr->req_tbl[*i]); flag_attr == '0' ) { concat( "invalid attribute name: ", req_ptr->req_tbl[*i], err_msg); EnExFlag #ifdef printf("Exitl chk_non_insrt_q \n"); #endif return (FALSE); } ++*i; /* skip attribute name */ ++*i; /* skip relational operator */ flag_value if = chk_value_type(flag_attr,req_ptr->req_tbl[*i]); (!flag_value) ( concat( "invalid value for attribute: ", req_ptr->req_tbl[*i-2], err_msg); #ifdef EnExFlag printf("Exit2 chk_non_insrt_q \n"); #endif retum(FALSE); 88 ++*i; /* skip attribute value */ req_ptr->re<Ubl[*i][0] == ++*i; /* skip EOConj */ if ( }/* end while EOConj ) */ EnExFlag #ifdef printf("Exit3 chk_non_insrt_q \n"); #endif return(TRUE); }/* end chk_non_insrt_q */ chk_attr_name(rtemp_ptr, attribute) char */ Purpose: attribute in the */ This procedure checks if an name exsists I* /* record template. */ If the attribute is in the record template, /* it returns the type of value (i, s, f....) for the attribute. */ /* /* If the attribute is not in the record template, rtemp_definition struct char it returns '0'. */ *rtemp_ptr, attribute []; { int i; EnExFlag #ifdef printf("Enter chk_attr_name \n"); #endif for i ( if ( = 0; i < rtemp_ptr->no_entries; ++i ) strcmp(rtemp_ptr->rt_entry[i].attr_name, attribute) { EnExFlag #ifdef printf("Exitl chk_attr_name VT); #endif return(rtemp_ptr->rt_entry[il.value_data_type); } #ifdef EnExFlag printf("Exit2 chk_attr_name \n"); #endif return('O'); }/* end chk_attr_name */ 89 == ) chk_value_type(value_type, val) /* Purpose: /* This procedure checks the validity of a value. TRUE /* otherwise, if valid; char value_type; char val[]; returns */ FALSE. returns it */ It { i,negflag=0; int EnExFlag #ifdef printf("Enter chk_value_type \n"); #endif switch ( value_type ) { case V: allow for negative numbers */ /* < '0' retum(FALSE); if((val[0] val[0] II > '9')&&(val[0]!='-')) else if(val[0]=='-') /* since it * not just is negative, '-' make sure numbers follow, */ negflag=l; for(i=l; val[i]!='\0'; if(val[i] #ifdef < '0' II ++i val[i] > ) '9') { EnExFlag printf("Exitl chk_value_type \n"); #endif retum(FALSE); } /* scold the user if just a negative sign */ if((negflag)&&(i=l)) return( FALSE); #ifdef EnExFlag printf("Exit2 chk_value_type \n"); #endif return(TRUE); break; 90 */ case V: EnExFlag #ifdef printf("Exit3 chk_value. .type V); #endif return(TRUE); break; case T: EnExFlag #ifdef printf("Exit4 chk_value. .type W); #endif return(TRUE); break; default #ifdef : EnExFlag printf("Exit5 chk_value_ .type V); #endif return(FALSE); break; }/* #ifdef end switch */ EnExFlag printf("Exit6 chk_value_type \n"); #endif }/* end chk_value_type */ 91 LIST 1. OF REFERENCES Korth, H. F. and Siberschatz, K., Database System Concepts, p.348, Book McGraw- Hill Co., 1986 2. MacLennan, B. J., Principles of Programming Languages: Design, Evaluation and Implementation, 2d ed., p.432, CBS College Publishing, 1987. 3. Stubbs, D. Modula-2, F., and Webre, N. W., Data Structures with Abstract Data Types and Brooks/Cole Publishing Co., 1987. p. 396, 4. Demurjian, S. A. and Hsiao, D. K., "New Directions in Database-Systems Development," Technical Report, NPS-52-85-001, Naval Research and Postgraduate School, Monterey, California, February 1985. 5. and Hsiao, D. K., "The Use of a Database Machine for Supporting Workshop on Computer Architecture for Nonnumeric Processing (August 1987). Banerjee, J. Relational Databases," Proceedings 5th 6. Benerjee, J., Hsiao, D. K., and Ng, F., "Database Transformation, Query Transformations and Performance Analysis of a Database Computer in Supporting Hierarchical Database Management," IEEE Transactions on Software Engineering Vol. SE-6, No. I, (January 1980). 7. Benerjee, and Hsiao, D. K., "A Methodology for Supporting Existing Databases with New Database Machines," Proceedings of National Conference (1978). J. CODASYL ACM 8. G.. "Design and Analysis of an SQL Interface for a Multi-Backend Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, Macy, March 1984. 9. Weisher, D., "Design and Analysis of a Complete Hierarchical Interface for a Multi-Backend Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, June 1984. 10. Wortherly, C. R., "Design and Analysis of a Network Interface for a MultiBackend Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, December 1985. 11. "The Design and Analysis of a Complete Entity-Relationship Interface for the a Multi-Backend Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, December 1985. Goisman. P. L., 92 12. Kloepping, G. R. and Mack, J.F., "The Design and Implementation of a Relational for the Multi-Lingual Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, June 1985. Interface 13. Benson, T. P. and Wentz, G. L., "The Design and Implementation of a Hierarchical Interface for the Multi-Lingual Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, June 1985. 14. Emdi, B., "The Implementation of a CODASYL-DML Interface for the MultiLingual Database System," M. S. Thesis, Naval Postgraduate School, Monterey, December 1985. California, 15. Anthony, Interface J. A. and Billings, A. for the J., Multi-Lingual "The Implementation of an Entity-Relationship Database System," M. S. Thesis, Naval Postgraduate School, Monterey, California, December 1985. and Menon, M. J., "Design and Analysis of a Multi-Backend Database System for Performance Improvement, Functionality Expansion and Capacity Growth (Part I)," Technical Report, OSU-CISRC-TR-81-7, The Ohio State University, Columbus Ohio, July 1981. 16. Hsiao, D. K. 17. and Menon, M. J., "Design and Analysis of a Multi-Backend Database System for Performance Improvement, Functionality Expansion and Capacity Growth (Part II)," Technical Report, OSU-CISRC-TR-81-8, The Ohio State University, Columbus Ohio, August 1981. Hsiao, D. K. 93 INITIAL DISTRIBUTION LIST No. Copies 1. Defense Technical Information Center Cameron Alexandria, Virginia 2. Library, 2 Station 22304-6145 Code 0142 2 Naval Postgraduate School Monterey, California 93943-5002 3. Chief of Naval Operations Director, Informations Systems (OP-945) Navy Department Washington, D.C. 20350-2000 1 4. Department Chairman, Code 52 Department of Computer Science Naval Postgraduate School Monterey, California 93943-5000 2 5. Curriculum Officer, Code 37 Computer Technology Naval Postgraduate School Monterey, California 93943-5000 6. Professor David K. Hsiao, 1 Code 52Hq Computer Science Department 2 Naval Postgraduate School Monterery, California 93943-5000 7. Professor Steven A. Demurjian 2 Computer Science & Engineering Department The University of Connecticut 260 Glenbrook Road Storrs, Connecticut 8. 06268 Deborah A. McGhee 232 Foxhunt Road Columbia. S.C. 29223 2 94 . r Thesis M18828 c.l McGhee Database creation and/or reorganization over multiple database backends. 22 JAM 93 M18828 c.l 38360 McGhee Database creation and/or reorganization over multiple database backends. J