Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 2 DBMS Architecture Overview BASIS is a database management system (DBMS) composed of application databases, application database journals, special purpose databases like the Authority Database and the Master Definition Database, and other system resources. For introductory information about these databases, see Database Definition and Development, “Overview of Database Definition and Development.” Application databases always consist of a pair of related databases: a Definition Database (DDB) and a Record Database (RDB). The DDB describes what is stored in the RDB: The records you want to store The fields in each record How the fields are indexed What the fields look like to users and How the records and indexes are constructed The RDB stores the data and related indexes. DBMS Architecture 25 In addition, application databases may have a special purpose RDB file to store queued updates. Optionally, your application may also have thesaurus databases (TDBs); thesauri are stored in a special purpose database that has a combination of DDB and RDB characteristics. Journals can be used with DDBs, RDBs, queues, and thesauri. The following sections describe these in further detail. Databases A BASIS DBMS contains one or more DDBs, one or more RDBs, an Authority Database (ADB), and a Master Definition Database (MDDB). Definition Databases (DDBs) A DDB is a special purpose Record Database that stores the Actual Data Model (ADM), User Data Model (UDM), and Structural Data Model (SDM) objects which define what is to be stored in your RDB. Creating and Maintaining the DDB The DMDBA utility lets you query, update, and display information in a DDB. DDBs are journaled and recovered the same way as RDBs. There is no queue file associated with a DDB; however, definition changes will accumulate in the form of change orders until an APPLY is done, at which time all pending changes are put into effect. Changes cannot be selectively applied. If one change causes an error, none of the pending changes is accepted. The DDB contains statement records and system records. Statement Records Statement records can be manipulated via the DMDBA utility and extracted by the DMDDBE utility to regenerate a source file of the current definition. Statement record information can also be queried and displayed by opening the Xdatabase.USER model in FQM, just as you would a normal database. System Records System records are the condensed, “applied” form of statement records used by other system modules (the Kernel, HVU, etc.) APPLY The DMDBA compilation of the statement records to create system records is performed with the APPLY action. 26 DBMS Architecture For more information about DDBs, see Database Definition and Development and see “Changing the Definition Database.” Record Databases (RDBs) An RDB can consist of one or more files which are used to store data and indexes. Initializing the RDB RDB files are created and reinitialized via DMDBA. The initial size of the file will depend on whether FIXED or DYNAMIC areas are used. FIXED areas are allocated to the file when it is initialized. DYNAMIC areas cause the file to grow gradually but may result in a more fragmented file. Controlling file extents Only VMS offers control over the extent or size to be used for a file, and that syntax is supported for BASIS datafile descriptors. For more information about file extents, see BASIS Reference, “VMS System Specifics.” Updating the RDB RDB updating can be done sharing access with other users, or it can be done with exclusive access. How the database is opened controls which type of access is employed. OPEN/DB Updating via OPEN/DB allows concurrency—many users can be updating different records at the same time. The Kernel controls locking of the database. Storage of data and corresponding updating of related indexes are done online. High Volume Update (HVU); Fundamental Query and Manipulation (FQM); OpenAPI; Document Handler Interface (DHI), if available to your operating system; and BASIS Webtop provide for this type of updating. This kind of access supports interactive updating. OPEN/DIRECT Updating via OPEN/DIRECT requires exclusive access to the database by one user and is often 5 to 10 times faster than updating using OPEN/DB. Because it bypasses the Kernel, no journal records are created. HVU, DMQ, and DMR can use this type of updating to load or update data and indexes. This kind of access supports batch-oriented updating. (DMQ is not available on Windows.) The structure of RDB files is described in the “Structure of RDB Files” section later in this chapter. DBMS Architecture 27 Authority Database (ADB) The ADB helps to control access to BASIS application databases. It contains a record for each database and records for each registered user. Each database record indicates the owner (creator) of the database, where its DDB is located, and the status of the database. User records indicate each user’s database privileges, read and write access codes, and priority. For more information about adding and changing information in the ADB, see System Administration. Master Definition Database (MDDB) The MDDB stores the definition of DDBs and portions of Thesaurus Databases. When updates are made to a DDB or TDB, the MDDB content is used to control construction of objects in these databases. The MDDB is delivered as part of the BASIS installation. This database cannot be changed onsite; thus the structure of all DDBs is fixed, but the data, the definition of each database, can vary. Occasionally, the MDDB changes (for example, when support of a new database object, such as a new parameter for one of the definitions in the ADM, is introduced). When this happens, it occurs with a new release of BASIS. Changes to the MDDB normally require DDB or TDB database system records to be regenerated using the new MDDB. For information about the MDDB, see System Administration. For more information about regenerating system records, see Database Definition and Development. Support Databases and Features Optional files, databases, and features supporting a DBMS include queue database files, one or more Thesaurus Databases (TDBs), database journal files, national language support features, and a markup and style guide. Queue Database Files UNIX, VMS A queue file is a special type of database file. The queue can be used as a holding area for RDB changes that are applied as batched updates or for records that do not pass validation. Only the primary key index is maintained for records in the queue, and the structure of that index is located in the queue file. The internal structure and format of the queue file is similar to that of regular data files. Multiple queue files can be defined for a single database. The queue file can contain only queue AREAs; it may not include regular data or index AREAs. 28 DBMS Architecture Windows A queue is a special type of database file. When a record is checked out of the database, the system automatically creates a small record (referred to as Queue Status Record) containing information relevant to the checked-out status of that record. The queue is where that information is stored. Queue Status Records contain not only information about the CHECKOUT status of the record but also the user ID of its owner (the user who checked it out), the date and time that the record was checked out, and optionally, a comment added and stored in the Queue Status Record during the CHECKOUT operation. The internal structure and format of the queue file is similar to that of regular data files. Multiple queue files can be defined for a single database. The queue file can contain only queue AREAs; it may not include regular data or index AREAs. For more information about queue management, see “DMBS Services” and “Using Queues.” Thesaurus Databases (TDBs) A thesaurus is a BASIS database; its structure is defined in the MDDB. Unlike a regular RDB, both the thesaurus definition and data are stored in a single file. Thesaurus Manager (TM) The Thesaurus Manager module lets you define and update the definition of the relations in the thesaurus and also change the contents of the thesaurus. For more information about the Thesaurus Manager module, see Thesaurus. Use with a RDB A thesaurus can have a very active or passive role for a given application database. DBMS Architecture 29 Search control A thesaurus may be used to facilitate user queries. When a user states the terms of interest to be searched, the supplied terms are looked up in the thesaurus for related terms, and the search is then broadened to include all the similar terms. You can also have user-supplied terms replaced with preferred terms, and you can notify users of related terms to assist users in refining search terms. Verify A thesaurus can be used to ensure that only values defined in the thesaurus are entered for a field. Data switch A thesaurus can be used to switch data supplied for a field to preferred terms. How a thesaurus is used by a specific database is determined in the ADM of the application DDB. For more information about the SEARCH_CONTROL_SET and THESAURUS_DATA_CONTROL_SET definitions, see Database Definition and Development. Use with multiple RDBs If the thesaurus is tied into validation or expanding searches, it must be controlled by the same Kernel as the application database. When you open an application database that references a thesaurus, the TDB is opened implicitly. Use with DMNAM If you are using the BASIS Network Access Method (DMNAM) utility, thesauri should not be declared as global databases because the implicit open bypasses the Global Database Directory. If the TDB cannot be opened, the open of the application database fails also. For more information about DMNAM and the Global Database Directory, see System Administration. Database Journals The Kernel can record interactive, OPEN/DB updates in journal files. The Kernel saves changes made to the RDB by sequentially writing transactions to the journal file before the changes are committed to the RDB. Once a transaction is FINISHed, the transaction in the journal is marked as a committed change. The journal contains Start and End Transaction entries and Image entries. Note: 30 DBMS Architecture OPEN/DIRECT updates and DMQ updates are not journaled. Journaling provides a basis for recovery from damaged DDBs, RDBs, queues, and TDBs. If a database is damaged by either software or hardware failure, restoring a prior version of the database and replaying the journals enables you to restore a damaged database without significant loss of time and data. You may also back out updates from a database if "before images" are kept in the journals Journal files are defined in the DDB SDM JOURNAL statement. The DMDBA utility controls journal usage. For more information about the DMJ utility, which provides journal REPLAY and BACKOUT facilities, see “DMJ, Journal Processor.” For more information about how journaling works, see Database Definition and Development. For information about using journals to recover from database damage, see “Recovery.” National Language Support Features Among other system resources in the BASIS system are National Language Support features that adapt BASIS products to local languages. Among the National Language Support features are tools to modify messages and prompts. For information about using these features, see the NLS online help file in the DMHELP area of BASIS. For information about accessing online help about a utility on your operating system, see “Introduction.” Markup and Style Guide The BASIS system uses a special system file called the markup and style guide to handle rich text documents—namely, documents whose text is enhanced by variations of font, renditions (bolding or italics), and formatting. A markup and style guide provides the following capabilities: Lets you import documents produced on word processors or other document processing systems into your database DBMS Architecture 31 Lets you insert markup to indicate text enhancements, formatting, or special characters in documents created in a simple file editor. Lets you see enhancements, formatting, and special characters when viewing documents in your database Lets users with different terminals, printers, personal computers, and workstations take advantage of the special presentational features of their equipment Lets you export documents from the database restored in their original format As illustrated in Figure 2-1, a markup and style guide consists of markup definitions, converters, presenters, context unit parsers, and various data resources. Documents Markup and Style Guide Markup Definitions Converters Presenters Context Unit Parsers Data Resources BASIS Database Figure 2-1: Role of a markup and style guide Markup definitions These are statements that define the meaning of tags you use in marking up a document. The default markup and style guide supplies a standard set of markup tag definitions which you can add to or modify. Converters These change the external format of a document to the BASIS internal storage format. They reverse the process on export. The default markup and style guide supplies you with a standard set of converters. You can define as many converters as you need. 32 DBMS Architecture Presenters These map a document from its descriptive form to a presentational form for display or print. The system uses a presenter to translate markup tags to presentation procedures, which may include specifications for type font, size, rendition, word wrap, and other visual attributes. A presenter can also hide or reveal components of a document. You could use this hide/reveal capability, for example, to suppress proprietary parts of a report and thus create a “generic version.” The default markup and style guide supplies you with a standard set of presenters. You can define as many presenters as you need. Context unit parsers These help in indexing text fields for proximity and phrase searching. A context unit is a sentence, paragraph, or some other portion of text. In the style guide, you can define how the system will locate context units when importing a document. Context unit parsers define sentence delimiters, abbreviations, etc. The default markup and style guide supplies default context unit parsers suitable for most applications. For more information about the default markup and style guide and how to make changes to it, see Markup and Style Guide. Structure of RDB Files A functional database is composed of many individual files. Some are simple sequential files containing Data Definition Language (DDL) or BASIS commands and parameters. Other files are specially structured to contain database definitions, data, and indexes. The structure and function of files that comprise the RDB is briefly described in this section. The RDB consists of files composed of areas, which are composed of pages, which are composed of records, which are composed of segments. Files The RDB can be composed of one or more physical files. Multiple versions of the database can be defined, in which case another full set of RDB files is created for each version. All versions must be governed by the same DDB. DBMS Architecture 33 Areas Each data file has one or more AREAs. Every record type and index must be assigned to an AREA, which is then assigned to a FILE. In this manner the user determines the contents of each file. You assign different record types and indexes to areas of files by defining FILES and AREAs in your SDM. Pages Data records are stored in units of pages. Page size differs on various operating systems, but a page is always the finest granularity possible for read/write operations. For 32-bits-perword machines, up to 60 records can be stored on a single page. A page must be at least 2000 bytes long. Its size is fixed but depends on the I/O system of the host computer. The page size is selected so it can be used efficiently. Each page contains a list of records stored on that page. Records All of the substructures of a file are constructed from records. A record may be a special system record, a reference or node record (used to form the indexes), or a data record. Data records can be data or supporting data structures. A page could contain records from each of the categories. A user sees only views of data records. The same scheme is used to allocate, write, and read every kind of record. The record pointer describes the physical address of a record in the database. A record pointer contains a file number (data records can be distributed across files), a page number, and a record number. Record pointers are used in BASIS index structures to reference a record. Segments The basic unit of storage of a record is the record segment, the piece of a data record that resides on one database page. One or more segments are used to store a record. For conventional documents only one segment is necessary. For continuous and sectioned 1 documents, multiple segments are used. Text streams, which typically require large amounts of storage, are broken into segments that are linked. 1 34 DBMS Architecture Sectioned records are not available on Windows. Structure of Indexes Indexes are carefully organized lists of index terms which allow the data of a field to be searched quickly. Indexes Three types of indexes can be created: EXACT Content of a field is indexed identically as it appears in the field. UNIQUE A unique index is a special case of an exact index in which no two index terms can be the same. INCLUSIVE An index term is created for each word contained in the field. This type of index is used for text fields (character strings or text streams). Common words that are of no interest when searching can be excluded from the index by using a stopword list. Numbers can optionally be excluded. A breaklist is used to determine rules for distinguishing words from punctuation. For an overview of indexes and how they are defined in the DDB, see Database Definition and Development, “Indexes.” Following are details about how indexes are structured. B-tree Index Structure A list of unique values for a field is used to form an index directory. This directory is kept in a structure known as a B-tree. BASIS stores information about the number and location of terms in a series of B-trees. Figure 2-2 displays a simplified version of a series of B-trees storing index information for an indexed simple field that contains the following terms and numbers of occurrences: Field Value Occurrences Record Numbers (key values) BLUE 7 3, 8, 12, 15, 18, 20, 22 CHARTREUSE 1 1 GREEN 3 2, 4, 6 DBMS Architecture 35 Root Node BLUE CHARTREUSE OTHERS BLUE See REF B-TREE CHARTREUSE REC1 REC4 OTHERS REC8 REC15 REC20 OTHERS REC3 REC8 GREEN See REF B-TREE REC22 REC2 REC4 REC6 REC18 REC20 REC12 REC15 Terminal Nodes Figure 2-2: B-tree index structure A B-tree is formed from special variable-length node records which are arranged in a hierarchical or tree structure. The top node of Figure 2-2 is the root node of the tree. This particular structure has a term B-tree with two levels and two reference B-trees beneath the blue and green terms. Sometimes a B-tree can be a single node, but usually it is a two- or three-level structure. BASIS can handle a tree with 16 levels. Each term B-tree node contains entries where part of the entry is a key. The keys are kept in alphabetic or numeric order (depending on the key type) within each node. When linked together, the keys in entries of the terminal nodes form an ordered list of all references. The key type is related to the data type of the corresponding field. The key can be: A binary integer value A binary floating point value 36 DBMS Architecture A packed decimal string A fixed-size character string (1:250 characters) A variable-size character string (1:250 characters) Fixed-size keys form fixed-size entries in the nodes and are very easy to manipulate. Variable-size character keys are important when a field may have values that significantly vary in size. Variable-size keys require a slightly more complicated node format, but are efficiently handled by BASIS. The higher-level nodes have abbreviated entries that have a key and a pointer to a lowerlevel node. In non-terminal nodes, each entry is derived from the highest entry of the node it references. Term entries The non-terminal nodes typically contain entries for each term in the index (regardless of whether the index type is unique, exact, inclusive). The size of the entry key depends on the size of the index term. The entry data comprises either lists of the records which include the term or a pointer to a subtree of nodes that contain the reference tree for records with the term. Terminal nodes The terminal nodes (leaf nodes) contain entries that have a key and one or more words of reference data. References Reference data, or references, consist of a unique record pointer and reference details. Unique record pointer The unique record pointer identifies a specific record which has a field value representing the term which owns the reference. Unique record pointers can be direct or indirect. (See Figures 2-3 and 2-4.) DBMS Architecture 37 Reference details Reference details, also commonly called postings and proximity information, require one or two words of storage, depending on the index requirements. Reference details include the file number, page number, and record number (or the system key). Reference details in the reference B-tree can be 32 or 64 bits long (one or two computer words depending on the platform). Simple fields with unique/exact indexes have just 32-bit postings because they do not require any word-level proximity information. Compound unique/exact and all inclusive indexes are 64-bit postings by default. Information in postings differs depending on the type of index (unique, exact, and inclusive) and the proximity options elected. For a UNIQUE index the single reference to the term is stored with the key in the terminal node of the term tree, and no reference tree is needed. Following is a list of reference details for various types of indexes: UNIQUE | EXACT mfisi, sect#, cntx# INCLUSIVE mfisi, sect#, cntx#, token# where 38 DBMS Architecture mfisi is the multi-field index field indicator, the field from which the index reference is produced. It is used only with multifield indexes (MFIs). For more information about multi-field indexes, see Database Definition and Development. sect# is the section number. It is used only in sectioned documents 1. cntx# is the context unit number. How this is determined varies depending on the field definition. For a unique-indexed or exact-indexed compound standard field, the context unit is the subfield number. For an inclusively indexed text stream, the context unit is normally the sentence. token # is each piece of text in the field being indexed. Each token (including STOP_WORDS) is numbered. The BREAK_LIST used with the index determines what characters delimit text tokens. For more information about BREAK_LIST, see Database Definition and Development. Text field postings For textual fields, reference details can also identify where in the field each term is referenced. For an INCLUSIVE index, a posting might provide the information that the term is referenced at the nth word of the nth sentence of the referenced record’s field. Reference tree If a key references only one record, the reference is in the entry. Keys that require two small references (just a record pointer) also carry the references in the entry. The keys that have more references point to a reference tree, which can accommodate 2000000000 references. These key entries indicate how many references are in the tree, how many different records are referenced, and where the root of the reference tree is located. Ordering Reference entries are stored in unique record pointer order. This is done for efficiency of updating (looking for old references) and search operations. FIND command processing makes use of the ordering in intermediate set manipulations which involve merging and comparing references. DBMS Architecture 39 Index Structure The SDM statement RECORD_STORAGE parameter INDEX_STRUCTURE controls what is used for unique record pointers in user-defined indexes. References can be direct or indirect. For a list of advantages and disadvantages of each structure and a comparison of functionality, see Database Definition and Development, “Indexes.” Direct Indexing 1 When a direct index is constructed, the physical location of the record is used to locate the record. The unique record pointer in this case is the record’s record pointer. Sectioned records are not available on Windows. Direct indexing offers more efficient retrieval than indirect indexing because BASIS does not have to look up the physical address RDB INDEX 1 2 Physical Location Figure 2-3: Direct indexing If the index structure is direct, references contain the physical location of the record pointer, and the record location is directly available. 40 DBMS Architecture With direct indexes, a FIND command without ORDER BY or SORT BY orders the result set in record pointer order. Since the record pointers have no particular relationship to user data, members in result sets with direct indexes seem to be in an unpredictable order. Note: Using area distribution lists with direct indexes has significant ramifications for the optimal setting of the insert method; see “Node splitting” below. For more information about defining an area distribution list and RECORD_STORAGE, see Database Definition and Development, “SDM Definitions.” Indirect Indexing With indirect indexes, the reference’s unique record pointer is the value of the primary key (date key or system key or user key) for the associated record. As a result, the references in the index are stored in primary key order. The primary key index is used to look up the record pointer. The key data of the primary key index is the only index that points directly to the records using record pointers. Because references in the index are stored in primary key order, members in result sets are displayed in primary key order unless otherwise sorted by ORDER BY or SORT BY on the FIND command. Indirect indexes are also necessary for STORE/SET and RESTORE/SET FQM commands. RDB Physical System Location Key INDEX 4 1 2 1 2 Figure 2-4: Indirect indexing DBMS Architecture 41 Because references in the index are stored in primary key order, members in result sets are displayed in primary key order unless otherwise sorted by ORDER BY or SORT BY on the FIND command. Capacity Proximity information (section, context, token) is stored in the reference details of an index reference. With the CAPACITY parameter of the SDM RECORD_STORAGE definition, you can tailor how much of the posting is spent for the different levels of proximity. In other words, you can specify how much of the posting capacity to allocate for any one piece of proximity information based on the data you have. For more information about capacity options available, see Database Definition and Development. Node splitting An index node can hold only a finite number of entries, so at some point an insertion of a new entry will cause a full node to split into two additional nodes, each representing a subset of the previous node. 42 DBMS Architecture Padding To avoid constant splitting, nodes are normally constructed with space for future insertions. The amount of space left in nodes for future insertions is called padding. If many insertions are made, the padding will be consumed and a node split will occur. This could result in a new level in the B-tree. Generally, keeping the number of levels low results in less space to contain the nodes and fewer I/Os to locate leaf nodes of interest. On the other hand, if you have more padding than you need, you waste space in the nodes. Insert method Insert method determines the general rule for padding when new nodes are constructed and thus influences node splitting. Insert methods are defined for each field indexed with the SDM INDEX statement. For more information about rules for selecting an insert method, see Database Definition and Development, “Indexes.” Note: If updating patterns should change, you can change the insert method defined for an index after the index is built. You can do so without having to re-index a field. Term nodes The SDM INDEX statement INSERT_METHOD parameter for TERMS determines how new term nodes are created when an existing term node is filled and thus must be split. New nodes are created half, 3/4, or entirely empty. If the terms are coming in a random order (such as a textual field), having space reserved allows for some insertions before the node must split again. If terms are entering the system in a sequential order (such as an ever-increasing date or key value), new terms will naturally fall at the end of the index, and it is not necessary to reserve free space for future insertions. DBMS Architecture 43 Reference nodes Reference nodes also fill up and split, and thus also benefit by padding. Since there are typically many more references than terms, reference node padding is of more importance than term node padding. The SDM INDEX statement INSERT_METHOD parameter for REFERENCES controls this in the same manner as the INSERT_METHOD for TERMS. New nodes are constructed half empty for RANDOM inserting, 3/4 empty for MOSTLY_SEQUENTIAL inserting, and entirely empty for SEQUENTIAL inserting. A typical inclusive index will have its term tree grow rapidly at first, then slow down after most of the vocabulary is entered. The reference tree, however, will continue to grow as long as records are added to the database. For this reason, it is important to understand the difference between the term tree and reference tree growth patterns and assign an appropriate setting for the reference insert method. Replacing records You typically think of padding issues when loading new records, but replacing records may impact node splitting in the same manner. A SEQUENTIAL insert method or little or no padding generally would not be appropriate for record types involving a significant volume of replaces. Area distribution list cycles 44 DBMS Architecture If you have direct indexes and are also using an area distribution list, a RANDOM insert method is appropriate. The reason relates back to the unique record pointer used for direct indexes; the high order component of the record pointer is the file number of the record. This file number cycles with area distribution. Therefore, if new data records are being cycled into several physical files, the associated index references will “cycle” also, in terms of where they are inserted into the reference B-tree. As a result, even if you are logically adding references sequentially, the use of this feature results in random-like references. For more information about area distribution and about the AREAS parameter of the SDM RECORD_STORAGE statement, see Database Definition and Development, “SDM Definitions.” Index Balancing BASIS uses efficient algorithms to search a tree and insert or delete entries from a tree. Since nodes are kept in variablesize records, the size of each node may vary depending on the number of entries kept in a node. It is important to keep the levels balanced so the access path to any terminal node is the same (and as short as possible). This provides a consistent access time to any key in the tree. The update algorithms use careful replacement techniques so a pointer is never updated in a file before the data it references has been written to the file. This prevents problems resulting from “broken” structures caused by catastrophes that occur during an update. Extensive updating (key insertions and deletions) may result in excessive node splits to the point where indexes may grow much larger (more levels) than needed. In this condition, searches will take longer to access the leaf nodes of the tree, and database files may grow much larger than necessary. Compression The excess space may be removed with DMR ACTION=COMPRESS. Compression is supported on files used to store indexes, not record data. ACTION=COMPRESS will rebalance the trees with the node padding specified (the padding specified does not have to match the insert method). For more information about compression, see “DBMS Services” and see “DMR, Restructure.” Note: HVU OPEN/DIRECT always inserts no padding on an initial load. It ignores the insert method. It optimizes this case because the index references in the initial load are in sorted order. As a result, a very compact index is created. Depending on your insert method and data loaded, the next significant loading of data may result in a significant increase in the size of the files storing large indexes. The rate of growth, however, will stabilize as the previously compact index nodes reach padding thresholds. If desired, you can use compress and change the padding to avoid such a transition. DBMS Architecture 45 Multi-field index The multi-field index (MFI) is a special type of index structure. It is populated with data from several fields (all from the same record type). In this case, the reference tree contains information about the field in which the term appeared as well as all the other proximity information. Because there may be a large number of references, it is especially important to set the REFERENCES INSERT_METHOD appropriately to RANDOM. In general, searching an MFI is preferred over a search on an equivalent field list because only one index structure needs to be touched. However, an MFI should not be used as a substitute index for a field that is frequently searched. Since the MFI has data for multiple fields, using it to search a single field requires extra processing to strip out the irrelevant data. 46 DBMS Architecture