Download DLM Chapter 2:DBMS Architecture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Chapter 2
DBMS Architecture
Overview
BASIS is a database management system (DBMS) composed of application databases,
application database journals, special purpose databases like the Authority Database and
the Master Definition Database, and other system resources. For introductory information
about these databases, see Database Definition and Development, “Overview of Database
Definition and Development.”
Application databases always consist of a pair of related databases: a Definition Database
(DDB) and a Record Database (RDB).
The DDB describes what is stored in the RDB:

The records you want to store

The fields in each record

How the fields are indexed

What the fields look like to users and

How the records and indexes are constructed
The RDB stores the data and related indexes.
DBMS Architecture  25
In addition, application databases may have a special purpose RDB file to store queued
updates. Optionally, your application may also have thesaurus databases (TDBs);
thesauri are stored in a special purpose database that has a combination of DDB and RDB
characteristics. Journals can be used with DDBs, RDBs, queues, and thesauri.
The following sections describe these in further detail.
Databases
A BASIS DBMS contains one or more DDBs, one or more RDBs, an Authority Database
(ADB), and a Master Definition Database (MDDB).
Definition Databases (DDBs)
A DDB is a special purpose Record Database that stores the Actual Data Model (ADM),
User Data Model (UDM), and Structural Data Model (SDM) objects which define what is
to be stored in your RDB.
Creating and
Maintaining the
DDB
The DMDBA utility lets you query, update, and display
information in a DDB. DDBs are journaled and recovered the
same way as RDBs. There is no queue file associated with a
DDB; however, definition changes will accumulate in the form
of change orders until an APPLY is done, at which time all
pending changes are put into effect. Changes cannot be
selectively applied. If one change causes an error, none of the
pending changes is accepted.
The DDB contains statement records and system records.
Statement Records
Statement records can be manipulated via the DMDBA utility
and extracted by the DMDDBE utility to regenerate a source file
of the current definition. Statement record information can also
be queried and displayed by opening the Xdatabase.USER
model in FQM, just as you would a normal database.
System Records
System records are the condensed, “applied” form of statement
records used by other system modules (the Kernel, HVU, etc.)
APPLY
The DMDBA compilation of the statement records to create
system records is performed with the APPLY action.
26  DBMS Architecture
For more information about DDBs, see Database Definition and Development and see
“Changing the Definition Database.”
Record Databases (RDBs)
An RDB can consist of one or more files which are used to store data and indexes.
Initializing the
RDB
RDB files are created and reinitialized via DMDBA. The
initial size of the file will depend on whether FIXED or
DYNAMIC areas are used. FIXED areas are allocated to the
file when it is initialized. DYNAMIC areas cause the file to
grow gradually but may result in a more fragmented file.
Controlling file
extents
Only VMS offers control over the extent or size to be used for
a file, and that syntax is supported for BASIS datafile
descriptors. For more information about file extents, see BASIS
Reference, “VMS System Specifics.”
Updating the RDB
RDB updating can be done sharing access with other users, or
it can be done with exclusive access. How the database is
opened controls which type of access is employed.
OPEN/DB
Updating via OPEN/DB allows concurrency—many users can
be updating different records at the same time. The Kernel
controls locking of the database. Storage of data and
corresponding updating of related indexes are done online.
High Volume Update (HVU); Fundamental Query and
Manipulation (FQM); OpenAPI; Document Handler Interface
(DHI), if available to your operating system; and BASIS
Webtop provide for this type of updating. This kind of access
supports interactive updating.
OPEN/DIRECT
Updating via OPEN/DIRECT requires exclusive access to the
database by one user and is often 5 to 10 times faster than
updating using OPEN/DB. Because it bypasses the Kernel, no
journal records are created. HVU, DMQ, and DMR can use
this type of updating to load or update data and indexes. This
kind of access supports batch-oriented updating.
(DMQ is not available on Windows.)
The structure of RDB files is described in the “Structure of RDB Files” section later in
this chapter.
DBMS Architecture  27
Authority Database (ADB)
The ADB helps to control access to BASIS application databases. It contains a record for
each database and records for each registered user. Each database record indicates the
owner (creator) of the database, where its DDB is located, and the status of the database.
User records indicate each user’s database privileges, read and write access codes, and
priority. For more information about adding and changing information in the ADB, see
System Administration.
Master Definition Database (MDDB)
The MDDB stores the definition of DDBs and portions of Thesaurus Databases. When
updates are made to a DDB or TDB, the MDDB content is used to control construction of
objects in these databases.
The MDDB is delivered as part of the BASIS installation. This database cannot be
changed onsite; thus the structure of all DDBs is fixed, but the data, the definition of each
database, can vary. Occasionally, the MDDB changes (for example, when support of a
new database object, such as a new parameter for one of the definitions in the ADM, is
introduced). When this happens, it occurs with a new release of BASIS. Changes to the
MDDB normally require DDB or TDB database system records to be regenerated using
the new MDDB.
For information about the MDDB, see System Administration. For more information
about regenerating system records, see Database Definition and Development.
Support Databases and Features
Optional files, databases, and features supporting a DBMS include queue database files,
one or more Thesaurus Databases (TDBs), database journal files, national language
support features, and a markup and style guide.
Queue Database Files
UNIX, VMS
A queue file is a special type of database file. The queue can be used as a holding area
for RDB changes that are applied as batched updates or for records that do not pass
validation. Only the primary key index is maintained for records in the queue, and the
structure of that index is located in the queue file. The internal structure and format of the
queue file is similar to that of regular data files. Multiple queue files can be defined for a
single database. The queue file can contain only queue AREAs; it may not include
regular data or index AREAs.
28  DBMS Architecture
Windows
A queue is a special type of database file. When a record is checked out of the database,
the system automatically creates a small record (referred to as Queue Status Record)
containing information relevant to the checked-out status of that record. The queue is
where that information is stored.
Queue Status Records contain not only information about the CHECKOUT status of the
record but also the user ID of its owner (the user who checked it out), the date and time
that the record was checked out, and optionally, a comment added and stored in the
Queue Status Record during the CHECKOUT operation.
The internal structure and format of the queue file is similar to that of regular data files.
Multiple queue files can be defined for a single database. The queue file can contain only
queue AREAs; it may not include regular data or index AREAs.
For more information about queue management, see “DMBS Services” and “Using
Queues.”
Thesaurus Databases (TDBs)
A thesaurus is a BASIS database; its structure is defined in the MDDB. Unlike a regular
RDB, both the thesaurus definition and data are stored in a single file.
Thesaurus Manager
(TM)
The Thesaurus Manager module lets you define and update the
definition of the relations in the thesaurus and also change the
contents of the thesaurus. For more information about the
Thesaurus Manager module, see Thesaurus.
Use with a RDB
A thesaurus can have a very active or passive role for a given
application database.
DBMS Architecture  29
Search control
A thesaurus may be used to facilitate user queries. When a user
states the terms of interest to be searched, the supplied terms are
looked up in the thesaurus for related terms, and the search is then
broadened to include all the similar terms. You can also have
user-supplied terms replaced with preferred terms, and you can
notify users of related terms to assist users in refining search
terms.
Verify
A thesaurus can be used to ensure that only values defined in the
thesaurus are entered for a field.
Data switch
A thesaurus can be used to switch data supplied for a field to
preferred terms.
How a thesaurus is used by a specific database is determined in the ADM of the
application DDB. For more information about the SEARCH_CONTROL_SET and
THESAURUS_DATA_CONTROL_SET definitions, see Database Definition and
Development.
Use with multiple
RDBs
If the thesaurus is tied into validation or expanding searches, it
must be controlled by the same Kernel as the application
database. When you open an application database that references
a thesaurus, the TDB is opened implicitly.
Use with DMNAM
If you are using the BASIS Network Access Method (DMNAM)
utility, thesauri should not be declared as global databases
because the implicit open bypasses the Global Database
Directory. If the TDB cannot be opened, the open of the
application database fails also. For more information about
DMNAM and the Global Database Directory, see System
Administration.
Database Journals
The Kernel can record interactive, OPEN/DB updates in journal files. The Kernel saves
changes made to the RDB by sequentially writing transactions to the journal file before
the changes are committed to the RDB. Once a transaction is FINISHed, the transaction
in the journal is marked as a committed change. The journal contains Start and End
Transaction entries and Image entries.
Note:
30  DBMS Architecture
OPEN/DIRECT updates and DMQ updates are not journaled.
Journaling provides a basis for recovery from damaged DDBs, RDBs, queues, and TDBs.
If a database is damaged by either software or hardware failure, restoring a prior version
of the database and replaying the journals enables you to restore a damaged database
without significant loss of time and data. You may also back out updates from a database
if "before images" are kept in the journals
Journal files are defined in the DDB SDM JOURNAL statement. The DMDBA utility
controls journal usage.
For more information about the DMJ utility, which provides journal REPLAY and
BACKOUT facilities, see “DMJ, Journal Processor.” For more information about how
journaling works, see Database Definition and Development. For information about
using journals to recover from database damage, see “Recovery.”
National Language Support Features
Among other system resources in the BASIS system are National Language Support
features that adapt BASIS products to local languages. Among the National Language
Support features are tools to modify messages and prompts. For information about using
these features, see the NLS online help file in the DMHELP area of BASIS. For
information about accessing online help about a utility on your operating system, see
“Introduction.”
Markup and Style Guide
The BASIS system uses a special system file called the markup and style guide to handle
rich text documents—namely, documents whose text is enhanced by variations of font,
renditions (bolding or italics), and formatting. A markup and style guide provides the
following capabilities:

Lets you import documents produced on word processors or other document
processing systems into your database
DBMS Architecture  31

Lets you insert markup to indicate text enhancements, formatting, or special
characters in documents created in a simple file editor.

Lets you see enhancements, formatting, and special characters when viewing
documents in your database

Lets users with different terminals, printers, personal computers, and workstations
take advantage of the special presentational features of their equipment

Lets you export documents from the database restored in their original format
As illustrated in Figure 2-1, a markup and style guide consists of markup definitions,
converters, presenters, context unit parsers, and various data resources.
Documents
Markup and Style Guide
Markup Definitions
Converters
Presenters
Context Unit Parsers
Data Resources
BASIS
Database
Figure 2-1: Role of a markup and style guide
Markup
definitions
These are statements that define the meaning of tags you use in
marking up a document. The default markup and style guide
supplies a standard set of markup tag definitions which you can add
to or modify.
Converters
These change the external format of a document to the BASIS
internal storage format. They reverse the process on export. The
default markup and style guide supplies you with a standard set of
converters. You can define as many converters as you need.
32  DBMS Architecture
Presenters
These map a document from its descriptive form to a presentational
form for display or print. The system uses a presenter to translate
markup tags to presentation procedures, which may include
specifications for type font, size, rendition, word wrap, and other
visual attributes. A presenter can also hide or reveal components of
a document. You could use this hide/reveal capability, for example,
to suppress proprietary parts of a report and thus create a “generic
version.” The default markup and style guide supplies you with a
standard set of presenters. You can define as many presenters as
you need.
Context unit
parsers
These help in indexing text fields for proximity and phrase
searching. A context unit is a sentence, paragraph, or some other
portion of text. In the style guide, you can define how the system
will locate context units when importing a document. Context unit
parsers define sentence delimiters, abbreviations, etc. The default
markup and style guide supplies default context unit parsers suitable
for most applications.
For more information about the default markup and style guide and how to make changes
to it, see Markup and Style Guide.
Structure of RDB Files
A functional database is composed of many individual files. Some are simple sequential
files containing Data Definition Language (DDL) or BASIS commands and parameters.
Other files are specially structured to contain database definitions, data, and indexes. The
structure and function of files that comprise the RDB is briefly described in this section.
The RDB consists of files composed of areas, which are composed of pages, which are
composed of records, which are composed of segments.
Files
The RDB can be composed of one or more physical files.
Multiple versions of the database can be defined, in which case
another full set of RDB files is created for each version. All
versions must be governed by the same DDB.
DBMS Architecture  33
Areas
Each data file has one or more AREAs. Every record type and
index must be assigned to an AREA, which is then assigned to a
FILE. In this manner the user determines the contents of each file.
You assign different record types and indexes to areas of files by
defining FILES and AREAs in your SDM.
Pages
Data records are stored in units of pages. Page size differs on
various operating systems, but a page is always the finest
granularity possible for read/write operations. For 32-bits-perword machines, up to 60 records can be stored on a single page.
A page must be at least 2000 bytes long. Its size is fixed but
depends on the I/O system of the host computer. The page size is
selected so it can be used efficiently. Each page contains a list of
records stored on that page.
Records
All of the substructures of a file are constructed from records. A
record may be a special system record, a reference or node record
(used to form the indexes), or a data record. Data records can be
data or supporting data structures. A page could contain records
from each of the categories. A user sees only views of data
records. The same scheme is used to allocate, write, and read
every kind of record.
The record pointer describes the physical address of a record in
the database. A record pointer contains a file number (data
records can be distributed across files), a page number, and a
record number. Record pointers are used in BASIS index
structures to reference a record.
Segments
The basic unit of storage of a record is the record segment, the
piece of a data record that resides on one database page. One or
more segments are used to store a record. For conventional
documents only one segment is necessary. For continuous and
sectioned 1 documents, multiple segments are used.
Text streams, which typically require large amounts of storage, are
broken into segments that are linked.
1
34  DBMS Architecture
Sectioned records are not available on Windows.
Structure of Indexes
Indexes are carefully organized lists of index terms which allow the data of a field to be
searched quickly.
Indexes
Three types of indexes can be created:
EXACT
Content of a field is indexed identically as it appears in the field.
UNIQUE
A unique index is a special case of an exact index in which no two
index terms can be the same.
INCLUSIVE
An index term is created for each word contained in the field. This
type of index is used for text fields (character strings or text
streams). Common words that are of no interest when searching
can be excluded from the index by using a stopword list. Numbers
can optionally be excluded. A breaklist is used to determine rules
for distinguishing words from punctuation.
For an overview of indexes and how they are defined in the DDB, see Database
Definition and Development, “Indexes.” Following are details about how indexes are
structured.
B-tree Index Structure
A list of unique values for a field is used to form an index directory. This directory is
kept in a structure known as a B-tree. BASIS stores information about the number and
location of terms in a series of B-trees. Figure 2-2 displays a simplified version of a
series of B-trees storing index information for an indexed simple field that contains the
following terms and numbers of occurrences:
Field Value
Occurrences
Record Numbers (key values)
BLUE
7
3, 8, 12, 15, 18, 20, 22
CHARTREUSE
1
1
GREEN
3
2, 4, 6
DBMS Architecture  35
Root Node
BLUE
CHARTREUSE
OTHERS
BLUE
See REF B-TREE
CHARTREUSE
REC1
REC4
OTHERS
REC8
REC15
REC20
OTHERS
REC3
REC8
GREEN
See REF B-TREE
REC22
REC2
REC4
REC6
REC18
REC20
REC12
REC15
Terminal Nodes
Figure 2-2: B-tree index structure
A B-tree is formed from special variable-length node records which are arranged in a
hierarchical or tree structure. The top node of Figure 2-2 is the root node of the tree.
This particular structure has a term B-tree with two levels and two reference B-trees
beneath the blue and green terms. Sometimes a B-tree can be a single node, but usually it
is a two- or three-level structure. BASIS can handle a tree with 16 levels.
Each term B-tree node contains entries where part of the entry is a key. The keys are kept
in alphabetic or numeric order (depending on the key type) within each node. When
linked together, the keys in entries of the terminal nodes form an ordered list of all
references.
The key type is related to the data type of the corresponding field. The key can be:

A binary integer value

A binary floating point value
36  DBMS Architecture

A packed decimal string

A fixed-size character string (1:250 characters)

A variable-size character string (1:250 characters)
Fixed-size keys form fixed-size entries in the nodes and are very easy to manipulate.
Variable-size character keys are important when a field may have values that significantly
vary in size. Variable-size keys require a slightly more complicated node format, but are
efficiently handled by BASIS.
The higher-level nodes have abbreviated entries that have a key and a pointer to a lowerlevel node. In non-terminal nodes, each entry is derived from the highest entry of the
node it references.
Term entries
The non-terminal nodes typically contain entries for each term
in the index (regardless of whether the index type is unique,
exact, inclusive). The size of the entry key depends on the size
of the index term. The entry data comprises either lists of the
records which include the term or a pointer to a subtree of
nodes that contain the reference tree for records with the term.
Terminal nodes
The terminal nodes (leaf nodes) contain entries that have a key
and one or more words of reference data.
References
Reference data, or references, consist of a unique record
pointer and reference details.
Unique record pointer
The unique record pointer identifies a specific record
which has a field value representing the term which
owns the reference. Unique record pointers can be
direct or indirect. (See Figures 2-3 and 2-4.)
DBMS Architecture  37
Reference details
Reference details, also commonly called postings and
proximity information, require one or two words of
storage, depending on the index requirements.
Reference details include the file number, page number,
and record number (or the system key).
Reference details in the reference B-tree can be 32 or 64
bits long (one or two computer words depending on the
platform). Simple fields with unique/exact indexes have
just 32-bit postings because they do not require any
word-level proximity information. Compound
unique/exact and all inclusive indexes are 64-bit
postings by default.
Information in postings differs depending on the type of
index (unique, exact, and inclusive) and the proximity
options elected. For a UNIQUE index the single
reference to the term is stored with the key in the
terminal node of the term tree, and no reference tree is
needed.
Following is a list of reference details for various types
of indexes:
UNIQUE | EXACT
mfisi, sect#, cntx#
INCLUSIVE
mfisi, sect#, cntx#, token#
where
38  DBMS Architecture
mfisi
is the multi-field index field indicator,
the field from which the index reference
is produced. It is used only with multifield indexes (MFIs). For more
information about multi-field indexes,
see Database Definition and
Development.
sect#
is the section number. It is used only in
sectioned documents 1.
cntx#
is the context unit number. How this is
determined varies depending on the field
definition. For a unique-indexed or
exact-indexed compound standard field,
the context unit is the subfield number.
For an inclusively indexed text stream,
the context unit is normally the sentence.
token #
is each piece of text in the field being
indexed. Each token (including
STOP_WORDS) is numbered. The
BREAK_LIST used with the index
determines what characters delimit text
tokens. For more information about
BREAK_LIST, see Database Definition
and Development.
Text field postings
For textual fields, reference details can also identify where in
the field each term is referenced. For an INCLUSIVE index,
a posting might provide the information that the term is
referenced at the nth word of the nth sentence of the
referenced record’s field.
Reference tree
If a key references only one record, the reference is in the
entry. Keys that require two small references (just a record
pointer) also carry the references in the entry. The keys that
have more references point to a reference tree, which can
accommodate 2000000000 references. These key entries
indicate how many references are in the tree, how many
different records are referenced, and where the root of the
reference tree is located.
Ordering
Reference entries are stored in unique record pointer order.
This is done for efficiency of updating (looking for old
references) and search operations. FIND command
processing makes use of the ordering in intermediate set
manipulations which involve merging and comparing
references.
DBMS Architecture  39
Index Structure
The SDM statement RECORD_STORAGE parameter
INDEX_STRUCTURE controls what is used for unique
record pointers in user-defined indexes. References can be
direct or indirect. For a list of advantages and disadvantages
of each structure and a comparison of functionality, see
Database Definition and Development, “Indexes.”
Direct Indexing
1
When a direct index is constructed, the physical location
of the record is used to locate the record. The unique
record pointer in this case is the record’s record pointer.
Sectioned records are not available on Windows.
Direct indexing offers more
efficient retrieval than indirect
indexing because BASIS does
not have to look up the physical
address
RDB
INDEX
1
2
Physical Location
Figure 2-3: Direct indexing
If the index structure is direct, references contain the
physical location of the record pointer, and the record
location is directly available.
40  DBMS Architecture
With direct indexes, a FIND command without ORDER
BY or SORT BY orders the result set in record pointer
order. Since the record pointers have no particular
relationship to user data, members in result sets with
direct indexes seem to be in an unpredictable order.
Note: Using area distribution lists with direct indexes has significant ramifications for
the optimal setting of the insert method; see “Node splitting” below. For more
information about defining an area distribution list and RECORD_STORAGE, see
Database Definition and Development, “SDM Definitions.”
Indirect
Indexing
With indirect indexes, the reference’s unique record pointer is
the value of the primary key (date key or system key or user
key) for the associated record. As a result, the references in the
index are stored in primary key order. The primary key index
is used to look up the record pointer. The key data of the
primary key index is the only index that points directly to the
records using record pointers.
Because references in the index are stored in primary key
order, members in result sets are displayed in primary key order
unless otherwise sorted by ORDER BY or SORT BY on the
FIND command. Indirect indexes are also necessary for
STORE/SET and RESTORE/SET FQM commands.
RDB
Physical System
Location
Key
INDEX
4
1
2
1
2
Figure 2-4: Indirect indexing
DBMS Architecture  41
Because references in the index are stored in primary key
order, members in result sets are displayed in primary key
order unless otherwise sorted by ORDER BY or SORT BY on
the FIND command.
Capacity
Proximity information (section, context, token) is stored in the
reference details of an index reference. With the CAPACITY
parameter of the SDM RECORD_STORAGE definition, you
can tailor how much of the posting is spent for the different
levels of proximity. In other words, you can specify how
much of the posting capacity to allocate for any one piece of
proximity information based on the data you have. For more
information about capacity options available, see Database
Definition and Development.
Node splitting
An index node can hold only a finite number of entries, so at
some point an insertion of a new entry will cause a full node
to split into two additional nodes, each representing a subset
of the previous node.
42  DBMS Architecture
Padding
To avoid constant splitting, nodes are normally constructed
with space for future insertions. The amount of space left in
nodes for future insertions is called padding.
If many insertions are made, the padding will be consumed
and a node split will occur. This could result in a new level in
the B-tree. Generally, keeping the number of levels low
results in less space to contain the nodes and fewer I/Os to
locate leaf nodes of interest. On the other hand, if you have
more padding than you need, you waste space in the nodes.
Insert method
Insert method determines the general rule for padding when
new nodes are constructed and thus influences node splitting.
Insert methods are defined for each field indexed with the
SDM INDEX statement. For more information about rules for
selecting an insert method, see Database Definition and
Development, “Indexes.”
Note: If updating patterns should change, you can change the insert method defined for
an index after the index is built. You can do so without having to re-index a field.
Term nodes
The SDM INDEX statement INSERT_METHOD parameter
for TERMS determines how new term nodes are created
when an existing term node is filled and thus must be split.
New nodes are created half, 3/4, or entirely empty.
If the terms are coming in a random order (such as a textual
field), having space reserved allows for some insertions
before the node must split again.
If terms are entering the system in a sequential order (such as
an ever-increasing date or key value), new terms will
naturally fall at the end of the index, and it is not necessary to
reserve free space for future insertions.
DBMS Architecture  43
Reference nodes
Reference nodes also fill up and split, and thus also benefit
by padding. Since there are typically many more references
than terms, reference node padding is of more importance
than term node padding. The SDM INDEX statement
INSERT_METHOD parameter for REFERENCES controls
this in the same manner as the INSERT_METHOD for
TERMS. New nodes are constructed half empty for
RANDOM inserting, 3/4 empty for
MOSTLY_SEQUENTIAL inserting, and entirely empty for
SEQUENTIAL inserting. A typical inclusive index will have
its term tree grow rapidly at first, then slow down after most
of the vocabulary is entered. The reference tree, however,
will continue to grow as long as records are added to the
database. For this reason, it is important to understand the
difference between the term tree and reference tree growth
patterns and assign an appropriate setting for the reference
insert method.
Replacing records
You typically think of padding issues when loading new
records, but replacing records may impact node splitting in
the same manner.
A SEQUENTIAL insert method or little or no padding
generally would not be appropriate for record types involving
a significant volume of replaces.
Area distribution list
cycles
44  DBMS Architecture
If you have direct indexes and are also using an area
distribution list, a RANDOM insert method is appropriate.
The reason relates back to the unique record pointer used for
direct indexes; the high order component of the record
pointer is the file number of the record. This file number
cycles with area distribution. Therefore, if new data records
are being cycled into several physical files, the associated
index references will “cycle” also, in terms of where they are
inserted into the reference B-tree. As a result, even if you are
logically adding references sequentially, the use of this
feature results in random-like references. For more
information about area distribution and about the AREAS
parameter of the SDM RECORD_STORAGE statement, see
Database Definition and Development, “SDM Definitions.”
Index Balancing
BASIS uses efficient algorithms to search a tree and insert or
delete entries from a tree. Since nodes are kept in variablesize records, the size of each node may vary depending on
the number of entries kept in a node. It is important to keep
the levels balanced so the access path to any terminal node is
the same (and as short as possible). This provides a
consistent access time to any key in the tree.
The update algorithms use careful replacement techniques so
a pointer is never updated in a file before the data it
references has been written to the file. This prevents
problems resulting from “broken” structures caused by
catastrophes that occur during an update.
Extensive updating (key insertions and deletions) may result
in excessive node splits to the point where indexes may grow
much larger (more levels) than needed. In this condition,
searches will take longer to access the leaf nodes of the tree,
and database files may grow much larger than necessary.
Compression
The excess space may be removed with DMR
ACTION=COMPRESS. Compression is supported on files
used to store indexes, not record data.
ACTION=COMPRESS will rebalance the trees with the
node padding specified (the padding specified does not have
to match the insert method). For more information about
compression, see “DBMS Services” and see “DMR,
Restructure.”
Note: HVU OPEN/DIRECT always inserts no padding on an initial load. It ignores
the insert method. It optimizes this case because the index references in the initial load
are in sorted order. As a result, a very compact index is created. Depending on your
insert method and data loaded, the next significant loading of data may result in a
significant increase in the size of the files storing large indexes. The rate of growth,
however, will stabilize as the previously compact index nodes reach padding thresholds.
If desired, you can use compress and change the padding to avoid such a transition.
DBMS Architecture  45
Multi-field index
The multi-field index (MFI) is a special type of index
structure. It is populated with data from several fields (all
from the same record type). In this case, the reference tree
contains information about the field in which the term
appeared as well as all the other proximity information.
Because there may be a large number of references, it is
especially important to set the REFERENCES
INSERT_METHOD appropriately to RANDOM.
In general, searching an MFI is preferred over a search on an
equivalent field list because only one index structure needs to
be touched. However, an MFI should not be used as a
substitute index for a field that is frequently searched. Since
the MFI has data for multiple fields, using it to search a
single field requires extra processing to strip out the
irrelevant data.
46  DBMS Architecture