* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Creating and maintaining a Database(powerpoint)
Survey
Document related concepts
Transcript
Creating and Maintaining a Database The DBA’S Job New York Conference 2005 Tasks Design – Logical Design – Physical Design – Documentation Implementation – – – – Test Performance Security Concurrent Updates Maintenance – Backup – Recovery – Data Integrity New Releases – SIR – Application Designing a Relational Database Normalization – Eliminate redundant data – Identify data dependencies – keys 1st Normal Form – One value per column – Unique primary key 2nd Normal Form – No subsets of data in multiple rows of a table 3rd Normal Form – All columns fully dependant on primary key Example Order # Customer # Address Part # Description Unit Price Quantity Total 1234 409 xxxxxx 10 xxxxxx $10.00 5 50.00 1234 409 xxxxxx 20 xxxxxx $15.00 3 45.00 Possible Tables Order - Order # – Customer # Order Item - Order # Line # – Product Code – Qty – Unit Price ? Customer – Customer # – Address Product – Product # – Description – Unit Price Keys Must be unique Good if real world – Employee Id/Product Code etc. May not be the only access required Should be short Avoid unformatted alphabetic If subordinate repeating group, consider sequence number Normalized Implementation Know the rules – Know the application Alternatives – How many repeats of a column/group? – Dependent data volatility/convenience Document – Variables – labels, descriptions – Records – keys, variables, foreign keys SIR Schema Case Definition – Case Id – Max Counts Record Definition – – – – Key Fields Max Counts Default Security Variables within records Documentation command for case and record Variable Definition – – – – – – – – Format & Position Missing Values Valid Values Value Labels Categorical Vars Variable Ranges Variable Label Extended label for variable documentation – Variable Security Schema functions in PQL 60+ database functions 30 tabfile functions Examples: – – – – – – – – – NRECS RECNAME NKEY KEYNAME NVARS VARNAME VARLABSC VFORMAT VTYPE Sec Index Functions – – – – DBINDS DBINDR DBINDV DBINDT Quick Data Dictionary Four Record Types – Variables – Records – Record keys – Record data Populate from any database Check consistency Example Data Dictionary Create Populate from MNYR – 55 record types – 2216 variables in records Check consistent use of variables – Labels – Formats – Types Identify foreign keys Look at secondary indexes SIR Structures Multiple Database – Until SIR2000 exactly one database in SIR session – Design suggests separate databases for separate hierarchies – Had to use ‘dummy’ cases in single database Inverted Lists – Until SIR2002 no secondary index – Had to use ‘dummy’ cases for inverted list Auto Increment Keys Physical Structure Single Data File Two types of blocks – Index Contain keys plus pointers to other blocks Single top level block From one to six further levels Bottom level points to data block – Data Contain keys and data SIR Data File Top Level Index Index Level 1 Bottom Level Index Data Blocks Index Level 1 Index Level 1 Data Record Header – Size – Update level – Lock status Separate Key in front of record – All keys same size in single database – Case id, record number,record key fields – Special so can be searched Record organized by data format – Real8, real4, I4, I2, I1, Character CIR One per case Count for each record type – e.g. max rec types – 100 100 integers – I1 – up to 123 – I2 – up to 32,000 – I4 – over this Common variables Size of Blocks Index – Calculated from key size and number of records – Minimum 2K (253 dwords) Maximum 32K Data – Calculated from maximum record size and number of records – Minimum 2K (254 dwords) Maximum 32K – Minimum 4 records per block Index Calculation Example: Key Length 16 Number of records 1 million – At 4 per block need 250,000 blocks – Each index entry takes 3 dwrds – – – – – Key in dwords + 1 dwrd for pointer & count) Minimum block holds 253/3 – 84 entries Top level Single block 84 entries Second level 84 blocks each with 84 entries Two level index points to 7,056 data blocks Three level points to 592,704 data blocks Data Block Growth Add first record Start with 3 blocks – Find data block – Put record in block in correct key sequence – If new record first in block, update higher level that pointed – Top Level Index 1 entry key of record Points to second level – Second level index 1 entry Key of record Points to data block – Data Block - 1 entry Add records Block Too Big – Split into two blocks – Add new key to higher level Loading Factor How to split block Records being added randomly – Split block in middle – Any block added to likely to have space – .5 loading factor Records added in ascending key – Leave original block as full as possible – All adding to new block – .99 loading factor Suggested Loading Factors Standard Updates - .5 – If set very high and activity all on original block, lots of empty new blocks IMPORT - .99 – Let standard updates split blocks when needed RELOAD - .99 – no block splitting – Block filled to loading factor – Space for largest record List Stats Info Number of Index Levels 2 Max Entries Per Index Block 42 Index/Data Block Size 253/3314 Active/Inactive Data Blocks 92/0 Active/Inactive Index Blocks 4/0 Keysize In Bytes 34 Min/Max Record Size 0/808 Inactive Blocks New Blocks added at end List maintained of empty blocks Re-used when new block needed All records deleted in stand alone Block update strategy in Master Master Allow multiple users to update a database concurrently Intended primarily for multiple interactive users Communicates via TCP/IP Machine dependent database access Provides a consistent database view for independent retrievals from database Master Operation Start Master – Starts with an address – Waits for client message – Does nothing else, NO database access Master is NOT permanently connected to any specific database Client tells master which database to connect to Database Access Database is opened and closed during SIR session as needed – During PQL retrievals – During utilities (No master) Batch Data Input Export, Unload, Spreadsheet, … – During schema updates (No master) – ‘Old’ Forms while form is running Database Access Database open for write – Single User – Master - Exclusive Use - Shared Read SIR database files – sr1/sr2 - meta data – needed by both client (read only) and master – sr3 - data – controlled by master – sr4 - procedures – controlled by client – sr5 - journal – controlled by master – sr6 - sec. index – controlled by master Single user SIR allows Multiple Readers OR a Single Writer to a database Master: How it Works Part 1 SIR (Copy 1) User A Reads & Writes Exclusively SIR (Copy 2) OR SIR (Copy 3) Users B & C Share Read Only Master: How it Works Part 2 SIR (Copy 1) Master allows multiple writers, readers plus independent readers SIR (Copy 2) SIR (Copy 3) Users A & B Reads & Writes User C Independent Read Only Master SIR Data File SIR Data File How does Master work? Client changes access to use Master Lock Manager for clients accessing through Master Delayed view of updates ‘Difference File Copy’ for independent readers Client SIR session switches from single user to use a specific master – Master must be available at this point – All subsequent retrievals then automatically use Master – Various utilities not available Sends Master a request for single data record at a time – Master selects on key values – Client does any selection on data values Data requests preceded by lock requests Master Gets initial logon from client – Allocates identifier Gets database open from client – Checks if already known (open by another client) – Allocates identifier Database Identification – Full pathname is passed by client – Path is ‘as seen’ by client Client needs to find database – Master needs to find database – Master needs to know that database referenced by multiple clients is same database Master Resource Control Gets request for lock on resource (e.g. case/record key) – Checks lock table – Creates entry if resource available Gets request for record retrieval Gets request for record update Lock Types Transmitted by client from PQL. Checks existing lock on resource 1 = Null - becomes exclusive in Update, concurrent read in retrieval 2 = Concurrent read - Fails if exclusive 3 = Concurrent write - Fails if protected or exclusive 4 = Protected read - Fails if concurrent write, protected write or exclusive 5 = Protected write - Fails if concurrent write, protected or exclusive 6 = Exclusive - Fails if any prior lock Locks in PQL Ignored in single user mode RETRIEVAL – LOCK = CR, CW,PR,PW,EX (2,3,4,5,6) – CIRLOCK, RECLOCK – Default: Update –Ex Retrieval – CR CASE/RECORD commands – LOCK = numeric_expression – Nested case/records inherit outer lock – Lock held until NEXT or EXIT at this level Locked Case/Record Block is entered Variables set to undefined Test status with functions: – SYSTEM(36) = 1 Record available – SYSTEM(37) = 1 Case available – SYSTEM(38) = 1 Master mode Wait and retry, tell user with option,… – RETRY CASE|RECORD ‘Delayed’ Updates Enables independent retrieval to have consistent view of data i.e. no updates seen while retrieval running Master creates local copy of master index block Whenever index or data block rewritten for first time, Master allocates new block Keeps list of redundant blocks (index & data) Identical process on secondary indexes Difference File Copy Makes updates available to independent retrieval Increments update level Creates journal header Writes master index Makes redundant blocks available if no other users (can get exclusive access) Managing Master Start Parameters – MST = – PW = – DFC= MST = parameter Master finds machine name, port 3000 MST = change port number to even_number CLIENT MST = machine_name[:port] – The machine name consists of a host and a domain. It makes the start up for clients faster to quote both the host and domain name (DNS) – Start Master - Master started SirNT:3000 – Start Forms MST=SirNT MST=SirNT.sir.com.au Other parameters PW = password – Any remote user who wants to administer master must specify a matching password DFC = minutes since a difference file copy which would force an automatic copy Administering Master Interrupt – – – – No users being served Commands No password Usage Statistics Remote – Other users still active – Menu driven – Password Protected Administering Master – List logged on users – List attached databases – Stop Immediately After users logoff – Difference File copy Set interval Backup and Recovery Unload – – – – Header Internal copy of database Machine specific/SIR version specific Brings all unloaded records up to current schema definition – Can have multiple unloads on same file – Accessed sequentially Journal – Header – Copy of database record after update Update Level & Restructure Update level incremented when database open/closed for update When record written, update level held in record header If schema changed old and new version kept with update level When record read, if record update level earlier than schema change record is transformed When record written, in new format Immediate Unload Length of key changes – Record type in key so increase in length forces (Number of record types > 123) Key definition changes for existing record Record is locked until reloaded Reload Reload takes specific unload Defined by update level Database is recreated If complete set of journals, can be applied to bring reloaded database up to date ITEMIZE lists unloads or journals EXPORT Text version of database Machine Independent IMPORT rebuilds completely Machine Independent SIR version independent Choose for long term archive VERIFY Walks index – Retrieves each data block – Checks counts and pointers – Reports structural problems – Patch puts calculated value in counts – Clear corruption flag Walks secondary indexes Creating and Maintaining Databases New York Conference 2005