Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
C20.0046: Database Management Systems Lecture #26 Matthew P. Johnson Stern School of Business, NYU Spring, 2004 M.P. Johnson, DBMS, Stern/NYU, Sp2004 1 Agenda Previously: Indices Next: Finish Indices, advanced indices Failure/recovery Data warehousing & mining Websearch Hw3 due today no extensions! 1-minute responses Review: clustered, dense, primary, #/tbl, syntax M.P. Johnson, DBMS, Stern/NYU, Sp2004 2 User/ Application Transaction commands Let’s get physical Query update Query compiler/optimizer Record, index requests Transaction manager: •Concurrency control •Logging/recovery Read/write pages Execution engine Query execution plan Index/record mgr. Page commands Buffer manager Storage manager storage M.P. Johnson, DBMS, Stern/NYU, Sp2004 3 BSTs Very simple data structure in CS: BSTs Each node has two children: Binary Search Trees Keep balanced Each node ~ one item Left subtree: < Right subtree: >= Can search, insert, delete in log time log2(1MB = 220) = 20 M.P. Johnson, DBMS, Stern/NYU, Sp2004 4 Search for DBMS Big improvement: log2(1MB) = 20 Each op divides remaining range in half! But recall: all that matters is #disk accesses 20 is better than 220 but: Can we do better? M.P. Johnson, DBMS, Stern/NYU, Sp2004 5 BSTs B-trees Like BSTs except each node ~ one block Branching factor is >> 2 Data stored only in leaves Each access divides remaining range by, say, 300 B-trees = BSTs + blocks B+ trees are a variant of B-trees Leaves form a (sorted) linked list Better supports range queries Consequences: Much shorter depth Many fewer disk reads Must find element within node Trades CPU/RAM time for disk time M.P. Johnson, DBMS, Stern/NYU, Sp2004 6 B+ Trees Parameter n branching factor is n+1 Largest number s.t. one block can contain n search-key values and n+1 pointers 30 Keys k < 30 120 Keys 30<=k<120 240 Keys 120<=k<240 Keys 240<=k Each node (except root) has at least n/2 keys 40 50 60 Next leaf 40 50 60 M.P. Johnson, DBMS, Stern/NYU, Sp2004 7 Searching a B+ Tree Exact key values: Start at the root If we’re in leaf, walk through its key values; If not, look at keys K1..Kn If Ki <= K <= Ki+1, look in child i Range queries: Select name From people Where age = 25 As above Then walk left until test fails M.P. Johnson, DBMS, Stern/NYU, Sp2004 Select name From people Where 20 <= age and age <= 30 8 B+ Tree Example Find the key 40 n=4 80 40 80 20 60 100 12 0 140 20 < 40 60 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40 40 10 15 18 20 30 40 50 60 65 80 85 90 NB: Leaf keys are sorted; data pointed to is only if clustered M.P. Johnson, DBMS, Stern/NYU, Sp2004 9 Clustered & unclustered B-trees Data entries Data entries (Index File) (Data file) Data Records CLUSTERED Data Records UNCLUSTERED B+ trees, and, or Assume index on a,b,c Intuition: phone book WHERE a = ‘x’ and b = ‘y’ WHERE b = ‘y’ and c = ‘z’ WHERE a = ‘a’ and c = ‘z’ WHERE a = ‘x’ or b = ‘y’ or c = ‘z’ M.P. Johnson, DBMS, Stern/NYU, Sp2004 11 B+ trees and LIKE Supports only hard-coded prefix LIKE checks Intuition: phone book Select * from T where a like ‘xyz%’ Select * from T where a like ‘%xyz’ Select * from T where a like ‘xyz%zyx%’ M.P. Johnson, DBMS, Stern/NYU, Sp2004 12 B-tree search efficiency With params: the largest n satisfying 4n+8(n+1) <= 4096 is n=340 block=4k integer = 4b, pointer = 8b Each node has 170..340 keys assume on avg has (170+340)/2=255 Then: 255 rows depth = 1 2552 = 64k rows depth = 2 2553 = 16M rows depth = 3 2554 = 4G rows depth = 4 M.P. Johnson, DBMS, Stern/NYU, Sp2004 13 B-trees in practice Most DBMSs use B-trees for most indices Speeds up Default in MySQL Default in Oracle where clauses Some like checks Min or max functions joins Limitation: fields used must Be a prefix of indexed fields Be ANDed together M.P. Johnson, DBMS, Stern/NYU, Sp2004 14 Next topic: Advanced types of indices Spatial indices based on R-trees (R = region) Support multi-dimensional searches on “geometry” fields 2-d not 1-d ranges Oracle: CREATE INDEX geology_rtree_idx ON geology_tab(geometry) INDEXTYPE IS MDSYS.SPATIAL_INDEX; MySQL: CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g)); M.P. Johnson, DBMS, Stern/NYU, Sp2004 15 Advanced types of indices Inverted indices for web doc search First, think of each webpage as a tuple One column for every possible word True means the word appears on the page Index on all columns Now can search: you’re fired select * from T where youre=T and fired=T M.P. Johnson, DBMS, Stern/NYU, Sp2004 16 Advanced types of indices Can simplify somewhat: 1. 2. For each field index, delete False entries True entries for each index become a bucket Create “inverted index”: One entry for each search word Search word entry points to corresponding bucket Bucket points to pages with its word Amazon M.P. Johnson, DBMS, Stern/NYU, Sp2004 17 Advanced types of indices Function-based indices Speeds up WHERE upper(name)=‘BUSH’, etc. create index on T(my_soundex(name)); create index on T(substr(DOB),4,5)); Now supported in Oracle 8, not MySQL Bitmap indices Speeds up arbitrary combination of reqs Not limited to prefixes or conjunctions Now supported in Oracle 9, not MySQL M.P. Johnson, DBMS, Stern/NYU, Sp2004 18 Bitmap indices Assume table has n records Assume F is a field with m different values Bitmap index on F: m length-n bitstrings One bitstring for each value of F Each one says which rows have that value for F Example: n = , mF = 1 , mG = 2 3 Q: find rows where 4 F=50 or (F=30 and G=‘Baz’) 5 6 M.P. Johnson, DBMS, Stern/NYU, Sp2004 F 30 30 40 50 40 30 G Foo Bar Baz Foo Bar Baz 19 Bitmap index search Larger example: (age,salary) of jewelry buyers: Sal. Age Sal. 1 25 60 2 45 3 4 Age Sal. 5 50 120 9 25 400 60 6 70 110 10 45 350 50 75 7 85 140 11 50 275 50 100 8 30 260 12 50 260 Bitmaps for age: Age 25:100000001000, 30:000000010000, 45:01000000100, 50:001110000010, 60:000000000001, 70:000001000000, 85:000000100000 Bitmaps for salary: 60:110000000000, 75:001000000000, 100:000100000000, 110:000001000000, 120:000010000000, 140:000000100000, 260:000000010001, 275:000000000010, 20 M.P. Johnson, DBMS, Stern/NYU, Sp2004 350:000000000100, 400:000000001000 Bitmap index search Query: find buyers of age 45-55 with salary 100-200 Age range: 010000000100 (45) | 001110000010 (50) = 011110000110 Bitwise or of Salary range: 000111100000 AND together: 011110000110 & 000111100000 = 000110000000 What does this mean? M.P. Johnson, DBMS, Stern/NYU, Sp2004 21 Bitmap index search Once we have row numbers, then what? Get rows with those numbers (How?) Bitmap indices in Oracle: CREATE BITMAP INDEX ON T(F,G); Best for low-cardinality fields Boolean, enum, gender lots of 0s in our bitmaps Compress: 000000100001 6141 “run-length encoding” M.P. Johnson, DBMS, Stern/NYU, Sp2004 22 New topic: Recovery Type of Crash Prevention Wrong data entry Constraints and Data cleaning Disk crashes Redundancy: e.g. RAID, archive Fire, theft, bankruptcy… Buy insurance, Change jobs… System failures: e.g. blackout DATABASE RECOVERY M.P. Johnson, DBMS, Stern/NYU, Sp2004 23 System Failures Each transaction has internal state When system crashes, internal state is lost Don’t know which parts executed and which didn’t Remedy: use a log A file that records each action of each xact Trail of breadcrumbs M.P. Johnson, DBMS, Stern/NYU, Sp2004 24 Media Failures Rule of thumb: Pr(hard drive has head crash within 10 years) = 50% Simpler rule of thumb: Pr(hard drive has head crash within 1 years) = 10% Serious problem Soln: different RAID strategies RAID: Redundant Arrays of Independent Disks M.P. Johnson, DBMS, Stern/NYU, Sp2004 25 RAID levels RAID level 1: each disk gets a mirror RAID level 4: one disk is xor of all others E.g.: Each bit is sum mod 2 of corresponding bits Disk 1: 11110000 Disk 2: 10101010 Disk 3: 00111000 Disk 4: How to recover? M.P. Johnson, DBMS, Stern/NYU, Sp2004 26 Transactions Transaction: unit of code to be executed atomically In ad-hoc SQL one command = one transaction In embedded SQL Transaction starts = first SQL command issued Transaction ends = COMMIT ROLLBACK (=abort) Can turn off/on autocommit M.P. Johnson, DBMS, Stern/NYU, Sp2004 27 Primitive operations of transactions Each xact reads/writes rows or blocks: elms INPUT(X) READ(X,t) copy transaction local variable t to element X OUTPUT(X) copy element X to transaction local variable t WRITE(X,t) read element X to memory buffer write element X to disk LOG RECORD M.P. Johnson, DBMS, Stern/NYU, Sp2004 28 Transaction example Xact: Transfer $100 from savings to checking A = A+100; B = B-100; READ(A,t); t := t+100; WRITE(A,t); READ(B,t); t := t-100; WRITE(B,t) M.P. Johnson, DBMS, Stern/NYU, Sp2004 29 Transaction example READ(A,t); t := t+100;WRITE(A,t); READ(B,t); t := t-100;WRITE(B,t) Action t INPUT(A) Mem A Mem B Disk A Disk B 1000 1000 1000 READ(A,t) 1000 1000 1000 1000 t:=t+100 1100 1000 1000 1000 WRITE(A,t) 1100 1100 1000 1000 INPUT(B) 1100 1100 1000 1000 1000 READ(B,t) 1000 1100 1000 1000 1000 t:=t-100 900 1100 1000 1000 1000 WRITE(B,t) 900 1100 900 1000 1000 OUTPUT(A) 900 1100 900 1100 1000 OUTPUT(B) 900 1100 900 1100 900 M.P. Johnson, DBMS, Stern/NYU, Sp2004 30 The log An append-only file containing log records Note: multiple transactions run concurrently, log records are interleaved After a system crash, use log to: Redo some transaction that didn’t commit Undo other transactions that didn’t commit Three kinds of logs: undo, redo, undo/redo We’ll discuss only Undo M.P. Johnson, DBMS, Stern/NYU, Sp2004 31 Undo Logging Log records <START T> <COMMIT T> T has committed <ABORT T> transaction T has begun T has aborted <T,X,v> T has updated element X, and its old value was v M.P. Johnson, DBMS, Stern/NYU, Sp2004 32 Undo-Logging Rules U1: Changes logged (<T,X,v>) before being written to disk U2: Commits logged (<COMMIT T>) after being written to disk Results: May forget we did whole xact (and so wrongly undo) Will never forget did partial xact (and so leave) Log-change, change, log-change, change, Commit, log-commit M.P. Johnson, DBMS, Stern/NYU, Sp2004 33 Undo-Logging e.g. (inputs omitted) Action T Mem A Mem B Disk A Disk B Log <START T> READ(A,t) 1000 1000 1000 1000 t:=t+100 1100 1000 1000 1000 WRITE(A,t) 1100 1100 1000 1000 READ(B,t) 1000 1100 1000 1000 1000 t:=t-100 900 1100 1000 1000 1000 WRITE(B,t) 900 1100 900 1000 1000 OUTPUT(A) 900 1100 900 1100 900 OUTPUT(B) 900 1100 900 1100 900 <T,A,8> <T,B,8> COMMIT <COMMIT T> M.P. Johnson, DBMS, Stern/NYU, Sp2004 34 Recovery with Undo Log After system’s crash, run recovery manager 1. Decide for each xact T whether it was completed <START T>….<COMMIT T> yes <START T>….<ABORT T> yes <START T>…………………… no 2. Undo all modifications from incomplete xacts, in reverse order (why?) and abort each M.P. Johnson, DBMS, Stern/NYU, Sp2004 35 Recovery with Undo Log Read log from the end; cases: <COMMIT T>: mark T as completed <ABORT T>: mark T as completed <T,X,v>: if T is not completed then write X=v to disk else ignore <START T>: ignore M.P. Johnson, DBMS, Stern/NYU, Sp2004 36 Recovery with Undo Log Start: … … <T2,X2,v2> … … <START T5> <START T4> <T1,X1,v1> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2> Q: Which updates are undone? Crash! M.P. Johnson, DBMS, Stern/NYU, Sp2004 37 Recovery with Undo Log Note: undo commands are idempotent How far back in the log do we go? No harm done if we repeat them Q: What if system crashes during recovery? Don’t go all the way back to the start May be very large Better idea: use checkpointing M.P. Johnson, DBMS, Stern/NYU, Sp2004 38 Checkpointing Checkpoint the database periodically Stop accepting new transactions Wait until all current xacts complete Flush log to disk Write a <CKPT> log record, flush log Resume accepting new xacts M.P. Johnson, DBMS, Stern/NYU, Sp2004 39 Undo Recovery with Checkpointing During recovery, can stop at first <CKPT> … … <T1,X1,v1> … … (all completed) <CKPT> <START T2> <START T3 <START T5> <START T4> <T4,X4,v4> <T5,X5,v5> <T4,X4,v4> <COMMIT T5> <T3,X3,v3> <T2,X2,v2> other xacts xacts T2,T3,T4,T5 M.P. Johnson, DBMS, Stern/NYU, Sp2004 40 Non-quiescent Checkpointing Problem: database must freeze during checkpoint Would like to checkpoint while database is operational Idea: non-quiescent checkpointing Quiescent: quiet, still, at rest; inactive M.P. Johnson, DBMS, Stern/NYU, Sp2004 41 Next time Next: Data warehousing mining! For next time: reading online Proj5 due next Thursday no extensions! Now: one-minute responses Relative weight: warehousing, mining, websearch Data mining techniques NNs GAs kNN Decision Trees M.P. Johnson, DBMS, Stern/NYU, Sp2004 42