Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授 6.1 Introduction The Role of Access Method/Index in DBMS Query in SQL: SELECT CUSTOMER. NAME FROM CUSTOMER, INVOICE WHERE REGION = 'N.Y.' AND AMOUNT > 10000 AND CUSTOMER.C#=INVOICE.C# DBMS Language Processor Internal Form: P(s (S SP) Optimizer Operator: SCAN C using region index, create C SCAN I using amount index, create I SORT C?and I?on C# JOIN C?and I?on C# EXTRACT name field Calls to Access Method: OPEN SCAN on C with region index GET next tuple . . . Calls to file system: GET10th to 25th bytes from block #6 of file #5 Operator Processor Access Method/Index File System database Wei-Pang Yang, Information Management, NDHU 12-3 The Internal Level Objectives: - concern the way the data is actually stored. - store data on direct access media. e.g. disk. - minimize the number of disk access (disk I/O). - disk access is much slower than main storage access time. Main Disk index CPU Buffer I/O index Wei-Pang Yang, Information Management, NDHU 12-4 The Internal Level (cont.) Physical database design: • Process of choosing an appropriate storage representation for a given database (by DBA). E.g. designing B-tree index or hashing • Nontrivial task • require a good understanding of how the database will be used. Logical database design: S P S P-SP SP data Wei-Pang Yang, Information Management, NDHU 12-5 6.2 Indexing Indexing: Introduction Consider the Supplier table, S. Suppose "Find all suppliers in city xxx" is an important query. i.e. it is frequency executed. => DBA might choose the stored representation as Fig. 6.2. City-Index (index) S (indexed file) Athens S1 Smith 20 London London S2 Jones 10 Paris London S3 Blake 30 Paris Paris S4 Clark 20 London Paris S5 Adams 30 Athens Fig. 6.2: Indexing the supplier file on CITY. Wei-Pang Yang, Information Management, NDHU 12-7 Indexing: Introduction (cont.) Now the DBMS has two possible strategies: <1> Search S, looking for all records with city = 'xxx'. <2> Search City-Index for the desired entry. Advantage: • speed up retrieval. • index file is sorted. • fewer I/O's because index file is smaller. Disadvantages: • slow down updates. • both index and indexed file should be updated when we insert new tuple. Wei-Pang Yang, Information Management, NDHU 12-8 Indexing: Multiple Fields Primary index : index on primary key. <e.g> s# Secondary index: index on other field. <e.g> city A given table may have any number of indexes. CITY-index Status-index S Athens S1 Smith 20 London 10 London S2 Jones 10 Paris 20 London S3 Blake 30 Paris 20 Paris S4 Clark 20 London 30 Paris S5 Adams 30 Athens 30 Fig. 6.3: Indexing the supplier file on both CITY and STATUS. Wei-Pang Yang, Information Management, NDHU 12-9 How Index are used? Consider: London <1> Sequential access : S (indexed file) accessed in the sequence defined by values of the indexed field. S1 ... ... London <e.g> Range query : "Find the suppliers whose S2 ... ... Paris city begins with a letter in the range L-R." London S3 ... ... Paris Paris S4 ... ... London Paris S5 ... ... Athens City-Index Athens <2> Direct Access : <e.g> "Find suppliers in London." <e.g> list query:"Find suppliers whose city is in London, Paris, and N.Y." <3> Existence test : <e.g> "Is there any supplier in London ?" Note: It can be done from the index alone. Wei-Pang Yang, Information Management, NDHU 12-10 Indexing on Field Combinations To construct an index on the basis of values of two or more fields. <e.g.> City/Status-Index S Athens/30 S1 Smith 20 London London /20 S2 Jones 10 Paris London /20 S3 Blake 30 Paris Paris /10 S4 Clark 20 London Paris/30 S5 Adams 30 Athens Query: “Find suppliers in Paris with status 30.” - on city/status index: a single scan of a single index. - on two separate indexes: two index scan => still difficult. (Fig. 6.3) Wei-Pang Yang, Information Management, NDHU 12-11 Dense V.S. Nondense Indexing Assume the Supplier file (S) is clustered on S# . Index (dense) S page1 S1... S1 S2 S3 S4 S5 S6 page3 page2 S2... S4... S3... S5... S6... City_index S1 S3 S5 A L L P P Index (nondense) Wei-Pang Yang, Information Management, NDHU 12-12 Dense V.S. Nondense Indexing (cont.) Nondense index: not contain an entry for every record in the indexed file. • retrieval steps: <1> scan the index (nondense) to get page # , say p. <2> retrieve page p and scan it in main storage. • advantages: <1> occupy less storage than a corresponding dense index <2> quicker to scan. • disadvantages: can not perform existence test via index alone. Note: At most only one nondense index can be constructed. (why?) Clustering: logical sequence = physical sequence Wei-Pang Yang, Information Management, NDHU 12-13 B-tree Introduction: • is a particular type of multi-level (or tree structured) index. • proposed by Bayer and McCreight in 1972. • the commonest storage structure of all in modern DBMS. Definition: (from Horowitz "Data Structure") A B-tree T of order m is an m-way search tree, such that <1> the root node has at least 2 children. <2> non-leaf nodes have at least [m/2] children. <3> all leave nodes are at the same level. Goal: maintain balance of index tree by dynamically restructuring the tree as updates proceed. Wei-Pang Yang, Information Management, NDHU 12-14 + B+-tree (Knuth's variation) 50 82 index set 12 32 58 70 89 94 (nondense) 6 8 12 15 18 32 35 40 50 51 52 58 60 62 70 71 78 82 83 85 89 91 93 94 96 97 99 Sequence set - index set: provides fast direct access to the sequential set and thus to the data too. - sequence set: provides fast sequential access to the indexed data. (with pointers to data records) (dense or nondense) Other variations: B*-tree, B'-tree,... Wei-Pang Yang, Information Management, NDHU 12-15