Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
IMS 4212: Database Implementation Physical Database Implementation—Topics • • • • Reasons for concern Data Volume Analysis Data Usage Analysis Index Design Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 1 IMS 4212: Database Implementation Reasons for Concern • Write the SQL query to calculate your GPA COURSE SECTION DeptCode CourseNo Name CreditHrs LabHrs SectionID DeptCode <AK> CourseNo <AK> SecNo <AK> Term <AK> Year <AK> Room <FK1> Days Time InstructorID <FK2> Has ENROLLMENT Has SectionID <FK1> StudentID <FK2> Grade <FK3> Dr. Lawrence West, Management Dept., University of Central Florida [email protected] STUDENT Has GRADE Grade GradePts Has : StudentID LastName FirstName : 2 IMS 4212: Database Implementation Reasons for Concern (cont.) • Normalization spreads data around the database – Optimizes storage space needed (usually) – Protects against anomalies • Normalization may make retrieval inefficient Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 3 IMS 4212: Database Implementation Reasons for Concern (cont.) • Each retrieval efficiency enhancement introduces some level of inefficiency with respect to a fully normalized database • We don’t introduce retrieval enhancements with wild abandon – Analyze the nature of the problem – Know capabilities of DBMS & Operating System – Select carefully from available options • Must understand tradeoffs (advantages and disadvantages) of each retrieval enhancement Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 4 IMS 4212: Database Implementation How Computers Work • Data flows through different parts of the computer as application instructions are executed Memory CPU ALU Unused Instruction Data Application Data Data Value 1 Application Value 2 Operating System Results Table Pages Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 5 IMS 4212: Database Implementation SQL Server Data Storage • Data in tables is stored on pages and there are eight pages per extent. RowID1 Data for Row 1 RowID2 Data for Row 2 • When more RowID3 Data for Row 3 RowID4 Data for Row 4 space is needed RowID5 Data for Row 5 an entire extent RowID6 Data for Row 6 RowID7 Data for Row 7 is added to the RowID8 Data for Row 8 database RowID9 Data for Row 9 RowID10 Data for Row 10 • Each row (record) in the database RowID11 Data for Row 11 RowID12 Data for Row 12 is physically stored on a page RowID13 Data for Row 13 and in an extent • Each row has a RowID that identifies it and it’s location in the page Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 6 IMS 4212: Database Implementation SQL Server Data Storage (cont.) • Without a clustered index (covered soon) rows are added to pages in the order of insertion. • When pages are full rows are added to the next page in the extent. • When extents are full new extents are created • Tables keep track of the sequence of extents that contain their contents Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 7 IMS 4212: Database Implementation Data Retrieval • By default, queries of tables require that each page be loaded into memory in sequence and each row examined to see if it meets the query conditions This is a full table scan Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 8 IMS 4212: Database Implementation Data Retrieval (cont.) • The Page is the basic unit of IO – Entire page is moved from physical storage to RAM for evaluation • In a pure table scan (the default method of retrieval) each record is examined to see if it matches the WHERE clause conditions (if any) – Test value and column value moved to CPU for testing – Records where condition is TRUE are added to result set • Pages are cached and the cached copy will be read if available and needed Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 9 IMS 4212: Database Implementation Data Retrieval (cont.) • In SQL Server page sizes are fixed at 8 KB – (Entire extent is 64 KB) – Some DBMS have different sizes – Some DBMS allow tuning on a table by table basis – 8 KB is also the maximum record size • Number of Records on a page depends on record size – Sum of data sizes of each column • IO time for a pure scan increases with – Number of records – Record size Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 10 IMS 4212: Database Implementation Data Retrieval Costs • Two levels of costs associated with data retrieval – Most Important: IO moving page from disk storage to RAM – Less Important: CPU effort to evaluate records – In default mode records cannot be evaluated until they have been moved into RAM • We also care about physical storage space – Less important as a performance issue • We also care about costs of reorganizing data as it is added to the DB or updated (later) Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 11 IMS 4212: Database Implementation Data Retrieval Costs (cont.) • ALL Retrieval Enhancement mechanisms must be evaluated on the dimensions from the previous slide • None of the enhancements come without cost • Decisions affected by use of the data, not just pure database characteristics – Understanding organizational tasks and priorities key – Requires balance between technical and organizational knowledge – MIS graduates ideally positioned to participate in this analysis Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 12 IMS 4212: Database Implementation Data Retrieval Costs (cont.) • Degree of the cost changes with many factors – Table sizes – Access mechanisms (paths—more later) – Nature of query – Number of tables needed in query – Nature of the enhancement approach • Remember that our DB design goal of minimizing storage space and redundancy (normalization) spread data around the database – More tables containing transaction logic – More complicated queries Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 13 IMS 4212: Database Implementation Data Volume Analysis • We don’t have retrieval problems with small tables • Need to know how big a table will get over the life of the system to understand the potential magnitude of the problem • Q: How many records are expected in the ENROLLMENT table? • Document in the data dictionary – Estimate of number of records expected – How estimate was computed COURSE SECTION DeptCode CourseNo Name CreditHrs LabHrs SectionID DeptCode <AK> CourseNo <AK> SecNo <AK> Term <AK> Year <AK> Room <FK1> Days Time InstructorID <FK2> Has ENROLLMENT Has SectionID <FK1> StudentID <FK2> Grade <FK3> GRADE Grade GradePts Dr. Lawrence West, Management Dept., University of Central Florida [email protected] STUDENT Has : StudentID LastName FirstName : 14 Has IMS 4212: Database Implementation Data Volume Analysis (cont.) • Estimating DV – Absolute count: We know there are 12 possible grades that can be contained in the GRADE table – Estimate: We think that we will have 32,000 students next year (use your statistics!) – Derived: Each enrolled student takes an average of four sections per semester and there are an average of 35 students per section – Historical trends: Enrollment is growing at 2% per year Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 15 IMS 4212: Database Implementation Data Volume Analysis (cont.) • Don’t forget historical data! – Are graduated or withdrawn student records retained in the STUDENT and ENROLLMENT tables? – How long will they be kept? – What is the potential size of the ENROLLMENT table if records are never discarded? • Precise entity definitions are critical in DVA • Document where or how you came up with volume estimates Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 16 IMS 4212: Database Implementation Data Usage Analysis • DUA is concerned with three factors – How frequently are tables accessed? – How urgent are the table accesses? – What is the access path into the table? • Usually means what fields are being compared in a WHERE clause • Fields in a JOIN expression • Goal is to find the high frequency, important retrievals and to put enhancements on the path used by the retrieval Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 17 IMS 4212: Database Implementation Data Usage Analysis (cont.) • Many frequency and urgency estimates will come from an analysis of the organization’s business practices and needs – What is max time a customer can be allowed to wait for a response? – How many sales take place a day? – Can this transaction take place in batch overnight? • How many sales are made per hour? Do we expect it to grow? • Consider electronic credit card clearing from retail stores Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 18 IMS 4212: Database Implementation Data Usage Analysis (cont.) • The access path is the fields being searched to find appropriate records in a transaction • What is the path taken through the sample ERD to: – Calculate your GPA? – Determine if you have met a course prerequisite? • Don’t forget checks of operational business rules made in conjunction with a transaction – What if we had a business rule that said only students with a 3.0 GPA could take ISM 4212? – How about checking prerequisites? Dr. Lawrence West, Management Dept., University of Central Florida [email protected] 19