Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Connecting Databases and the Web Big-Boy Databases What's wrong with Access? Concurrent user limit Performance restricted by OS Limited ability to handle large data volumes Limited ability to manipulate LOBS No stored procedures or triggers Limited indexing …………but its good for rapid prototyping! What's wrong with Access? Measure Concurrent user limit Access Oracle 256 LICENSE_MAX_ 32000 concurrent USERS end-users (8i) < 2 Gb 128 TB / tblspce Max fields in a table Max columns in an index 255 1000 10 32 Max Indexes / table Binary Indexes? Index organized tables 32 × × √ √ Partitioned tables × √ Max database size unlimited *exabyte = 1,152,921,504,606,846,976 bytes (2 to power 60) gigabyte = 1,073,741,824 bytes (2 to power 30) eight exabytes* of data (65,536 BIGFILE datafiles) DBA areas of concern Scalability E-Business solutions can grow from a few concurrent users to many thousand in a few months. Availability A customer who can’t use the system is potentially a lost customer Performance Expectation for database performance is constantly getting higher Database scalability eras Unlimited Users n-tier Internet Client/Server Business components, servlets... 3-tier Internet Client/Server pooling connection 1000s of Users Mainframe 3-tier Internet Client/Server single stateless connection per CGI call 100s of Users 2-tier Client/Server user=process LAN based Max users depends upon file-handles Several Users Single User PC based Single user Late 1980s PC based, file-level locking Single user or Peer-to-Peer Mid-1990s Now Time Physical issues The slowest task in any record retrieval is reading from disk (It can be 500 times slower to locate and read from a disk than from the Random Access Memory). No matter what the disks spin speed, there will come a point when you need to wait for your turn So Oracle will do all it can to reduce trips to the database http://www.oracle.com/timesten/index.html Physical issues Access uses FAT Oracle grabs contiguous disk at the outset If the single disk fails in Access you loose the database to the point of your last backup (best case) If a disk fails in Oracle (and if you have a multiplexed online redo log) it is possible to continue as normal and replace the disk offline Clusters: the scalable option Most vendors offer CLUSTERING as the solution to the scalability problem DB2 and SQL Server opt for Shared Nothing, Oracle for Shared Everything Shared Everything Cluster Node 2 The G in 10g = Grid Node 3 Node 1 High-speed interconnect Storage area network Unified set of database files Shared Nothing Cluster Node 2 Node 1 High-speed interconnect Node 3 Partitioned database files Partitioned database files Partitioned database files Entire Database Clusters: the scalable option Shared Disk multiple database servers have equal access to a shared disks via a super-cache Lock management is the potential bottleneck Can automatically cope with node failure Shared Nothing: Independent servers owning disks No worries about locks, because partitioned data is shared Vulnerable if a node goes down Oracle Building Blocks I n s t a n c e mounts Database Tablespace building blocks Tablespace Segment Extent Extent Data blocks Data blocks Regardless of what the block is being used to store (it could be part of an Extent in a table segment, or an index segment, or any other segment) the data block will be of a set format. The overhead: information about the block (type, count of entries, timestamp, pointers to items in the block, etc.). This is often no more than 100 bytes in size. The data section (or Row Data): contains the rows from the table, or branches of an index. Free Space: the area in a block not yet taken by row data. Block Size Summary of Advantages/ Disadvantages Small (2KB-4KB) Has relatively large overhead. Reduces block contention. Good for small rows, or lots of random access. Medium (8KB) Space in the buffer cache will be wasted if you are doing random access to small rows and have a large block size. For example, with an 8KB block size and 50 byte row size, you are wasting 7,950 bytes in the buffer cache when doing random access. Large (16KB-32KB) Even more potential waste than above There is relatively less overhead, thus more room to store useful data. Good for sequential access, or very large rows. From Query to Recordset Ad Hoc User or Returned data set Application Asks the question: All people and their grades in a list giving sname and fname and grade SELECT fname, sname, grade FROM persons p, grades g WHERE p.GRADEID = g.GRADEID ; Yes Is cursor in shared pool already? No Parser Data Dictionary Statistics Re-use Cost-based Which optimiser mode? Execution Plan Rule-based Query efficiency: Parser Decomposes what has been passed and makes sense of it by: verifying it to be correct in syntax looking up any table and column definitions in the data dictionary to check for accuracy Breaking the string up and looking for: what is being selected, and is it distinct? What are the prefixes all about? What are the conditions (WHERE) What sorts and grouping needs to be carried out? Checking privileges Query efficiency: Optimiser The optimiser determines the most efficient way to execute a SQL statement For any SQL statement there are a finite number of possible 'execution plans'. Best can be: uses least resources to process all rows affected by the statement. returns the first row of a statement the quickest Query efficiency: Optimiser For an Oracle instance you can choose which with the OPTIMZER_MODE init.ora param: ALL_ROWS, FIRST_ROWS, CHOOSE, RULE At the query level you can override using a hint after the SELECT: Select /*+FULL*/ Fname, Sname, Grade…. Forces a full table scan, even if optimser thinks using an index would be better - well most of the time! The Optimiser is empowered to ignore hints! Tools to help performance testing Set Timing On Set Autotrace on Stat Explain Plan Review the tutorial: http://www.shu.ac.uk/schools/cms/teaching/pl3/Tutorials/PerformandTest.htm SET AUTOTRACE ON STAT just in case example! SQL> select count(sname) from persons; COUNT(SNAME) -----------478 Statistics ---------------------------------------------------------0 recursive calls 4 db block gets 20 consistent gets 0 physical reads 0 redo size 372 bytes sent via SQL*Net to client 425 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed SQL> Explain Plan just in case example! EXPLAIN PLAN FOR SELECT a.composer, b.cdid, b.title FROM composers a, cds b WHERE a.composer like 'V%' AND b.composerid = a.composerid ORDER BY a.composer ; Using Explain Plan Testing Always remember to run tests many times, but especially remember: the first time a query runs is likely to take longer than subsequent runs because there is no need to parse if the cursor is still in SGA Timing, in particular, can be dramatically impacted by what else the database server is doing -- Two distinct goes spool twogoes select sname, fname from persons where personid = 406 ; More Query efficiency select sname, fname Reduce the number of trips to the database Example scenario: from persons where personid = 447 ; You need to pull the name back to a client application from Persons table for 2 people. Do you : Write 2 selects Write a select using an IN Write a self join? Select sname, fname from persons where personid in (406, 447) ; select a.sname, a.fname, b.sname, b.fname from persons a, persons b where a.personid = 406 and b.personid = 447 ; Finding Data Indexes help us by telling us the ROWID, and hence the physical address of the row There is, however, a cost: the processing needed to “look up” the ROWID the need to maintain the index the storage implications performance hits because of multiple writes for each created or updated row The problem of keeping indexes consistent - a well known problem with Paradox, for example Finding Data Two main types of index B-tree Index Bitmap Index The selection of index type depends primarily upon Cardinality: Low cardinality means: columns in which the number of distinct values is small compared to the number of rows in the table. bitmap indexes are best for low cardinality columns B-tree indexes are most effective for high-cardinality data such as Name, or Phone Number A B-tree index can grow to be larger than the indexed data. Bitmap indexes can be significantly smaller than a corresponding B-tree index To index, or not to index….. The purpose of an index is to reduce the cost of data retrieval. Overhead involved in maintenance and use of secondary indexes has to be balanced against performance improvement gained when retrieving data. This includes: adding/updating an index record to every secondary index whenever row is inserted/updated; increase in disk space required possible performance degradation during query optimization to consider all secondary indexes. To index, or not to index….. Considerations: The relative “cost” of a Full Table Scan. For eg: Create an index if you frequently want to retrieve less than 15% of the rows in a large table. The percentage varies greatly according to the relative speed of a table scan and how clustered the row data is about the index key. The faster the table scan, the lower the percentage; the more clustered the row data, the higher the percentage. Source: Oracle manual In Oracle, primary + unique keys automatically have indexes Refer to: http://www.shu.ac.uk/schools/cms/teaching/pl3/Tutorials/IndexandPartitioning.doc Oracle Indexing Options Straightforward Index: Use the SQL command CREATE INDEX CREATE INDEX emp_ename ON emp(ename); UNIQUE: CREATE UNIQUE INDEX uniq_dept_dname ON dept(dname); Be sure you need this as it is a costly extra! Bitmap Index: create bitmap index i_alloc on Requirements(Allocated) ; IOT: An Oracle 8i+ alternative: An index-organized table has a storage organization that is a variant of a primary B-tree. Unlike an ordinary table whose data is stored as an unordered collection, data for an indexorganized table is stored in a primary key sorted manner Index-organized tables provide faster access to table rows by the primary key. Since rows are stored in primary key order, range access by the primary key involves minimum block accesses. IOT: When? Broadly speaking, any table that is frequently searched and accessed almost exclusively via the primary key But a table which needs secondary indexes and unique key constraints can bring problems IOTs can improve performance by reducing the I/O associated with having to read blocks from both an index and a table. IOTs can also save storage because they avoid the need to duplicate key columns in both an index and table segments The down-side is the increased maintenance requirements: Its an index, which do need rebuilding. Refer to: http://www.shu.ac.uk/schools/cms/teaching/pl3/Tutorials/IndexandPartitioning.doc