Download Connecting Databases and the Web

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Access wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Oracle Database wikipedia , lookup

PL/SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
Connecting Databases
and the Web
Big-Boy Databases
What's wrong with Access?
 Concurrent user limit
 Performance restricted by OS
 Limited ability to handle large data volumes
 Limited ability to manipulate LOBS
 No stored procedures or triggers
 Limited indexing
 …………but its good for rapid prototyping!
What's wrong with Access?
Measure
Concurrent user limit
Access Oracle
256 
LICENSE_MAX_ 32000 concurrent
USERS
end-users (8i)
< 2 Gb
128 TB /
tblspce
Max fields in a table
Max columns in an index
255
1000
10
32
Max Indexes / table
Binary Indexes?
Index organized tables
32
×
×
√
√
Partitioned tables
×
√
Max database size
unlimited
*exabyte = 1,152,921,504,606,846,976 bytes (2 to power 60)
gigabyte =
1,073,741,824 bytes (2 to power 30)
eight exabytes* of
data (65,536
BIGFILE datafiles)
DBA areas of concern
 Scalability

E-Business solutions can grow from a few
concurrent users to many thousand in a few
months.
 Availability

A customer who can’t use the system is
potentially a lost customer
 Performance

Expectation for database performance is
constantly getting higher
Database scalability eras
Unlimited Users
n-tier Internet Client/Server
Business components, servlets...
3-tier Internet Client/Server
pooling connection
1000s of Users
Mainframe
3-tier Internet Client/Server
single stateless connection per CGI call
100s of Users
2-tier Client/Server
user=process
LAN based
Max users depends upon file-handles
Several Users
Single User
PC based
Single user
Late 1980s
PC based, file-level locking
Single user or Peer-to-Peer
Mid-1990s
Now
Time
Physical issues
 The slowest task in any record retrieval is
reading from disk (It can be 500 times slower
to locate and read from a disk than from the
Random Access Memory).
 No matter what the disks spin speed, there
will come a point when you need to wait for
your turn
 So Oracle will do all it can to reduce trips to
the database
 http://www.oracle.com/timesten/index.html
Physical issues
 Access uses FAT
 Oracle grabs contiguous disk at the outset
 If the single disk fails in Access you loose the
database to the point of your last backup
 (best case) If a disk fails in Oracle (and if you
have a multiplexed online redo log) it is
possible to continue as normal and replace
the disk offline
Clusters: the scalable option
 Most vendors offer CLUSTERING as the
solution to the scalability problem
 DB2 and SQL Server opt for Shared Nothing,
Oracle for Shared Everything
Shared Everything Cluster
Node 2
The G in 10g = Grid
Node 3
Node 1
High-speed interconnect
Storage area network
Unified set of database files
Shared Nothing Cluster
Node 2
Node 1
High-speed interconnect
Node 3
Partitioned database files
Partitioned database files
Partitioned database files
Entire Database
Clusters: the scalable option
 Shared Disk



multiple database servers have equal access to a
shared disks via a super-cache
Lock management is the potential bottleneck
Can automatically cope with node failure
 Shared Nothing:



Independent servers owning disks
No worries about locks, because partitioned data is
shared
Vulnerable if a node goes down
Oracle Building Blocks
I
n
s
t
a
n
c
e
mounts
Database
Tablespace building blocks
Tablespace
Segment
Extent
Extent
Data blocks
Data blocks
 Regardless of what the block is being used to
store (it could be part of an Extent in a table
segment, or an index segment, or any other
segment) the data block will be of a set format.
 The overhead: information about the block
(type, count of entries, timestamp, pointers to
items in the block, etc.). This is often no more
than 100 bytes in size.
 The data section (or Row Data): contains the
rows from the table, or branches of an index.
 Free Space: the area in a block not yet taken
by row data.
Block Size
Summary of Advantages/ Disadvantages

Small (2KB-4KB)



Has relatively large overhead.
Reduces block contention.
Good for small rows, or lots of random access.
 Medium (8KB)

Space in the buffer cache will be wasted if you are doing
random access to small rows and have a large block size.
For example, with an 8KB block size and 50 byte row size,
you are wasting 7,950 bytes in the buffer cache when doing
random access.
 Large (16KB-32KB)



Even more potential waste than above
There is relatively less overhead, thus more room to store
useful data.
Good for sequential access, or very large rows.
From Query to Recordset
Ad Hoc User or
Returned data set
Application
Asks the question:
All people and their
grades in a list giving sname and fname and
grade
SELECT fname, sname, grade
FROM persons p, grades g
WHERE p.GRADEID = g.GRADEID ;
Yes
Is cursor in
shared pool already?
No
Parser
Data
Dictionary
Statistics
Re-use
Cost-based
Which optimiser
mode?
Execution Plan
Rule-based
Query efficiency: Parser
 Decomposes what has been passed and
makes sense of it by:

verifying it to be correct in syntax
looking up any table and column definitions in the data
dictionary to check for accuracy

Breaking the string up and looking for:

 what is being selected, and is it distinct?
 What are the prefixes all about?
 What are the conditions (WHERE)
 What sorts and grouping needs to be carried out?

Checking privileges
Query efficiency: Optimiser
 The optimiser determines the most efficient way to
execute a SQL statement
 For any SQL statement there are a finite number of
possible 'execution plans'.
 Best can be:

uses least resources to process all rows affected by the
statement.

returns the first row of a statement the quickest
Query efficiency: Optimiser
 For an Oracle instance you can choose which with the
OPTIMZER_MODE init.ora param:

ALL_ROWS, FIRST_ROWS, CHOOSE, RULE
 At the query level you can override using a hint after the SELECT:
 Select /*+FULL*/ Fname, Sname, Grade….


Forces a full table scan, even if optimser thinks using an index
would be better - well most of the time!
The Optimiser is empowered to ignore hints!
Tools to help performance testing
 Set Timing On
 Set Autotrace on Stat
 Explain Plan
 Review the tutorial:
http://www.shu.ac.uk/schools/cms/teaching/pl3/Tutorials/PerformandTest.htm
SET AUTOTRACE ON STAT
just in case example!
SQL> select count(sname) from persons;
COUNT(SNAME)
-----------478
Statistics
---------------------------------------------------------0 recursive calls
4 db block gets
20 consistent gets
0 physical reads
0 redo size
372 bytes sent via SQL*Net to client
425 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
Explain Plan
just in case example!
EXPLAIN PLAN FOR
SELECT a.composer, b.cdid, b.title
FROM composers a, cds b
WHERE a.composer like 'V%'
AND b.composerid = a.composerid
ORDER BY a.composer ;
Using Explain Plan
Testing
 Always remember to run tests many times,
but especially remember:


the first time a query runs is likely to take
longer than subsequent runs because there is
no need to parse if the cursor is still in SGA
Timing, in particular, can be dramatically
impacted by what else the database server is
doing
-- Two distinct goes
spool twogoes
select sname, fname
from persons
where personid = 406 ;
More Query efficiency
select sname, fname
 Reduce the number of trips to the database

Example scenario:


from persons
where personid = 447 ;
You need to pull the name back to a client
application from Persons table for 2 people.
Do you :



Write 2 selects
Write a select using an IN
Write a self join?
Select sname, fname
from persons
where personid in (406, 447) ;
select a.sname, a.fname, b.sname, b.fname
from persons a, persons b
where a.personid = 406
and b.personid = 447 ;
Finding Data
 Indexes help us by telling us the ROWID, and hence the
physical address of the row
 There is, however, a cost:
 the processing needed to “look up” the ROWID
 the need to maintain the index
 the storage implications
 performance hits because of multiple writes for each
created or updated row
 The problem of keeping indexes consistent - a well
known problem with Paradox, for example
Finding Data
 Two main types of index

B-tree Index

Bitmap Index
 The selection of index type depends primarily upon Cardinality:

Low cardinality means: columns in which the number of distinct
values is small compared to the number of rows in the table.

bitmap indexes are best for low cardinality columns
 B-tree indexes are most effective for high-cardinality data such as Name, or
Phone Number
 A B-tree index can grow to be larger than the indexed data. Bitmap indexes
can be significantly smaller than a corresponding B-tree index
To index, or not to index…..
 The purpose of an index is to reduce the cost of
data retrieval.
 Overhead involved in maintenance and use of
secondary indexes has to be balanced against
performance improvement gained when retrieving
data. This includes:



adding/updating an index record to every secondary index
whenever row is inserted/updated;
increase in disk space required
possible performance degradation during query optimization
to consider all secondary indexes.
To index, or not to index…..
 Considerations:
 The relative “cost” of a Full Table Scan. For eg:
 Create an index if you frequently want to retrieve less than
15% of the rows in a large table. The percentage varies
greatly according to the relative speed of a table scan and
how clustered the row data is about the index key. The faster
the table scan, the lower the percentage; the more clustered
the row data, the higher the percentage.
 Source: Oracle manual

In Oracle, primary + unique keys automatically have indexes
Refer to:
http://www.shu.ac.uk/schools/cms/teaching/pl3/Tutorials/IndexandPartitioning.doc
Oracle Indexing Options
 Straightforward Index:

Use the SQL command CREATE INDEX

CREATE INDEX emp_ename ON emp(ename);
 UNIQUE:
 CREATE UNIQUE INDEX uniq_dept_dname ON
dept(dname);
 Be sure you need this as it is a costly
extra!
 Bitmap Index:
 create bitmap index i_alloc on
Requirements(Allocated) ;
IOT: An Oracle 8i+ alternative:

An index-organized table has a storage
organization that is a variant of a primary B-tree.
Unlike an ordinary table whose data is stored as
an unordered collection, data for an indexorganized table is stored in a primary key sorted
manner

Index-organized tables provide faster access to
table rows by the primary key.

Since rows are stored in primary key order, range
access by the primary key involves minimum
block accesses.
IOT: When?
 Broadly speaking, any table that is frequently searched and
accessed almost exclusively via the primary key
 But a table which needs secondary indexes and unique key
constraints can bring problems
 IOTs can improve performance by reducing the I/O associated with
having to read blocks from both an index and a table.
 IOTs can also save storage because they avoid the need to
duplicate key columns in both an index and table segments
 The down-side is the increased maintenance requirements: Its an
index, which do need rebuilding.
Refer to:
http://www.shu.ac.uk/schools/cms/teaching/pl3/Tutorials/IndexandPartitioning.doc