Download branch-name

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Access wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Oracle Database wikipedia , lookup

Serializability wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Concurrency control wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Transcript
Chapter 21:
Database System Concepts
21.1
©Silberschatz, Korth and Sudarshan
Índices e Queries
Database System Concepts
21.2
©Silberschatz, Korth and Sudarshan
Índices
 Indices
 Sequenciais, Hash,
 Únicos e não únicos
 primários
 Simples e compostos
 Vantagens e desvantagens dos indices
 Declaração em SQL:
 CREATE UNIQUE INDEX nome ON tabela(a1,....,an)
3
Database System Concepts
21.3
©Silberschatz, Korth and Sudarshan
Índices e Queries
 Select A1,..,An from T where Ai = x
 O índices tipo hash são melhores nestes casos
 Select A1,..,An from T where Ai < x AND Ai > y
O índices tipo sequenciais são melhores nestes casos.
 Select A1,..,An from T where Ai = x AND Aj = y
Podemos usar índices simples em Ai e Aj
Podemos usar índices compostos (Ai, Aj)
4
Database System Concepts
21.4
©Silberschatz, Korth and Sudarshan
Basic Concepts
 Indexing mechanisms used to speed up access to desired data.
 E.g., author catalog in library
 Search Key - attribute to set of attributes used to look up
records in a file.
 An index file consists of records (called index entries) of the
form
search-key
pointer
 Index files are typically much smaller than the original file
 Two basic kinds of indices:
 Ordered indices: search keys are stored in sorted order
 Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”.
5
Database System Concepts
21.5
©Silberschatz, Korth and Sudarshan
Index Evaluation Metrics
 Access types supported efficiently. E.g.,
 records with a specified value in the attribute
 or records with an attribute value falling in a specified range of
values.
 Access time
 Insertion time
 Deletion time
 Space overhead
6
Database System Concepts
21.6
©Silberschatz, Korth and Sudarshan
Ordered Indices
Indexing techniques evaluated on basis of:
 In an ordered index, index entries are stored sorted on the
search key value. E.g., author catalog in library.
 Primary index: in a sequentially ordered file, the index whose
search key specifies the sequential order of the file.
 Also called clustering index
 The search key of a primary index is usually but not necessarily the
primary key.
 Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
non-clustering index.
 Index-sequential file: ordered sequential file with a primary index.
7
Database System Concepts
21.7
©Silberschatz, Korth and Sudarshan
Hash Indices
 Hashing can be used not only for file organization, but also for
index-structure creation.
 A hash index organizes the search keys, with their associated
record pointers, into a hash file structure.
 Strictly speaking, hash indices are always secondary indices
 if the file itself is organized using hashing, a separate primary hash
index on it using the same search-key is unnecessary.
 However, we use the term hash index to refer to both secondary
index structures and hash organized files.
8
Database System Concepts
21.8
©Silberschatz, Korth and Sudarshan
Example of Hash Index
9
Database System Concepts
21.9
©Silberschatz, Korth and Sudarshan
Comparison of Ordered Indexing and Hashing
 Cost of periodic re-organization
 Relative frequency of insertions and deletions
 Is it desirable to optimize average access time at the expense of
worst-case access time?
 Expected type of queries:
 Hashing is generally better at retrieving records having a specified
value of the key.
 If range queries are common, ordered indices are to be preferred
10
Database System Concepts
21.10
©Silberschatz, Korth and Sudarshan
Index Definition in SQL
 Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch-name)
 Use create unique index to indirectly specify and enforce the
condition that the search key is a candidate key is a candidate
key.
 Not really required if SQL unique integrity constraint is supported
 To drop an index
drop index <index-name>
11
Database System Concepts
21.11
©Silberschatz, Korth and Sudarshan
Multiple-Key Access
 Use multiple indices for certain types of queries.
 Example:
select account-number
from account
where branch-name = “Perryridge” and balance = 1000
 Possible strategies for processing query using indices on
single attributes:
1. Use index on branch-name to find accounts with balances of
$1000; test branch-name = “Perryridge”.
2. Use index on balance to find accounts with balances of $1000;
test branch-name = “Perryridge”.
3. Use branch-name index to find pointers to all records pertaining
to the Perryridge branch. Similarly use index on balance. Take
intersection of both sets of pointers obtained.
12
Database System Concepts
21.12
©Silberschatz, Korth and Sudarshan
Indices on Multiple Attributes
Suppose we have an index on combined search-key
(branch-name, balance).
 With the where clause
where branch-name = “Perryridge” and balance = 1000
the index on the combined search-key will fetch only records
that satisfy both conditions.
Using separate indices in less efficient — we may fetch many
records (or pointers) that satisfy only one of the conditions.
 Can also efficiently handle
where branch-name = “Perryridge” and balance < 1000
 But cannot efficiently handle
where branch-name < “Perryridge” and balance = 1000
May fetch many records that satisfy the first but not the
second condition.
13
Database System Concepts
21.13
©Silberschatz, Korth and Sudarshan
Performance Tuning
Database System Concepts
21.14
©Silberschatz, Korth and Sudarshan
Performance Tuning
 Adjusting various parameters and design choices to improve
system performance for a specific application.
 Tuning is best done by
1. identifying bottlenecks, and
2. eliminating them.
 Can tune a database system at 3 levels:
 Hardware -- e.g., add disks to speed up I/O, add memory to
increase buffer hits, move to a faster processor.
 Database system parameters -- e.g., set buffer size to avoid
paging of buffer, set checkpointing intervals to limit log size. System
may have automatic tuning.
 Higher level database design, such as the schema, indices and
transactions (more later)
15
Database System Concepts
21.15
©Silberschatz, Korth and Sudarshan
Desempenho dos Sistemas
Como desempenho do Processadores actualmente,
porquê que ainda temos problemas com o desempenho?


Decomposição do tempo
gasto numa transacção
óptima:
 10% – CPU
 30% – Rede
 30% – Base de Dados
 30% – Idle (caso
contrário as queues
entram em “trash”)
Rede
Processador
& Memória
IO
Base de
Dados
Log
Com transacções
distribuídas, o cenário é
bastante pior
Tipicamente, 5 a 20 I/Os
por transacção
16
Database System Concepts
21.16
©Silberschatz, Korth and Sudarshan
Bottlenecks
 Performance of most systems (at least before they are tuned)
usually limited by performance of one or a few components:
these are called bottlenecks
 Transactions request a sequence of services
 e.g. CPU, Disk I/O, locks
 E.g. 80% of the code may take up 20% of time and 20% of code
takes up 80% of time
 Worth spending most time on 20% of code that take 80% of time
 Bottlenecks may be in hardware (e.g. disks are very busy, CPU
is idle), or in software
 Removing one bottleneck often exposes another
 De-bottlenecking consists of repeatedly finding bottlenecks, and
removing them
 This is a heuristic
17
Database System Concepts
21.17
©Silberschatz, Korth and Sudarshan
Identifying Bottlenecks

With concurrent transactions, transactions may have to wait for a
requested service while other transactions are being served
 Can model database as a queueing system with a queue for each
service
 transactions repeatedly do the following
 request a service, wait in queue for the service, and get serviced
 Bottlenecks in a database system typically show up as very high
utilizations (and correspondingly, very long queues) of a particular
service
 E.g. disk vs CPU utilization
 100% utilization leads to very long waiting time:
 Rule of thumb: design system for about 70% utilization at peak load
 utilization over 90% should be avoided
18
Database System Concepts
21.18
©Silberschatz, Korth and Sudarshan
Queues In A Database System
19
Database System Concepts
21.19
©Silberschatz, Korth and Sudarshan
Tunable Parameters
 Tuning of hardware
 Tuning of schema
 Tuning of indices
 Tuning of materialized views
20
Database System Concepts
21.20
©Silberschatz, Korth and Sudarshan
Tuning of Hardware
 Even well-tuned transactions typically require a few I/O
operations
 Typical disk supports about 100 random I/O operations per second
 Suppose each transaction requires just 2 random I/O operations.
Then to support n transactions per second, we need to stripe data
across n/50 disks (ignoring skew)
 Number of I/O operations per transaction can be reduced by
keeping more data in memory
 If all data is in memory, I/O needed only for writes
 Keeping frequently used data in memory reduces disk accesses,
reducing number of disks required, but has a memory cost
21
Database System Concepts
21.21
©Silberschatz, Korth and Sudarshan
Hardware Tuning: Five-Minute Rule
 Question: which data to keep in memory:
 If a page is accessed n times per second, keeping it in memory saves

n*
price-per-disk-drive
accesses-per-second-per-disk
 Cost of keeping page in memory

price-per-MB-of-memory
ages-per-MB-of-memory
 Break-even point: value of n for which above costs are equal
 If accesses are more then saving is greater than cost
 Solving above equation with current disk and memory prices leads to:
5-minute rule: if a page that is randomly accessed is used
more frequently than once in 5 minutes it should be kept in
memory
 (by buying sufficient memory!)
22
Database System Concepts
21.22
©Silberschatz, Korth and Sudarshan
Hardware Tuning: One-Minute Rule
 For sequentially accessed data, more pages can be read per
second. Assuming sequential reads of 1MB of data at a time:
1-minute rule: sequentially accessed data that is accessed
once or more in a minute should be kept in memory
 Prices of disk and memory have changed greatly over the years,
but the ratios have not changed much
 so rules remain as 5 minute and 1 minute rules, not 1 hour or 1
second rules!
23
Database System Concepts
21.23
©Silberschatz, Korth and Sudarshan
RAID Levels
 RAID 0: nonredudant striping
(Block striping)
C
C
 RAID 1: mirroed disks
C
C
(Block striping)
P
 RAID 3: Bit interleaved parity
 RAID 5: Block interleaved distributed parity
P
P
P
P
P
24
Database System Concepts
21.24
©Silberschatz, Korth and Sudarshan
Hardware Tuning: Choice of RAID Level
 To use RAID 1 or RAID 5?
 Depends on ratio of reads and writes
 RAID 5 requires 2 block reads and 2 block writes to write out one
data block
 If an application requires r reads and w writes per second
 RAID 1 requires r + 2w I/O operations per second
 RAID 5 requires: r + 4w I/O operations per second
 For reasonably large r and w, this requires lots of disks to handle
workload
 RAID 5 may require more disks than RAID 1 to handle load!
 Apparent saving of number of disks by RAID 5 (by using parity, as
opposed to the mirroring done by RAID 1) may be illusory!
 Thumb rule: RAID 5 is fine when writes are rare and data is very
large, but RAID 1 is preferable otherwise
 If you need more disks to handle I/O load, just mirror them since
disk capacities these days are enormous!
25
Database System Concepts
21.25
©Silberschatz, Korth and Sudarshan
Tuning the Database Design
 Schema tuning
 Vertically partition relations to isolate the data that is accessed most
often -- only fetch needed information.
• E.g., split account into two, (account-number, branch-name) and
(account-number, balance).
• Branch-name need not be fetched unless required
 Improve performance by storing a denormalized relation
• E.g., store join of account and depositor; branch-name and
balance information is repeated for each holder of an account, but
join need not be computed repeatedly.
• Price paid: more space and more work for programmer to keep
relation consistent on updates
• better to use materialized views (more on this later..)
 Cluster together on the same disk page records that would
match in a frequently required join,
 compute join very efficiently when required.
26
Database System Concepts
21.26
©Silberschatz, Korth and Sudarshan
Tuning the Database Design (Cont.)
 Index tuning
 Create appropriate indices to speed up slow queries/updates
 Speed up slow updates by removing excess indices (tradeoff between
queries and updates)
 Choose type of index (B-tree/hash) appropriate for most frequent types
of queries.
 Choose which index to make clustered
 Index tuning wizards look at past history of queries and updates
(the workload) and recommend which indices would be best for the
workload
27
Database System Concepts
21.27
©Silberschatz, Korth and Sudarshan