Download database - Dl4a.org

Document related concepts

Tandem Computers wikipedia , lookup

DBase wikipedia , lookup

SQL wikipedia , lookup

Serializability wikipedia , lookup

Microsoft Access wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Fundamentals of Relational
Database Design and
Database Planning
J.Trumbo
Fermilab
CSS-DSG
1
Outline







Definitions
Selecting a dbms
Selecting an application layer
Relational Design
Planning
A very few words about Replication
Space
2
Definitions
What is a database?
A database is the implementation of freeware or
commercial software that provides a means to
organize and retrieve data. The database is the
set of physical files in which all the objects and
database metadata are stored. These files can
usually be seen at the operating system level.
This talk will focus on the organize aspect of
data storage and retrieval.
Commercial vendors include MicroSoft and Oracle.
Freeware products include mysql and postgres.
For this discussion, all points/issues apply to both
commercial and freeware products.
3
Definitions
Instance
A database instance, or an ‘instance’ is
made up of the background processes
needed by the database software.
These processes usually include a process
monitor, session monitor, lock monitor,
etc. They will vary from database vendor
to database vendor.
4
Definitions
What is a schema?
A SCHEMA IS NOT A DATABASE, AND A DATABASE IS NOT
A SCHEMA.
A database instance controls 0 or more databases.
A database contains 0 or more database application
schemas.
A database application schema is the set of database
objects that apply to a specific application. These objects
are relational in nature, and are related to each other,
within a database to serve a specific functionality. For
example payroll, purchasing, calibration, trigger, etc. A
database application schema not a database. Usually
several schemas coexist in a database.
A database application is the code base to manipulate
and retrieve the data stored in the database application
schema.
5
Definitions Cont.
Primary Definitions




Table, a set of columns that contain data. In
the old days, a table was called a file.
Row, a set of columns from a table reflecting a
record.
Index, an object that allows for fast retrieval of
table rows. Every primary key and foreign key
should have an index for retrieval speed.
Primary key, often designated pk, is 1 or more
columns in a table that makes a record unique.
6
Definitions Cont.
Primary Definitions


Foreign key, often designated fk, is a common
column common between 2 tables that define
the relationship between those 2 tables.
Foreign keys are either mandatory or optional.
Mandatory forces a child to have a parent by
creating a not null column at the child. Optional
allows a child to exist without a parent, allowing
a nullable column at the child table (not a
common circumstance).
7
Definitions Cont.
Primary Definitions
Entity Relationship Diagram or ER is a
pictorial representation of the application
schema.
8
Er Example
REVISION
MODULE
STATUS
# STAT_ID
o STATUS _NAME
* C RE ATE _DATE
* C RE ATE _US ER
...
# MOD ULE _ID
* MOD ULE _NAME
* C RE ATE _DATE
* C RE ATE _US ER
...
describes
has
have
may have
associated with
part of
OWNER
PARAMETER
# PAR _ID
* P AR_ NAME
* TEXT
* VALU E
o UPP ER_ LIMIT
o LOWER_ LIMIT
* S RC
* D OC UME NTATION
o DRAWIN GS
* C RE ATE _DATE
* C RE ATE _US ER
...
creates
REV_ID
REV_NAME
R EV_ DATE
C RE ATE _DATE
C RE ATE _US ER
UPD ATE _DATE
UPD ATE _US ER
associated with
own
# OWN ER _ID
* FIRS T_N AME
* L AST_NAME
* P ASS WORD
* E MAIL
* U SE RNAME
* C RE ATE _DATE
* C RE ATE _US ER
...
#
o
*
*
*
o
o
UNIT
have
describes
#
*
*
*
*
o
UNIT_ID
U NIT_NAME
C RE ATE _DATE
C RE ATE _US ER
U PD ATE _DATE
UPD ATE _US ER
has
has
describes
HISTORY
#
*
*
*
*
o
o
HIST_ID
D ATE _CH AN GED
R EAS ON
C RE ATE _DATE
C RE ATE _US ER
UPD ATE _DATE
UPD ATE _US ER
9
Definitions Cont.
Primary Definitions
Constraints are rules residing in the
database’s data dictionary governing
relationships and dictating the ways
records are manipulated, what is a
legal move vs. what is an illegal
move. These are of the utmost
importance for a secure and
consistent set of data.
10
Definitions Cont.
Primary Definitions
Data Manipulation Language or DML,
sql statements that insert, update or
delete database in a database.
Data Definition Language or DDL, sql
used to create and modify database
objects used in an application schema.
11
Definitions Cont.
Primary Definitions
A transaction is a logical unit of work that
contains one or more SQL statements. A
transaction is an atomic unit. The effects
of all the SQL statements in a transaction
can be either all committed (applied to
the database) or all rolled back (undone
from the database), insuring data
consistency.
12
Definitions Cont.
Primary Definitions

A view is a selective presentation of the
structure of, and data in, one or more
tables (or other views). A view is a ‘virtual
table’, having predefined columns and
joins to one or more tables, reflecting a
specific facet of information.
13
Definitions Cont.
Primary Definitions
Database triggers are PL/SQL, Java, or C
procedures that run implicitly whenever a table
or view is modified or when some user actions
or database system actions occur. Database
triggers can be used in a variety of ways for
managing your database. For example, they can
automate data generation, audit data
modifications, enforce complex integrity
constraints, and customize complex security
authorizations. Trigger methodology differs
between databases.
14
Definitions Cont.
Primary Definitions
Replication is the process of copying and
maintaining database objects, such as tables, in
multiple databases that make up a distributed
database system.
Backups are copies of the database data in a
format specific to the database. Backups are
used to recover one or more files that have been
physically damaged as the result of a disk
failure. Media recovery requires the restoration
of the damaged files from the most recent
operating system backup of a database. It is of
the utmost importance to perform regularly
15
scheduled backups.
Definitions Cont.
Mission Critical Applications
An application is defined as mission critical,
imho, if
1. there are legal implications or financial loss to
the institution if the data is lost or unavailable.
2. there are safety issues if the data is lost or
unavailable.
3. no data loss can be tolerated.
4. uptime must be maximized (98%+).
16
Definitions Cont.
‘large’ or ‘very large’ or ‘a lot’
Seems odd, but ‘large’ is a hard definition to
determine. Vldb is an acronym for very large
databases. Its definition varies depending on
the database software one selects. Very large
normally indicates data that is reaching the
limits of capacity for the database software, or
data that needs extraordinary measures need to
be taken for operations such as backup,
recovery, storage, etc.
17
Definitions Cont.
Commercial databases do not a have a practical
limit to the size of the load. Issues will be
backup strategies for large databases.
Freeware does limit the size of the databases, and
the number of users. Documentation on these
issues vary widely from the freeware sites to the
user sites. Mysql supposedly can support 8T and
100 users. However, you will find arguments on
the users lists that these numbers cannot be
met.
18
Selecting a DBMS
Many options, many decisions, planning,
costs, criticality.
For lots of good information, please refer to
the urls on the last slides. Many examples
of people choosing product.
19
Selecting a DBMS
How do I Choose?
Which database product is appropriate for my
application? You must make a requirements
assessment.
Does you database need 24x7 availability?
Is your database mission critical, and no data loss can
be tolerated?
Is your database large? (backup recovery methods)
What data types do I need? (binary, large objects?)
Do I need replication? What level of replication is
required? Read only? Read/Write? Read/Write is
very expensive, so can I justify it?
20
Selecting a DBMS
How do I Choose? Cont.
If your answer to any of the above is ‘yes’, I would
strongly suggest purchasing and using a commercial
database with support. Support includes:
 24x7 assistance with technical issues
 Patches for bugs and security
 The ability to report bugs, and get them resolved in
a timely manner.
 Priority for production issues
 Upgrades/new releases
 Assistance with and use of proven backup/recovery
methods
21
Selecting a DBMS
The Freeware Choice
Freeware is an alternative for applications.
However, be fore warned, support for
these databases is done via email to a ad
hoc support group. The level of support
via these groups may vary over the life of
your database. Be prepared. Also expect
less functionality than any commercial
product. See http://wwwcss.fnal.gov/dsg/external/freeware/
22
Selecting a DBMS
The Freeware Choice
Freeware is free.
Freeware is open source.
Freeware functionality is improving.
Freeware is good for smaller non-mission
critical applications.
23
Selecting an Application Layer
Again, planning takes center stage. In the end you
want stability and dependability.
 How many users need access?
 What will the security requirements be?
 Are there software licensing issues that need
consideration?
 Is platform portability a requirement?
 Two tier or three tier architecture?
24
Selecting an Application Layer



Direct access to the database layer? (probably
should be avoided)
Are you replicating? How? Where? With what?
There are no utilities that will port data from 1
database to another (i.e., postgres to mysql). if
database portability is a requirement, an
independent code must be written to satisfy this
requirement.
25
Selecting an Application Layer
Cont.







Application maintenance issues
People availability, working with users as a team, talent,
and turnover? (historically a huge issue)
A ‘known’ or ‘common’ language?
Freeware? Bug fixes, patches…are they important and
timely?
Documentation? Set standards, procedures, code
reviews making sure the documentation exists and is
clear.
Is the application flexible enough to easily accommodate
business rule changes that mandate modifications?
The availability of an ER diagram at this stage is
invaluable. We consider it a must have.
There are no utilities to port data from 1 type of db to 26
Selecting an Application Layer
Misc. application definitions…
This presentation is not an application
presentation, but I will mention a few terms you
may hear.
Sql the query language for relational databases. A
must learn.
ODBC, open database connectivity. The software
that allows a database to talk to an application.
JDBC, java database connectivity.
27
Relational Design
The design of the application schema will
determine the usability and query ability
of the application. Done incorrectly, the
application and users will suffer until
someone else is forced to rewrite it.
28
Relational Design
The Setup
The database group has a standard 3 tier
infrastructure for developing and deploying
production databases and applications. This
infrastructure provides 3 database instances,
development, integration and production. This
infrastructure is applicable to any application
schema, mission critical or not. It is designed
to insure development, testing, feedback,
signoff, and an protected production
environment.
Each of these instances contain 1 or more
applications.
29
Relational Design
The Setup
The 3 instances are used as follows:
1. Development instance. Developers
playground. Small in size compared to
production. Much of the data is
‘invented’ and input by the developers.
Usually there is not enough disk space to
ever ‘refresh’ with production data.
30
Relational Design Cont.
The Setup
2. The integration instance is used for
moving what is thought to be ‘complete’
functionality to a pre production
implementation. Power users and
developers work in concert in integration
to make sure the specs were followed.
The users should use integration as their
sign off area. Cuts from dev to int are
frequent and common to maintain the
newest releases in int for user testing.
31
Relational Design Cont.
The Setup
3. The production instance, real data. Needs
to be kept pure. NO testing allowed. Very
few logons. The optimal setup of a
production database server machine has
~3 operating system logons, root, the
database logon (ie oracle), and a
monitoring tool. In a critical 24x7
supported database, developers,
development tools, web servers, log files,
all should be kept off the production
32
database server.
Relational Design Cont.
The Setup
Let’s talk about mission critical & 24x7 a bit.
1.
To optimize a mission critical 24/7 database, the
database server machine should be dedicated to
running the database, nothing else.
2.
All software products need maintenance and downtime.
Resist putting software products on the db server
machine so that their maintenance does not inhibit the
running of the database. Further, if the product
breaks, it could inhibit access to the database for a
long period. Example, a logging application, monitoring
users on the db goes wild, fills all available space and
halts the database. If this logging app. were not on
the dbserver machine, the db would be unaffected by
the malfunction.
33
Relational Design Cont.
The Setup
3. All database applications and database software require
modifications. Most times these modification require
down time because the schema or data modifications
need to lock entire tables exclusively. If you are sharing
your database instance with other many other
applications, and 1 of those applications needs the
database for an upgrade, all apps may have to take the
down time. Avoid this by insuring your 24/7 database
application is segregated from all other software that is
not absolutely needed. In that way you insure any down
times are specific to your cause.
34
Our 1st relational example
A cpu can
house
1 or more
databases
CPU
(d0ora2)
An database can
accommodate 1 or
more instances
Databases
on d0ora2
(d0ofprd1,
d0ofint1)
An instance may
contain 1 or more
application
schemas
schema
applications in
d0ofprd1
(sam, runs, calib)
schema
applications in
d0ofint1
(sam, runs, calib)
35
What is a schema?
It is
It is not
Tables (columns/datatypes) having
Constraints (not null, unique, foreign &
primary keys)
Triggers
Indexes
etc.
Accounts
Privileges & Roles
Server side processes
The
environment (servers, OS)
The results of queries, I.e objects
Application Code
One implements a schema by running scripts.
These scripts can be run against multiple
servers and should be archived.
36
Relational Design
Getting Started
Using your design tool, you will begin by relating objects that will
eventually become tables. All the other schema objects will fall out
of this design.
You will spend LOADS of time in your design tool, honing, redoing,
reacting to modifications, etc.
The end users and the designers need to be working almost at the
same desk for this process. If the end user is the designer, the end
user should involve additional users to insure an unbiased and
general design.
It is highly suggested that the design be kept up to date for future
documentation and maintainers.
Tables are related, most frequently in a 0 to many relationship.
Example, 1 run will result in 0 or more events. Analyzing and
defining these relationships results in an application schema.
37
What will a good schema design
buy you?
I am afraid the 80% planning 20% implementation
rule applies. Gather requirements.
 Discovery of data that needs to be gathered.
 Fast query results
 Limited application code maintenance
 Data flexibility
 Less painful turnover of application to new
maintainers.
 Fewer long term maintenance issues.
38
Relational Design
Let’s get started
Write a requirements document.
You will not be able to anticipate all requirements,
but a document will be a start. A well designed
schema naturally allows for additional
functionality.
Who are the users? What is their mission?
Identify objects that need to be stored/tracked.
Think about how objects relate to each other.
Do not be afraid to argue/debate the relationships
with others.
39
Relational Design
So how do you get there?
Design tools are available, however, they do not think for
you. They will give you a clue that you are doing
something stupid, but it won’t stop you. It is highly
recommended you use a design tool.
A picture says 1000 words. Create ER, entity relationship,
diagrams.
Get a commitment from the developer(s) to see the
application through to implementation. We have seen
several applications redone multiple times. A string of
developers tried, left the project, and left a mess. A new
developer started from scratch because there was no
documentation or design.
40
Relational Design
How do I get there?
Adhere to the recommendations of your database
vendor for setup and architecture.
Don’t be afraid to ask for help or to see other
examples.
Don’t be afraid to pilfer others design work, if it is
good, if it closely fits your requirements, then use it.
Ask questions, schedule reviews with experts and
users.
Work with your hardware system administrators to
insure you have the hardware you need for the
proposed job to be done.
41
Relational Design
Common Mistakes
Mistakes we see ALL the time


Do not design your schema around your favorite
query. A relational design will enable all queries
to be speedy, not only your favorite.
Don’t design the schema around your narrow
view of the application. Get other users involved
from the start, ask for input and review.
42
Relational Design
Common Mistakes

Create a relational structure, not a
hierarchical structure. The ER diagram
should not necessarily resemble a tree
or a circle. It is the logical building of
relationships between data.
Relationships flow between subsets of
data. The resulting ER diagram’s ‘look’
is not a standard by which one can
judge the quality of the design.
43
Relational Design
Common Mistakes



Do not create 1 huge table to hold 99% of
the data. We have seen a table with 1100+
columns…unusable, unqueryable, required
an entire application rewrite, took over a
year, made 80 tables from the 1 table.
Do not create separate schemas for the
same application or functions within an
application.
Use indices and constraints, this is a MUST!
44
Relational Design
Examples of Common Mistakes


Using timestamp as the primary key assumes
that within a second, no other record will be
inserted. Actually this was not the case, and an
insert operation failed. Use database generated
sequences as primary keys and NON-UNIQUE
index on timestamp.
A table with more than 900 columns. Such
design will cause chaining since each record is
not going to fit in one block. One record
spanning many blocks, thus chaining, hence bad
performance.
45
Relational Design
Examples of Common Mistakes


Do not let the application control a generated
sequence. Have seen locking issues, and
duplicate values issues when the application
increments the sequence. Have the database
increment/lock/constrain the sequence/primary
key. That is why the databases have sequence
mechanisms, use them.
Use indices! An Atlas table with 200,000 rows,
halted during a query. Reason? No indices.
Added a primary key index, instantaneous query
response. Indices are not wasted space!
46
Relational Design
Examples of Common Mistakes
USE DATABASE CONSTRAINTS!!!!!!
Have examples where constraints were not used,
but ‘implemented’ via the api. Bugs in the api
allowed data to be deleted that should not have
been deleted, and constraints would have
prevented the error. Have also seen apis error
with ‘cannot delete’ errors. They were trying to
force an invalid delete, luckily the database
constraints saved the data.
47
Entity Relationship Diagrams
1 to many
PARENT
have
# PAR ENT_ID
A
belong to
# E_ID
B
# B_ID
belong to
have
# C_ID
E
# CHIL D_ID
have
# A_ID
C
CHILD
D
belong to
have
# D_ID
F
belong to
# F_ID
48
Entity Relationship Diagrams
many to many
define
G
H
# G_ID
define
G2
# G2_ ID
I
G2H2
# I2_ID
map to
map to
H2
define
define
relate to
define
I2J2
map to
# H2_ ID
J
# I_ID
I2
# H_ID
owned by
# J_ID
map to
J2
define
# J2_ID
49
Entity Relationship Diagrams
1 to 1
K
define
# K_ID
M
relate to
define
# M_ID
O
# O_ID
L
# L_ID
N
relate to
define
# N_ID
P
relate to
# P_ID
50
Relational Design
The Good
CALIB_TYPE
CALIBRATION
# CAL IB_TYPE_ID
* D ESCRIPTION
# CAL IBRATION_ID
* TSTART
o TEN D
define
be defined by
Calibration type might have 3
rows, drift, pedestal, & gain
This is a parent table.
Each calibration record will be
Defined by drift, pedestal or gain.
In addition to start and end times.
This is a child table.
51
Relational Design
The Bad
CALIBRATION
# CAL IBRATION _ID
* TSTART
o TEN D
define
relate to
define
relate to
define
relate to
DRIFT_CALIB
PEDESTAL_CALIB
GAIN_CALIB
# DRIFT_C ALIB_ID
* TSTART
o TEN D
# PED ESTAL_C ALIB_ID
* TSTART
o TEN D
# GAIN _CALIB_ ID
* TSTART
o TEN D
You have now created 3 different children, all reporting the same information, when 1 child would
suffice. Code will have to be written, tested, and maintained for 4 tables now instead of 2.
52
Relational Design
The Ugly
CALIBRATION
CALIBRATION(2)
# CAL IBRATION_ID
* TSTART
o TEN D
# CAL IBRATION_ID
* TSTART
o TEN D
defines
relate to
PEDESTAL_CALIB
# PED ESTAL_ CAL IB_ID
* TSTART
o TEN D
CALIBRATION(3)
# CAL IBRATION_ID
* TSTART
o TEN D
defines
relate to
defines
relate to
DRIFT_CALIB
GAIN_CALIB
# DRIFT_C ALIB_ID
* TSTART
o TEN D
# GAIN _CALIB_ID
* TSTART
o TEN D
Now you have created 3 different applications, using 6 tables. All of which could be managed with 2 tables.
Extra code, extra testing, extra maintenance.
53
Relational Design
The Good…let’s recap
CALIB_TYPE
CALIBRATION
# CAL IB_TYPE_ID
* D ESCRIPTION
# CAL IBRATION_ID
* TSTART
o TEN D
define
be defined by
AHHH, back to normal, or normalization as we refer to it.
54
Relational Design
What to expect from a design tool




An entity relationship diagram
The ability to create the ddl (data
definition language) needed
The ability to project disk space usage
Ddl in a format to allow you to enter the
code into a code library (cvs), and that will
allow you to run against your database
55
Relational Design Why bother?
Experience from RunII
TO SAVE TIME AND PRECIOUS PEOPLE RESOURCES!
Personnel consistency does not exist. Application
developers come and go regularly. The
documentation that a design product provides will
the next developer an immediate understanding of
the application in picture format.
Application sharing is enhanced when others can look
at your design and determine whether the
application is reusable in their environment. Sam is
a good example of an application that 3 experiments
are now using.
56
Relational Design
Why bother? Cont.
When an application is under construction,
the ER diagram goes to every application
meeting, and quite possibly the wallet of
the application leader. It is the pictorial
answer to many issues.
Planning for disk space has been an issue,
the designer tool should assist with this
task.
57
Planning
Overall
What do I need to plan for?
People, hardware, software, obsolescence,
maintenance, emergencies.
How far out do I need to plan?
Initially 2-4 years.
How often do I need to review the plans?
Annually.
What if my plan fails or looks undoable?
Nip it in the bud, be proactive, come up with
options.
58
Planning
Overall


Disk space requirements. My experience is all the
wags, (wild guesses) fall short of what is needed. It
is hard to predict the number of rows in a table. It
would be easier if we knew the amount and results
of the science ahead of time! Remember, 10x what
you think the data will take.
Hardware requirements. Experience tells us that the
database machine should serve 1 master (if it is a
large database or mission critical), the database,
nothing else. Ideally there will be root, a database
monitor user and a database user, oracle for
example. No apache, no log file areas, no
applications, etc.
59
Planning
Overall


Growth and obsolesce. Plan for 3-4 years before
needing to replace hardware. Hardware and
software become obsolete. New/upgraded software
gives addition functionality that you will want/need.
Maintenance. Do you change the oil in your car?
Plan on 1 morning per month downtime for caring
for the hardware and software. Security patches
could mandate additional stoppages. I cannot stress
how important this is. Fire walling will not protect
you from bugs and obsolescence. If the downtime is
not needed, it will not be taken. Planning
maintenance time is as important as planning to buy
60
disks.
Planning
User Requirements
Will user requirements influence your
hardware & software decisions?
Do you need replication?
What architecture is your api going to be?
How many users will be loading the
database and hardware?
61
Planning
Maintenance

Database/Operating system software need
upgrades. One always hopes one can get
on a stable version of something and not
upgrade. That is a fallacy. Major version
upgrades provide needed and new
functionality. Bug patches and security
patches are a never ending fact of life.
62
Planning
Backup and Recovery
Backup and recovery procedures of vldb
(very large databases) are difficult at best.
Vldb is normally defined as mulitple Gig or
tera byte databases. This is probably the
most sensitive area when choosing a
freeware database.
Hardware plays a part here as well. Insure
when planning for hardware there is plan
for backup and recovery. Disk and tape
63
Planning
Good Practices with a Hammer
Make a standards document and enforce its
use. When dbas and developers are
always on the same page, life is easier for
both. Expectations are clear and defined.
Anger and disappointment are lessened.
System as well as database standards need
to be followed and enforced.
64
Planning
Failover
Yikes, we are down!
Everyone always wants 24x7 scheduled uptime.
Until they see the cost.
Make anyone who insists on real 100% uptime to
justify it (and pay for it?). 98-99% uptime can
be realized at a much lower cost.
Uptime requirements will influence, possibly
dictate, database choices, hardware choices, fte
requirements.
65
Planning
Failover
The cheapest method of addressing a failure is proactive
planning.
Make sure your database and database software are
backed up. Unless you are using a commercial database
with roll forward recovery, assume you will lose all dml
since your last backup if you need to recover. This
should dictate your backup schedule.
Do not forget tape backups as a catastrophic recovery
method.
Practice recovery on your integration and development
databases. Practice different scenarios, delete a datafile,
delete the entire database.
66
Replication
Replication is the process of copying and
maintaining database objects in multiple
databases that make up a distributed
database system. Replication can improve
the performance and protect the
availability of applications because
alternate data access options exist.
67
Replication Cont.




Oracle Supports 3 types of replication READ ONLY Snapshots
(Materialized views), Advanced Replication and streams based
replication.
Streams allows ddl modifications made to the master
automatically.
Streams can be configured in uni-directional ( Single Source
and one or more than targets) or master to master where
updates can happen to any participant database.
Advanced replication also supports master to master . But
streams based replication is recommended.
READ ONLY Snapshots replication from a Sun box to a Sun &
Linux box(s) is being done in CDF. When a replica is under
maintenance there is failover to another replica. The replicas
are up and running in read only mode if the master is down
for maintenance.
68
Replication cont.
Oracle master to master replication allows for
updates on both the master and replica sides.
Master to master is a complex and a high
maintenance replication. It seems to be the 1st
option the unwitting opt for. Both Cern and
Fermi dbas have requested firm justification
before considering this type of replication
request.
Every link in the multi master would be required to
be a fully staffed, as downtime will be critical.
69
Replication cont.
1.
2.
3.
4.
Disk Space for Archives. If receiving site is down for extended
period of time, then source db should be tuned enough to
hold the archives logs, otherwise, one has to reinstantiate the
replication. Reasonable downtime for target depends upon
archive area being generated on source. Space, space and
more space.
Conflict Resolution In Master to Master, conflict resolution
may be challenge. Rules should be well defined to resolve the
data conflicts.
Design of Data Model if Primary Keys are populated by
sequences , there is very much chance of overlapping the
sequences and will cause integrity constraints. Data Model
should be designed very carefully.
DB Support In Master to Master Replication, all master sites
should be in 24*7 support mode. Otherwise , sync up of data
will be challenge or one may lead to reinstantiation of
replication. Reinstantiation is not unplug and play type of 70
situation.
Freeware Replication
MySQL has replication in the last stable
version (3.23.32, v4.1 is out). It is masterslave replication using binary log of
operations on the server side. It is
possible to build star or chain type
structures.
There is a PostgreSQL replication tool. We
have not tested it yet.
71
Lost in Space
Space is the 1 area consistently under estimated in
every application I have seen. Imho,
consistently, data volume initial estimates were
undersized by a factor of 2 or 3. For example,
RunII events were estimated at 1 billion rows.
This estimate was surpassed Feb. 2004. We will
probably end up with 4-5 billion event rows.
That is a lot of disk space.
Disk hardware becomes unsupported, and
obsolete in what seems to be a blink of an eye.
72
Lost In Space cont.
8 x N Gb
N Gb
Unexpected?
Rollback
Indexuse disk
Redo
AllData
databases
to store
data.
Data
Index
mirror
mirror
Backup
Replication
Good rule of thumb:
You need 10x the disk to hold a given amount of data in an RDB.
Operate in 2 year cycles:
• First 2 years storage available on day 1.
• Evaluate growth at end of year 1, begin prep of next 2 yr.
73
Lost in Space, cont.
You will use as much disk space as you
purchase, and then some.
Database indices will take MINIMALLY at
least as much space as the tables,
probably considerably more.
Give WIDE lead time to purchase disk
storage. New disks are not installed and
configured over night. They require
planning, downtime and $.
74
Additional References
**WARNING some of these may be database specific.
 Intro to database design
http://www.cc.gatech.edu/classes/AY2000/cs4400_sprin
g/cs4400a/
 Intro to Oracle tutorial
http://w2.syronex.com/jmr/edu/db/
 Evolutionary Database Design
http://www.martinfowler.com/articles/evodb.html
mentions 1 dba for atlas
 Sql course http://sqlcourse.com/
75
Additional References


***Highly recommended reading, db
comparatives http://wwwcss.fnal.gov/dsg/external/freeware/
db infrastructure standard, support levels,
etc. for fermi computing http://wwwcss.fnal.gov/dsg/external/oracle_admin/
76
Additional References


Oracle Designer tutorial http://wwwcss.fnal.gov/dsg/internal/ora_adm/index.htm#d
esigner (choose Oracle Designer tutorial or
Oracle Designer Short Cuts and Lessons
Learned)
Btev specific additional information
http://wwwcss.fnal.gov/dsg/external/BTeV/index.html
77