Download IDDC & CAM Cores - Harvard University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Database Design
Presented to
Molecular Biology Bioinformatics Meeting
by
Norbert Kremer
April 29, 2002
Database Features
• Efficient storage and retrieval of data
relevant to a specific purpose
– Data integrity
– Security
– Performance, indexing
• Best design depends on application
Database Technologies
• Manual Systems
– Shoebox full of index cards
– Yellow stickies
• Mechanical Systems
– Hollerith Cards
– Indexing holes
Database Technologies
• Automated Systems
– Flat file, fields in fixed or delimited columns
– Early database systems: network, hierarchical
– Client-server paradigm: relational database
management systems
– Object databases, e.g. AceDB
– Object-relational, e.g. Oracle 9
– XML, eg Neocore and Tamino
RDBMS Features 1/2
• Data stored as tables with keys
• Sound mathematical foundations in set
theory, T.Codd, IBM Research Labs
• Relational algebra – closure
• Best for highly structured data
RDBMS Features 2/2
• SQL is most common relational language
• SQL is non-procedural, relationally
complete
• Robust, mature technology, performance
and scalability, security, data integrity
Relational Table Structure
• Examples showing
– One big table
– Multiple tables with keys
• Normalization Process
– Data elements depend on the “key, the whole
key, and nothing but the key”
Data Modeling
• Identify and define data elements
• Discover logical relationships between data
elements
Software Engineering Methodology
ANALYSIS
Identify and
Define data
elements
DESIGN
Logical
Relationships
BUILD
Physical
layout
DEPLOY
Requirements
Doc,
Functional
Spec
Data
dictionary
Technical
Spec
Database
schema
Code,
Release cycle
DB creation
scripts, seed
data
Maintenance
SOPs
Backup,
DB Tuning
Data Integrity
• Constraints enforced by DBMS
• Transaction processing
Data Integrity - Constraints
• Constraints enforced by DBMS
– Central location eases maintenance
– Uniform application in all code
– Cannot be subverted by malicious or lazy
programming
• Foreign keys
Data Integrity - Transactions
• ACID Properties
–
–
–
–
Atomic
Consistent
Isolated
Durable
Data Integrity - Transactions
• ACID Properties
– Atomic
Logical unit of work, complete
all or roll back
– Consistent All constraints observed
– Isolated
Transactions are independent
– Durable
“Commit” is irreversible
Software Engineering Methodology
ANALYSIS
Identify and
Define data
elements
DESIGN
Logical
Relationships
BUILD
Physical
layout
DEPLOY
Requirements
Doc,
Functional
Spec
Data
dictionary
Technical
Spec
Database
schema
Code,
Release cycle
DB creation
scripts, seed
data
Maintenance
SOPs
Backup,
DB Tuning
PowerDesigner Demo
• Graphical communication among analysts,
architects, developers, end users
• Centralized maintenance of database
schema and data dictionary
• Forward engineering (SQL script from
model)
• Reverse engineering (model from existing
database)
• Medline example, showing modeling
process for “associative tables”
References
• WinSQL graphical database client for any ODBCcompliant database (just about all) Get the free Lite version
from http://www.indus-soft.com/winsql/
• ODBC driver for MySQL (also need the usual MySQL
client installed)
http://www.mysql.com/downloads/api-myodbc-3.51.html
• Powerdesigner 9 Trial The trial is fully functional for 45
days. See
http://www.sybase.com/products/enterprisemodeling/powerdesigner
Related documents