Download RDBMS - Simmons College

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
RDBMS
RDBMS
Introduction
Any collection of objects, intentionally organized for a purpose and efficient
retrieval is a “database”. In that sense, an entire library is a “database”; the term
“database” is used casually and usually incorrectly to refer to the computer-based
“relational database management systems” or “RDBMS”, discussed below. A
RDBMS refers to the entire concept of business requirements, rules for describing
data, commands for creating and manipulating the data, and generating reports,
though you should note that most people use the term to refer solely to the
computer program.
A RDBMS has several major components: the business logic (or rules that govern what data are to be
collected, by whom, when, and how they’re to be used), data themselves (grouped by functions based on
the business needs, and the specific types of data (see below), and the commands to manipulate the data.
What is informally called a “database” is usually the computer application that carries out the business rules
and manipulates the data. Some database applications, such as FileMaker Pro, MS Access, and
phpMyAdmin provide GUI tools to facilitate defining databases, data tables, and creating reports from the
data. Other tools, such as MySQL, Oracle, Sybase are just the tool for creating and manipulating data - the
programmer must create the rest of the computer application for input, reports, etc. See also the .pdf
(Database1.pdf )
For students new to RDBMS, it is helpful to think of it as a spreadsheet. There are columns and rows. Every
cell in the spreadsheet has a unique row and column address (e.g., row 5, column 2). If we know the row
and column, we can find the data. RDBMS are computer implementations of the relational algebra that
makes it possible to identify uniquely every cell and every set of related cells. This is the key: we can
identify a single cell by some unique value, such as a record number, or by a set of values, such as lastname
+firstname+date_of_birth+shoe_size.
Steps in creating a RDBMS
In both systems analysis and information architecture, one of the first phases of creating a RDBMS is to
gather information about what data need to be collected, how they will be used and by whom, and the
technical concerns about storing, retrieving, and presenting the data.
Data collection: only data that are actually needed ought to be collected. This is determined by
interviews, reviewing emails and other files that document the kind of problems people have with their
current system, and by studying how people in an organization use the data. Ultimately a report is
generated that outlines the problem and presents candidate solutions. From that report, a database
administrator or programmer considers the data problem from three perspectives: the conceptual, logical,
and physical.
Conceptual phase: in this phase, the programmer or analyst breaks down the needs into functions
and the relationships among the data and the people who use the data. For example, say you’re creating a
payroll system (this becomes the “payroll database”). You need to know the name of the staff, their pay
rate, hours worked, whether the check has been printed, how much taxes and insurance to be deducted,
and so on. In the conceptual phase, you’d decide how the data should be grouped by function: e.g., record
hours worked function, print paycheck function, check data for accuracy function, and so on. At this high
level, only the major functions are defined and related and will become the “tables” that belong to the
single database. How the data move between these functions is called “data flow analysis.” The analyst
Relational Database Management Systems
1 of 8
RDBMS
determines also who gets to see what data and when (this is called the “view of the data”). For example,
Joe cannot see Tom’s paycheck. Joe cannot change his pay rate but his manager can. These controls have
to be specified and documented.
From this we can move to the logical phase. Here the main activities are “data decomposition” and
“data normalization.” Data decomposition means breaking down the concepts into something closer to
how a computer might use them. The concept of “staff name” must be broken down, say, into last name,
first name, middle initial, job title. In addition, the analyst tries to remove redundant data from being
collected. For example we want to gather the staff member’s name only once and at the right time (when
hired). This task is called normalization. There are many forms of normalization but the goal for most of us
is to reach “third normal form” or 3NF. FileMaker Pro’s and MS Access’s interface enforces 3NF. Otherwise
the programmer is required to enforce normalization. [There are times when redundant data must be
captured, but this has to be justified in the design of the database system.]
All the data that are to be captured must be defined by being given a data type (see the Topic
Data_Types) and all the data gathered into a document called the data dictionary. The Data Dictionary is
literally that - the names of all the data to be captured, their data types, their aliases, how the data are used,
what table contains what fields, etc. It is critical to the success of any project to have and maintain the data
dictionary.
Physical phase. Once the data have been defined, the programmer uses the logical design phase
documents to construct the actual database and tables in software. If you use MS Access or similar
product, the Access program guides you but it is still easy to make mistakes if you don’t understand how
RDBMS work. We’ll focus on MySQL to demonstrate.
MySQL is a software program that helps you to create other software programs, specifically the
data part of a RDBMS, by issuing various commands. You’ll still need some kind of programming or
scripting to interact with the database and tables. [In LIS488, we’ll use php and Java.] All SQL programs
cluster their commands into two groups: Data Definition Language (or DDL) and Data Manipulation
Language (DML).
DDL commands include CREATE database... , CREATE table … and others
DML commands include INSERT into …, SELECT …, MODIFY … and others
Example: Let’s say you’re creating a database about yourself and you want to insert into and retrieve data
from the database over the Internet. As a first try, we create a database called “Transcripts” and then a
table called “grades”. This is a possible data definition:
field name
data type
size
example
last_name
String
25 characters Smith, De la Rosa
first_name
String
15 characters Jane, Tom
middle_initial char
1
M
collegeName String
25
Wellesley College
major
String
25
Art History
age
int
24
course_1
String
6
LIS488
grade_1
float
4
4.0
course_2
String
6
LIS458
grade_2
float
4
3.7
[and so on...]
Relational Database Management Systems
2 of 8
RDBMS
If we know these data, we can play the kind of report (screen) design you want:
About me
My classes
Welcome to Jane Doe’s online resume
I’m a 24 years old Art Major from Wellesley College now enrolled in GSLIS at
Simmons College.
LIS488
4.0
A
LIS458
3.7
AGPA: 3.52
To get these data, we issue the command to SQL to use the database we want: “USE Transcripts”
Then we issue the command to the table that is part of the database (“grades”):
SELECT * FROM grades;
* means “all” so the command is “get all the records from the grades table”. Because there is only 1
record in the grades table, we retrieve all our data (in this case only the 1 record).
If everyone in the class wanted to post their grades online, then it would make more sense to
separate (decompose) the data into different functions: the student name and personal data in one table,
the grades in another table, and the information about the courses into another table.
table 1
student_names
database studentInfo
table 2
grades
table 3
class_info
So, let’s redefine our database tables and update the data dictionary.
Database:
studentInfo
tables:
student_names, grades, class_info
table:
table:
table:
student_names
record_no
last_name
first_name
middle_initial
major
college
int not null auto_increment
String 25
String 15
char
1
String 25
String 25
grades
record_no
class_number
grade
not not null auto_increment
String 6
not null
String 2
class_info
class_number
class_name
desc
String 6
String 25
Text
Relational Database Management Systems
3 of 8
RDBMS
Note that we define Strings but must choose the right kind of String for our database MySQL.
Strings are “varchar()” [a fixed length field from 0-255 characters long]. A “text” holds up to 65,535
characters.
Using this data dictionary, the programmer creates the physical form of the database and tables.
The underline means the field is indexed. More on that shortly. SQL commands end with a semi-colon ;
CREATE database studentinfo;
Now, let’s use the empty database and add tables: USE studentinfo; To create tables, we add these
commands:
CREATE TABLE student_names (
record_no int unsigned not null auto_increment,
last_name varchar(25),
first_name varchar(15),
middle_initial char varchar(1),
major varchar(25),
college varchar(25),
primary key (record_no)
);
[press the return key]
CREATE TABLE grades (
record_no int unsigned not null,
class_number varchar(6) not null,
grade varchar(2)
);
CREATE TABLE class_info (
class_number varchar(6),
class_name varchar(25),
desc Text
);
Notice that student_names.record_no field is “int unsigned” - this means the record number is an integer
between 0 - 65,535. It has to be an integer because the computer performs an arithmetic function on the
record number every time a new record is added. [The old record number is reviewed and 1 added to it to
create the new record number.] It is “not null” because we want to index that field for fast retrieval. We
cannot index on a value that is missing so the addition of “not null” forces the SQL program to give us an
error if we try to save the record without a value. We know there will be an index (and the primary key) by
the last line of the create statement. In the grades table, we see two fields cannot be null. Later we would
create indices on these fields, too. Finally, notice that the class_info table has a field called “desc” and a
value of Text. This means we can enter up to 63,535 characters in this field - far longer than we need to add
a course description.
Assignment or Lab - Due before the next class.
1. Practice the various commands listed below and compare their behavior with different options.
Relational Database Management Systems
4 of 8
RDBMS
Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations
No doubt, you’ve learned already some of the concepts & terminology of relational database modeling. To
get up to speed with what you’ve covered, here are some quick notes that may help us harmonize our
perspectives.
Components of a DBMS:
Many roles and activities are involved; not all of which you may have encountered. Here’s a view of
the components of a DBMS. Notice the different contributions of programmers, users, and the DBA - the
database administrator. Note, too, the functions that constitute a DBMS (and where the DDL and DML fit
in). Ultimately the commands and functions must be communicated to the computer system itself - via the
file manager, various other access methods and buffers before reaching the actual data stored in the
relationship databases and tables.
Query Processor: transforms queries into a
series of low-level instructions directed to the
database manager.
Database Manager (DM): Interacts with the
user-submitted application programs and
queries. The DM accepts queries and examines
the external and conceptual schema to
determine what conceptual records are
required to satisfy the request. The DN places a
call to the File Manger to perform the request.
File Manager: manipulates the underlying
storage files and manages the allocation of
storage space on the disk. Actual physical
manipulation of the data is passed to the
appropriate access method.
DML preprocessor: converts DML statements
embedded in an application program into a
standard function call in the host language
(e.g., MySQL).
DDL compiler: compiler converts DDL
statements into a set of tables containing metadata. These tables are then stored in the catalog while control info is stored in the data files headers.
Catalog manager: [not the library kind!] - manages access to and maintains the system catalogue; the
system catalog is accessed by most of the DBMS components.
There are other important functions in SQL software such as authorization control, command processor,
integrity checker, query optimizer, transaction manager, scheduler, recovery manager, and buffer manager.
Relational Database Management Systems
5 of 8
RDBMS
Most of these functions are performed by the 3rd party application (such as MS Access) or can be
manipulated at the command line or in other resource files. Very large systems, such as those that usually
use Oracle have lots of these helper resources.
Terms:
Relation:
Attribute:
Domain:
a relation is a table with columns & rows
an attribute is a named column of a relation
is the set of allowable values for one or more attributes.
Figure 2: Instances of two tables (branch and staff relations):
What to know
Database
Entity
Attribute
Domain
Tuple
Degree
Cardinality
Superkey
Candidate key
Relational db
Shared collection of logically related data (and a description of these data),
designed to meet the info needs of an organization.
Distinct object (person, place, thing, concept, or event) represented in the
database
A property that describes some aspect of the object we wish to record; also a
named column of a relation.
Set of allowable values for one or more attributes.
Row of a relation
The degree of a relation is the number of attributes it contains.
The number of rows in a relation.
Attribute or a set of attributes that identifies uniquely a tuple within a relation.
Superkey such that no proper subset is a superkey within the relation.
Collection of normalized relations.
Relational Database Management Systems
6 of 8
RDBMS
Primary key
Foreign key
Candidate key that is selected to identify tuples uniquely within the relation.
Attribute or set of attributes within one relation that matches the candidate key of
some (possibly the same) relation.
Null
A value for an attribute that is currently unknown or is not applicable for this tuple
Referential integrity If a foreign key exists in a relation, either the foreign key value must match a
candidate key value of some tuple in its home relation or the foreign key value
must be wholly null.
Join
The union of two or more tables.
Relationship
Association between several entities.
Relation
A table with columns and rows.
DDL
Data Definition Language - set of commands to define the data and constraints on
the data stored in the database
DML
Data Manipulation Language - provides a general enquiry facility to the data (aka
query language)
View of the data
Dynamic result of one or more relational operations on the base relations to
produce another relation. A view is a virtual relation that doesn’t actually exist in
the DB.
Logical db design
The layout of the relationships of data, using specific design techniques and tools
that document the needs and uses of data, according to an organization’s specific
data needs.
Business rules
The articulation of an organization’s data needs.
Physical db design
The implementation of the logical database design, usually expressed as the
creation of databases, tables, indices, etc.
OODBMS
Object-Oriented Database Management System.
External view
Users’ view of the db
Conceptual level
What data are stored in the db and the relationship among the data
Internal level
Physical representation of the db on the computer.
DB Schema
The definition of a database; the structure and content in each data element
within the structure. Often created using visualization tools.
Data independence
Techniques that allow data to be changed without affecting the applications that
process it.
5GL
Fifth Generation Language - expression referring to computer languages that with
each iteration are closer to human language by generation, e.g., 4GL, 5GL
Data normalization
The process of analyzing data into record groups for more efficient processing.
There are many stages, the most standard result being “3NF” (third normal form)
where data are identified only by the key field in their record. The main purpose is
to eliminate having to store a single datum in more than one place.
3NF
The standard
Information system
Resources that enable the collection, management, control, and dissemination of
info throughout an organization.
DB app lifecycle
DB planning, system definition, requirements collection and analysis, db design,
application design and prototyping, etc.
Requirements analysis Process of collecting and analyzing info about the part of the organization that is
to be supported by the DB application and using this info to identify the users’
requirements of the new system.
Relational Database Management Systems
7 of 8
RDBMS
CASE
Computer-Aided Software Engineering - usually software that helps in the
development of an information system, including analysis, design, and
programming.
Data dictionary
The document (or database) that defines the databases, tables, data types,
sources, etc., for an information system.
Entity type
An object or concept that is identified by the organization as having an
independent existence.
Weak entity type
Entity that is existence-dependent on some other entity.
Strong entity type
Entity that is not existence-dependent on some other entity.
Composite attribute
Attribute composed of components, each with an independent existence.
Multi-valued attribute Attribute that holds multiple values for a single entity
Derived attribute
Attribute that represents a value that is derivable from the value of a related
attribute or set of attributes not necessarily in the same entity.
Superclass
Entity type that includes distinct subclasses that requirement to be represented in
the data model.
Normalization
The process of producing a set of relations with desirably properties, given the
data requirements of an enterprise.
1:M
Representation of the one-to-many relationship, one data element related to
many others.
M:N
Representation of many-to-many data relationships; database normalization
breaks down M:N to 1:M
1:1
One-to-one data relationship.
Oracle
Popular very large scale relational database product.
SELECT
Database command, or statement, to select data from a table
ALTER
Database command, or statement, to modify the structure of a database table
UPDATE
Database command to change the data contents of an existing row.
INSERT
Database command to add new data (add a new row) into a table.
phpMyAdmin
Web-based GUI for working with MySQL
Data decomposition
Breaking down of work functions into discrete units of data
ER
Entity-relation diagram, a graphic representation of data relationships
UML
Unified Modeling Language, an object oriented analysis and design language; has
12 diagrams (four structural, five behavioral, and three model management
(packages, subsystems, and models).
Web-enabled DB
A relational database that has been linked to the Internet.
Object-oriented programming Writing software that supports a model wherein the data and associated
processing (“methods”) are defined as self-contained “objects.” OOP has three
major features, encapsulation, inheritance, and polymorphism.
Relational Database Management Systems
8 of 8