Download Physical Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Physical Design - RDBMS
LIS458 | Benoit
Spring 2011
1
Monday, March 14, 2011
Today
✤
Main points about the transition from conceptual & logical to physical
aspects of RDBMS
✤
Main points about what to look for
✤
Supplemental points about the mechanics of RDBMS systems
✤
DCL, DML
✤
Students’ demo of candidate queries on their tables
✤
The other readings, hands-on practice with phpMyAdmin and the
terminal window and especially Forta’s text come into play today!
2
Monday, March 14, 2011
Example (Microsoft)
believe it or not
✤
SQL Server 2000
✤
The I/O subsystem (storage engine) is a key component of any relational
database. A successful database implementation usually requires careful
planning at the early stages of your project. The storage engine of a
relational database requires much of this planning, which includes
determining:
✤
What type of disk hardware to use, such as RAID (redundant array of
independent disks) devices. ...
✤
How to place your data onto the disks ...
✤
Which index design to use to improve query performance in accessing
data. ...
✤
How to set all configuration parameters appropriately for the database to
perform well. … [http://msdn.microsoft.com/en-us/library/aa178575
(v=sql.80).aspx]
Monday, March 14, 2011
3
Main points
✤
After needs assessment, ER design, schema refinement, and definition of
views, we have addressed most of the conceptual and logical (or external)
schema issues related to our data needs
✤
Now must consider
✤
what to index on
✤
how to cluster data
✤
what will impact optimizations for storage and retrieval
✤
when to break the rules of data decomposition
✤
what should we do to the conceptual schema if the queries we create are
unwieldy or not efficient? Can we undo the data?
✤
Should we consider optimizations in the indices, such as Hash, B+ tree
others
✤
How might we cluster and join our data?
Monday, March 14, 2011
4
A model of the design process
External Model
Application 1
External
Model
Application 1
Conceptual
Requirements
Conceptual
Model
Application 2
External
Model
Logical
Model
Physical
Model
Application 2
Conceptual
Requirements
Internal Model
5
Monday, March 14, 2011
Review - Conceptual Modeling
✤
Usually the results of Systems Analysis
✤
Conceptual level: top-down (or bottom-up) that translates the “business
information requirements” into an operational database
✤
Info requirements are tightly coupled with business function requirements
✤
Objective is to define and model the things of significance that the business
needs to know and the relationships between them
✤
Ignores specifics of hardware and software
✤
Higher level look at the “database”
6
Monday, March 14, 2011
Review - Data modeling (logical)
✤
Objective is to map the information requirements reflected in the EntityRelationship Model (and its helper tools, relational schema or variants
such as UML) into a Relational Database Design
✤
Necessarily software specific because data types vary by implementation
✤
Should be independent of the hardware
✤
✤
Not always the case in commercial systems!
So by now your understanding of the data needs and the components of
RDBMS should be in place so we can
✤
transform entities into tables
✤
transform attributes into columns
✤
transform domains into data types and constraints.
7
Monday, March 14, 2011
Physical Modeling
✤
To create the physical relational database, tables, etc., to implement in
machine form the database design
✤
Hardware and Software dependent
✤
Introduces file structure and memory requirements into the database
designers’ world! [Design decisions affect the physical storage and
retrieval of data - from the machine’s p.o.v. and from the user’s]
✤
International standard for communicating with the hardware - structured
query language (SQL)
✤
Data Creation Language (DCL) [Data Control Language?]
✤
Data Definition Language (DDL)
✤
Data Manipulation Language (DML)
8
Monday, March 14, 2011
Physical design
✤
To be efficient - Optimize performance of databases
✤
Implements the requirements of users and data gathered during the design
phase
✤
Especially in large systems, most DB designers determine the physical file
storage requirements
✤
The normalized relations + size estimates for them
✤
Descriptions of the attributes (e.g., varchar(25))
✤
When, how, and how often data are manipulated: entered, retrieved,
deleted, updated
✤
Expectations of the data: speed of retrieval, security, backup, recovery,
retention, integrity
✤
Descriptions of the technologies used to implement the DB
9
Monday, March 14, 2011
Physical design
✤
We are, then, affected by
✤
Systems performance - storage formats
✤
While storage is cheap, it isn’t free: the size of the fields add up, so don’t
waste space
✤
Need to store strings correctly and be able to manipulate data
appropriately (numbers and strings)
✤
Physical record composition
✤
Data arrangement
✤
Indices
✤
Query optimization and performance tuning
10
Monday, March 14, 2011
Remember...
✤
The power of RDBMS is in its making links to and among data
✤
Tables are related to each others through columns of data sharing identical
data (the keys)
✤
Each table is based on set theory - ultimately each element in the set must
be unique)
✤
Relational bases usually manipulate a set of data at a time rather than a
single record of a time.
✤
Rows of data are called tuples - each are uniquely identified; a
phenomenon of interest in the organization
✤
Rows consist of columns or attributes that describe the phenomenon
11
Monday, March 14, 2011
Terminology
Table (Relation)
Row (tuple)
ID
Name
Phone
clientID
201
Snowflake
555-1212
12
202
Crumpet
617-3038
14
203
Fishlips
555-9383
2
Column (attribute)
12
Monday, March 14, 2011
Terminology
✤
Primary key (PK) - a column or set of columns that identify uniquely each
row in a table
✤
A PK of multiple columns is a Composite Primary Key
✤
No part of the PK can be null
✤
Auto-generated data are great for this.
✤
Foreign key (FK) - a column or combination of columns in one table that
refer to a primary key in the same or another table
✤
A FK must match an existing primary key value
✤
If a FK is part of a primary key, the FK cannot be null
13
Monday, March 14, 2011
Terminology
✤
ERD and/or UML should be accurate representations of the info needs
and organization’s activities
✤
Effective means for collecting and documenting an organization’s info
needs
✤
To facilitate communicating ideas to the users
✤
Facilitate development of the physical design because relations are
clarified
✤
ERDs are often part of other, larger projects
✤
Goal is to decompose by understanding the work flow processes into 1:M
or 1:1
✤
Goal is 3NF: all attributes of the entity depend on the primary key
✤
If you can get to the row via the key, you can get all the data
14
Monday, March 14, 2011
Data types (e.g., MS Access)
✤
Numeric (1, 2, 4, 8 bytes; fixed or float)
✤
Text (255 characters max)
✤
Memo (64000 max)
✤
Date/Time (8 bytes)
✤
Currency (8 byte, 15-digits + 4 decimals)
✤
Autonumber (4 bytes)
✤
Yes/No (1 bit)
✤
Hyperlinks (64000 characters max)
✤
Byte (0-255)
✤
Integer (-32,768 to 32,768)
✤
Long Integer, String, Double, etc.
15
Monday, March 14, 2011
Data Types (MySQL)
16
Monday, March 14, 2011
Physical Requirement Issues
✤
A physical record is a group of fields stored in adjacent memory locations
and retrieved together
✤
Combination of fixed length & variable length fields
✤
How are data stored?
✤
On Disks
✤
In a buffer
✤
With other data representing the relational data on the disk
✤
Processor cache, Disk speed, RAM, other storage devices all add up
to the length of time it takes to retrieve data
17
Monday, March 14, 2011
By the way ...
•
Null values cause trouble because
different versions of SQL treat nulls
differently; also null values affect
sums, counts, etc.
•
Use “is null” or “is not null”
SELECT * FROM myTable WHERE
fname is null;
18
Monday, March 14, 2011
By the way...
✤
Recall we need to be aware of nulls! What would happen to our data and
model if there were null values?
✤
When is there an advantage, if there is, to fixed length and variable length
fields?
✤
Should the RDBMS system assign values to sequences (e.g., auto number)?
✤
3NF vs. BCNF
✤
All decisions about data are affected by the work flow and organization’s
info needs
✤
including the decision to break the rules and “denormalize” the data
19
Monday, March 14, 2011
Records on the disk ...
✤
Consist of variable and fixed length - but how to know where the data are?
✤
Draw on the board how fixed, variable length, and record references
appear
✤
header: base address (B); location of data is Address = B + length 1 +
length 2…
✤
Record Header: pointer to the schema, to the length of the record, and to
the timestamp … then the data
✤
Fixed fields first; then pointer from header to variable length - see
MARC
✤
“Reference fields” are pointers to -other- chunks of data located on the
disk. They represent the 1:M and M:N relationships.
✤
Physical space on disk: record may require more than 1 block of disk
space so need a pointer to the next block’s location on the disk
20
Monday, March 14, 2011
BLOBs; Disk Space
✤
Binary large objects - images, sounds, etc.
✤
RDBMS tries to store these items in contiguous blocks
✤
Note: the commands affect disk space - the data are shifted or new blocks
are required;
✤
The host, disk, cylinder number, track number, block within the track
and offset block are required to be stored, too.
✤
Consequently, there are many techniques: we don’t have to manipulate
them but it’s useful to know about them.
21
Monday, March 14, 2011
Access Methods comparison
Factor
Sequence
Indexed
Storage space no wasted space No waste but need index data
Hasted
More space needed for add/del
Sequential on
primary key
very fast
moderately fast
impractical
Random Retr
impractical
moderately fast
very fast
Multikey
Ok but needs full
scan
Very fast with multi index
not possible
Deleting
Can waste space
Ok, if dynamic
very easy
Adding
Requires
rewriting
ok, if dynamic
very easy
Updating
Usually rewrite Easy but requires index maint
very easy
22
Monday, March 14, 2011
In short...
1. What storage and media are used?
2. How big is the database? How will it grow over time?
3. What are the required access speeds?
4. Should data be partitioned somehow?
5. Should the data be stored centrally or distributed? On what servers?
6. Who is responsible for maintaining the physical data (the computers) and
the data (indices and other needs)?
7. Who controls quality assurance and quality control on updates and
additions to the data, the programs?
8. How does your documentation look? Would someone else be able to
follow your analysis, documentation, data design, etc.?
23
Monday, March 14, 2011
And ...
1. What programs (applications) can reach your data?
2. Is the integrity of the data (referential, null values, domain and range)
addressed?
1. Where will be the quality control?
1. Data control on insertion (e.g., web forms & JavaScript or on the
server in the program?)
2. Formatting data on output (have you checked for nulls, numbers, and
Strings? etc.?)
3. What are the permissions (grant rights) on your database and tables?
24
Monday, March 14, 2011
Data Retrieval
✤
✤
SELECT Statements
✤
Used to retrieve data from the RDBMS in an ad-hoc manner
✤
Data are returned almost always in a table (rows of data described by
columns). Programming languages have optimized ways of getting
data out of these tables:
✤
Java uses “ResultSet”
✤
PHP uses $
There’s a logic to how commands are structured - but it’s always best to
check the documentation of your version of SQL!
25
Monday, March 14, 2011
SELECT syntax
SELECT
is a list of at least one column
DISTINCT
suppresses duplicates
*
selects all (*) columns
COLUMN
selected named column(s)
Alias
Gives selected columns a heading
FROM table
Specifies the source table(s)
Condition
e.g., WHERE column names, expressions, constants and comparison
ORDERED BY
specifies the display order
ASC
in ascending order
DESC
in descending order (default)
26
Monday, March 14, 2011
Select example
✤
SELECT bookCol.callNo, bookCol.loanedTo, userGroups.userID FROM
✤
bookCol, userGroups
✤
WHERE
✤
userGroups.userID = ‘100’ AND
✤
bookCol.loanedTo = userGroups.userID
27
Monday, March 14, 2011
Some options on rows
✤
✤
On individual rows:
✤
LOWER, UPPER, CONCAT, SUBSTRING, LENGTH (on strings)
✤
ROUND, TUNC, MOD (on numbers)
✤
MONTHS_BETWEEN, ADD_MONTHS, NEXT_DAY, LAST_DAY,
ROUND TRUNC (on dates)
✤
TO_CHAR, TO_DATE, and others (conversion functions)
On multiple rows (GROUP BY, HAVING clause)
✤
AVG
✤
COUNT
✤
MIN, MAX
✤
SUM, STDDEV, VARIANCE
28
Monday, March 14, 2011
DDL
✤
CREATE, ALTER, DROP, RENAME, TRUNCATE
✤
CREATE VIEW theData AS SELECT …
✤
FROM …
✤
WHERE ...
29
Monday, March 14, 2011
DCL
✤
Some say Data Control Language, some say Data Creation Language
✤
GRANT, REVOKE
✤
Transaction Control:
✤
COMMIT, ROLLBACK, SAVEPOINT
30
Monday, March 14, 2011
DML
✤
INSERT, UPDATE, DELETE
✤
INSERT INTO table [(column [, column…])] VALUES (value, [,
value…]}];
✤
UPDATE table SET columnName = value WHERE condition;
✤
DELETE FROM table WHERE condition;
31
Monday, March 14, 2011
OORDBMS
✤
Increasingly popular - much more work for the programmer and designer
32
Monday, March 14, 2011
Documentation
✤
See the MySQL homepage
✤
Worth finding a text and websites whose examples you can understand and
apply.
✤
Practice commands using terminal window or phpMyAdmin and save the
commands that work (with comments) in a text file for your own use. This
is extremely useful for documentation and remembering what to do on
your next project!
✤
Example: Practice-DB-SQL.pdf
✤
http://web.simmons.edu/~benoit/LIS458/Practice-DB-SQL.pdf
33
Monday, March 14, 2011
Students...
✤
Who wants to volunteer to issue commands on what they’ve created?
✤
Examples: Using Perl, PHP and Java as part of web-enabled RDBMS.
✤
First Java, Perl, then PHP – note: the purpose is not to master writing the
code but to see the parallels of connecting to the SQL server, creating a
bridge to send/receive data, an object to capture the data, and then how
the data are extracted row by row or element by element and then
wrapped in HTML (or XML) to be returned to the user.
34
Monday, March 14, 2011
Example using a Java program
import
import
import
import
import
import
import
java.io.*;
java.sql.*;
javax.servlet.*;
javax.servlet.http.*;
javax.sql.*;
java.util.*;
java.math.*;
35
Monday, March 14, 2011
public void doPost(
HttpServletRequest req, HttpServletResponse res)
throws ServletException, IOException {
…
theServer != req.getServerName();
id = req.getParameter("idno");
…
res.setContentType("text/html");
sos = res.getOutputStream();
36
Monday, March 14, 2011
Connection cu = null;
Statement su = null;
ResultSet ru = null;
String driver = "org.gjt.mm.mysql.Driver";! // LINUX
// driver = "com.mysql.jdbc.Driver";! // MAC
try {
Class.forName( driver ).newInstance();
cu = DriverManager.getConnection(
"jdbc:mysql://" + theServer +":3306/myDB”, “gb”, “cat”);
// myDB = database; gb = db use name; cat = password
su = cu.createStatement();
su.executeUpdate(
“DELETE FROM users WHERE idno=’” + id +”’”);
Monday, March 14, 2011
Example of retrieving data
try {
Class.forName( driver ).newInstance();
con = DriverManager.getConnection("jdbc:mysql://" +
theServer +":3306/"+dbName, dbUser, dbPassword );
stmt = con.createStatement();
rs = stmt.executeQuery( reviewQuery );
while (rs.next()) {
sos.println(“Welcome, “+rs.getString(“first_name”));
38
Monday, March 14, 2011
} catch (Exception e) {
if (e instanceof SQLException) {
SQLException sqlex = (SQLException)e;
sos.println("SQL state = "+sqlex.getSQLState());
sos.println("<br/>Error message = "+e.getMessage()
+ sqlex.getErrorCode());
if ((sqlex.getErrorCode()) == 1045) {
sos.println("<hr>Sorry, the connection has been
refused by the database server.");
}
}
} finally {
! if (con != null) {
! try {
! ! con.close(); stmt.close(); rs.close();
! ! } catch (Exception ee) {
! }
}
39
Monday, March 14, 2011
public boolean checkID(String id, String password,
!
String tableName, String idfield, String dbUser, String dbPassword, String dbName,String theServer) {
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
}
String checkID = "SELECT * FROM "+tableName+ " WHERE idfield+"='"+id+"' AND password='"+password+"'";
boolean returnValue = false;
Connection con = null; ! Statement stmt = null; ! ResultSet rs = null;
try {
!
Class.forName( driver ).newInstance();
!
con = DriverManager.getConnection("jdbc:mysql://" + theServer +":3306/" +
dbName, dbUser, dbPassword );
!
stmt = con.createStatement();
!
rs = stmt.executeQuery( checkID );
!
if (rs.next()) {
!
!
returnValue = true;
!
} else {
!
!
returnValue = false;
!
}
} catch (Exception e) {
!
if (e instanceof SQLException) {
!
!
SQLException sqlex = (SQLException)e;
!
!
errorStatus = "SQL state = "+sqlex.getSQLState()+" Error message = "+sqlex.getMessage();
!
!
errorStatus += "<br/>Database status: state code "+sqlex.getSQLState();
!
!
if ((sqlex.getErrorCode()) == 1045) {
!
!
!
errorStatus += "<hr>Sorry, the connection has been refused by the database server.";
!
!
}
!
} else {
!
!
errorStatus += "<blockquote>Error message "+e.getMessage() +"</blockquote>";
!
}
} finally {
!
if (con != null) {
!
!
try {
!
!
!
con.close();
!
!
!
rs.close(); stmt.close();
!
!
} catch (Exception ee) {
!
!
}
!
}
}
return returnValue;
40
Monday, March 14, 2011
Perl example
✤
NEAR THE TOP OF YOUR SCRIPT ADD THIS CODE
USE MYSQL;
$DBHOST = "LOCALHOST";
$DBNAME = "MYDATABASE";
$DBUSER = "PERLSCRIPTS";
$DBPASS = "YWE6YWNQ";
$DB = MYSQL->CONNECT($DBHOST, $DBNAME, $DBUSER, $DBPASS);
$DB = MYSQL->CONNECT($DBHOST, $DBNAME, $DBUSER, $DBPASS);
$QRY = QQ~SELECT * FROM EMPLOYEES WHERE ID < 100~;
WHILE( @EMPS = $QRY->FETCHROW) {
PRINT QQ~
$EMPS[0], $EMPS[1], $EMPS[2] <BR>
~;
}
THE CODE ABOVE, WHEN TRANSLATED INTO ENGLISH, SAYS "CONNECT TO THE
SERVER, SELECT ALL COLUMNS FROM THE TABLE NAMED EMPLOYEES WHERE ID
IS LESS THAN 100, THEN WHILE THE DATA IS PLACED INTO AN ARRAY CALLED
EMPS USING THE FETCHROW METHOD,
PRINT COLUMNS 1, 2 AND 3 THEN A LINE BREAK."
41
Monday, March 14, 2011
PHP Examples
<?php
$username = "pee_wee";
$password = "let_me_in";
$hostname = "localhost";!
$dbh = mysql_connect($hostname, $username, $password)
! or die("Unable to connect to MySQL");
print "Connected to MySQL<br>";
$selected = mysql_select_db("first_test",$dbh)
! or die("Could not select first_test");
// you're going to do lots more here soon
mysql_close($dbh);
?>
See also 488 notes on PHP and UsingPHPandMySQL.txt
42
Monday, March 14, 2011
1 of 4
#!/USR/LOCAL/RBIN/PERL
USE DBI;
PRINT <<END;
CONTENT-TYPE: TEXT/HTML
<HTML><HEAD>
<TITLE>EXAMPLE OF PERL CALLING MYSQL</TITLE>
</HEAD><BODY BGCOLOR="WHITE">
END
43
Monday, March 14, 2011
2 of 4
# DATABASE INFORMATION
$DB="FIFISDATABASE";
$HOST="GSLIS.SIMMONS.EDU";
$USERID="SCOTT";
$PASSWD="TIGER";
$CONNECTIONINFO="DBI:MYSQL:$DB;$HOST";
# MAKE CONNECTION TO DATABASE
$DBH = DBI->CONNECT(
$CONNECTIONINFO,$USERID,$PASSWD);
44
Monday, March 14, 2011
3 of 4
# PREPARE AND EXECUTE QUERY
$QUERY = "SELECT * FROM PEOPLE WHERE AGE > 30
ORDER BY NAME";
$STH = $DBH->PREPARE($QUERY);
$STH->EXECUTE();
# ASSIGN FIELDS TO VARIABLES
$STH->BIND_COLUMNS(\$ID, \$NAME, \$AGE);
# OUTPUT NAME LIST TO THE BROWSER
PRINT "NAMES IN THE PEOPLE DATABASE:<P>\N";
PRINT "<TABLE>\N";
WHILE($STH->FETCH()) {
PRINT "<TR><TD>$NAME<TD>$AGE\N";
}
45
Monday, March 14, 2011
4 of 4
PRINT "</TABLE>PRINT "</BODY>\N";
PRINT "</HTML>\N";
$STH->FINISH();
# DISCONNECT FROM DB
$dbh->disconnect;
database
46
Monday, March 14, 2011
By the way, 3
✤
✤
✤
✤
✤
✤
A few things from practice: query blocks
SELECT DISTINCT * FROM cats WHERE cat.name IN (SELECT
pets.name FROM pets) is the same as SELECT DISTINCT cats.*
FROM cats, pets WHERE cats.name=pets.name
BUT note the nested select statement: this is often a solution but isn’t
considered a best practice.
✤ SELECT * FROM cats WHERE cats.name IN (SELECT
DISTINCT pets.name FROM pets)
COUNT() may not always work correctly.
Some folk recommend avoiding DISTINCT if duplicates are acceptable or
if the answer set contains a key
Minimize the use of GROUP BY and HAVING, e.g.,
✤ SELECT MIN(pets.age) FROM pets GROUP BY pets.idno
HAVING pets.idno=‘100’
✤ SELECT MIN(age) FROM staff WHERE staff.idno=‘100’;
47
Monday, March 14, 2011
Next steps...
✤
Ensure your documentation is on your website
✤
Finalize your SQL statements that create the data views you want
✤
Post the statements on your page (after you’ve tested ‘em, of course!)
✤
Next class we take your practiced statements and create “Prepared
Statement” objects out of them and integrate them into a web-enabled
rdbms.
✤
Comments on your work to be emailed individually shortly.
48
Monday, March 14, 2011