Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction to databases
Bent Thomsen
Who uses databases?
• Everybody!
– Every time you use your credit card, you
(indirectly) use a database
– When you book a flight you use a database
– When you take out insurance or a mortgage,
you use a database
– When you contact public services
– When you buy something online
Classes of database users
• Workers on the scene
– End user: The person who makes queries,
updates and extracts reports
– Database administrator: The person
responsible for managing the database system
– Database designer
– Application programmer: Design and
implement transactions for end-users
Database users (cont.)
• Workers behind the scene
– Database designers and implementers
– Tool developers
– Operators and maintenance personnel
• Run and maintain the hardware and software
Types of databases
• Large database systems
– Oracle, DB2, Informix, Ingres, MS SQLServer
– Huge amounts of data – high throughput
• Small(ish) database systems
–
–
–
–
Excell DataLists
MySQL
MS Access
Small companies, clubs, scientific data
When may you need/want to use a
database?
• To access data relevant to your project
– E.g. KMS database of roads in Denmark
• To store and retrieve data from scientific
experiments
• To store, organize and retrieve data from
questionnaires
• To organize your reference library
• To manage the membership of your sports club
Concepts of Database
– A database is a collection of records that are organized for
a particular purpose
– A student record book with addresses and grades
– An inventory with software, books, and hardware
– A field is a basic fact (or data element).
• Name, address,…
– A record is a set of fields.
• A student, a book
– A table is a set of records.
– A database consists of one or more tables.
– A primary key is a field that identify a single record.
• In the primary key field, every record has a unique number.
(Student ID Number)
Parts of a database
Record
Tables
Attribute/Field
• Records become “rows”
• Attributes/fields become “columns”
• Rules determine the relationship
between the tables and tie the data
together to form a database
Creating a database
• What information are we trying to store?
• How do we describe the information?
• Phone Book/Contact entries
–
–
–
–
–
–
–
–
–
Name
Address
Company
Phone Number
URL/Web Page
Age
Height (in meters)
Birthday
When we added the entry
Data Types
• Binary
– Database specific binary objects
– Pictures, digital signatures, etc.
• Boolean
– True/False values
• Character
– Fixed width or variable size
• Numeric
– Integer, Real (floating decimal point), Money
• Temporal
– Time, Date, Timestamp
Phone Book/Contact Record
Name
Address
Company
Phone Number
URL/Web Page
Age
Height
Birthday
When we added the entry
Character
Character
Character
Character
Character
Integer
Real (float)
Date
Timestamp
An example
Contacts
Name
Company
Address
Phone1
Phone2
Phone3
ZipCode
Joe
ABC
123
5532
2234
3211
12345
Jane
XYZ
456
3421
Chris
PDQ
789
2341
14454
6655
14423
Example (cont.)
Contacts
Id
Name
Company
Address
Phone
ZipCode
1
Joe
ABC
123
5532
12345
1
Joe
ABC
123
2234
12345
1
Joe
ABC
123
3211
12345
2
Jane
XYZ
456
3421
14454
3
Chris
PDQ
789
2341
14423
3
Chris
PDQ
789
6655
14423
Could be a DataList table in Excell!
What is a relational database?
• Originally developed by E.F. Codd in 1970
• Organizes data into tables where each item
is a row and the attributes of the item are in
columns.
• Different from “flat file” databases because
you can define “relationships” between
items in different tables.
A non-relational database
STAFF
Manager
Phone Location Sex
Name
DOB
Role
Alan Spencer
5/2/55
Administrator Colin Morton 4242
G23
M
Sarah Supton
3/12/70 Administrator Alan Spencer 2487
G23
F
Leanne Thurr
3/1/71
Teaching
Assistant
Colin Morton 9210
111
F
Colin Morton
1/1/40
Dean
-
4297
323
M
A relational version
RoleID
1
2
3
ROLES
Title
Administrator
Teaching Assistant
Dean
STAFF
AnnualLeave
1
3
90
LOCATIONS
LocationID LocationName
01
G23
02
111
03
323
ID
Firstname Lastname
Role Manager DOB
Phone Location Sex
1
2
3
4
Alan
Sarah
Leanne
Colin
1
1
2
3
4242
2487
9210
4297
Spencer
Supton
Thurr
Morton
4
1
4
4
5/2/55
3/12/70
3/1/71
1/1/40
01
01
02
03
M
F
F
M
Kinds of Relationships
• “One to One”
– One row of a table matches exactly to another
• One person, one id number, one address
• “One to Many”
– One row of a table matches many of another
• One person, many phone numbers
• “Many to Many”
– One row may match many of another or many
rows match one row of another
Normal forms
• E. F. Codd in 1972 wrote a paper on
“Further Normalization of the Data Base
Relational Model”
• Normal forms reduce the amount of
redundancy and inconsistent dependency
within databases.
• Codd proposed three normal forms and
through the years two more have been
added.
Summarized from Barry Wise’s article on Database
Normalization
http://www.phpbuilder.com/columns/barry20000731.php3?page=1
The Zero Form
• No rules have
been applied
• Where most
people start
(and stop)
• No room for
growth
• Usually wastes
space
Contacts
Name
Company
Address
Phone1
Phone2
Phone3
ZipCode
Joe
ABC
123
5532
2234
3211
12345
Jane
XYZ
456
3421
Chris
PDQ
789
2341
14454
6655
14423
First Normal Form
• Eliminate repeating
columns in each
table
• Create a separate
table for each set of
related data
• Identify each set of
related data with a
primary key
Contacts
Id
Name
Company
Address
Phone
ZipCode
1
Joe
ABC
123
5532
12345
1
Joe
ABC
123
2234
12345
1
Joe
ABC
123
3211
12345
2
Jane
XYZ
456
3421
14454
3
Chris
PDQ
789
2341
14423
3
Chris
PDQ
789
6655
14423
Benefits: Now we can have infinite phone numbers or
company addresses for each contact.
Drawback: Now we have to type in everything over and over
again. This leads to inconsistency, redundancy and wasting
space. Thus, the second normal form…
Second Normal Form
People
• Create separate tables
for sets of values that
apply to multiple
records
• Relate these tables
with a “foreign key”.
Id
Name
Company
Address
Zip
1
Joe
ABC
123
12345
2
Jane
XYZ
456
14454
3
Chris
PDQ
789
14423
PhoneNumbers
PhoneID
Id
Phone
1
1
5532
2
1
2234
3
1
3211
4
2
3421
5
3
2341
6
3
6655
Third Normal Form
• Eliminate fields that
do not depend on the
primary key.
PhoneNumbers
PhoneID
Id
Phone
1
1
5532
2
1
2234
3
1
3211
4
2
3421
5
3
2341
6
3
6655
People
Id
Name
AddressID
1
Joe
1
2
Jane
2
3
Chris
3
Address
AddressID
Company
Address
Zip
1
ABC
123
12345
2
XYZ
456
14454
3
PDQ
789
14423
Is this enough? Codd thought so…
What about “many to many”?
•
Fourth
Normal
Form
In a “many to many” relationship,
independent entities cannot be
stored in the same table.
PhoneNumbers
People
PhoneID
Phone
1
5532
2
2234
Id
Name
AddressID
3
3211
1
Joe
1
4
3421
2
Jane
2
5
2341
3
Chris
6
3
Address
6655
PhoneRelations
PhoneRelID
Id
PhoneID
1
1
1
AddressID
Company
Address
Zip
2
1
2
1
ABC
123
12345
3
1
3
2
XYZ
456
14454
4
2
4
3
PDQ
789
14423
5
3
5
6
3
6
Fifth Normal Form
• The “very esoteric” one that is probably not
required to get the most out of your database.
• “The original table must be reconstructed from the
tables into which it has been broken down.”
• The rule ensures that you have not created any
extraneous columns and all the tables are only as
large as they need to be.
The Normal Forms
• First Form
– Eliminate replicated data in tables
– Create separate tables for each set of related data
– Identify each set of related data with a primary key
• Second Form
– Create separate tables for sets of values that apply to multiple
records
– Relate the tables with a foreign key
• Third Form
– Eliminate fields that do not depend on the primary key
• Fourth Form
– In many-to-many relationships, independent entities cannot be
stored in the same table
Why normalize?
•
•
•
•
Increases the integrity of the data
Reduces redundancy
Improves efficiency
Although normalization can be hard, it is
worth it in the long run.
What do I need to remember?
• Keep normalization in mind.
• Don’t replicate data in a table.
• If you break the rules, know why you are
breaking the rules and do it for a good
reason.
All you need to know about SQL
in 30 minutes (or less)
SQL = Structured Query Language
Basic SQL Commands
•
•
•
•
•
•
Creating tables with CREATE
Adding data with INSERT
Viewing data with SELECT
Removing data with DELETE
Modifying data with UPDATE
Destroying tables with DROP
Creating tables with CREATE
• Generic form
CREATE TABLE tablename (
column_name data_type attributes…,
column_name data_type attributes…,
…
)
• Table and column names can’t have spaces or be
“reserved words” like TABLE, CREATE, etc.
Phone Book/Contact Record
Name
Address
Company
Phone Number
URL/Web Page
Age
Height
Birthday
When we added the entry
Character
Character
Character
Character
Character
Integer
Real (float)
Date
Timestamp
Phone Book/Contact Table
CREATE TABLE contacts (
Name
VARCHAR(40),
Address
VARCHAR(60),
Company
VARCHAR(60),
Phone
VARCHAR(11),
URL
VARCHAR(80),
Age
INT,
Height
FLOAT,
Birthday
DATE,
WhenEntered
TIMESTAMP
);
Plan your tables very carefully!
Once created, they are difficult to change!
Phone Book/Contact Table
CREATE TABLE contacts (
ContactID
INT PRIMARY KEY,
Name
VARCHAR(40),
Address
VARCHAR(60),
Company
VARCHAR(60),
Phone
VARCHAR(11),
URL
VARCHAR(80),
Age
INT,
Height
FLOAT,
Birthday
DATE,
WhenEntered
TIMESTAMP
);
If you are going to use the relational nature of a database,
don’t forget you need to have a unique way to access records!
There is a way to make the key automatically increment,
so you don’t have to worry about which one is next.
Adding data with INSERT
• Generic Form
INSERT INTO tablename (column_name,…)
VALUES (value,…)
Inserting a record into ‘contacts’
INSERT INTO contacts
(contactid,name,address,company,phone,url
,age,height,birthday,whenentered)
VALUES
(1,‘Joe’,’123 Any St.’,’ABC’,
’800-555-1212’,‘http://abc.com’,30,1.9,
’6/14/1972’,
now());
Inserting a partial record
INSERT INTO contacts
(contactid,name,phone)
VALUES (2,’Jane’,’212-555-1212’);
Viewing data with SELECT
• Generic Form
SELECT column,… FROM table,…
WHERE condition
GROUP BY group_by_expression
HAVING condition
ORDER BY order_expression
• The most used command
• Probably the most complicated also
• If used improperly, can cause very long waits
because complex computations
A few simple SELECTs
• SELECT * FROM contacts;
– Display all records in the ‘contacts’ table
• SELECT contactid,name FROM contacts;
– Display only the record number and names
• SELECT DISTINCT url FROM contacts;
– Display only one entry for every value of URL.
Refining selections with WHERE
• The WHERE “subclause” allows you to
select records based on a condition.
• SELECT * FROM contacts
WHERE age<10;
– Display records from contacts where age<10
• SELECT * FROM contacts
WHERE age BETWEEN 18 AND 35;
– Display records where age is 18-35
Additional selections
• The “LIKE” condition
– Allows you to look at strings that are alike
• SELECT * FROM contacts
WHERE name LIKE ‘J%’;
– Display records where the name starts with ‘J’
• SELECT * FROM contacts
WHERE url LIKE ‘%.com’;
– Display records where url ends in “.com”
Removing data with DELETE
• Generic Form
DELETE FROM table WHERE condition;
DELETE FROM contacts WHERE age<13;
Modifying data with UPDATE
• Generic Form
UPDATE table SET column=expression
WHERE condition;
UPDATE contacts SET company=‘AOL’
WHERE company=‘Time Warner’;
Destroying tables with DROP
• Generic Form
DROP TABLE tablename;
DROP TABLE contacts;
More about SELECT
“Normal Forms” and SELECT
• Good database design using the normal
forms requires data to be separated into
different tables
• SELECT allows us to join the data back
together
• We can use “views” to create virtual tables
Joining together tables
•
SELECT name,phone,zip FROM
people, phonenumbers, address
WHERE
people.addressid=address.addressid
AND people.id=phonenumbers.id;
PhoneNumbers
PhoneID
Id
Phone
1
1
5532
2
1
2234
3
1
3211
4
2
3421
5
3
2341
6
3
6655
People
Id
Name
AddressID
1
Joe
1
2
Jane
2
3
Chris
3
Address
AddressID
Company
Address
Zip
1
ABC
123
12345
2
XYZ
456
14454
3
PDQ
789
14423
General form of SELECT/JOIN
SELECT columns,…
FROM left_table
join_type JOIN right_table ON condition;
SELECT name,phone FROM people
JOIN phonenumbers ON
people.id=phonenumbers.id;
ORDER BY
• The “ORDER BY” clause allows you to
sort the results returned by SELECT.
SELECT * FROM contacts
ORDER BY company;
SELECT * FROM contacts
ORDER BY company, name;
Views
• You can use “CREATE VIEW” to create a
virtual table from a SELECT statement.
CREATE VIEW contactview AS
(SELECT name,phone,zip FROM
people,phonenumbers,address
WHERE people.id=phonenumbers.id AND
people.addressid=address.addressid);
Finishing off
• SQL is a large language
–
–
–
–
You have seen some basic commands
Takes time and effort to learn
Mainly needed for powerusers
Lots of stuff we haven’t talked about
• E.g. security, multiple users, program accessing DB, …
– Many database systems now use graphical user
interfaces instead of SQL directly