Download lecture 2 - tables and relationships

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
RELATIONS/TABLES
This weeks lecture will look at how database
tables/relations work. Last week we identified how we
could identify the possible tables and contents and this
week we are taking this in more detail, looking at how a
table/relation works (set theory) and then looking at how
this set approach relates to the way relational databases
are structured.
Objectives of the Lecture :
SET THEORY
The relational model that underpins the relational database is based
on mathematical sets. We looked at tables already containing
collections of related data (patient details, student rows,
programmes, etc)
Definition : “A set is a collection of things”.

A set has two main characteristics :


It has no sequence or ordering.
It has no duplicates.
Example :
{ 2, 7, 5, 4 }  { 7, 5, 4, 2 }
They are the same set because sets have no order.
Example :
{ 2, 2, 7, 5, 4, 8, 8 }
is not a set because it has duplicates.
A database table/relation should hold data in some order where the
value of the data is NOT dependent on the order it is retrieved in.
RELATION/TABLE = SET OF TUPLES/ROWS
A
B
C
First Name
Sam
Surname
Vines
Location
Ankh-Morpork
Job
Watch
Commander
Marital Status
Married
William
De Worde
Ankh-Morpork
Single
Jason
Esmeralda
Thomas
Ogg
Weatherwax
Silverfish
Lancre
Lancre
Ankh-Morpork
Newspaper
Editor
Blacksmith
Witch
Alchemist
First Name
Sam
Surname
Vines
Location
Ankh-Morpork
Marital Status
Married
Esmeralda
Thomas
William
Weatherwax
Silverfish
De Worde
Lancre
Ankh-Morpork
Ankh-Morpork
Jason
Ogg
Lancre
Job
Watch
Commander
Witch
Alchemist
Newspaper
Editor
Blacksmith
First Name
Sam
Surname
Vines
Location
Ankh-Morpork
Esmeralda
Thomas
William
Weatherwax
Silverfish
De Worde
Ankh-Morpork
Ankh-Morpork
Lancre
Jason
Ogg
Ankh-Morpork
Job
Watch
Commander
Witch
Alchemist
Newspaper
Editor
Blacksmith
Married
Single
Single
Single
Single
Single
Married
Marital Status
Married
Single
Single
Single
Married
Table A and B are the
same because even
though the data is in a
different order it is still
the same data in each
table.
Table C is different
because the data in the
location column is
different, this changes
the data on a row by
row comparison.
RELATION - NO DUPLICATE TUPLES
The data in the relation/table should NOT hold any duplicate,
duplicated data can lead to errors in any calculation.
There are ways to ensure that each row of data is unique by
having one column/attribute in the table that is unique to each
row (an extra column/attribute can be added if required)
•student row number
•NHS number
•NI number
•etc.
Alternative is to have an attribute/column that indicate
multiple rows for example a ‘quantity’ attribute
CONSIDER THE FOLLOWING ....
Product
Brand
Size
Price
Ripple
Galaxy
Single unit
£0.79
Malteesers
Nestle
300g
£1.55
Malteesers
Nestle
100g
£.066
Ripple
Galaxy
Single unit
£0.79
Ripple
Galaxy
Single unit
£0.79
Sour mix
Harribo
200g
£1.00
The above data has duplicated rows, we need to remove duplicates but we
can’t just delete the rows because we need to consider the nature of the data. If
the data relates to the products a shop sells then we may be able to simply
remove the duplicates because retaining a single row will still reflect that the
store sells ripple bars but if the tables relates to stock then to remove the
duplicates will change the data and make it invalid, the store will need to know
they have 3 ripples in stock.
SOLUTION ....
Product
Brand
Size
Price
Quantity in
stock
Ripple
Galaxy
Single unit £0.79
3
Malteesers
Nestle
300g
£1.55
1
Malteesers
Nestle
100g
£.066
1
Sour mix
Harribo
200g
£1.00
1
Product
Brand
Size
Price
Ripple
Galaxy
Single unit £0.79
Malteesers
Nestle
300g
£1.55
Malteesers
Nestle
100g
£.066
Sour mix
Harribo
200g
£1.00
The way we
handle the
duplicated data
depends on the
nature of the data
and how we are
using it. A
database
developer must
understand the
data they are
working with and
its use.
CONVERTING DATA ELEMENTS TO RELATIONS/TABLE
This section will be covered in more detail throughout the
module as your understanding develops more complexity to
this procedure (known as normalisation). At a basic level,
converting the data gathered in analysis:
1. Data elements are collected and ‘grouped’
• Look at related data, cluster things
2. Identify the data types that are needed and consider what
size they should be
• Varchar, numbers, dates, 50 characters etc
3. Do you need extra data to give meaning?
• Analysis should indicate if the data currently held is
sufficient or if more are needed
4. Does the data need to be given more granularity?
• Name = Surname, forename, middle initial etc
EXAMPLE
A company currently takes orders over the phone or by sales staff, the
orders are noted on paper order forms and these are used to raise
picking notes, invoices etc. The company wants to put these orders into
a database so that they can use the data to analyse sales and do some
sales forecasting and target marketing (send relevant promotions to
interested customers rather than a blanket mail shot). The paper system
records the following information:
Order number, order line, invoice address, delivery address, customer
name, customer id, cost of order, delivery cost, vat, discount code,
additional notes, picking note id, pallet number, warehouse id, checked
by, product details, product id, unit cost, quantity, line cost, (these are
repeated values for each item on order)
EXAMPLE CONTINUED – GROUP DATA
Order number, order line, invoice address, delivery address, customer name, customer id,
total cost, delivery cost, vat, discount code, additional notes, picking note id, pallet number,
warehouse id, checked by, product details, product id, unit cost, quantity, line cost, (these are
repeated values for each item on order)
The elements above come from different documents (invoice & order form
for simplification) the source documents can be used to indicate possible
relations/tables
Order customer name, customer id, delivery address, product details,
product id, unit cost, quantity, line cost, VAT, discount code, total cost,
invoice address
Invoice customer name, customer id, product details, product id, unit cost,
quantity, line cost, VAT, discount code, total cost, invoice address
Note: that some data is duplicated and the data seems more cumbersome
than necessary (do we need the customer name and number?)
EXAMPLE CONTINUED – RELATE THE DATA
Order customer name, customer id, delivery address, product details, product id,
unit cost, quantity, line cost, VAT, discount code, total cost, invoice address
Invoice customer name, customer id, product details, product id, unit cost, quantity,
line cost, VAT, discount code, total cost, invoice address
1. Consider the data, instead of duplicating the data we could include a
reference to where the data is, if the invoice data has the order data on
it why not simply hold something that links to the order row?
2. The order data has 2 elements, there is the information that relates to
the order in general and there is data that is repeated, the order data
therefore breaks down into the order header info and the order line
detail
3. The above changes mean that we have all the data held but the
structure of the data is change from the paper systems to a more
RELATIONAL format.
EXAMPLE ....
1. To relate the data we need to add a reference element
2. Breaking the data down gives a greater granularity
3. Consider renaming elements so it is clear which table they belong to
(the reason for this is apparent in later elements)
The outcome of the relations/table could result in:
OrderHeader (ord_id, ord_date, ord_custid, ord_value, ord_vat,
ord_deladdr)
Orderline (OL_ordrid (f), OL_ordline_no, OL_prod, OL_quant,
OL_unitprice)
Invoice (In_vid, inv_date, inv_orderid (f), inv_custid (f), inv_comments)
Customer ????
•These are not definitive tables but you should see how they evolve.
•Each of these relations/tables SHOULD have a unique element that
prevents duplication of data (PRIMARY KEY) if a relation references
another relation there needs to be a referencing element within the relation
(FORIGN KEY (f))
EXAMPLE – LOOKING AT RELATIONSHIPS
customer
Order
Orderline
Invoice
An order has 1 or more
order lines (you may
order many different
product)
Each order will have one
or more invoices
associate with them
(think about when you
order something from
Amazon, dispatched in a
number of parcels)
If you have added a
customer relation then a
customer can place
many orders but each
order is only associated
with one cusotmer
OVERVIEW
1. Regardless of the data that is currently in use within an
organisation’s paper system, data held in a database
should not be duplicated.
2. Data should be re-organised so that duplications are
removed, links between relations are identified so that the
chain of data is clear and accessible.
3. Additional elements that are not in the paper system will
need to be added to make a computer based system
4. The translation from paper systems to computer/database
systems require extensive understanding of the way the
data is to be used
5. Each row in a relation should have a unique identifier
6. Related relations must have common attributes having the
same data types and sizes but not necessarily same
names.
DO FOR NEXT WEEK!
1. Students should read through the notes on BB and these
slides
2. Write up the notes you have made so that you can
understand them when you come to review and revise them
3. Ensure that you can follow the example gone through today
and ask yourself the following questions:
• Do you understand why the information that is held on
the paper system CANNOT simply be transferred onto a
database system
• Why does each row have to be unique and how do you
ensure it is unique?
• Why do we break the data into smaller subsets
(orderheader and orderline tables/relations)
• Why do we ‘add’ extra attributes