Download Lecture 1 - Michigan State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Versant Object Database wikipedia , lookup

Database wikipedia , lookup

Data vault modeling wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
What is a database?
An organized collection of data. This can be in an electronic, paper, or other format.
Types of databases
Operational - constantly changing because entries are dynamic. Example is
customer purchases and inventory control database
Analytical -
once data are collected, they remain static. This is typical of
scientific databases
Legacy -
Also known as inherited database. Created by someone else
Created-
My own term for a database you create
Derived-
Database you create by importing another database
Flat file databases
This is commonly the way we first view “databases”. Spreadsheets, word processing
documents or simple ASCII files are common examples
Flat file databases
This is commonly the way we first view “databases”. Spreadsheets, word processing
documents or simple ASCII files are common examples
Order ID
Order
Date
Ship
Date
Sales Rep
Customer
Item 1
Quantity
Item 2
Quantity
….
1
10 May,
2003
11 May
2003
Jim
MSU
Plankton
Splitter
1
Ekman
Dredge
1
….
2
May 11,
2003
11 May
2003
Jim
Michigan State
Ekman
Dredge
2
Plankton
Splitter
2
….
3
5/12/2003
11 May
2003
Bill, Jim
M.S.U.
Plankton net
3
….
4
5/12/03
11 May
2003
Jim, Bill
That other
school in Ann
Arbor
Zooplankton
net
1
….
This example shows a lot of problems. For example,
-Very constrained - only two items allowed per order
-Lacks ability to search easily (e.g., finding a specific item ordered is difficult
and not always robust)
-Lacks database integrity. For example, MSU is not represented consistently
The first and most critical concept is that of a relational database
where the data are stored in multiple tables when necessary
Associated with this is the key idea that the data may be stored in a
different format than how we view the data
We will get back to these ideas again (probably more often than you
would like!)
Overview of Database Design Process
1.
2.
3.
4.
5.
6.
7.
Goals and objectives for database
Analyze current database
Create data structure
Establish table relationships
Define business rules
Establish views
Review data integrity
One of the key points is that this is an iterative process –
you may need to go back to earlier steps if you find problems
Example of Database Design Process
-Introduction to example data set
1. Goals and objectives
Goal is to be able to determine the catch and size distribution of individual fish
species at specific sites or groups of sites in our research program.
We also want to be able to describe habitat conditions at these sites and
relate them to the fish catches
Objectives:
1. To be able to compute catch per effort for each species at individual sites,
and for the above barrier sites and for the below barrier sites as a group
2. To be able to compute mean size for each species at individual sites,
and for the above barrier sites and below barrier sites
3. ...
2. Analyze current database
In this case, we have data sheets already filled in, so we will use this to
analyze our current (paper) database
Begin by describing how data are collected. During this process, focus on
units of observation (entities) or sampling events, and descriptions
or measurements.
Create list of all variables (attributes), entities and events
Associate every variable with one or more entity or event
Water flow
1
3
2
1
2
Barrier
Within a site
Transect 2
Transect 1
Width, Depth,
50 substrate
particles
Transect 3
3
Variables
Stream name
Fish species caught
Fish length
Sample date
Position (Above or Below Barrier)
Treatment or Reference Stream
Segment ID number (=site)
Length of segment
Crew members
Conductivity
Water Temperature
Weather Conditions
Water Conditions
Transect width
Transect depth
Transect ID number
Particle size
Entities or Events
Shocking
Habitat
Refinements
Variables
Entities or Events
Stream name
Shocking
Fish species caught
Habitat
Fish species caught (Common name,
Streams
scientific name, family)
Transects
Fish length
Substrate
Sample date Year, Month, Day
Position (Above or Below Barrier)
Treatment or Reference Stream
Segment ID number (=site)
Length of segment
Crew members (always three)
Conductivity
Water Temperature
Weather Conditions (Cloud Cover, Precipitation)
Water Conditions (Water color, Water height)
Transect width
Transect depth
Transect ID number
Particle size
From this preliminary set of entities and descriptors,
develop preliminary list of tables and fields
TABLES- contain information on a particular entities or events
FIELDS - describe the attributes of entities or events
RECORD- contains the information or data on an individuals
Characteristics of a “Good” Field
• It represents a characteristic of the subject of the table
• It contains only a single value (e.g., if had two instructors for a
course, the instructor field should not contain both names). This is
in contrast to MULTIVALUED FIELDS.
• It can not be broken down into smaller components (e.g., the entire
address for a person can be broken down into street address, city,
state, zip code). This is in contrast to MULTIPART FIELDS.
• It does not contain a calculated value. Fields which are determined
by values in other fields are CALCULATED FIELDS.
• The field is unique within the database unless it is needed to link
tables
• The field retains all its characteristics if it appears in more than one
table
Characteristics of a “Good” Table
• Each table refers to a single class of entities
or unit of observation or event
• There is a way to uniquely identify each entry
in a table. This is called the PRIMARY KEY.
• It does not contain multipart, multivalued, or
calculated fields.
• It does not contain unnecessary fields, or
unnecessary redundant data
• It contains all of the fields necessary to link it
to other tables you want to link (or relate) it
to
First Cut at Developing Tables
Stream Table
Stream ID
Stream Name
Barrier or Reference
Shocking Event Table
Stream ID
Position (above/below)
Segment
Date
Crew
Segment Length
Conductivity
Water Temperature
Weather
Habitat Transect Table Water Conditions
Stream ID
Transect number
Width
Depth
???Substrate???
Fish Table
Stream ID
Position (above/below)
Fish name
Length
Total Catch
Refinements to Tables
Stream Table
Stream ID
Stream Name
Barrier or Reference
Shocking Event Table
Stream ID
Sampling Event ID
Position (above/below)
Segment
Date
Crew
Segment Length
Conductivity
Water Temperature
Weather
Water Conditions
Substrate Table
Sampling Event ID
Transect number
Particle ID
Particle size code
Fish Table
Stream ID
Sampling Event ID
Position (above/below)
Fish name
Fish species code
Length
Total Catch
Habitat Transect Table
Stream ID
Sampling Event ID
Transect number
Width
Depth
???Substrate???
Another example: Deer habitat use in SE Michigan
Habitat patches
-size
-cover type
Deer characteristics
-Deer ID
-age
-sex
Telemetry observation
-Year
-Month
-Day
-Time
-Deer ID
-Habitat patch (or lat/lon ?)
Homework
• Develop list of tables and fields for
your database project
• With a partner, go over your list to
determine if each table and field meets
the criteria for being “good”