Download Essig Museum Inventory and Loans Database

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Versant Object Database wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data center wikipedia , lookup

Database wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Clusterpoint wikipedia , lookup

Forecasting wikipedia , lookup

SAP IQ wikipedia , lookup

Information privacy law wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction
to
Data Management
and
Relational Databases
Outline of presentation


Data Management
Compare database


Versus spreadsheets, word processor docs, …
Relational Databases

Parts & Terms
tables, forms, queries, reports (we’ll skip reports)
 fields, records, keys



Relationships
Linking Tables
1
1
2
3
2
3
A
B
1
1
2
Mature
Forest
2
3
3
A
Old Growth
C
1
2
2
2
3
A
2
3
B
1
2
3
C
B
1
1
1
Clear Cut
3
3
C
Data Management Issues

Organization!

Data Entry (error-prone process)

Quality Control – Quality Assurance

Metadata (possible data values, how collected, etc.)

Tracking specimens, samples

Data retrieval
Spreadsheet vs. Relational Databases

Relational Database




Data entry
Data storage
Data retrieval
Spreadsheet



Manipulating Data (eg. Pivot tables)
Summarizing & Presenting Data (eg. graphing)
(Formatting data for statistics programs)
?
Embedded Information
Spreadsheets = “flat files”
Databases = “multi-dimensional”
A look at spreadsheets
It is possible to sort and filter records
in the spreadsheet (look under DATA
in the menu bar). Filtering
temporarily removes all unwanted
records from view. This is also
possible in a database, with some
more sophisticated options available.
It is very easy to copy cells or entire
rows of data in spreadsheets, but more
difficult in databases (one of the few
advantages of spreadsheets over
databases). However, if one needs to
copy-down a lot of data, then the
database is not well “normalized”
(discussed later).
It is easy to search for and replace words
in spreadsheets. This is also possible in
databases, but with more sophisticated
search and replace options.
The “auto-fill” option in a spreadsheet
completes a word it recognizes from
entries immediately above the current
one. In databases one can use a
“lookup table” (discussed later) for a
full list of values (eg. names), which
might not yet appear in the data set.
Word processor files are the least
capable of filtering, finding & replacing,
and assisting data entry compared with
spread sheets and databases.
Although with proper formatting a word
processor document can look like a spreadsheet
or database table, one cannot manipulate the
rows and columns in the same way.
Archiving Data


As an aside, the best form to archive data in is
tab delimited (.txt) or comma separated values
(.csv) text files
Although programs and formats come and go,
all database, spreadsheet, and word processor
programs know how to handle .txt and .csv
files
Comparing databases, spreadsheets, and documents
Sort
Filter
Replace
Fill down
Auto-complete
Whole row
Link tables
Database
Yes
Yes
Yes
No3
Yes1
Yes
Yes
Spreadsheet
Yes
Yes
Yes
Yes
Yes1
Yes
No2
1 Auto complete is done in very different ways
2 Not linking in a true relational sense, except through a database
3 Properly set up (normalized) data can be back-filled
Document
Yes
No
Yes
No
No
Yes
No
Tabulations

Matrix-style synopsis of data


“crosstab query” in MS Access
“pivot table” in MS Excel
?
Relational Databases

Four major components




Tables – these are where ALL data reside
Queries – select subsets of data (retrieve data)
Forms – “windows” into data tables (views of data)
Reports – summaries of data (formatted synopses)
Tables
All data in relational databases reside in tables. Queries, forms, and
reports are just convenient ways of looking at the data in the tables. As
we shall soon see, the sizes and types of data that can be entered into a
table can be regulated for better efficiency and error-proofing. And two
or more tables that have a field in common can be linked to draw
information from all related tables.
Some terminology:
Each square is a “cell” of data
Columns = “Fields”
Rows = “Records”
To change between datasheet view and design view
Another way to enter design view is to click on the table name
once (so that it is highlighted), then click on the design view
icon. Or right-click the table name and choose design view.
The DESIGN VIEW of a table is where
one dictates the type and range of data that
can be entered into each field. This can
include formatting (such as capitalization),
default values, and valid/non-valid entries.
“Lookup” is used to create a list of possible values that a field can
take. This example uses a list of values in the field’s properties
settings (in DESIGN VIEW). In DATA VIEW the field will have a
drop down list of values (“Combo Box”). The full value will be
filled in when the first letter is typed.
In this example the “lookup” is set to the list of species
codes in the table “Species”
Miscellaneous

New records are always added at the end of the
table (many people find this annoying)


“Esc” once to undo current typing
“Esc” twice to undo the whole record

Changes are saved when you move off the cell

No need to save the data in a database after any changes
(formatting changes must be saved)
?
Linking Tables




Fields common b/w two or more tables can link
Keyed fields prevent duplicate entries
Keyed fields determine relationships b/w tables
Linked tables can reduce data entry and storage
needs (using an idea called data normalization)
Keys and relationships

A keyed field is one that does not allow
repeated values.

For example, if the field “Code Name” is keyed
in a table, then the user would not be allowed to
enter the same Code Name more than once (an
error, “key violation” would appear). In this way,
one constructs a list of unique values (eg. Code
Names).
?
One-to-many relationship
Because each Species Code is unique in the
“keyed” Species table, and can be repeated
many times in the Collections table, a “oneto-many” relationship is created between the
two (indicated by the “1” and infinity).
Referential integrity means that a Species
Code cannot be entered into the Collections
table if it is not in the Species table.
Cascade Update allows one to change the
species code once and propagate that change
through Collections table.
This is the “relationships view” of
the database that allows the user
to define which tables are linked
and how. Keyed fields are in bold.
Cascade Delete deletes that species code in
all tables that are connected. Use this feature
cautiously.
One-to-one relationship?
Normalization


Identical data should not be repeated
Fields with records that are repeated should be
pulled out into a separate table and referenced
when needed
?
Forms
“Windows into the tables” (which really contain the data). Forms
allow the user to customize/arrange fields on the screen for ease of
data entry and retrieval. Forms also allow the user to view several
linked tables in one display. Forms display all the records in the
underlying tables as opposed to subsets (see queries for this). Data
can be entered and updated in forms.
Forms can (should) look like field collection sheets
Forms can combine several tables in one view (this one has five)
A form with a subform (ex: Loans Masterform)
Here two tables are combined in one display
subform
Using Forms:



Create data entry forms that look like field sheets
Viewing multiple related tables in one view
The easiest way to create forms in MS Access is to use
the “form wizard”






Select first table to be in form
Select fields to be in form
Select next table to be in form
Select fields to be in form …
Select style of query to create
Once the form is created, go to Design View to modify the
size and position of fields in the display
?
Queries
Queries are used to ask questions of the data. That is, to select subsets
of records based on particular search criteria. The real power of
relational database queries is that data can be summarized across
tables that are linked by common fields.
Queries




Used to select subsets of data or combinations of
data from linked tables
Two or more tables may be combined in one
query using linked fields
Data can be entered and changed (changes will
propagate through the underlying tables)
Queries can also be used to append one table to
another, update records given certain criteria, or
make new tables for exporting as spreadsheets or
documents
Creating Queries

Queries are easily created in Design View






Click “Create query in design view”
Select tables to be in the query
Select the fields to be in the query
Type in any criteria to search for in each field
Set any sorting preferences
Press run button (!)
Review

Steps for creating a relational database
1
1
2
3
Step 1: Nested Data Structure
2
3
A
Mature
B
1
1
2
3
3
Old Growth
2
A
C
1
2
2
2
3
A
2
3
B
1
2
3
C
B
1
1
1
Clear Cut
3
3
C
Step 2: Sketch-out tables

Table 1










Trap
Plot
Site
Date
Species
Genus
Family
Order
Quantity
Collector

Table 2







Dissolved Oxygen
Dissolved Nitrogen
pH
Temperature
Plot
Site
Collector
Step 3: Data Normalization

Look for fields with repeated records



Consider pulling these fields out into a separate table
that links back to the first table
Set up look up tables or look up values
Designate Keyed fields


These are fields with non-repeated values
Keys can be sequential numbers, or unique codes
Step 4: Establish one-to-many links
Step 5: Set up forms and queries


Set up forms for easier data entry
Set up queries to retrieve specific data sets
?
http://nature.berkeley.edu/~poboyski/download/