* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 1 - Michigan State University
Survey
Document related concepts
Transcript
What is a database? An organized collection of data. This can be in an electronic, paper, or other format. Types of databases Operational - constantly changing because entries are dynamic. Example is customer purchases and inventory control database Analytical - once data are collected, they remain static. This is typical of scientific databases Legacy - Also known as inherited database. Created by someone else Created- My own term for a database you create Derived- Database you create by importing another database Flat file databases This is commonly the way we first view “databases”. Spreadsheets, word processing documents or simple ASCII files are common examples Flat file databases This is commonly the way we first view “databases”. Spreadsheets, word processing documents or simple ASCII files are common examples Order ID Order Date Ship Date Sales Rep Customer Item 1 Quantity Item 2 Quantity …. 1 10 May, 2003 11 May 2003 Jim MSU Plankton Splitter 1 Ekman Dredge 1 …. 2 May 11, 2003 11 May 2003 Jim Michigan State Ekman Dredge 2 Plankton Splitter 2 …. 3 5/12/2003 11 May 2003 Bill, Jim M.S.U. Plankton net 3 …. 4 5/12/03 11 May 2003 Jim, Bill That other school in Ann Arbor Zooplankton net 1 …. This example shows a lot of problems. For example, -Very constrained - only two items allowed per order -Lacks ability to search easily (e.g., finding a specific item ordered is difficult and not always robust) -Lacks database integrity. For example, MSU is not represented consistently The first and most critical concept is that of a relational database where the data are stored in multiple tables when necessary Associated with this is the key idea that the data may be stored in a different format than how we view the data We will get back to these ideas again (probably more often than you would like!) Overview of Database Design Process 1. 2. 3. 4. 5. 6. 7. Goals and objectives for database Analyze current database Create data structure Establish table relationships Define business rules Establish views Review data integrity One of the key points is that this is an iterative process – you may need to go back to earlier steps if you find problems Example of Database Design Process -Introduction to example data set 1. Goals and objectives Goal is to be able to determine the catch and size distribution of individual fish species at specific sites or groups of sites in our research program. We also want to be able to describe habitat conditions at these sites and relate them to the fish catches Objectives: 1. To be able to compute catch per effort for each species at individual sites, and for the above barrier sites and for the below barrier sites as a group 2. To be able to compute mean size for each species at individual sites, and for the above barrier sites and below barrier sites 3. ... 2. Analyze current database In this case, we have data sheets already filled in, so we will use this to analyze our current (paper) database Begin by describing how data are collected. During this process, focus on units of observation (entities) or sampling events, and descriptions or measurements. Create list of all variables (attributes), entities and events Associate every variable with one or more entity or event Water flow 1 3 2 1 2 Barrier Within a site Transect 2 Transect 1 Width, Depth, 50 substrate particles Transect 3 3 Variables Stream name Fish species caught Fish length Sample date Position (Above or Below Barrier) Treatment or Reference Stream Segment ID number (=site) Length of segment Crew members Conductivity Water Temperature Weather Conditions Water Conditions Transect width Transect depth Transect ID number Particle size Entities or Events Shocking Habitat Refinements Variables Entities or Events Stream name Shocking Fish species caught Habitat Fish species caught (Common name, Streams scientific name, family) Transects Fish length Substrate Sample date Year, Month, Day Position (Above or Below Barrier) Treatment or Reference Stream Segment ID number (=site) Length of segment Crew members (always three) Conductivity Water Temperature Weather Conditions (Cloud Cover, Precipitation) Water Conditions (Water color, Water height) Transect width Transect depth Transect ID number Particle size From this preliminary set of entities and descriptors, develop preliminary list of tables and fields TABLES- contain information on a particular entities or events FIELDS - describe the attributes of entities or events RECORD- contains the information or data on an individuals Characteristics of a “Good” Field • It represents a characteristic of the subject of the table • It contains only a single value (e.g., if had two instructors for a course, the instructor field should not contain both names). This is in contrast to MULTIVALUED FIELDS. • It can not be broken down into smaller components (e.g., the entire address for a person can be broken down into street address, city, state, zip code). This is in contrast to MULTIPART FIELDS. • It does not contain a calculated value. Fields which are determined by values in other fields are CALCULATED FIELDS. • The field is unique within the database unless it is needed to link tables • The field retains all its characteristics if it appears in more than one table Characteristics of a “Good” Table • Each table refers to a single class of entities or unit of observation or event • There is a way to uniquely identify each entry in a table. This is called the PRIMARY KEY. • It does not contain multipart, multivalued, or calculated fields. • It does not contain unnecessary fields, or unnecessary redundant data • It contains all of the fields necessary to link it to other tables you want to link (or relate) it to First Cut at Developing Tables Stream Table Stream ID Stream Name Barrier or Reference Shocking Event Table Stream ID Position (above/below) Segment Date Crew Segment Length Conductivity Water Temperature Weather Habitat Transect Table Water Conditions Stream ID Transect number Width Depth ???Substrate??? Fish Table Stream ID Position (above/below) Fish name Length Total Catch Refinements to Tables Stream Table Stream ID Stream Name Barrier or Reference Shocking Event Table Stream ID Sampling Event ID Position (above/below) Segment Date Crew Segment Length Conductivity Water Temperature Weather Water Conditions Substrate Table Sampling Event ID Transect number Particle ID Particle size code Fish Table Stream ID Sampling Event ID Position (above/below) Fish name Fish species code Length Total Catch Habitat Transect Table Stream ID Sampling Event ID Transect number Width Depth ???Substrate??? Another example: Deer habitat use in SE Michigan Habitat patches -size -cover type Deer characteristics -Deer ID -age -sex Telemetry observation -Year -Month -Day -Time -Deer ID -Habitat patch (or lat/lon ?) Homework • Develop list of tables and fields for your database project • With a partner, go over your list to determine if each table and field meets the criteria for being “good”