* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Essig Museum Inventory and Loans Database
Data Protection Act, 2012 wikipedia , lookup
Versant Object Database wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Clusterpoint wikipedia , lookup
Forecasting wikipedia , lookup
Information privacy law wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
Data vault modeling wikipedia , lookup
Introduction to Data Management and Relational Databases Outline of presentation Data Management Compare database Versus spreadsheets, word processor docs, … Relational Databases Parts & Terms tables, forms, queries, reports (we’ll skip reports) fields, records, keys Relationships Linking Tables 1 1 2 3 2 3 A B 1 1 2 Mature Forest 2 3 3 A Old Growth C 1 2 2 2 3 A 2 3 B 1 2 3 C B 1 1 1 Clear Cut 3 3 C Data Management Issues Organization! Data Entry (error-prone process) Quality Control – Quality Assurance Metadata (possible data values, how collected, etc.) Tracking specimens, samples Data retrieval Spreadsheet vs. Relational Databases Relational Database Data entry Data storage Data retrieval Spreadsheet Manipulating Data (eg. Pivot tables) Summarizing & Presenting Data (eg. graphing) (Formatting data for statistics programs) ? Embedded Information Spreadsheets = “flat files” Databases = “multi-dimensional” A look at spreadsheets It is possible to sort and filter records in the spreadsheet (look under DATA in the menu bar). Filtering temporarily removes all unwanted records from view. This is also possible in a database, with some more sophisticated options available. It is very easy to copy cells or entire rows of data in spreadsheets, but more difficult in databases (one of the few advantages of spreadsheets over databases). However, if one needs to copy-down a lot of data, then the database is not well “normalized” (discussed later). It is easy to search for and replace words in spreadsheets. This is also possible in databases, but with more sophisticated search and replace options. The “auto-fill” option in a spreadsheet completes a word it recognizes from entries immediately above the current one. In databases one can use a “lookup table” (discussed later) for a full list of values (eg. names), which might not yet appear in the data set. Word processor files are the least capable of filtering, finding & replacing, and assisting data entry compared with spread sheets and databases. Although with proper formatting a word processor document can look like a spreadsheet or database table, one cannot manipulate the rows and columns in the same way. Archiving Data As an aside, the best form to archive data in is tab delimited (.txt) or comma separated values (.csv) text files Although programs and formats come and go, all database, spreadsheet, and word processor programs know how to handle .txt and .csv files Comparing databases, spreadsheets, and documents Sort Filter Replace Fill down Auto-complete Whole row Link tables Database Yes Yes Yes No3 Yes1 Yes Yes Spreadsheet Yes Yes Yes Yes Yes1 Yes No2 1 Auto complete is done in very different ways 2 Not linking in a true relational sense, except through a database 3 Properly set up (normalized) data can be back-filled Document Yes No Yes No No Yes No Tabulations Matrix-style synopsis of data “crosstab query” in MS Access “pivot table” in MS Excel ? Relational Databases Four major components Tables – these are where ALL data reside Queries – select subsets of data (retrieve data) Forms – “windows” into data tables (views of data) Reports – summaries of data (formatted synopses) Tables All data in relational databases reside in tables. Queries, forms, and reports are just convenient ways of looking at the data in the tables. As we shall soon see, the sizes and types of data that can be entered into a table can be regulated for better efficiency and error-proofing. And two or more tables that have a field in common can be linked to draw information from all related tables. Some terminology: Each square is a “cell” of data Columns = “Fields” Rows = “Records” To change between datasheet view and design view Another way to enter design view is to click on the table name once (so that it is highlighted), then click on the design view icon. Or right-click the table name and choose design view. The DESIGN VIEW of a table is where one dictates the type and range of data that can be entered into each field. This can include formatting (such as capitalization), default values, and valid/non-valid entries. “Lookup” is used to create a list of possible values that a field can take. This example uses a list of values in the field’s properties settings (in DESIGN VIEW). In DATA VIEW the field will have a drop down list of values (“Combo Box”). The full value will be filled in when the first letter is typed. In this example the “lookup” is set to the list of species codes in the table “Species” Miscellaneous New records are always added at the end of the table (many people find this annoying) “Esc” once to undo current typing “Esc” twice to undo the whole record Changes are saved when you move off the cell No need to save the data in a database after any changes (formatting changes must be saved) ? Linking Tables Fields common b/w two or more tables can link Keyed fields prevent duplicate entries Keyed fields determine relationships b/w tables Linked tables can reduce data entry and storage needs (using an idea called data normalization) Keys and relationships A keyed field is one that does not allow repeated values. For example, if the field “Code Name” is keyed in a table, then the user would not be allowed to enter the same Code Name more than once (an error, “key violation” would appear). In this way, one constructs a list of unique values (eg. Code Names). ? One-to-many relationship Because each Species Code is unique in the “keyed” Species table, and can be repeated many times in the Collections table, a “oneto-many” relationship is created between the two (indicated by the “1” and infinity). Referential integrity means that a Species Code cannot be entered into the Collections table if it is not in the Species table. Cascade Update allows one to change the species code once and propagate that change through Collections table. This is the “relationships view” of the database that allows the user to define which tables are linked and how. Keyed fields are in bold. Cascade Delete deletes that species code in all tables that are connected. Use this feature cautiously. One-to-one relationship? Normalization Identical data should not be repeated Fields with records that are repeated should be pulled out into a separate table and referenced when needed ? Forms “Windows into the tables” (which really contain the data). Forms allow the user to customize/arrange fields on the screen for ease of data entry and retrieval. Forms also allow the user to view several linked tables in one display. Forms display all the records in the underlying tables as opposed to subsets (see queries for this). Data can be entered and updated in forms. Forms can (should) look like field collection sheets Forms can combine several tables in one view (this one has five) A form with a subform (ex: Loans Masterform) Here two tables are combined in one display subform Using Forms: Create data entry forms that look like field sheets Viewing multiple related tables in one view The easiest way to create forms in MS Access is to use the “form wizard” Select first table to be in form Select fields to be in form Select next table to be in form Select fields to be in form … Select style of query to create Once the form is created, go to Design View to modify the size and position of fields in the display ? Queries Queries are used to ask questions of the data. That is, to select subsets of records based on particular search criteria. The real power of relational database queries is that data can be summarized across tables that are linked by common fields. Queries Used to select subsets of data or combinations of data from linked tables Two or more tables may be combined in one query using linked fields Data can be entered and changed (changes will propagate through the underlying tables) Queries can also be used to append one table to another, update records given certain criteria, or make new tables for exporting as spreadsheets or documents Creating Queries Queries are easily created in Design View Click “Create query in design view” Select tables to be in the query Select the fields to be in the query Type in any criteria to search for in each field Set any sorting preferences Press run button (!) Review Steps for creating a relational database 1 1 2 3 Step 1: Nested Data Structure 2 3 A Mature B 1 1 2 3 3 Old Growth 2 A C 1 2 2 2 3 A 2 3 B 1 2 3 C B 1 1 1 Clear Cut 3 3 C Step 2: Sketch-out tables Table 1 Trap Plot Site Date Species Genus Family Order Quantity Collector Table 2 Dissolved Oxygen Dissolved Nitrogen pH Temperature Plot Site Collector Step 3: Data Normalization Look for fields with repeated records Consider pulling these fields out into a separate table that links back to the first table Set up look up tables or look up values Designate Keyed fields These are fields with non-repeated values Keys can be sequential numbers, or unique codes Step 4: Establish one-to-many links Step 5: Set up forms and queries Set up forms for easier data entry Set up queries to retrieve specific data sets ? http://nature.berkeley.edu/~poboyski/download/