* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Models for Ecological Databases
Survey
Document related concepts
Versant Object Database wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Forecasting wikipedia , lookup
3D optical data storage wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Business intelligence wikipedia , lookup
Transcript
Data Models for Ecological Databases John Porter Department of Environmental Sciences University of Virginia DBMS Types File system-based Hierarchical Network Relational Object-oriented You’ve seen these before, now lets go into more detail File-System Based Directory Files Files Files •very simple and easy to set up •inefficient •few capabilities Hierarchical Project Hierarchical efficient Datasets Investigators not very general Variables Locations e.g. phylogenetic structures Codes Methods geographical images Network Database Projects Datasets Links are hard-coded into database. They are not a property of the data Locations •very flexible •unwieldy to modify •not widely used Relational Database Projects Data_id Location_id Datasets Linkages are through the properties of the data itself - not hard coded Locations Location_id •widely-used, mature •table-oriented •restricted range of structures Object Oriented Methods Object Data Structure Complex data structures, along with the methods to use the data are in the database •developing -few commercial implementations •diverse structures •extensible Data Modeling • DBMS Systems are highly flexible • Good: they can do a lot! • Bad: they have to be told how to do it! • A Database Management System is the CANVAS, the DATA MODEL is the painting……. Data Modeling Data modeling is used to develop the database structures used in a database Your data model effects – reliability of the data – efficiency and speed of queries – the complexity of the database Data modeling is an art, not a science! Some Terminology: Tables contain attributes or fields (columns) and multiple observations or tuples (rows) Spec_code QRCALB QRCRBR Genus Quercus Quercus Species alba rubra Common Name White Oak Red Oak Flat-file Genus Quercus Quercus Quercus Quercus Quercus Species alba alba alba rubra rubra Common Name White Oak White Oak White Oat Red Oak Red Oak Observer Jones, D. Smith, D. Doe, J. Fisher, K. James, J. Tables in boxes Species Genus Date Observation Species Common Name Observer Date 15-Jun-1998 12-Jul-1935 15-Sep-1920 15-Jun-1998 15-Sep-1920 Attributes in ovals Normalization One widely-used approach for reducing errors within a database is to normalize your data structures Normalization is the process of eliminating duplicate or redundant information Two-table Relational Database Spec_code QRCALB QRCRBR Genus Quercus Quercus Spec_code QRCALB QRCALB QRCALB QRCRBR QRCRBR Observer Jones, D. Smith, D. Doe, J. Fisher, K. James, J. Species Species alba rubra Common Name White Oak Red Oak Date 15-Jun-1998 12-Jul-1935 15-Sep-1920 15-Jun-1998 15-Sep-1920 Spec_code Spec_code Observation Genus Species Common Name Observer Date Complex Data Model Species Images Observations Internet Links Notation: or Locations Observers One-to-one One-to-many Specimens Data Model for Metadata at theVCR/LTER Personnel Projects Mailing Lists Dataset Locations Variable Codes Dataset Variable Optional Linkage Mandatory Linkage “Beanstalk”& “String of Pearls” What Value Date Location Temp SEV 23 10/19/00 Metadata •methods •units Location Table •Lat/Lon Humid 95 10/19/00 SEV Precip 0.01 10/18/00 VCR Beanstalk / String of Pearls • Highly normalized • Extremely flexible - capable of handling many different kinds of data • Inefficient – Queries can be very slow – Can require large amounts of space Why is there no perfect data model for ecological data? • One of the reasons data modeling is an ART not a SCIENCE is that ecologists use data in many different ways – Data that is perfectly formed for one kind of analysis may be unusable for another – Different analytical software may be used Why No Perfect Model? • Generally ecologists want to use data in “flat file” formats that combine all the tables containing data into a single, denormalized “spreadsheet”-type format- but even that format can vary between researchers – ClimDB needed to support single parameter and multiple parameter formats to meet researcher needs