* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database basics
Survey
Document related concepts
Transcript
CTFS Workshop Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network [email protected] [email protected] CTFS Workshop Relational database basics Why relational databases? Why MySQL? What about R? From an input sheet to a database What is a database? Why do we need to convert our input sheet/ excel spreadsheet to a database? What are the advantages and disadvantages? How does a data entry program help? Input form / Excel sheet How best to store data for Accuracy Easy retrieval Relational Theory In order to work with MySQL it is necessary to understand the basics of relational theory. i.e how and why data is stored and managed in a relational database. The guiding principle behind a relational database is to store data once and only once. What is a Relation? A table. Columns are fields (attributes) of data related to other fields on the same row (tuple). Primary Key Identifies the row of a table without duplicates. Tells you what the row contains Eg. If treeid is the primary key then the row has information about that tree Candidate Primary Key Any attribute(s) which together would serve as the primary key. Must uniquely identify a row of data. Each part of the key must be essential to unique identification. No redundancy. Foreign Key A foreign key is a column in a table that matches the primary key column of another table. Its function is to link the basic data of two entities on demand, i.e. when two tables are joined using the common key. First Normal Form One piece of information per column. No repeated rows. Eliminate fused data eg Code1,Code2 Tag Species Code 1234 SHORME A, BA Tag Species Code 1234 SHORME A 1234 SHORME BA Wrong! Right Second Normal Form Each column depends on the entire primary key. Wrong Tag Census 1234 1 Tag Species Seedsize X SHORTR Medium Species 1234 SHORTR Seedsize X Medium Y DBH 11.3 15.4 12 Y 11.3 15.4 Right Third Normal Form Each column depends ONLY on the primary key. i.e. there are no transitive dependencies Wrong Tag Species Seedsize X 1234 SHORTR Medium 11.3 15.4 Tag Species X Y 1234 SHORTR 11.3 15.4 Right Y Fourth Normal Form The table must contain no more than one multi-valued dependency Tag DBH Cod e 1234 10 A 1234 11 A 1234 11 BA Entity Relationship diagram (ERD) Shows in a diagram how entities (tables) are related to one another. One to One One to many Many to many One to one Extension of number of attributes in a single table Rarely required Tree More tree attributes One to Many Most common Child Parent Requires two tables. Linked by Foreign Key Family Genus Species Many to many Need to break down to one to many Measurement Tree Requires three tables Associative table provides common key Measurement Code Code Reassembling data Data was broken down into tables to preserve integrity How can we put it together to derive information? Use Structured Query Language (SQL) to JOIN tables using a common attribute Joins Two tables may be joined when they share at least one common attribute The Primary key of the Parent table is stored in the Child table as a cross reference. This is called a Foreign Key. Primary Key in Parent GenusID Genus FamilyID 1 Acacia 4 2 Acalypha 3 3 Adelia 3 4 Aegiphila 3 5 Alchornea 3 SpeciesID Species Foreign Key in Child GenusId 1 melanoceras 1 2 diversifolia 2 3 macrostachya 2 4 triloba 3 5 panamensis 4 6 costaricensis 5 7 latifolia 5 Table joined on Foreign Key GenusID SpeciesID 1 2 3 4 5 6 7 Species melanoceras diversifolia macrostachya triloba panamensis costaricensis latifolia GenusId 1 2 2 3 4 5 5 ⇿ ⇿ ⇿ ⇿ ⇿ ⇿ ⇿ GenusID 1 2 2 3 4 5 5 Genus Acacia Acalypha Acalypha Adelia Aegiphila Alchornea Alchornea The Genus ID in the Species table is used to pick up information for the corresponding Genus. It looks for a row with the matching Primary Key FamilyID 4 3 3 3 3 3 3 Extend to join many tables With SQL you can join as many tables as you need to in order to get the set of information you need. Thus the previous example can be extended to include Family which is a parent table of Genus and/or extended in the another direction to include Tree which is a child of Species as long as there is a linking attribute. This attribute is called a Foreign Key.