* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Week1-DatabaseIntroduction - Cardiff Biodiversity Informatics
Survey
Document related concepts
Expense and cost recovery system (ECRS) wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
Versant Object Database wikipedia , lookup
Data vault modeling wikipedia , lookup
Business intelligence wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
MET282 Information Systems in Bioinformatics Databases – the Story So Far Dr. Richard White The Knowledge Pyramid 2 3 What a database is • Data is stored separately from any application programs which might use it • Multiple uses of the data are envisaged • Designed for retrieval in various anticipated and unanticipated ways 4 What databases are not • unstructured piles of data (including heaps of web pages in web sites, wikis, blogs etc.) • directories full of text files • data collected and stored to be read by a single program or for one kind of analysis • spreadsheets 5 Spreadsheets versus databases (1) • A spreadsheet is typically viewed as an entire table of cells which may contain – numbers (data) – text (labels) – formulae (calculations producing results) • A database may be structured in various ways, usually so that a small subset of the data is presented as the result of a search 6 Spreadsheets versus databases (2) Compared with a spreadsheet, a database • requires planning • the data is "hidden" until retrieved • a program may be required to help enter data • a program may be required to help retrieve data • integrity checking can be performed (Week 4) • can be multi-user • can be available on the Web 7 Uses in bioinformatics Contents of databases in bioinformatics: • species names • nucleotide sequence databases • protein sequence databases • protein structure databases • phenotypic effects • bibliographic data • special-purpose databases 8 Uses in biodiversity informatics Uses of databases in biodiversity: • information about species names • data about species • data about biological specimens • data about areas, places, sampling sites, etc. (sometimes stored in Geographical Information Systems (GIS) 9 Database architecture • There are several very different ways to organise data in databases, sometimes called database architectures • In the first part of this module we shall focus on relational databases, widely used for scientific data • Later, we shall investigate other types of database architecture 10 Database system components A relational database management system (DBMS) has the following essential components: • Data tables (the data itself) • “Storage engine” (stores data to and retrieves data from the tables) • User interface software (for programs and humans to enter, view and edit data) Some commercial general-purpose DBMSs, such as Microsoft Access, make the engine and the interface appear as one 11 Database system software A DBMS usually also includes, in order of increasing userfriendliness: 1. Database “drivers” 2. APIs (application program interface modules, so that the driver(s) can be called from, say, Perl, Python or Java) 3. Other import & export modules, etc. (to make it easier for programs to store, retrieve and alter data) 4. Application programs (using the above to make it easier for people to store, retrieve and alter data, and do useful things with it, sometimes called “business logic”, including ... 12 Database application programs Application programs allow users to store, retrieve and alter data, and do useful things with it, sometimes called “business logic”, including ● data analysis ● report writing ● 13 utilities for database managers for ● backup ● integrity checking ● etc. Example 1 • Imagine a database of your digital photo file collection Table of photo file names (with title, location, date, exposure details, tags) Table of locations (holidays, visits, etc.) with dates, coordinates, etc. Index of tags Might include physical slides and prints 14 Example 2 • Imagine a database of your CD or music file (e.g. MP3) collection Table of CDs or files (with track titles, performers, record companies) Table of tracks (linked to CD or file) Table of performers Table of record companies 15 Data retrieval from a database • A relational database consists of one or more tables • Data retrieved from a relational database can be thought of as consisting of another (usually smaller) table • So how is this smaller table specified? 16 Specifying the result table ● By “selecting” rows, by some property such as performer = “Nigel Kennedy” By “projecting” (choosing a subset of) the columns required, as in title, performer, label ● By “joining” two tables together, by means of a linking column such as performer SQL (Structured Query Language), which you met briefly in the Computing module, is a commonly used language in which to make these requests ● 17 End This presentation is available on Learning Central and on my web pages at http://users.cs.cf.ac.uk/R.J.White/InfoSystemsInBioinfo/ http://biodiversity.cs.cf.ac.uk/teaching/InfoSystemsInBioinfo/ as file Week1-DatabaseIntroduction.ppt No trees were harmed in the production of this presentation. However, a large number of electrons were terribly inconvenienced. 18