* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genome Wide Visualization and Integration
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational algebra wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Database Technologies Plotting Omics Associations Jake Lin GEBI 2014 Database: Motivation • Bioinformatics is a data driven science • Systems Biology – *Omics • Human Genome Sequence • ENCODE • Model Organisms – Web + Cloud Abstraction • Persistence + Organization • Provenance + Discovery • Integration DBMS Timeline • A Database Management System Primer – – – – Early 1960s Hierarchical mid 1960s Network 1969 Relational* 1980s Object Oriented • CAD, complex nested objects • No separation between language/db – 1990s – Internet • HTTP – 2009 NoSQL* Hierarchical • Tree – Pyramid – Manufacturing/Suppliers – Root – Leaf Nodes – Fast access/Update • Limitation – Difficult to relate branches • ChildX to ChildY Network • Flexible Hierarchical – Children with X Parents • Members -> Owners – Loose Graph • Not Declarative Relational SQL: Principles • Relational Algebra/Set Theory – Same Math properties – Union, Intersection … • Schema – Relational model • Every thing (entity + relationship) is defined as a column in a table – Tuple » {‘key’,’name’,’role’,…} SQL Vocabulary • Table – Columns • Key/Types – Primitives/VARCHAR/blobs – Relationships/Constraints • Update – Procedure – Trigger • Select – Joins + Views ACID: Relational SQL DBMS • ACID – http://en.wikipedia.org/wiki/ACID – Atomicity – Consistency – Isolation – Durability • Guarantee that transactions are processed fully/reliably – Table A/Table B/Table C • Cascade Rollback Relational SQL: Normalization • Normalization* – Data organization rules to prevent ACID anomalies – Eight “Forms” - Theory – http://michaelmclaughlin.info/db1/lesson-3-modeling-data/normalization/ – http://researcher.watson.ibm.com/researcher/files/us-fagin/tods81.pdf Relational SQL: Market leaders • Commercial – – – – – Oracle - ~$24 Billion IBM – DB2 Microsoft - SqlServer Sybase Apple - FilerMaker • Open sourced – MySQL – most popular – 65K downloads per day – PostgreSQL – most advanced – SQLite – most widely deployed • Gadgets - Smart Phones • Light Apps Relational SQL: Examples Apps • MySQL – Most web companies + startups • SQLite – SQLite • Gadgets – iPhone • PostgreSQL – BASF – Affymetrix – Governments Relational SQL: Schema • Schema* – Relational model Example • Tuple – {‘gene_key’,’gene_alias’,’gene_chr’,’gene_start’,’gene_end’…} • Math : Set theory • Columns {Types}:{byte, (var)char, int, blob} • Querying – Joins – something in common – Inner/Outer … Relational SQL: API • Drivers – ODBC/JDBC – Interface between programming language and DB – Python packages • MySQLDB – Java JDBC jar – Perl/C++… • Model View Controller – App Interface – Separation/(Http/AJAX) • • • • Create (Put) Read (Get) Update (Post) Delete (Post) Relational SQL: Querying • Selects – set theory – Indexing – Keys • Syntax* – Select * from TableA as A where A.cA1 = [‘someValue’] • Joins – Foreign Keys – Inner and Outer Joins – Select A.cA1, B.cB1 from TableA as A, TableB as B where A.key = B.key – Unions, Intersects • Common Attributes – Traversals • 2 joins • 3 joins • 4 joins … Relational SQL: Limitations • Not all Data Structures are ideal representations to relational – Many joins for one query • To get a few tuples • Self joins (n) … – Relationship between rows stored in the same table – Complex Objects • Stored Procedures/Triggers • Blobs – Data Stream/File NoSQL Foundation + Motivation • 1998/2009 NoSQL • Not Only SQL • Big Data – Web Scale + Simplicity • Insert + Retrieve – Distributed across Cloud – Complex relationships • Graphs NoSQL Technologies • Key-Valued Store – {p,v} distributed across machines • Column Family Store – Key-Valued where keys are in families • Document-Oriented Databases – Collection of Key-Valued in Documents • Graph DB Graph DB Foundation – Graph DB • Nodes (vertices) • Edges (degrees) • Properties (key-value) Neo4j Graph DB • Manages • Records data – – Nodes Relationships • Properties – – Belong » Nodes » Relationships INDEX » Look Up • Traversal • PATH – Look Up Watch + Learn http://player.vimeo.com/video/50787208 Relationships and Degrees Kevin Bacon Paul Erdős Relationships and Degrees Kevin Bacon Paul Erdős Natalie Portman POMO: Plotting Genomic Associations – http://pomo.cs.tut.fi – Web Viz Tool • • Circos light (http://circos.ca) No installs, dependencies – Associations + Annotations – Human – Mouse – Yeast – Views • Network/Grid • Annotations – Bar/Histogram/Heatmaps • Filter – Unmapped Sketches – http://pomo.cs.tut.fi – Custom – Human-Mouse – New POMO Architecture • HTML5 • JavaScript – jQuery/ExtJS/protovis/circvis – Cytoscapeweb • Apache Linux – Python CGI • SQLite – Ensembl Biomart – latest reference builds – Ensembl Plants – SGD • https://code.google.com/p/pomo NoSQL Players • Google* – Big Table : Column Based • Amazon* – Dynamo – Key:Value • Facebook – Cassandra – Column Based • Redis – Open sourced Key:value store • MongoDB – Document • CouchDB - Document • Neo4J – GraphDB – Many other Graph Databases – http://en.wikipedia.org/wiki/Graph_database NoSQL: Vocabulary • Not ACID compliant • Focus : Speed + Scaled • Distributed – Replication + Propagation • Sharding – Single Logic DB system » Ranges of Documents • Cluster of machines • Latency Acknowledgements! Bioinformatics Core - Luxembourg Centre for Systems Biomedicine Shmulevich Lab – ISB Seattle Nykter Group – BioMediTech Reija Autio – TUT/UTA For more information: [email protected] References • • • • • • • • • C. J. DATE, An Introduction to Database Systems, Addison-Wesley 8th Edition (2003) http://en.wikipedia.org/wiki/Nosql Gentle Introduction to Object and Relational DB http://www.cl.cam.ac.uk/~fms27/db/tr-98-2.pdf Google Bigtable http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google .com/sv//archive/bigtable-osdi06.pdf http://newtech.about.com/od/databasemanagement/a/Nosql.htm http://en.wikipedia.org/wiki/Create,_read,_update_and_delete http://lucene.apache.org/ Dynamo Paper – Amazon http://www.read.seas.harvard.edu/~kohler/class/cs239w08/decandia07dynamo.pdf Free Graph DB Book: • • http://graphdatabases.com/?_ga=1.245224581.1016191170.1409035486 POMO – http://pomo.cs.tut.fi • http://www.biomedcentral.com/1471-2164/14/918