Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
CS 292 Special topics on Big Data Yuan Xue ([email protected]) Part I Relational Database Yuan Xue ([email protected]) Discussion Did you ever encounter a data management problem? Experimental data from a homework? Personal data? Other data? How did you manage your data? Database Database: An integrated collection of related data Usually stored on secondary storage (as files) Also in-memory database Examples of databases Vanderbilt student database, course registration and grading database (backend of YES); Amazon’s products and customer database; Ebay’s products and transaction database; Facebook’s user and message database; Data And more… Database Data Database Management System (DBMS) DBMS: A collection of software/programs Designed to assist in creating, and managing database Support defining, constructing, manipulating, sharing databases Examples of DBMSs Relational DBMSs: Commercial: Oracle, IBM (DB2, Informix), Microsoft (SQL Server, Access); Open source: MySQL, PostgreSQL NoSQL and newSQL: BigTable/Hbase, Cassandra, Redis, Riak, MongoDB, Dynamo, DynamoDB, Spanner Other: object-oriented database, etc Database System Environment With DBMS Without DBMS Users Application Users Application Application Application DBMS Data Data Database Benefit of DBMS Development convenience Reduce application development time Data independence: Application programs not dependent on data representation and storage details Data integrity and consistency: Enforce consistency constraints on data Data sharing and Concurrency control Data is better utilized (discovered and reused), redundancy of data is minimized Avoid undesirable race conditions that arise with simultaneous access/updates to data Centralized control DBA tunes the database to balance user's needs Security Prevent unauthorized access. Crash recovery Ensure the integrity of data in the presence of failures Example Application – MiniTwitter What data do we need? What capabilities on the data do we need? Example Application – MiniTwitter What data do we need? Information required to record System State User profile info: ID, password, email, display name, picture, people I follow, people who follow me. Tweets: author, time, content (topic), replies (author, time, content), favorite (author, time), What capabilities on the data do we need? Operations that update and retrieve System State Register a new user Follow/unfollow a user (approve following request) post/delete a tweet Read/update in real-time all the tweets from the people I follow Show the number of tweets I posted, #people following me, #people I follow Trend information Three-Level Architecture Key question: how to describe data? Conceptual Data Model Logic Data Model Physical Data Model Entities, attributes, relationships (entity-relationship model) Coming next Storage, data structure Database Model Logic Data Model: logical structure of data organization Types of data model Relational model: table Semistructured data model (XML/JSON) tree Various data models in NoSQL systems key-value pair column-family graph Object-oriented model object, class, inheritance a layer over relational model Relational Data Model Schema Schema – structural = structural description description of relations of relations in database in database Instance Instance – data = actual in thecontents database at at given a given point point in in time time ID Name Email Password Alice00 Alice [email protected] Aadf1234 Bob2013 Bob [email protected] qwer6789 Relational Data Model Database of named relationsof(or tables)in database Schema==set structural description relations Each relation has a set of named attributes (or columns) Instance = actual contents at given point in time Each tuple (or row) has a value for each attribute Each attribute has a type (or domain) ID Name Email Password Alice00 Alice [email protected] Aadf1234 Bob2013 Bob [email protected] qwer6789 Discussion How to design relations (tables) for MiniTwitter What are the aspects we need to consider? Design – Version 0.1 Pretending to be md5 hashcode ;) User Follow ID Name Email Password Followee Follower Timestamp Alice00 Alice alice00@gmail .com Aadf1234 Alice00 Bob2013 2011.1.1.3.6.6 Bob2013 Bob bob13@gmail. com qwer6789 Bob2013 Cathy123 2012.10.2.6.7.7 Cathy123 Cathy cath@vandy Tyuoa~!@ Alice00 Cathy123 2012.11.1.2.3.3 Cathy123 Alice00 2012.11.1.2.6.6 Bob2013 Alice00 2012.11.1.2.6.7 Tweet ID Timestamp Author Content 0001 2013.12.20.11 .20.2 Alice00 Hello 0002 2013.12.20.11 .23.6 Bob2013 Nice weather 0003 2014.1.6.1.25. 2 Alice00 @Bob Not sure.. Relational Data Model Key – attribute whose value is unique in each tuple Or set of attributes whose combined values are unique User Follow ID Name Email Password Alice00 Alice alice00@gm ail.com Aadf1234 Bob2013 Bob bob13@gmai l.com Cathy123 Cathy cath@vandy ID Follower timestamp Alice00 Bob2013 2011.1.1.3.6.6 qwer6789 Bob2013 Cathy123 2012.10.2.6.7.7 Tyuoa~!@ Alice00 Cathy123 2012.11.1.2.3.3 Cathy123 Alice00 2012.11.1.2.6.6 Bob2013 Alice00 2012.11.1.2.6.7 Tweet ID timestamp Author Content 0001 2013.12.20.1 1.20.2 Alice00 Hello 0002 2013.12.20.1 1.23.6 Bob2013 Nice weather 0003 2014.1.6.1.2 5.2 Alice00 @Bob Not sure.. Relational Data Model Key – attribute whose value is unique in each tuple Or set of attributes whose combined values are unique User Follow ID Name Email Password Alice00 Alice alice00@gm ail.com Aadf1234 Bob2013 Bob bob13@gmai l.com Cathy123 Cathy cath@vandy ID Follower timestamp Alice00 Bob2013 2011.1.1.3.6.6 qwer6789 Bob2013 Cathy123 2012.10.2.6.7.7 Tyuoa~!@ Alice00 Cathy123 2012.11.1.2.3.3 Cathy123 Alice00 2012.11.1.2.6.6 Bob2013 Alice00 2012.11.1.2.6.7 Tweet ID timestamp Author Content 0001 2013.12.20.1 1.20.2 Alice00 Hello 0002 2013.12.20.1 1.23.6 Bob2013 Nice weather 0003 2014.1.6.1.2 5.2 Alice00 @Bob Not sure.. Relational Data Model Foreign Key – attribute or set of attributes in one table that point to the primary key of another User Follow ID Name Email Password Alice00 Alice alice00@gm ail.com Aadf1234 Bob2013 Bob bob13@gmai l.com Cathy123 Cathy cath@vandy ID Follower timestamp Alice00 Bob2013 2011.1.1.3.6.6 qwer6789 Bob2013 Cathy123 2012.10.2.6.7.7 Tyuoa~!@ Alice00 Cathy123 2012.11.1.2.3.3 Cathy123 Alice00 2012.11.1.2.6.6 Bob2013 Alice00 2012.11.1.2.6.7 Tweet ID timestamp Author Content 0001 2013.12.20.1 1.20.2 Alice00 Hello 0002 2013.12.20.1 1.23.6 Bob2013 Nice weather 0003 2014.1.6.1.2 5.2 Alice00 @Bob Not sure.. Relational Data Model Foreign Key – attribute or set of attributes in one table that point to the primary key of another User Follow ID Name Email Password Alice00 Alice alice00@gm ail.com Aadf1234 Bob2013 Bob bob13@gmai l.com Cathy123 Cathy cath@vandy ID Follower timestamp Alice00 Bob2013 2011.1.1.3.6.6 qwer6789 Bob2013 Cathy123 2012.10.2.6.7.7 Tyuoa~!@ Alice00 Cathy123 2012.11.1.2.3.3 Cathy123 Alice00 2012.11.1.2.6.6 Bob2013 Alice00 2012.11.1.2.6.7 Tweet ID timestamp Author Content 0001 2013.12.20.1 1.20.2 Alice00 Hello 0002 2013.12.20.1 1.23.6 Bob2013 Nice weather 0003 2014.1.6.1.2 5.2 Alice00 @Bob Not sure.. More on Relational Data Model NULL – special value for “unknown” or “undefined” Relational Model Constraint Summary Domain constraints Key constraints Integrity contraints Relational Data Model and Database Relation Model Simple representation Efficient implementation Driven by relational algebra and relational calculus Up-front definition of schemas and types that the data will thereafter adhere to High-level simple yet expressive query language Relational databases Proven success for both open source and proprietary systems Provide full ACID guarantees. SQL as widely used and standard way of database interaction Creating and Using a Relational Database Steps in creating and using a (relational) database 1. Design schema (using DDL – data definition language) 2. Initialization: “Bulk load” initial data 3. Operation: execute queries and modifications Meta-data: database definition Data Data