* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database overview
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Consistency model wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
CS 292 Special topics on Big Data Yuan Xue ([email protected]) Part II NoSQL Database (Overview) Yuan Xue ([email protected]) Outline From SQL to NoSQL Motivation Challenges Approaches Notable NoSQL systems Database User/Application Developer: How to use? Database System Designer: How to design? Under the hood: (Physical) data model and distribution algorithm Database Designer: How to link application needs with database design (Logic) data model and CRUD operations Schema design Summary and Moving forward Summary of NoSQL Data Modeling Techniques Summary of NoSQL Data Distribution Models/Algorithms Limits of NoSQL NewSQL Data Model • Column Family • Key-value • Document From SQL to NoSQL Relational Database Review Persistent data storage Transaction support ACID Concurrency control + recovery Standard data interface for data sharing SQL Design Operation Conceptual Design Entity/Relationship model SQL query Logic Design Data model mapping Logical Schema] Normalization Normalized Schema Physical Design Physical (Internal) Schema SQL Review -- Putting Things Together Users Application Program/Queries Query Processing DBMS system Data access (Database Engine) Meta-data Data http://www.dbinfoblog.com/post/24/the-query-processor/ From SQL to NoSQL Motivation I Scaling (up/out) SQL database Web-based application with SQL database as backend High Web traffic large volume of transactions More users large amount of data Solution 1: cache E.g. memcached, Only handle read traffic Solution 2: Scale up (vertically) Add more resources to a single node Solution 3: Scale out (horizontally) Add more nodes to the DB system Data distribution among multiple nodes non-trivial Scale out SQL database – Techniques and Challenges Two Techniques Replication Sharding Replication Master-Slave • Duplication facilitates read • Data consistency problem arises All writes are written to the master All reads performed against the replicated slave databases Critical reads may be incorrect as writes may not have been propagated down Large data sets can pose problems as master needs to duplicate data to slaves Peer-to-peer SQL and multi-node cluster do not go well Writes can happen at any nodes Inconsistent write (which can be persistent) Sharding (Partition the dataset to multiple nodes) Scales well for both reads and writes Not transparent, application needs to be partition-aware Can no longer have relationships/joins across partitions Loss of referential integrity across shards • Partition facilitates write/read • lost transaction support across partition From SQL to NoSQL Motivation II Limits in Relational Data Model Impedance Mismatch Difference between in-memory data structures and relational model Predefined schema Join operation Not appropriate for Graph data Geographical data Unstructured data From SQL to NoSQL Google: Search Engine Store billions document Bigtable + Google File System Amazon: Online Shopping Shopping cart management dynamo Foundation HBase + HDFS Cassandra Riak Redis MongoDB … Open source DBMS Supported by many social media sites with large data needs • facebook • twitter Amazon DynamoDB Amazon SimpleDB … Cloud-hosted managed DBMS Utilized by companies • imdb • startups.. What is NoSQL Stands for Not Only SQL: Not relational database Umbrella term for many different types of data stores (database) Different Types Key value DB, Column Family DB Document DB Graph DB of Data Model Just as there are different programming languages, NoSQL provides different data storage tools in the toolbox Polyglot Persistence What is the Magic? Logic Data Model: From Table to Aggregate Diverse data models Aggregate No schema predefined Column Family Key-value Document Graph allows an attribute/field to be added at run-time Still need to consider how to define “key” “column family” Giving up build-in Joins Physical Data Handling: From ACID to BASE (CAP theorem coming up) No full transaction support Support at Aggregate level Support both replication and sharding (automatically) Relax one or more of the ACID properties NoSQL Database Classification -- Data Model View Key Value Store Column Family Hbase (BigTable) Cassandra Document Dynamo Riak Redis Memcached (in memory) MongoDB Terrastore Graph FlockDB Neo4J Transaction Review ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. Atomicity: "all-or-nothing" proposition Each work unit performed in a database must either complete in its entirety or have no effect whatsoever Consistency: database constraints conformation Each client, each transaction, Can assume all constraints hold when transaction begins Must guarantee all constraints hold when transaction ends Isolation: serial equivalency Operations may be interleaved, but execution must be equivalent to some sequential (serial) order of all transactions. Durability: durable storage. If system crashes after transaction commits, all effects of transaction remain in database ACID and Transaction Support in Distributed Environment Recall -- scaling out database Distributed environment with multiple nodes Data are distributed across nodes via Replication Sharding Concerns from Distributed Networking Environment Message Loss/Delay Network partition Can ACID property still hold for database in distributed environment? CAP theorem comes as a guideline CAP Theorem Start with three properties for distributed systems: Availability Consistency: All nodes see the same data at the same time Availability: Every request to a nonfailing node in the system returns a response about whether it was successful or failed. Partition Tolerance: System properties (consistency and/or availability) hold even when the system is partitioned (communicate lost) Consistency Partition tolerance Availability CAP Theorem Consistency – Atomic data object As in ACID 1. In a distributed environment, multiple copies of a data item may exist on different nodes. 2. Consistency requires that all operations towards a data item are executed as if they are performed over a single instant Consistency Partition tolerance Clients Data item X Data item X Copy 1 Data item X Copy 2 Availability CAP Theorem Availability – Available data object Requests to data -- Read/write, always succeed. Consistency 1. 2. Partition tolerance All (non-failing ) nodes remain able to read and write even when network is partitioned. A system that keeps some, but not all, of its nodes able to read and write is not Available in the CAPsense, even if it remains available to clients and satisfies its SLAs for high availalbility Reference: https://foundationdb.com/white-papers/the-cap-theorem Availability CAP Theorem Partition Tolerance 1. 2. The network will be allowed to lose arbitrarily many messages sent from one node to another. When network is partitioned, all messages from one component to another will get lost. Consistency Partition tolerance Under Partition Tolerance Consistency requirement implies that every data operation will be atomic, even though arbitrary messages may get lost. Availability requirement implies that every nodes receiving a request from a client must respond, even though arbitrary messages may get lost. CAP Theorem You can have at most two of these three properties for any shared-data system Consistency, availability and partition tolerance To scale out, you have to support partition tolerant NoSQL: either consistency or availability to choose from under network partition Availability SQL NoSQL Pick one side Consistency Partition tolerance NoSQL Database Classification -- View from CAP Theorem Availability Relational: MySQL, PostgreSQL Consistency Dynamo and its derivatives: Cassandra, Riak Relational: MySQL BigTable and its derivatives: HBase, Redis, MongoDB Partition tolerance More on Consistency Question: in AP system, if consistency property can not be hold, what property can be claimed? Example Data item X is replicated on nodes M and N Client A writes X to node N Some period of time t elapses. Client B reads X from node M Does client B see the write from client A? Client B read M Data item X Copy 1 Client A write N Data item X Copy 2 From client’s perspective, two kinds of consistency: Strong consistency (as C in CAP): any subsequent access is guaranteed to return the updated value Weak consistency: subsequent access is not guaranteed to return the updated value Inconsistency window: The period between the update and the moment when it is guaranteed that any observer will always see the updated value Consistency is a continuum with tradeoffs Multiple consistency models Eventual Consistency Eventual consistency a specific form of weak consistency When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent all accesses will return the last updated value. For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service Based on CAP SQL NoSQL ACID BASE (Basically Available, Soft state, Eventual consistency) Basically Available - system seems to work all the time Soft State - it doesn't have to be consistent all the time Eventually Consistent - becomes consistent at some later time References and Additional Reading for CAP http://en.wikipedia.org/wiki/CAP_theorem Formal proof for CAP theorem: Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services by Seth Gilbert and Nancy Lynch Graphical illustration of CAP theorem: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem Recent post from Brewer: CAP Twelve Years Later: How the "Rules" Have Changed Eventually Consistent by Werner Vogels Part II NoSQL Database (Overview) Yuan Xue ([email protected]) BigTable and Hbase Introduction BigTable Background Development began in 2004 at Google (published 2006) need to store/handle large amounts of (semi)-structured data Many Google projects store data in BigTable Google’s web crawl Google Earth Google Analytics HBase Background open-source implementation of BigTable built on top of HDFS Initial HBase prototype in 2007 Hadoop become Apache top-level project and HBase becomes subproject in 2008 Road Map Database User/Application Developer: How to use? Database System Designer: How to design? (Logic) data model and CRUD operations Under the hood: (Physical) data model and distribution algorithm Database Designer: How to link application needs with database design Schema design Data Model A sparse, distributed, persistent multidimensional sorted map Map indexed by a row key, column key, and a timestamp (row:string, column:string, time:int64) uninterpreted byte array Rows maintained in sorted lexicographic order based on row key A row key is an arbitrary string Every read or write of data under a single row is atomic. Row ranges dynamically partitioned into tablets Unit of distribution and load balancing Applications can exploit this property for efficient row scans Data Model A sparse, distributed, persistent multidimensional sorted map Map indexed by a row key, column key, and a timestamp (row:string, column:string, time:int64) uninterpreted byte array Columns grouped into column families Column key = family:qualifier Column family must be created before data can be stored in a column key. Column families provide locality hints Unbounded number of columns Data Model A sparse, distributed, persistent multidimensional sorted map Map indexed by a row key, column key, and a timestamp (row:string, column:string, time:int64) uninterpreted byte array Timestamps 64 bit integers , Assigned by: Bigtable: real-time in microseconds, Client application: when unique timestamps are a necessity. Items in a cell are stored in decreasing timestamp order. Application specifies how many versions (n) of data items are maintained in a cell. Bigtable garbage collects obsolete versions. Data Model – MiniTwitter Example View as a Map of Map Operations & APIs in Hbase Create and delete tables and column families; Modify meta-data Operations are based on row keys Single-row operations: Multi-row operations: Put Get Delete Scan MultiPut Atomic R-M-W sequences on data stored in a single row key (No support for transactions across multiple rows). No built-in joins Can be done in the application Using scan() and get() operations Using MapReduce Creating a Table HBaseAdmin admin= new HBaseAdmin(config); HColumnDescriptor []column; column= new HColumnDescriptor[2]; column[0]=new HColumnDescriptor("columnFamily1:"); column[1]=new HColumnDescriptor("columnFamily2:"); HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable")); desc.addFamily(column[0]); desc.addFamily(column[1]); admin.createTable(desc); 34 Altering a Table Disable the table before changing the schema 35 Single-row operations: Put() Insert a new record (with a new key), Or Insert a record for an existing key Implicit version number (timestamp) Explicit version number 36 Put() in MiniTwitter Update information Single-row operations: Get() • Given a key return corresponding record For each value return the highest version Can control the number of versions you want 39 Get() in MiniTwitter Single-row operations: Delete() Marking table cells as deleted Multiple levels Can mark an entire column family as deleted Can make all column families of a given row as deleted Delete d = new Delete(Bytes.toBytes(“rowkey”)); userTable.delete(d); Delete d = new Delete(Bytes.toBytes(“rowkey”)); d.deleteColumns( Bytes.toBytes(“cf”), Bytes.toBytes(“attr”)); userTable.delete(d); 41 Multi-row operations: Scan() 42 Road Map Database User/Application Developer: How to use? (Logic) data model and CRUD operations Database System Designer: How to design? Under the hood: (Physical) data model and distribution algorithm Single Node Write, Read, Delete Distributed System Database Designer: How to link application needs with database design Schema design Basics Terms BigTable Hbase SSTable HFile memtable MemStore tablet region tablet server RegionServer HFile/SSTable BigTable Hbase SSTable HFile memtable MemStore tablet region Basic building block of Bigtable tablet server RegionServer Persistent, ordered immutable map from keys to values Sequence of blocks on disk plus an index for block lookup Stored in GFS Can be completely mapped into memory Supported operations: Look up value associated with key Iterate key/value pairs within a key range 64K block 64K block 64K block SSTable Index HDFS: Hadoop Distributed File Systems Client requests meta data about a file from namenode Data is served directly from datanode File Read/Write in HDFS File Read 1. open HDFS client 3. read 6. close File Write 2. get block locations Distributed FileSystem NameNode FSData InputStream 1. create HDFS client name node 3. write 7. close client JVM client JVM client node client node FSData OutputStream 2. create NameNode 8. complete name node 4. get a list of 3 data nodes 5. write packet 4. read from the closest node Distributed FileSystem 6. ack packet 5. read from the 2nd closest node DataNode DataNode DataNode DataNode DataNode DataNode data node data node data node data node data node data node If a data node crashed, the crashed node is removed, current block receives a newer id so as to delete the partial data from the crashed node later, and Namenode allocates an another node. 47 Hbase: Logic storage vs Physical storage Region/Tablet Dynamically partitioned range of rows Built from multiple SSTables Column-Family oriented storage Tablet 64K block Start:Alice00 64K block 64K block BigTable Hbase SSTable HFile memtable MemStore tablet region tablet server RegionServer End:Dave11 SSTable Index 64K block 64K block 64K block SSTable Index Table (HTable) BigTable Hbase SSTable HFile memtable MemStore tablet region tablet server Multiple tablets make up the table The entire BigTable is split into tablets of contiguous ranges of rows Approximately 100MB to 200MB each Tablets are split as their size grows SSTables can be shared Tablet Alice00 SSTable RegionServer Tablet Dave11 SSTable Emily Darth SSTable HTable SSTable •Each column family is stored in a separate file •Key & Version numbers are replicated with each column family •Empty cells are not stored Source: Graphic from slides by Erik Paulson Tablet1 Tablet2 Table to Region Physical Storage: MiniTwitter Example HTable Write Path in HBase Hlog (append only WAL on HDFS One per RS) Read Path in Hbase Deletion and Compaction in HBase Delete() will mark the record for deletion A new “tombstone” record is written for that value BigTable Hbase Merging compaction Minor compaction Minor compaction flush Major compaction Major compaction Announcement Lab 1 Due Lab 2 Release (team up) Project team up Quiz 1 graded Data Distribution and Serving -- Big Picture 57 Placement of Tablets and Data Serving A tablet is assigned to one tablet server at a time. Metadata for tablet locations and start/end row are stored in a special Bigtable cell Master maintains: The set of live tablet servers, Current assignment of tablets to tablet servers (including the unassigned ones) RegionServer and DataNode RegionServer and DataNode Interacting with Hbase Hbase Schema Design How many column families should the table have? What data goes into what column family? How many columns should be in each column family? What should the column names be? Although column names don’t have to be defined on table creation, you need to know them when you write or read data. What information should go into the cells? How many versions should be stored for each cell? What should the rowkey structure be, and what should it contain? MiniTwitter Review Read operation Whom does TheFakeMT follow? Does TheFakeMT follow TheRealMT? Who follows TheFakeMT? Does TheRealMT follow TheFakeMT? Write operation A user follows someone A user unfollows someone MiniTwitter- Version 1 Version 2 Read operation How many people a user follows? Atomic operation! Version 3 Get rid of the counter Problem Row access overhead Version 4 Wide table vs tall table Version 4 – client code Version 5 Trick with hash code Normalization vs Denormalization Course CourseID CourseName Hour Description CS292 Special Topics on Big Data 3 large-scale data processing CS283 Computer networks 3 Networking technology ClassSchedule ClassID CourseID Semester InstructorID Classroom Time 2014CS292 CS292 S2014 xuey1 FGH134 Tue/Th 1:10-2:25 2014CS283 CS283 S2014 jmatt FGH236 Tue/Th 1:10-2:25 ClassID StudentID Grade 2014CS292 balice1 NULL Registration VandyUser VUNetID FirstName LastName Email xuey1 Yuan Xue Yuan.xue balice1 Alice Burch Alice.burch ClassSchedule eID SectionID Semester InstructorID Classroom Time 2 01 S2014 xuey1 FGH134 Tue/Th 1:10-2:25 3 01 S2014 jmatt FGH236 Tue/Th 1:10-2:25