Download Document

NoSQL continued CMSC 461 Michael Wilson MongoDB  MongoDB   is another NoSQL solution Provides a bit more structure than a solution like Accumulo Data is stored as BSON (Binary JSON)  Binary  Allows encoded JSON, extends JSON storage of large amounts of data SQL vs. MongoDB  SQL has databases, tables, rows, columns  Monbo has databases, collections, documents, fields  Both have primary keys, indexes  Collection structures are not enforced heavily  Inserts automatically create schemas Interacting with MongoDB  Multiple  Switch databases  use   newDb New databases will be stored after an insert  Create  databases within MongoDB collection db.createCollection(“collectionName”) Not necessary, collections are implicitly created on insert BSON  MongoDB    uses BSON very heavily Binary JSON Like JSON with a binary serialization method Has extensions so that it can represent data types that JSON cannot  Used to represent documents, provide input to queries Selects/queries  In MongoDB, querying typically consists of providing an appropriately crafted BSON  SELECT * FROM collectionName   SELECT * FROM collectionName WHERE field = value   db.collectionName.find( {field: value} ) SELECT * FROM collectionName WHERE field > 5   db.collectionName.find() db.collectionName.find( {field: {$gt: 5} } ) Other functions that take a query argument have queries that are formatted this way Interacting with MongoDB  Insert   db.collectionName.insert( {queryBSON} ) Update  db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} )  updateBSON    Set field to 5: {$set: {field: 5}} Increment field by 1 {$inc: {field: 1}} optionBSON  Options that determine whether or not to create new documents, update more than one document, write concerns Interacting with MongoDB  Delete  db.collectionName.remove( {queryBSON} ) Apache Hive  Also runs on Hadoop, uses HDFS as a data store  Queryable like SQL  Using an SQL-inspired language, HiveQL Hive data organization  Databases  Tables  Partitions  Tables are broken down into partitions  Partition keys allow data to be stored into separate data files on HDFS  Can query on particular partitions  Buckets  Can bucket by column to sample data Purpose of Hive  Provide data  NOT to be used for real time queries like Postgres or Oracle  Hive  analytics, query large volumes of queries take forever Partitions and buckets can help reduce this amount of time Hive queries  Hive queries actually generate MapReduce jobs  MapReduce jobs take a while to set up and run  MapReduce jobs can be run manually, but for structured data and analytics, Hive can be used

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document