Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
NoSQL
continued
CMSC 461
Michael Wilson
MongoDB
 MongoDB


is another NoSQL solution
Provides a bit more structure than a solution
like Accumulo
Data is stored as BSON (Binary JSON)
 Binary
 Allows
encoded JSON, extends JSON
storage of large amounts of data
SQL vs. MongoDB
 SQL
has databases, tables, rows, columns
 Monbo has databases, collections,
documents, fields
 Both have primary keys, indexes
 Collection structures are not enforced
heavily

Inserts automatically create schemas
Interacting with MongoDB
 Multiple

Switch databases
 use


newDb
New databases will be stored after an insert
 Create

databases within MongoDB
collection
db.createCollection(“collectionName”)
Not necessary, collections are implicitly
created on insert
BSON
 MongoDB



uses BSON very heavily
Binary JSON
Like JSON with a binary serialization method
Has extensions so that it can represent data
types that JSON cannot
 Used
to represent documents, provide
input to queries
Selects/queries

In MongoDB, querying typically consists of
providing an appropriately crafted BSON

SELECT * FROM collectionName


SELECT * FROM collectionName WHERE field =
value


db.collectionName.find( {field: value} )
SELECT * FROM collectionName WHERE field > 5


db.collectionName.find()
db.collectionName.find( {field: {$gt: 5} } )
Other functions that take a query argument
have queries that are formatted this way
Interacting with MongoDB

Insert


db.collectionName.insert( {queryBSON} )
Update

db.collectionName.update( {queryBSON},
{updateBSON}, {optionBSON} )

updateBSON



Set field to 5: {$set: {field: 5}}
Increment field by 1 {$inc: {field: 1}}
optionBSON

Options that determine whether or not to create
new documents, update more than one document,
write concerns
Interacting with MongoDB
 Delete

db.collectionName.remove( {queryBSON} )
Apache Hive
 Also
runs on Hadoop, uses HDFS as a data
store
 Queryable like SQL

Using an SQL-inspired language, HiveQL
Hive data organization
 Databases
 Tables
 Partitions

Tables are broken down into partitions
 Partition
keys allow data to be stored into
separate data files on HDFS
 Can query on particular partitions
 Buckets

Can bucket by column to sample data
Purpose of Hive
 Provide
data

NOT to be used for real time queries like
Postgres or Oracle
 Hive

analytics, query large volumes of
queries take forever
Partitions and buckets can help reduce this
amount of time
Hive queries
 Hive
queries actually generate
MapReduce jobs

MapReduce jobs take a while to set up and
run
 MapReduce
jobs can be run manually,
but for structured data and analytics, Hive
can be used