Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NoSQL
continued
CMSC 461
Michael Wilson
MongoDB
MongoDB
is another NoSQL solution
Provides a bit more structure than a solution
like Accumulo
Data is stored as BSON (Binary JSON)
Binary
Allows
encoded JSON, extends JSON
storage of large amounts of data
SQL vs. MongoDB
SQL
has databases, tables, rows, columns
Monbo has databases, collections,
documents, fields
Both have primary keys, indexes
Collection structures are not enforced
heavily
Inserts automatically create schemas
Interacting with MongoDB
Multiple
Switch databases
use
newDb
New databases will be stored after an insert
Create
databases within MongoDB
collection
db.createCollection(“collectionName”)
Not necessary, collections are implicitly
created on insert
BSON
MongoDB
uses BSON very heavily
Binary JSON
Like JSON with a binary serialization method
Has extensions so that it can represent data
types that JSON cannot
Used
to represent documents, provide
input to queries
Selects/queries
In MongoDB, querying typically consists of
providing an appropriately crafted BSON
SELECT * FROM collectionName
SELECT * FROM collectionName WHERE field =
value
db.collectionName.find( {field: value} )
SELECT * FROM collectionName WHERE field > 5
db.collectionName.find()
db.collectionName.find( {field: {$gt: 5} } )
Other functions that take a query argument
have queries that are formatted this way
Interacting with MongoDB
Insert
db.collectionName.insert( {queryBSON} )
Update
db.collectionName.update( {queryBSON},
{updateBSON}, {optionBSON} )
updateBSON
Set field to 5: {$set: {field: 5}}
Increment field by 1 {$inc: {field: 1}}
optionBSON
Options that determine whether or not to create
new documents, update more than one document,
write concerns
Interacting with MongoDB
Delete
db.collectionName.remove( {queryBSON} )
Apache Hive
Also
runs on Hadoop, uses HDFS as a data
store
Queryable like SQL
Using an SQL-inspired language, HiveQL
Hive data organization
Databases
Tables
Partitions
Tables are broken down into partitions
Partition
keys allow data to be stored into
separate data files on HDFS
Can query on particular partitions
Buckets
Can bucket by column to sample data
Purpose of Hive
Provide
data
NOT to be used for real time queries like
Postgres or Oracle
Hive
analytics, query large volumes of
queries take forever
Partitions and buckets can help reduce this
amount of time
Hive queries
Hive
queries actually generate
MapReduce jobs
MapReduce jobs take a while to set up and
run
MapReduce
jobs can be run manually,
but for structured data and analytics, Hive
can be used