Download What is MongoDB?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Transcript
What is NoSQL?
Definition: “Next generation databases mostly addressing some of the points: being non-relational,
distributed, open source and horizontally scalable… schema-free, easy replication support, simple API,
eventually consistent, huge amount of data…”
- nosql-database.org
Non-relational: data items do not have a row of attributes, no tables with fixed number of columns or
relationship between them.
Distributed: not all storage devices are attached to a common processing unit.
Open source: available to everyone to copy, modify, redistribute.
Horizontally scalable: more nodes can be added to the system. Computer prices have dropped and
performance has increased, so it is more convenient to have many low cost computers rather than just
one, with high performance.
Schema-free: NoSQL databases are built to allow the insertion of data without a predefined schema.
That makes it easy to make significant application changes in real-time, without worrying about service
interruption – which means development is faster, code integration is more reliable, and less database
administration time is needed.
Replication support: storing multiple copies of data across the cluster, and even across data centers, to
ensure high availability and support disaster recovery.
Simple API: simple application programming interface.
Eventually consistent: if no new updates are made to a given data item, eventually all accesses to that
item will return the last updated value. BASE semantics (Basically Available, Soft state, Eventual
consistent)
Huge amount of data: Big data
NoSQL database types
Document databases pair each key with a complex data structure known as a document. Documents
may contain many key-value pairs, or key-array pairs, or even nested documents. (MongoDB, CouchDB)
Graph stores are used to store information about networks, such as social connections. (Neo4J,
HyperGraphDB)
Key-value stores are the simples NoSQL databases. Every single item in the database is stored as an
attribute name (or “key”), together with its value. (Riak, Voldemort, Redis)
Wide-column stores are optimized for queries over large datasets, and store columns of data together,
instead of rows. (Cassandra, HBase)
What is MongoDB?
MongoDB (from "humongous") is an open-source document database that provides high performance,
high availability, and automatic scaling.
-
mongodb.org
Key features
High performance
MongoDB provides high performance data persistence. In particular,


Support for embedded data models reduces I/O activity on database system
Indexes support faster queries and can include keys form embedded documents and arrays
High availability
To provide high availability, MongoDB’s replication facility, called replica sets, provide:


Automatic failover
Data redundancy
Automatic scaling
MongoDB provides horizontal scalability as part of its core functionality.


Automatic sharding distributes data across a cluster of machines
Replica sets can provide eventually consistent-reads for low-latency high throughput
deployments
Who uses MongoDB?
MongoDB is the most popular NoSQL database system according to DB-Engines ranking
(db-engines.com/en/ranking)
Running MongoDB
MongoDB requires a data folder to store its files, default location is C:\data\db.
To start MongoDB with Command Prompt: "C:\mongodb\bin\mongod.exe". Waiting for connection
message indicates that mongod.exe is running successfully.
To connect with MongoDB you have to open another Command Prompt and execute
"C:\monogodb\bin\mongo.exe". The mongo.exe shell will connect to mongod.exe running on the
localhost interface and port 27017 by default. You can set up MongoDB as a Windows Service so that
the database will start automatically following each reboot cycle.
Documents and Collections
A document is the basic unit of data. Documents are stored on disk in BSON (binary JSON) serialization
format. The advantages of using documents are:
-Documents (i.e. objects) correspond to native data types in many programming language.
-Embedded documents and arrays reduce need for expensive joins.
-Dynamic schema supports fluent polymorphism.
A collection is a group of documents (equivalent to a table in a RDBMS). A collection exists within a
single database. MongoDB will create a collection upon its first use. You do not need to create a
collection before inserting data. Because MongoDB uses dynamic schemas, you do not need to specify
the structure of your documents before inserting them into the collection.
MongoDB features


Querying
MongoDB supports search by field, range queries, regular expression searches. Searches can
return specific fields of documents and also include user defined JavaScript functions. The find()
method returns a cursor to the results, but if the returned cursor is not assigned to a variable s
automatically iterated up to 20 times to access up to the first 20 documents that match the
query. Also using the “it” operation will show the remaining results.
To display all results:
"var c = db.testData.find()"
"while ( c.hasNext() ) printjson( c.next() )"
“db.testData.find().limit(3)” will limit the number of results.
"printjson( c [ 1 ] )" prints the second result, but be careful using array indexes because first all
cursors results are loaded into RAM. For very large sets mongo may run out of memory.
"db.testData.find({x:3})" will return the document where the x field has value of 3
Projections
Queries in MongoDB return all fields in all matching documents by default. To limit the amount
of data that MongoDB sends to applications, include a projection in the queries. By projecting
results with a subset of fields, applications reduce their network overhead and processing
requirements.
Indexing
Indexes provide high performance read operations for frequently used queries. Without
indexes, MongoDB must scan every document in a collection to select those documents
that match the query statement. These collection scans are inefficient and require the
mongod to process a large volume of data for each operation. Indexes are special data
structures that store a small portion of the collection’s data set in an easy to traverse form. The
index stores the value of a specific field or set of fields, ordered by the value of the field. Indexes
in MongoDB are similar to indexes in other database systems. If an appropriate index exists for a
query, MongoDB can use the index to limit the number of documents it must inspect.



Index types
MongoDB provides a number of different index types to support specific types of data and
queries: default _id, single field (user-defined), compound indexes (user-defined on
multiple fields), multikey index (array field), geospatial index (coordinates), text indexes
(for text search), hashed indexes (indexes the hash value of a field)
Replication
MongoDB provides high availability and increased throughput with replica sets. A replica set
consists of two or more copies of the data. Each replica may act in the role of primary or
secondary replica at any time. The primary replica performs all writes and reads by default.
Secondary replicas maintain a copy of the data on the primary using built-in replication. When a
primary replica fails, the replica set automatically conducts an election process to determine
which secondary should become the primary. Secondaries can also perform read operations, but
the data is eventually consistent by default.
Load balancing
MongoDB scales horizontally using sharding. The user chooses a shard key, which determines
how the data in a collection will be distributed. The data is split into ranges (based on the shard
key) and distributed across multiple shards. (A shard is a master with one or more slaves.)
MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the
system up and running in case of hardware failure. Automatic configuration is easy to deploy,
and new machines can be added to a running database.
File storage
MongoDB can be used as a file system, taking advantage of load balancing and data replication
features over multiple machines for storing files.
This function, called GridFS, is included with MongoDB drivers and available with no difficulty for
development languages. MongoDB exposes functions for file manipulation and content to
developers. In a multi-machine MongoDB system, files can be distributed and copied multiple
times between machines transparently, thus effectively creating a load-balanced and faulttolerant system.

Aggregation
MapReduce can be used for batch processing of data and aggregation operations. The
aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY
clause is used.
In this map-reduce operation, MongoDB applies the map phase to each input document. The
map function emits key-value pairs. For those keys that have multiple values, MongoDB applies
the reduce phase, which collects and condenses the aggregated data. MongoDB then stores the
results in a collection. Optionally, the output of the reduce function may pass through
a finalize function to further condense or process the results of the aggregation.