Download What is MongoDB?

What is NoSQL? Definition: “Next generation databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable… schema-free, easy replication support, simple API, eventually consistent, huge amount of data…” - nosql-database.org Non-relational: data items do not have a row of attributes, no tables with fixed number of columns or relationship between them. Distributed: not all storage devices are attached to a common processing unit. Open source: available to everyone to copy, modify, redistribute. Horizontally scalable: more nodes can be added to the system. Computer prices have dropped and performance has increased, so it is more convenient to have many low cost computers rather than just one, with high performance. Schema-free: NoSQL databases are built to allow the insertion of data without a predefined schema. That makes it easy to make significant application changes in real-time, without worrying about service interruption – which means development is faster, code integration is more reliable, and less database administration time is needed. Replication support: storing multiple copies of data across the cluster, and even across data centers, to ensure high availability and support disaster recovery. Simple API: simple application programming interface. Eventually consistent: if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. BASE semantics (Basically Available, Soft state, Eventual consistent) Huge amount of data: Big data NoSQL database types Document databases pair each key with a complex data structure known as a document. Documents may contain many key-value pairs, or key-array pairs, or even nested documents. (MongoDB, CouchDB) Graph stores are used to store information about networks, such as social connections. (Neo4J, HyperGraphDB) Key-value stores are the simples NoSQL databases. Every single item in the database is stored as an attribute name (or “key”), together with its value. (Riak, Voldemort, Redis) Wide-column stores are optimized for queries over large datasets, and store columns of data together, instead of rows. (Cassandra, HBase) What is MongoDB? MongoDB (from "humongous") is an open-source document database that provides high performance, high availability, and automatic scaling. - mongodb.org Key features High performance MongoDB provides high performance data persistence. In particular,   Support for embedded data models reduces I/O activity on database system Indexes support faster queries and can include keys form embedded documents and arrays High availability To provide high availability, MongoDB’s replication facility, called replica sets, provide:   Automatic failover Data redundancy Automatic scaling MongoDB provides horizontal scalability as part of its core functionality.   Automatic sharding distributes data across a cluster of machines Replica sets can provide eventually consistent-reads for low-latency high throughput deployments Who uses MongoDB? MongoDB is the most popular NoSQL database system according to DB-Engines ranking (db-engines.com/en/ranking) Running MongoDB MongoDB requires a data folder to store its files, default location is C:\data\db. To start MongoDB with Command Prompt: "C:\mongodb\bin\mongod.exe". Waiting for connection message indicates that mongod.exe is running successfully. To connect with MongoDB you have to open another Command Prompt and execute "C:\monogodb\bin\mongo.exe". The mongo.exe shell will connect to mongod.exe running on the localhost interface and port 27017 by default. You can set up MongoDB as a Windows Service so that the database will start automatically following each reboot cycle. Documents and Collections A document is the basic unit of data. Documents are stored on disk in BSON (binary JSON) serialization format. The advantages of using documents are: -Documents (i.e. objects) correspond to native data types in many programming language. -Embedded documents and arrays reduce need for expensive joins. -Dynamic schema supports fluent polymorphism. A collection is a group of documents (equivalent to a table in a RDBMS). A collection exists within a single database. MongoDB will create a collection upon its first use. You do not need to create a collection before inserting data. Because MongoDB uses dynamic schemas, you do not need to specify the structure of your documents before inserting them into the collection. MongoDB features   Querying MongoDB supports search by field, range queries, regular expression searches. Searches can return specific fields of documents and also include user defined JavaScript functions. The find() method returns a cursor to the results, but if the returned cursor is not assigned to a variable s automatically iterated up to 20 times to access up to the first 20 documents that match the query. Also using the “it” operation will show the remaining results. To display all results: "var c = db.testData.find()" "while ( c.hasNext() ) printjson( c.next() )" “db.testData.find().limit(3)” will limit the number of results. "printjson( c [ 1 ] )" prints the second result, but be careful using array indexes because first all cursors results are loaded into RAM. For very large sets mongo may run out of memory. "db.testData.find({x:3})" will return the document where the x field has value of 3 Projections Queries in MongoDB return all fields in all matching documents by default. To limit the amount of data that MongoDB sends to applications, include a projection in the queries. By projecting results with a subset of fields, applications reduce their network overhead and processing requirements. Indexing Indexes provide high performance read operations for frequently used queries. Without indexes, MongoDB must scan every document in a collection to select those documents that match the query statement. These collection scans are inefficient and require the mongod to process a large volume of data for each operation. Indexes are special data structures that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. Indexes in MongoDB are similar to indexes in other database systems. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.    Index types MongoDB provides a number of different index types to support specific types of data and queries: default _id, single field (user-defined), compound indexes (user-defined on multiple fields), multikey index (array field), geospatial index (coordinates), text indexes (for text search), hashed indexes (indexes the hash value of a field) Replication MongoDB provides high availability and increased throughput with replica sets. A replica set consists of two or more copies of the data. Each replica may act in the role of primary or secondary replica at any time. The primary replica performs all writes and reads by default. Secondary replicas maintain a copy of the data on the primary using built-in replication. When a primary replica fails, the replica set automatically conducts an election process to determine which secondary should become the primary. Secondaries can also perform read operations, but the data is eventually consistent by default. Load balancing MongoDB scales horizontally using sharding. The user chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.) MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure. Automatic configuration is easy to deploy, and new machines can be added to a running database. File storage MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files. This function, called GridFS, is included with MongoDB drivers and available with no difficulty for development languages. MongoDB exposes functions for file manipulation and content to developers. In a multi-machine MongoDB system, files can be distributed and copied multiple times between machines transparently, thus effectively creating a load-balanced and faulttolerant system.  Aggregation MapReduce can be used for batch processing of data and aggregation operations. The aggregation framework enables users to obtain the kind of results for which the SQL GROUP BY clause is used. In this map-reduce operation, MongoDB applies the map phase to each input document. The map function emits key-value pairs. For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data. MongoDB then stores the results in a collection. Optionally, the output of the reduce function may pass through a finalize function to further condense or process the results of the aggregation.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download What is MongoDB?