Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
Side 1 af 24 Document1 Indhold Ordbog annoteret Android.content.SharedPreferences - BSON cache-hit - collection.find consistent hashing - cooperating caches Data Definition Language (DDL) - findOne() Data Manipulation Language (DML) - Database query Database Management System - DOM events findOne() - JSON Kollektioner - Mobile and embedded DBMS Mongod - multicast and directory schemes Network congestion and server swamping - Persistence for client platforms The Plaxton/Rajaraman algorithm - replication strategy Shard - Sharding shared caching machine - SQLite swamped - views of witch machines are available Referencer uden annotering [Alvermann 2011]- [Copeland 2013] [Dede 2013]- [Imran m.fl.2003+] [Karger 1997]- [Nori 2007] [Padmanabhan 2008]- [Yimeng 2012] 1 2 3 5 6 6 7 8 10 11 13 15 17 18 20 21 22 22 23 24 Noter TODO: Revisit MongoDb documentation. Tying the concept of a page to documents in MongoDB, and promoting the use of semistructured datamodel in a database over calling the database schemaless. Page …A DBMS stores vast quantities of data, and the data must persist across program executions. Therefore, data is stored on external storage devices such as disks or tapes, and fetched into main memory as needed for processing. The unit of information read from or written to a disk is a page. The size of a page is a DBMS parameter, and typical values are 4Kb or 8Kb. ... …[Ramakrisnan & Gehrke 2003] p274 I think in the case of a key-value store like MongoDb the page size i increased to Mb. And further, the page-size and the concept of a document are similar. In a key-value store collections of document scan be hashed in a distributed system. Further when accesing a ducument, the whole document is read. If a document can be compared to page, then the technology con be explained. A collection is the a dierctory of documents . And the structure of a document itself forllows the BSON format [Bson xxxx]. Although there is no conceptual model there is still a format to conform to.. This is call schemaless data-definition. BSON and JSON [ecma 2013] are both semistructured datamodels [Garcia-Molina m.fl 2012]. The datamodel is described by the data itself. Attribute-value pairs are distributed throughout a document, and it is part of search queries to include named attributes in the resulset of a search. So instead of beeing schemaless i think it is more important to be semi-structured with limited indexing capabilities. [Bson xxxx] Homepage BSON, http://www.bsonspec.org. Creative commons. No-copyright. URL20151021: Document1 Side 2 af 24 Document1 http://www.bsonspec.org [ecma 2013] ECMA-404.The JSON Data Interchange Format. 1st Edition / October 2013 ECMA International URLPDF20151021: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf Homepage JSON. Introducing JSON. ECMA-404 The JSON Data Interchange Standard, http://json.org/ URL20151021: http://json.org/ [Garcia-Molina m.fl 2012] Hector Garcia-Molina, Jeffrey D. Ullman and Jenifer Widom. Database Systems. Selected Chapters. Compiled by Marcos Vaz Salles. University of Copenhagen. Pearson, 2014. ISBN 978-1-78399-319-2. [Ramakrisnan & Gehrke 2003] Raghu Ramakrishnan, Johannes Gerke. Database Management Systems. Third Edition. McGrawHill 2003 Ordbog annoteret EN: Glossary with citet text. As a learning tool text snippets a copied and referenced. Android.content.SharedPreferences - BSON Android.content.SharedPreferences … A developer can use the SDK to retrieve, add, delete and modify preferences either re- lated to the current activity or shared across multiple activities for the same user session. For both variants, a public interface is available, android.con-tent.SharedPreferences that contains two other nested classes, one as an editor for the current key-value pairs and another which acts as a listener for changing preferences and related callbacks. …[Fotache 2013] android.os.Environment … lmost the same file and directory related functions are used, to which dedicated locations as Music, Podcasts, Ringtones, Alarms, Pictures and others are predefined by the system, since API level 8 (all contained by the public class Environment - android.os.Environment). …[Fotache 2013] API Application programming interface. An application programming interface is a collection of library function in the host language that implements access to an application. For example JDBC is a database connectivity protocol, and drivers or API exists to a number of programming languages as libraries or class hierarchies (java, c#, javascript or php). Balancer …The balancer (page 28) is a background process that manages chunk migrations. The balancer can run from any of the query routers in a cluster. When the distribution of a sharded collection in a cluster is uneven, the balancer process migrates chunks from the shard that has the largest number of chunks to the shard with the least number of chunks until the collection balances. For example: if collection users has 100 chunks on shard 1 and 50 chunks on shard 2, the balancer will migrate chunks from shard 1 to shard 2 until the collection achieves balance. …[MongoDB 2015D] Document1 Side 3 af 24 Document1 BSON … [bee · sahn], short for Binary JSON, is a binary-encoded serialization of JSONlike documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type. BSON can be compared to binary interchange formats, like Protocol Buffers. BSON is more "schema-less" than Protocol Buffers, which can give it an advantage in flexibility but also a slight disadvantage in space efficiency (BSON has overhead for field names within the serialized data). BSON was designed to have the following three characteristics: 1. Lightweight. Keeping spatial over Database query head to a minimum is important for any data representation format, especially when used over the network. 2. Traversable. BSON is designed to be traversed easily. This is a vital property in its role as the primary data representation for MongoDB. 3. Efficient. Encoding data to BSON and decoding from BSON can be performed very quickly in most languages due to the use of C data types. … [Bson xxxx] cache-hit - collection.find cache. …A cache retains a copy of any page it obtains for some time. … [Karger 1997] cache-hit. Et cache-hit er en request som kan håndteres af cachen. Der laves ikke opslag på primær serveren(url i request). cache-miss. Et cache-miss er et cache request som en er forældet, erastttet med et andet reuest, eller endnu ikke requested. I begge tilfælde må requesten lave opslag / sende requesten videre til primær serveren(url i request). Minimize cache memory requirements. …The first is to minimize cache memory requirements. A protocol should work well without requiring any cache to store a large number of pages …[Karger 1997] Cache Resolver. Is a system of caches that uses hashing to determine the primary cache for each request. …In this paper we suggest an approach that does away with all inter-cache communication, yet allows caches to behave together in one coherent system. Cache Resolver, the distributed web caching system that we developed, eliminates intercache communication on a miss by letting clients decide for themselves which cache has the required data. Instead of contacting a primary cache that, on a miss, locates the desired resource in another cache, a user's browser directly contacts the one cache that should contain the required resource. Browsers make their decision with help of a hash function that maps resources (or URLs) to a dynamically changing set of available caches. … [Karger m.fl. 2015] Document1 Side 4 af 24 Document1 Caching …Caching has been employed to improve the efficiency and reliability of data delivery over the Internet. A nearby cache can serve a (cached) page quickly even if the originating server is swamped or the network path to it is congested. While this argument provides the self-interested user with the motivation to exploit caches, it is worth noting that using widespread use of caches also engenders a general good: if requests are intercepted by nearby caches, then fewer go to the source server, reducing load on the server and network traffic to the benefit of all users. …[Karger m.fl. 2015] candidate key ..A key is the minimal set of attributes whose values uniquely identify an entity in the set. There could be more than candidate one key; if so we designate one of them as the primary key...[Ramakrisnan & Gehrke 2003] Cassandra er er globalt distribueret database styresystem, som også implementere en begrænset relationel model; Cassandra er meget skalerbart; Cassandra er ’durable’, håndterer multiple server udfald på en daglig basis. [Lakshman & Malik 2010] Client-side Sharded SQL Server and MongoDB. …For our experiments, we created a SQL Server implementation (SQL-CS) that uses client-side hashing to determine the home node/shard for each record by modifying the client-side application that runs the YCSB benchmark. We implemented this client-side sharding so that we could compare MongoDB(-AS) with SQL Server in a cluster environment. We also took the client-side sharding code and implemented it on top of MongoDB. This implementation of client-side sharding on MongoDB is denoted as MongoDB-CS, allowing us to compare MongoDB-AS with MongoDB-CS (and SQL-CS). We note that both SQL-CS and Mongo-CS do not support some of the features that are supported by Mongo-AS. First, whereas Mongo-AS uses a form of range partitioning to distribute the records across the shards, the Mongo-CS and SQL-CS implementations both use hash partitioning. Another difference is that the Mongo-CS implementation does not use any of the routing (mongos), configuration (config db), and balancer processes that are part of Mongo-AS. As a result, load balancing cannot happen automatically as in Mongo-AS, where the auto- sharding mechanism aims to continually balance the load across all the nodes in the cluster. However, Mongo-CS makes use of the basic “mongod” process, which is responsible for processing the client’s requests. Finally, Mongo-CS and SQL-CS do not support automatic failover. We note that these features listed above were not the key subject of performance testing in the benchmark (YCSB) that we use in this paper. On the flip side, we also note that SQL Server has many features that are not supported in MongoDB. For example, MongoDB has a flexible data model that makes it far easier to deal with schema changes. MongoDB also supports read/write atomic operations on single data entities, whereas SQL Server provides full ACID semantics and multiple isolation levels. SQL Server also has better manageability and performance analysis tools (e.g. database tuning advisor). … .. [Floratou 2012] Document1 Side 5 af 24 Document1 clustered index …When a file is organized so that the ordering of data records is the same as or close to the ordering of data entries in some index, se say that the index is clustered; otherwise the index is an unclustered index...[Ramakrisnan & Gehrke 2003] collection.find() …An important functionality is the usage of cursors by assigning the result of a collection.find() call to a DBCursor object. Sequential operations can be performed using the obtained cursor object. In addition to the operations enumerated above, some administrative actions can be performed as creating indexes, dropping them and obtaining a list for the collections in the current database. ,,,[Fotache 2013]. consistent hashing - cooperating caches consistent hashing …We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network … [Karger 1997] Consistent hashing … consistent hashing assigns a set of items to buckets so that each bin receives roughly the same number of items. Unlike standard hashing schemes, a small change in the bucket set does not induce a total remapping of items to buckets. In addition, hashing items into slightly different sets of buckets gives only slightly different assignments of items to buckets. We apply consistent hashing to our tree-ofcaches scheme, and show how this makes the scheme work well even if each client is aware of only a constant fraction of all the caching machines. In [5] Litwin et al proposes a hash function that allows buckets to be added one at a time sequentially. However our hash function allows the buckets to be added in an arbitrary order. Another scheme that we can improve on is given by Devine [2]. In addition, we believe that consistent hashing will be useful in other applications (such as quorum systems [7] [8] or distributed name servers) where multiple machines with different views of the network must agree on a common storage location for an object without communication. [2] Robert Devine. Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm. In Proceedings of 4th International Con- ference on Foundations of Data Organizations and Algorithms, 1993. [5] Witold Litwin, Marie-Anne Neimat and Donovan A. Schneider. -A Scalable, Distributed Data Structure. ACM Transactions on Database Systems, Dec. 1996 [7] M. Naor and A. Wool. The load, capacity, and availability of quorum systems. In Proceedings of the 35th IEEE Symposium on Foundations of Computer Science, pages 214-225, November 1994. [8] D.PelegandA.Wool.Theavailabilityofquorumsystems.Information and Computation 123(2):210-233, 1995. [9] Greg Plaxton and Rajmohan Rajaraman. Fast Fault-Tolerant Concurrent Access to Shared Objects. In Proceedings of 37th IEEE Symposium on Foundations of Computer Science, 1996. Document1 Side 6 af 24 Document1 …[Karger 1997] Config servers. …Config servers store the cluster’s metadata. This data contains a mapping of the cluster’s data set to the shards. The query router uses this metadata to target operations to specific shards. Production sharded clusters have exactly 3 config servers. …[MongoDB 2015D] cooperating caches …To achieve fault tolerance, scalability, and aggregation of larger numbers of requests (which improves hit rates) several groups [2,7,5,4] have proposed using systems of several cooperating caches. These systems all share certain common properties. Every client selects one primary cache in the system. A request from the client goes to its primary cache. If the primary cache misses, instead of going directly to the content server, it tries to locate the requested resource in other cooperating caches. If it succeeds, a slow fetch of the resource from the content server is replaced by a fetch of the resource from a (presumably closer) cooperating cache. Thus, the other cooperating caches serve as a ``second level cache'' to reduce the cost of misses in the primary cache. … [Karger m.fl. 2015] Data Definition Language (DDL) - findOne() Data Definition Language (DDL) [Fotache 2013], Definitionssprog. Oprettelse af tabeller, view, fremmednøgler, indeks. Data layer in mobile applications. [Fotache 2013]. Data Manipulation Language (DML) - Database query Data Manipulation Language (DML) [Fotache 2013]. Forespørgsemssprog mod databasen – SQL (SELECT, INSERT, UPDATE, DELETE). Data warehouse . .One popular approach to information integratione is th creation of data warehouses, where information from many legacy databases is copied periodically, with the appropriate translation, to a central database. … [Garcia-Molina m.fl 2012] Database En database er en samling af data som eksisterer i længere tid. [Garcia-Molina m.fl 2012] Databasesystem En database er en samling af data over længere tid som administreres af et databasesystem. [Garcia-Molina m.fl 2012] Et (relationelt) databasessystem har følgende egenskaber: Document1 Side 7 af 24 Document1 Brugere kan oprette ny databaser og specificere tilhørende schemaer (logisk struktur for data), ved hjælp af et specialiseret data-definitions sprog (DDL). (relationel) Brugere skal kunne forespørge på data (EN:query), og ændre data med et sprog kaldet ’query language’ or ’data manipulation language’ (DML) (relationel). Skal kunne opbevare store mængder data (terrabytes) over længere tid. Systemet skal kunne komme op igen i en række fejlsituationer eller hvis systemet tages ned (durable). Kun håndtere forspørgsler og ændringer fra mange brugere samtidigt således at rettelser køres som havde man hele systemet (Isolation). Det har noget med rettigheder at gøre transaktioner. Atomicity er det at transaktioner kører som en enhed og der kan rulles tilbage eller committes. (relationel ). ..[Garcia-Molina m.fl 2012] Database query[Fotache 2013]. Forespørgsel mod databasen – SQL (SELECT, INSERT, UPDATE, DELETE). database schema, Database Management System - DOM events Database Management System [TODO] Database service in the cloud. … Some limitations might appear when connecting to a database service in the cloud. A very important functionality covered by the Java MongoDB driver is the ability to access the aggregation framework (see section 6). The wrapper around this functionality provided by the driver is DBCollection.aggregate(). Generic DBObject instances can be used by the programmer to create the pipeline of operations needed in aggregation processes. Finally, and AggregationOutput object can be obtained after calling the aggregate method over a certain collection. … [Fotache 2013]. minimize the delay …A second objective is, naturally, to minimize the delay a browser experiences in obtaining a page.…[Karger 1997] DOM manipulation methods . .. Javasscript provides a large API for dealing with these DOM structures, in terms of both parsing and manipulating the document. This is one of the primary ways to accomplish the smaller, piece-by-piece changes to a web page that we see in an AJAX application[4]… in [Eernisse 2006] DOM events. … The other important function of the DOM is that it provides a standard means for Javascript to attach events to elements on a web page. This makes possible much richer user interfaces, because it allows you to give users opportunities to interact with the web-page beyond simple links and form elements. A great example of this is drag-and-drop functionality, which lets users drag pieces of the page around the screen, and drop them into place to trigger specific pieces of Document1 Side 8 af 24 Document1 functionality. This kind of feature used to exist only in desktop applications, but now it works just as well in the browser, thanks to the DOM.. … [Eernisse 2006] findOne() - JSON findOne() …The driver also allows the developer to use authentication by username and password. After the connection sequence is performed, collections (all, or a specific one) can be re- trieved, documents can be inserted, simple find operations can be performed by calling findOne() method. .. [Fotache 2013]. Harvest Cache. …Chankhunthod et al. [1] developed the Harvest Cache, a more scalable approach using a tree of caches. A user obtains a page by asking a nearby leaf cache. If neither this cache nor its siblings have the page, the request is forwarded to the cache's parent. If a page is stored by no cache in the tree, the request eventually reaches the root and is forwarded to the home site of the page. cache retains a copy of any page it obtains for some time. The advantage of a cache tree is that a cache receives page requests only from its children (and siblings), ensuring that not too many requests arrive simultaneously. Thus, many requests for a page in a short period of time will only cause one request to the home server of the page, and won' t overload the caches either. A disadvantage, at least in theory, is that the same tree is used for all pages, meaning that the root receives at least one request for every distinct page requested of the entire cache tree. This can swamp the root if the number of distinct page requests grows too large, meaning that this scheme also suffers from potential scaling problems. [1] Anawat Chankhunthod, Peter Danzig, Chuck Neerdaels, Michael Schwartz and Kurt Worrell. A Hierarchical Internet Object Cache. In USENIX Proceedings, 1996. …[Karger 1997] Hash based partitioning. …For hash based partitioning, MongoDB computes a hash of a field’s value, and then uses these hashes to create chunks. With hash based partitioning, two documents with “close” shard key values are unlikely to be part of the same chunk. This ensures a more random distribution of a collection in the cluster …[MongoDB 2015D] Hive …Hive [3] is an open-source data warehouse built on top of Hadoop [2]. It provides a structured data model for data that is stored in the Hadoop Distributed Filesystem (HDFS), and a SQL-like declarative query language called HiveQL. Hive converts HiveQL queries to a directed acyclic graph of MapReduce jobs, and thus saves the user from having to write the more complex MapReduce jobs directly. Document1 Side 9 af 24 Document1 Data organization in Hive is similar to that found in relational databases. Starting from a coarser granularity, data is stored in databases, tables, partitions and buckets. More details about the data layout in Hive are provided in Section 3.3.2. Finally, Hive has support for multiple data storage formats including text files, sequence files, and RCFiles [17]. Users can also create custom storage formats as well as serializers/deserializers, and plug them into the system. [2] Hadoop. http://hadoop.apache.org/ [3] Hive. http://hive.apache.org/ [17] Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, and Z. Xu. RCFile: A Fast and Spaceefficient Data Placement Structure in MapReduce-based Warehouse Systems. In ICDE, pages 11991208, 2011. … .. [Floratou 2012] home server of the page. . DA: Inden sidens server tilgåes. Sidens server. Hotspot Et hotspot på nettet er når et stort antal forespøgsler går mod samme system. Hvis et stort antal klienter samtidig tilgår et website opstår der et hotspot. For at håndtere et stigende antal request kan request-ene fordeles på et antal servere. The “hot spot problem”. …The “hot spot problem” is to satisfy all browser page requests while ensuring that with high probability no cache or server is swamped. The phrase “with high probability” means “with probability at least 1-1/N”, where Nis a confidence parameter used throughout the paper. …[Karger 1997] The inbox search problem ... Inbox Search is a feature that enables users to search through their Facebook Inbox. At Facebook this meant the system was required to handle a very high write throughput, billions of writes per day, and also scale with the number of users. Since users are served from data centers that are geographically distributed, being able to replicate data across data centers was key to keep search latencies down. Inbox Search was launched in June of 2008 for around 100 million users and today we are at over 250 million users and Cassandra has kept up the promise so far. …[Lakshman & Malik 2010] Information integration …Joining the information contained in many related databases into a whole…[Garcia-Molina m.fl 2012] IP Multicast …Malpani et al. [6] work around this problem by making a group of caches function as one. A user's request for a page is directed to an arbitrary cache. If the page is stored there, it is returned to the user. Otherwise, the cache forwards the request to all other caches via a special protocol called “IP Multicast”. If the page is cached nowhere, the request is forwarded to the home site of the page. The disadvantage of this technique is that as the number of participating caches grows, even with the use of multicast, the number of messages between caches can become unmanageable. [6] RadhikaMalpani,JacobLorchandDavidBerger.MakingWorldWide Web Caching Servers Cooperate. In Proceedings of World Wide Web Document1 Side 10 af 24 Document1 Conference, 1996. … …[Karger 1997] JSON …JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. JSON is built on two structures: A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures…. [ecma 2013] Kollektioner - Mobile and embedded DBMS key ..A key is the minimal set of attributes whose values uniquely identify an entity in the set. There could be more than candidate one key; if so we designate one of them as the primary key...[Ramakrisnan & Gehrke 2003] Kollektioner har dokumenter. Kollektioner er en stak af dokumenter. Layers of the enterprise Mobile Applications Development Framework [Unhelkar 2010] in [Fotache 2013]. Det er sikkerhedslaget som dækker alle lag konseptuelt, der er det 6. lag, som ikke synes på en tegning med 5 lag. Mobile teknologistak: Preæsentationslag (App); Application (Applicationsserver); Middleware binding; Information/database; Comminications/TCP. I forhold til webteknologi Begrænsede resource på en mobile device Kan kræve replikering af data til client persistence layer Kan kræve synkronisering af data mellem client- og server persistence layer. “load” property. …Similarly, over all the client views, the number of distinct objects assigned to a particular cache is small. We call this property “load”. …[Karger 1997] Document1 Side 11 af 24 Document1 Locality (Cashing) Locality. Regardless of caching scheme, user latency is greatly influenced by the proximity of the cache servers. Our system ensures that users are always served by the caches in their physically local regions. Our caches are split among geographical regions and the users are served only from the caches in their region. We place the knowledge of determining the user's geographical region inside the JavaScript function in the users' browsers. The JavaScript function is customizable: when users download it, they are given a choice of regions. The virtual names generated by the JavaScript function take on the following form: A456.ProxyCache3.com, where '456' is the hash of the URL and '3' represents the variable geographical region. We then split our DNS system into a two-layer hierarchy. The top layer DNS servers resolve for the part of the name that contains geographical information and direct the user's DNS resolver to a set of bottom layer DNS servers that correspond to the user's geographical region. The bottom layer DNS servers are placed physically in a specific geographical region of caches and they resolve virtual names in terms of IP addresses of only those caches. [Karger m.fl. 2015] MANET A Mobile Ad-hoc Network (MANET) is a collection of autonomous wireless nodes that may move unpredictably, forming a temporary network without any fixed backbone infrastructure ([Gruenwald, 03], [IETF, 03]) in [Padmanabhan 2008]. Mobile and embedded DBMS. 11 Characteristics by [Nori 2007] in [Fotache 2013] a. Embeddable in applications – sometimes requiring no administration. b. Small footprint – in order to be down- loadable in broader range of mobile devices c. Run on mobile devices and operates in conditions of small amount of processor power, RAM and permanent memory. d. Componentized DBMS – supports from all DBMS functions only the ones required by the specific application. e. Self-managed DBMS, with no hope for the user to bea DBA who can be able to restore a crashed database. f. In-memory DBMS – requiring special query processing and indexing technics which are optimized for main memory usage. g. Portable databases with very simple deployment. h. No code in the database that protects against viruses and malware. i. Synchronize with back-end data sources. j. Remote management – especially in the case of enterprise-wide applications. k. Custom programming interfaces for specialized data-centric applications. Mongod - multicast and directory schemes Mongod er mongodb daemon MongoDb klassificeres som et document store[] MongoDB er skemaløs. Man definerer ikke et skema i databasemæssig forstand. Man definerer en collection. Man navngiver kollectioner enten implecit ver indsættelse at første stykke information [Alvermann 2011] eller eksplicit, ved oprettelse Document1 Side 12 af 24 Document1 efterfølgende. Kollektionen er en del at objektnavnet, vis metoder benyttes til at tilgå dokumentet. Selve dataformatet for et dokument er BSON [Alvermann 2011] [BSON xxxx]. Binær JSON. JSON er en semistruktureret datamodel. Modellen en selvbeskrivende og data angives i key-value pairs. BSON format betyder stadig at det er læsbar tekst. Der ikke dokumentet der er binært, det er repræsentationen. Mongo DB er programmeret i C [Cho10] und [Bson, her [BSON xxxx]] I [Alvermann 2011]. I MongoDB får hvert dokument automatisk en primærnøgle [Alvermann 2011]. Primærnøglen kan tilpasses [Alvermann 2011]. {} er objekter og [..] er arrays -----[ecma 2013]. MongoDB. MongoDB er et eksempel på et NoSQL datastore [Fotache 2013]. Andre gode navne for et NoSQL datastore er key-value store. Her lagres data ’clustered’ sammen på et dokument, som for MongoDB’s vedkommende kan distribueres vandret mellem MongoDB noder(med en hash nøgle – sharding-key) [Alvermann 2011; Chodorow 2011]. På den måde implemeneterer MongoDB en distribueret database. MongoDB as persistence layer in the cloud. …In order to better illustrate the above features, a popular service for MongoDB database in the cloud will be used as persistence layer for a demonstrative Android application. The service is provided by MongoLab and it offers physical storage in different data centers like Amazon, Joyent or Microsoft Azure. At the time of writing of this paper, the available storage options for shared plans are: 0.5, 1, 2 and 4 GB. The corresponding prices are be- tween 10 to 40 USD each month (except the 0.5 GB instance which is free). The customer can also buy dedicated plans (details and prices varies from one storage provider to another. For Amazon hosted MongoDB databases in the cloud, available RAM re- sources start from 1.7 GB and go up to 68 GB, available processors core are between 1 and 8 and the user can choose one or more dedicated nodes. Default storage for each unit type varies from 40 to 160 GB and can be easily extended or moved to SSD disks. Just as a plan example, for a 34 GB of RAM, 2 dedicated nodes, 4 processor cores, 80 GB storage capacity (on SSD disks) costs about $3000 each month. Of course, the storage capacity doesn’t come cheap, but includes additional services as MongoDB monitoring service activated, real-time access to created log files, 24/7 DBA assistance from 10Gen (creator of MongoDB), replica sets and dedicated virtual machines. …[Fotache 2013]. MongoDB … Mongo is a high performance and very scalable document-oriented database developed in C++ that stores data in BSON format, a dynamic schema document structured like JSON…[Truica et.al. 2013] MongoDB … MongoDB [7] is a popular open-source NoSQL database. Some of its features are a document-oriented storage layer, indexing in the form of B-trees, autosharding and asynchronous replication of data between servers. In MongoDB data is stored in collections and each collection contains documents. Collections and documents are loosely analogous to tables and records, respectively, found in relational databases. Each document is serialized using BSON. MongoDB does not require a rigid schema for the documents. Specifically, documents in the same collection can have different structures. Document1 Side 13 af 24 Document1 Another important feature of MongoDB is its support for auto-sharding. With sharding, data is partitioned amongst multiple nodes in an order-preserving manner. Sharding is similar to the horizontal partitioning technique that is used in parallel database systems. This feature enables horizontal scaling across multiple nodes. When some nodes contain a disproportionate amount of data compared to the other nodes in the cluster, MongoDB redistributes the data automatically so that the load is equally distributed across the nodes/shards. Finally, MongoDB supports failover via replica sets, which is its mechanism for implementing asynchronous master/slave replication. A replica set consists of two or more nodes that are copies of each other. More information about the semantics of replica sets can be found in [8]. In the following sections, we use the name Mongo-AS (MongoDB with auto-sharding) when referring to the original MongoDB implementation. [7] MongoDB. http://www.mongodb.org/ [8] MongoDB – Replica Sets. http://www.mongodb.org/display/DOCS/Replica+Sets … .. [Floratou 2012] MongoDB driver is thread safe …As stated in the official documentation, the java MongoDB driver is thread safe and it is rec- ommended to create a single object which can be accessed by multiple threads. Internal- ly, this object creates a pool of connections, and for each operation it is able to find an available connection, use it and release the resources after it is done. The developer can enforce a certain consistent behaviour (usage of the same socket by the client) by calling the two specially designed functions, db.requestStart() and db.RequestDone(). …[Fotache 2013]. Mongos. En Mogos er en process som som kører foran dit cluster, som implementerer en mongod-server for alle som logger sig på – os transparent overfor clusterimplementering (Transparant betyder at man logger sig på en process kaldet mongos, som implementerer een server, som er en implementering af en clusterserver (clusterserver er en server der styrer andre servere) som var det én server) . Under overfladen sender den requests /meldinger (router meldinger) til de respektive server og samler response meldinger efterfølgende. [Chodorow 2011] Mongos instances. See Query Routers. multicast and directory schemes …An important issue in many caching systems is how to decide what is cached where at any given time. Solutions have included multicast queries and directory schemes. … Consistent hashing provides an alternative to multicast and directory schemes, and has several other advantages in load balancing and fault tolerance. … [Karger m.fl. 2015] Network congestion and server swamping - Persistence for client platforms Document1 Side 14 af 24 Document1 Network congestion and server swamping …Two main of these delays and failures are congested networks and swamped servers. Date travels slowly through congested networks. Swamped servers (facing more simultaneous requests than their resources can support) will either refuse to serve certain requests or will serve them very slowly. Network congestion and server swamping are common because network and server infrastructure expansions has not kept pace with the tremendous growth in Internet use. … [Karger m.fl. 2015] NoSQL. Term som benyttes for ikke relationelle databaser[Wei-ping 2011]. Betyder vel egentlig at man ikke kan benytte SQL til at tilgå basen. Basen er i den sammenhæng det man så kalder ’Persistant datastore’, der hvor data lagres permanent. MongoDB anses for en NoSql database [Fotache 2013]. NoSQL datastore. MongoDb er et eksempel på et NoSQL datastore [Fotache 2013]. Andre gode navne for et NoSQL datastore er key-value store. Her lagres data ’clustered’ sammen på et dokument, som for MongoDB’s vedkommende kan distribueres vandret mellem MongoDB noder(med en hash nøgle – sharding-key) [Alvermann 2011; Chodorow 2011]. På den måde implemenetere MongoDB en distribueret database. Not only SQL. .. recently databases known as nosql systems have been increasingly used for data not well suited for relational detabases [Lawrence 2014]. Streaming and soicial mediea data for example. Page …A DBMS stores vast quantities of data, and the data must persist across program executions. Therefore, data is stored on external storage devices such as disks or tapes, and fetched into main memory as needed for processing. The unit of information read from or written to a disk is a page. The size of a page is a DBMS parameter, and typical values are 4Kb or 8Kb. ... …[Ramakrisnan & Gehrke 2003] p274 I think in the case of a key-value store like MongoDb. The page size i increaset to Mb. And further the page-size and the concept of a ducument are similar. In a key-value store collections of ducment scan be hashed in a distributed system. Further when accesing a ducument, the whole document is read. …The basic abstraction of data in a DBMS is a collection of records, or a file, and each file consists of one or more pages. The files and access methods software layer organizes data carefully to support fast access to desired subsets of records. Understanding how these records are organied is essential to using a database system efficiently …[Ramakrisnan & Gehrke 2003] p273. ...each node in this fugure is a physical page, and retrieving a node involves a disk I/O. … [Ramakrisnan & Gehrke 2003] p280 about tree-based indexing. Document1 Side 15 af 24 Document1 Parallel Data Warehouse (PDW) …SQL Server PDW [6] is a classic shared-nothing parallel database system from Microsoft that is built on top of SQL Server. PDW consists of multiple compute nodes, a single control node and other administrative service nodes. Each compute node is a separate server running SQL Server. The data is horizontally partitioned across the compute nodes. The control node is responsible for handling the user query and generating an optimized plan of parallel operations. The control node distributes the parallel operations to the compute nodes where the actual data resides. A special module running on each compute node called the Data Movement Service (DMS) is responsible for shuffling data between compute nodes as necessary to execute relational operations in parallel. When the compute nodes are finished, the control node handles post-processing and re-integration of results sets for delivery back to the users. [6] Microsoft SQL Server 2008 R2 Parallel Data Warehouse. http://www.microsoft.com/sqlserver/en/us/solutions- technologies/data-warehousing/pdw.aspx … .. [Floratou m.fl. 2012] Persistent data layer [Fotache m.fl 2013]. Et Persistent datastore er et persistent data layer. Et lag er en abstract beskrivelse for et teknologilag. Hvilke lag I teknologistakken er det persistente/faste/durable teknologilag for data. Man kan tale om et server persistence layer og et client-persistence layer. Client persistence layer kan være til web, som en cache eller tilsvarende på en mobile device. Many mobile devices require replication and synchronization with centralized larger servers. Persistent datastore. En SQL datastore og en NoSQL datastore er eksempler på persistent datastore. Et persitent datastore bevarer til tilstand hvis systemet tages ned. (system down). Et Persistent datastore er et persistent data layer. Persistence for client platforms[Fotache 2013]. Det på klientsiden og her tales apps, men med ’light-weight-sql-datastores eller med en NoSQL-sikkert-også-light-weigt repræsentation med ’f.eks MongoDb’ som lagring af semistrukturerede datamodeler i dokumenter. Formaliseret udnyttelse af en cache i webtermer. The Plaxton/Rajaraman algorithm - replication strategy The Plaxton/Rajaraman algorithm. Plaxton and Rajaraman [9] show how to balance the load among all caches by using randomization and hashing. In partic- ular, they use a hierarchy of progressively larger sets of “virtual cache sites” for each page and use a random hash function to as- sign responsibility for each virtual site to an actual cache in the network. Clients send a request to a random element in each set in the hierarchy. Caches assigned to a given set copy the page to some members of the next, larger set when they discover that their load is too heavy. This gives fast responses even for popular pages, be- cause the largest set that has the page is not overloaded. It also gives good load balancing, because a machine in a small (thus loaded) set for one page is likely to be in a large (thus unloaded) set for another. Plaxton and Rajaraman's technique is also fault tolerant. Document1 Side 16 af 24 Document1 The Plaxton/Rajaraman algorithm has drawbacks, however. For example, since their algorithm sends a copy of each page request to a random element in every set, the small sets for a popular page are guaranteed to be swamped. In fact, the algorithm uses swamp- ing as a feature since swamping is used to trigger replication. This works well in their model of a synchronous parallel system, where a swamped processor is assumed to receive a subset of the incom- ing messages, but otherwise continues to function normally. On the Internet, however, swamping has much more serious consequences. Swamped machines cannot be relied upon to recover quickly and may even crash. Moreover, the intentional swamping of large num- bers of random machines could well be viewed unfavorably by the owners of those machines. The Plaxton/Rajaraman algorithm also requires that all communications be synchronous and/or that mes- sages have priorities, and that the set of caches available be fixed and known to all users. [9] Greg Plaxton and Rajmohan Rajaraman. Fast Fault-Tolerant Concurrent Access to Shared Objects. In Proceedings of 37th IEEE Symposium on Foundations of Computer Science, 1996. …[Karger 1997] Primary cache miss. See cooperating caches. Every request goes to a primary cache server. If the primary cache misses, instead of going directly to the content server, it tries to locate the requested resource in other cooperating caches. …The systems (systems of several cooperating caches) differ in precisely how data is located in the case of a primary cache miss. Some schemes broadcast a query to all other caches using multicast [7] or UDP broadcasts [2]. Besides consuming excess bandwidth with broadcast queries, the primary cache must wait for all cooperating caches to report misses before it contacts the content server; this can slow down performance on second-level cache misses. Other schemes use directories, either centralized [5] or repeatedly broadcast to support local queries [4]. Directory queries or transmissions also consume bandwidth, and centralized directories can become new points of failure in the system [5]. … [Karger m.fl. 2015] primary key ..A key is the minimal set of attributes whose values uniquely identify an entity in the set. There could be more than candidate one key; if so we designate one of them as the primary key...[Ramakrisnan & Gehrke 2003] Query Routers (mongos instances). Query Routers, or mongos instances, interface with client applications and direct operations to the appropriate shard or shards. The query router processes and targets operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Most sharded clusters have many query routers. [MongoDB 2015D] Document1 Side 17 af 24 Document1 random cache trees …Our first tool, random cache trees, … we use a tree of caches[Chankhood xxx] to coalesce requests. Like Plaxton and Rajaraman, we balance load by using a different tree for each page and assigning tree nodes to caches via a random hash function[Placton xxxx]. …we prevent any server from becoming swamped with high probability, a property not possessed by either Chankhunthod et al. or Plaxton/Rajaraman. In addition, our protocol shows how to minimize memory requirements (without significantly increasing cache miss rates) by only caching pages that have been requested a sufficient number of times. …[Karger 1997] Range-based sharding. …For range-based sharding, MongoDB divides the data set into ranges determined by the shard key values to provide range based partitioning. Consider a numeric shard key: If you visualize a number line that goes from negative infinity to positive infinity, each value of the shard key falls at some point on that line. MongoDB partitions this line into smaller, non-overlapping ranges called chunks where a chunk is range of values from some minimum value to some maximum value. Given a range based partitioning system, documents with “close” shard key values are likely to be in the same chunk, and therefore on the same shard. …[MongoDB 2015D] Ranged hash function. A ranged hash function maps to a bucket. …A ranged hash family is a family of ranged hash functions. A random ranged hash function is a function drawn at random from a particular ranged hash family. . …[Karger 1997] replication strategy to store copies of hot pages throughout the Internet ..Several approaches to overcoming the hot spots have been proposed. Most use some kind of replication strategy to store copies of hot pages throughout the Internet; this spreads the work of serving a hot page across several servers. In one approach, already in wide use, several clients share a proxy cache. All user requests are forwarded through the proxy, which tries to keep copies of frequently requested pages. It tries to satisfy requests with a cached copy; fail- ing this, it forwards the request to the home server. The dilemma in this scheme is that there is more benefit if more users share the same cache, but then the cache itself is liable to get swamped. .. …[Karger 1997] Shard - Sharding Shard. …Shards store the data. To provide high availability and data consistency, in a production sharded cluster, each shard is a replica set 1 . For more information on replica sets, see Replica Sets. …[MongoDB 2015D], [MongoDB 2015C] Shard key. …MongoDB distributes data, or shards, at the collection level. Sharding partitions a collection’s data by the shard key. To shard a collection, you need to select a shard key. A shard key is either an indexed field or an indexed compound field that exists in every document in the collection. MongoDB divides the shard key values into chunks and distributes the chunks evenly across the shards. To divide the shard key values into chunks, MongoDB uses either range based partitioning or hash based Document1 Side 18 af 24 Document1 partitioning. See the Shard Key (page 18) documentation for more information. …[MongoDB 2015D] Shard Key. Ved konfiguration kan auto-sharding slås til og så kan collectioner shardes over flere maskiner. Dokumentnøglen som gives hvert dokument som primærnøgle, kan ændres og konfigureres efter befor. Shard key er den definerede primærnøgle, som anvendes til at sharde/fordele de enkelte dokumenter mellem de konfigurerede maskiner [Alvermann 2011]. Sharded cluster about mongostat: …Discovers and reports on statistics from all members of a replica set or sharded cluster. When connected to any member of a replica set, --discover all non-hidden members of the replica set. When connected to a mongos, mongostat will return data from all shards in the cluster. If a replica set provides a shard in the sharded cluster, mongostat will report on non-hidden members of that replica set. …[MongoDB 2015] Sharding Ifølge [Chodorow 2011] er sharding den partitioneringsmetode Mongo DB implementerer til at automatisk partitionere data mellem nye maskiner og modsat samle /coalesce. Sharding. MongoDB er født med sharding. Slåes auto-sharding til, bliver en database automatisk fordelt over et antal maskiner. Det kaldes horisontal partitionering. De enkelt dokumenter kopieres til forskellige maskiner. For at konfigurere sharding til MongoDB er der 3 komponenter: En shard. Er en maskine der holder en delmængde af data. En mongos. En slags router der kan route forespørgslerne til de enkelte shards.Og sammensætte svar til et samlet svar. En konfigurationsserver er nødvendig for mongos… En konfigurationsserver. Her ligger oplysningerne om de enkelte shards. [Alvermann 2011]. shared caching machine - SQLite shared caching machine …The most obvious approach, of providing a group of users with a single, shared caching machine, has several drawbacks. If the caching machine fails, all users are cut off from the Web. Even while running, a single cache is limited in the number of users it can serve, and may become a bottleneck during periods of intense use. Finally, two important limits arise on the hit rate a single cache can achieve. First, since the amount of storage available is limited, the cache will suffer ``false misses'' when requests are repeated for objects which it was forced to evict for lack of space. Second, the limit on the number of users that the cache can serve works against the desire to aggregate requests from as many users as possible for caching purposes: typically, the more user requests are aggregated together, the better Document1 Side 19 af 24 Document1 the hit rate becomes as one user requests objects already requested by other users. …[Karger m.fl. 2015] “smoothness” property. When a machine is added to or removed from the set of caches, the expected fraction of objects that must be moved to a new cache is the minimum needed to maintain a balanced load across the caches. …[Karger 1997] Software options. [Fotache 2013]. Teknologier. … Mahmoud et al. [Mahmoud 2012] in [Fotache 2013] point out the most common software options for storage in mobile applications: HTML5 (localStorage API which stores objects as key-value pairs and IndexDB which implements relational technology); SQLLite – a over-simplified relational database server; Cloud storage (Apple iCloud, Dropbox, Google Drive, etc.); Device specific storage (APIs, tools, frameworks such as WebWorks, Shared Preferences, Network IO, WebView, Core Data. .. Splitting ..Splitting is a background process that keeps chunks from growing too large. When a chunk grows beyond a specified chunk size (page 32), MongoDB splits the chunk in half. Inserts and updates triggers splits. Splits are an efficient meta-data change. To create splits, MongoDB does not migrate any data or affect the shards. … [MongoDB 2015D] “spread” property. …Second, over all the client views, the total number of different caches to which a object is assigned is small. We call this property “spread”. …[Karger 1997] SQL datastore. Postgres er et eksempel på et SQL datastore[Fotache 2013]. Et SQL datastore kaldes for en relationel database. Andre vigtige eksempler på relationelle databaser er Oracle, DB2 og MySQL. En relationel database implementerer et fortolket sprog, SQL, som følger den relationelle algebra. I modsætningen til den relationelle algebra som beskriver en algebra med unikke tupler, kan databaser have dublerede tubler og SQL implementerer da det som kaldes for ’bag-theory’, at resultatsæt fra SQL forespørgsler kan have dublerede tupler. SQLite … Among other data layer options, the most popular SQLite, advertised by its producers as a software library that implements a self- contained, serverless, zeroconfiguration, transactional SQL database engine. In fact, SQLite is present on a variety of devices and operation systems, precompiled packages can be downloaded for Linux, Mac OS, Win- dows, Windows Phone 8, Windows .Net platform [7], [5] [3]. Its popularity can be ex- plained by the following features [3]: easy to install and configure ; Document1 Side 20 af 24 Document1 simplicity ; does not require a server (can run in cli ent-only mode) ; compactness of database (all database resides in a single file for each application); it is an open source product. … The small footprint library can handle both DML and DDL statements, of- fers cursors access to clients, indexes, multi- ple data types, primary and foreign keys and a lot more features, which makes it very easy to use and a good candidate for being used as a persistence layer. …[Fotache 2013] . swamped - views of witch machines are available . swamped Many of us have experienced the hot spot phenomenon in the context of the Web. A Web site can suddenly become extremely popular and receive far more requests in a relatively short time than it was originally configured to handle. In fact, a site may receive so many requests that it becomes “swamped,” which typically renders it unusable. Besides making the one site inaccessible, heavy traffic destined to one location can congest the network near it, interfering with traffic at nearby sites. …[Karger 1997] . tag aware sharding. …MongoDB allows administrators to direct the balancing policy using tag aware sharding. Administrators create and associate tags with ranges of the shard key, and then assign those tags to the shards. Then, the balancer migrates tagged data to the appropriate shards and ensures that the cluster always enforces the distribution of data that the tags describe. Tags are the primary mechanism to control the behavior of the balancer and the distribution of chunks in a cluster. Most commonly, tag aware sharding serves to improve the locality of data for sharded clusters that span multiple data centers. See Tag Aware Sharding (page 26) for more information… [MongoDB 2015D] TCP and UDP port numbers...Originally, port numbers were used by the Network Control Program (NCP) in the ARPANET for which two ports were required for halfduplex transmission. Later, the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP) needed only one port for full-duplex, bidirectional traffic. The even-numbered ports were not used, and this resulted in some even numbers in the well-known port number range being unassigned. The Stream Control Transmission Protocol (SCTP) and the Datagram Congestion Control Protocol (DCCP) also use port numbers. They usually use port numbers that match the services of the corresponding TCP or UDP implementation, if they exist. The Internet Assigned Numbers Authority (IANA) is responsible for maintaining the official assignments of port numbers for specific uses.[1] However, many unofficial uses of both well-known and registered port numbers occur in practice... [Imran m.fl.2003+] Document1 Side 21 af 24 Document1 User Datagram Protocol, UDP …User Datagram Protocol, UDP er en protokol til overførsel af data. UDP er en del af Internet-protokolstakken, som oftest benævnes TCP/IP. I protokolstakken anvendes enten TCPeller UDP. UDP giver ingen garanti for at data kommer frem (eller rettere: Afsenderen får ikke besked hvis data ikke kommer frem, ligeledes får afsender ikke besked hvorvidt data er modtaget). UDP tilhører TCP/IP protokol stakkens 4. lag. Derfor er der tilført "pakke-headeren" et lag yderligere. I forhold til IP er der tilføjet et portnummer. Dette portnummer bruges til "demultiplexing" (engelsk) af data, for at sørge for, at det rigtige data bliver leveret til den rigtige process på computeren. Desuden yder UDP en checksum service, der garanterer at indholdet af en DatagramPacket (pakke) er intakt. Bevæger man sig et lag længere ned i protokol stakken, møder man netværk-lagets; IP-protokol, IKKE garanteret levering- eller modtagelse af pakker. …[Rasser m.fl. 2004] ultrametric …The ultrametric is a natural model of Internet distances, since it essentially captures the hierarchical nature of the Internet topology, under which, for example, all machines in a given university are equidistant, but all of them are farther away from another univer- sity, and still farther from another continent. The logical point-to- point connectivity is established atop a physical network, and it is generally the case that the latency between two sites is determined by the “highest level” physical communication link that must be traversed on the path between them. Indeed, another definition of an ultrametric is as a hierarchical clustering of points. The distance in the ultrametric between two points is completely determined by the smallest cluster containing both of the points. …[Karger 1997] unclustered index …When a file is organized so that the ordering of data records is the same as or close to the ordering of data entries in some index, se say that the index is clustered; otherwise the index is an unclustered index...[Ramakrisnan & Gehrke 2003] View. …We define a view to be the set of caches of which a particular client is aware. … A client uses a consistent hash function to map a object to one of the caches in its view. …[Karger 1997] views of witch machines are available. …The Internet, however, does not have a fixed collection of machines. Instead, machines come and go as they crash or are brought into the network. Even worse, the information about what machines are functional propagates slowly through the network, so that clients may have incompatible “views” of which machines are available to replicate data. This makes stan- dard hashing useless since it relies on clients agreeing on which caches are responsible for serving a particular page. …[Karger 1997] Referencer uden annotering Resultat af en søgning på scholar.google.com på ’MongoDB’ [Lassen 2015]. Document1 Side 22 af 24 Document1 [Alvermann 2011]- [Copeland 2013] [Alvermann 2011] Markus Alvermann. Alles ist möglich. Einführung in MongoDB. JavaSPEKTRUM 1/2011 URL20151005: https://www.iks-gmbh.com/assets/downloads/Einfuehrung-in-MongoDB-iks.pdf. URL: www.javaspektrum.de [Boicea 2013] Alexandru Boicea, Florin Radulescu, Laura Ioana Agapin, "MongoDB vs Oracle -- Database Comparison", EIDWT, 2012, 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies, 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies 2012, pp. 330-335, doi:10.1109/EIDWT.2012.32. URL20151005: http://www.computer.org/csdl/proceedings/eidwt/2012/4734/00/4734a330-abs.html [Bson xxxx] Homepage BSON, http://www.bsonspec.org. Creativ commons. No-copyright. URL20151021: http://www.bsonspec.org [Chodorow 2011] Kristina Chodorow. Scaling MongoDB. Sharding, Cluster Setup and Administration. O’Reilly 2011. URL20151005: https://books.google.no/books?hl=da&lr=&id=Fp6NtavDpEC&oi=fnd&pg=PR5&dq=mongodb&ots=iTT8IhDPWl&sig=DiB0xrj8-_jEZ2_Kdx6zzeiUCA&redir_esc=y#v=onepage&q=mongodb&f=false [Chodorow 2011B] Kristina Chodorow. 50 Tips and Tricks for MongoDB Developers. O’Rielly 2011. URL20151005: https://books.google.no/books?hl=da&lr=&id=Np_aXDGVKoC&oi=fnd&pg=PR5&dq=mongodb&ots=jNC8QI_3GH&sig=fhSXEojdXSL4plcUIvyfsrrCh2Y&redir_es c=y#v=onepage&q=mongodb&f=false [Chodorow 2013] Kristina Chodorow MongoDB: The Definitive Guide. Powerfull and scalable data storage. O’Rielly, 2013. ISBN 978-1-449-34468-9. URL20151005: https://books.google.no/books?hl=da&lr=&id=uGUKiNkKRJ0C&oi=fnd&pg=PP1&dq=mongodb&ots=h8mxJeeTte&s ig=vwf6uyPyjWOu8ug2mHen4MRLQ4k&redir_esc=y#v=onepage&q=mongodb&f=false [Copeland 2013] Rick Copeland. MongoDB Applied Design Patterns. O’Rielly. 2013. URL 20151005: https://books.google.no/books?hl=da&lr=&id=S53RrxZZMtcC&oi=fnd&pg=PR2&dq=mongodb&ots=6xBalbjAmC&s ig=FnEh2qnpe6Bqjczg5bLfrVRkclE&redir_esc=y#v=onepage&q=mongodb&f=false [Dede 2013]- [Imran m.fl.2003+] [Dede 2013] Elif Dede, Madhusudhan Govindaraju, Daniel Gunter, Richard Shane Canon, and Lavanya Ramakrishnan. 2013. Performance evaluation of a MongoDB and hadoop platform for scientific data analysis. In Proceedings of the 4th ACM workshop on Scientific cloud computing (Science Cloud '13). ACM, New York, NY, USA, 13-20. http://doi.acm.org/10.1145/2465848.2465849. URL20151005: http://dl.acm.org/citation.cfm?id=2465849# [ecma 2013] ECMA-404.The JSON Data Interchange Format. 1st Edition / October 2013 ECMA International URLPDF20151021: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf Homepage JSON. Introducing JSON. ECMA-404 The JSON Data Interchange Standard, http://json.org/ URL20151021: http://json.org/ [Eernisse 2006] Matthew Eernisse, Build your own AJAX Web Applications. Sitepoint. 2006 [Floratou m.fl. 2012] Avrilia Floratou, Nikhil Teletia, David J. DeWitt, Jignesh M. Patel, Donghui Zhang. Can the Elephants Handle the NoSQL Onslaught? Proceedings of the VLDB Endowment, Vol. 5, No. 12. August 27th - 31st 2012, Istanbul, Turkey. Copyright 2012 VLDB Endowment Document1 Side 23 af 24 Document1 [Fotache 2013] Marin Fotache, Dragos COGEAN. NoSQL and SQL Databases for Mobile Applications. Case Study: MongoDB versus PostgreSQL Al. I. Cuza University of Iasi, Romania [email protected], [email protected]. Informatica Economică vol. 17, no. 2/2013 URL20151005: http://revistaie.ase.ro/content/66/04%20-%20Fotache,%20Cogean.pdf [Garcia-Molina m.fl 2012] Hector Garcia-Molina, Jeffrey D. Ullman and Jenifer Widom. Database Systems. Selected Chapters. Compiled by Marcos Vaz Salles. University of Copenhagen. Pearson, 2014. ISBN 978-1-78399-319-2. [Hecht & Jablonski 2011] Robin Hecht, Stefan Jablonski. NoSQL Evaluation A Use Case Oriented Survey. 2011 International Conference on Cloud and Service Computing. URL20151101: http://rogerking.me/wpcontent/uploads/2012/03/DatabaseSystemsPaper.pdf [Imran m.fl.2003+] Imran (), Delirium (Mark) List of TCP and UDP port numbers From Wikipedia, the free encyclopedia. URL20151101: https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers [Karger 1997]- [Nori 2007] [Karger 1997] David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin and Rina Panigrahy. Consistent hashing and random trees: Distributed cachine protocols for relieving hot spots on the World Wide Web. In Proceedings of the 29th ACM Symposium on Theory of Computing, pages 654-663, 1997 [Karger m.fl. 2015] David Karger, Alex Sherman, Andy Berkheimer, Bill Bogstad, Rizwan Dhanidina, Ken Iwamoto, Brian Kim, Luke Matkins, Yoav Yerushalmi. Web Caching with Consistent Hashing. MIT Laboratory for Computer Science. URL20151016: http://www8.org/w8-papers/2a-webserver/caching/paper2.html [Lakshman & Malik 2010] Vinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35-40. [Lassen 2015] Lassen, Anders. Søgning på scholar.google.com. De første 3 sider. URL20151005: https://scholar.google.dk/scholar?start=20&q=mongodb&hl=da&as_sdt=0,5 [Lawrence 2014] Ramon Lawrence. Integration and Virtualization of Relational SQL and NoSQL Systems including MySQL and MongoDB. 2014 International Conference on Computational Science and Computational Intelligence [Mahmoud 2012] Q. H. Mahmoud, S. Zanin, T. Ngo, “Inte- grating Mobile Storage into Database Systems Courses”, in Proc. of the 13th annual conference on Information tech- nology education - SIGITE '12, 2012, pp. 165-170 [Marcus 2015] The NoSQL Ecosystem. Adam Marcus 16th International Workshop on High Performance Transaction Systems (HPTS). September 27-30, 2015. Asilomar Conference Grounds. Pacific Grove, CA. MIT CSAIL. DBg Database Group. MIT Computer Science and Artificial Intelligence Lab. Presentation. 58 pages. URL 20151005: http://hpts.ws/papers/2011/sessions_2011/nosql-ecosystem.pdf [MongoDB 2015] MongoDB - Mongostat. From MongoDB reference. URL20151024: http://www.mongodb.org/display/DOCS/mongostat [MongoDB 2015C] Replication and MongoDB. Release 3.0.7 October 23, 2015. MongoDB, Inc. Copyright MongoDB, Inc. 2008 - 2015 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 United States License. URL20151024: https://docs.mongodb.org/master/MongoDB-replication-guide-master.pdf [MongoDB 2015D] Sharding and MongoDB. Release 3.0.7 October 23, 2015. MongoDB, Inc. Copyright MongoDB, Inc. 2008 - 2015 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 United States License. URL20151024: https://docs.mongodb.org/ [Nori 2007] A. K. Nori (2007). Mobile and Embedded Databases. Bulletin of the IEEE Computer Society T echnical Committee on Data Engineering. Available: ftp://ftp.research.microsoft.com/pub/debu ll/A07sept/nori.pdf (accessed March 2013) Document1 Side 24 af 24 Document1 [Padmanabhan 2008]- [Yimeng 2012] [Padmanabhan 2008] P . Padmanabhan, L. Gruenwald, A. Vallur, M. Atiquzzaman, “A survey of data replication techniques for mobile ad hoc network databases,“ The VLDB Jour- nal, 17, pp. 1143–1164, 2008 [Public Domain Dedication] CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. No Copyright. Ny public domain copyright. Gælder BSON og måske JSON. Her mangler lidt ref-arbejde.TODO [Ramakrisnan & Gehrke 2003] Raghu Ramakrishnan, Johannes Gerke. Database Management Systems. Third Edition. McGrawHill 2003 [Rasser m.fl. 2004] Rasser (), Glenn (Glenn Møller-Holst), Rune (Rune Magnussen) UDP. Fra Wikipedia, den frie encyklopædi. URL20151101: https://da.wikipedia.org/wiki/UDP [Tran 2009] V.T.K. Tran, R.K. Wong, W.K. Cheung, J.Liu, “Mobile Information Exchange and Integration: From Query to Application Layer”, in Proc. of the 20th Australasian Database Conference (ADC 2009), 2009, pp.115-124 [Truica m.fl. 2013] Ciprian-Octavian Truica, Alexandru Boicea, Ionut Trifan. CRUD Operations in MongoDB. Advances in Intelligent Systems Research. 2013 International Conference on Advanced Computer Science and Electronics. ICACSEI-13. July 2013. ISBN 978-90-78677-74-1. URL20151005: http://www.atlantispress.com/php/pub.php?publication=icacsei-13&frame=http%3A//www.atlantis-press.com/php/paperdetails.php%3Ffrom%3Dauthor%2520index%26id%3D7568%26querystr%3Dauthorstr%253DB%2526publication%25 3Dicacsei-13 [Unhelkar 2010] B. Unhelkar, S. Murugesan, “The Enterprise Mobile Applications Development Framework,” IT Professional, 12 (3), May/June 2010, pp.33-39 [Wei-ping 2011] Zhu Wei-ping; Li Ming-xin; Chen Huan, "Using MongoDB to implement textbook management system instead of MySQL," in Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on , vol., no., pp.303-305, 27-29 May 2011 doi: 10.1109/ICCSN.2011.6013720 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6013720&isnumber=6013532 [Yimeng 2012] Yimeng Liu *; Yizhi Wang ; Yi Jin. Research on the improvement of MongoDB Auto-Sharding in cloud environment. 7th International Conference on Computer Science & Education (ICCSE), 14-17 July 2012. Melbourne, VIC. pp 851–854. ISBN: 978-1-4673-0241-8. URL20151005: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6295203&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2 Fabs_all.jsp%3Farnumber%3D6295203 Document1