* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SAND CDBMS
Data analysis wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Concurrency control wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Business intelligence wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
Accelerating the largest user populations, the biggest data, and the most complex analytics™ SAND CDBMS A Technological Overview Contents 1. Introduction .........................................................................................................................................................................................................................................5 Architecture Overview ....................................................................................................................................................................................................................5 Supported Operating Systems ........................................................................................................................................................................................................6 2. SAND Technology’s Patented Storage Architecture ...............................................................................................................................................................................6 Tokenization ...................................................................................................................................................................................................................................6 Benefits of the SAND CDBMS Storage Architecture..........................................................................................................................................................................8 3. Virtual Mode Operation ........................................................................................................................................................................................................................9 4. Time Travel Database Versioning.........................................................................................................................................................................................................10 5. Workload Manager .............................................................................................................................................................................................................................11 Octopus Architecture ....................................................................................................................................................................................................................11 24/7 Operations ...........................................................................................................................................................................................................................12 6. Text Search .........................................................................................................................................................................................................................................12 7. Security ..............................................................................................................................................................................................................................................12 Database File Security ...................................................................................................................................................................................................................12 Virtual/Time Travel Mode Delta File Security.................................................................................................................................................................................13 Secure Record De-Identification....................................................................................................................................................................................................13 8. Backup and Recovery .........................................................................................................................................................................................................................14 Disaster Recovery..........................................................................................................................................................................................................................14 Conclusion..............................................................................................................................................................................................................................................15 About SAND Technology .........................................................................................................................................................................................................................16 Copyright ©2010 SAND Technology. All rights Reserved. SAND CDBMS is a trademark and Nucleus is a registered trademark of SAND Technology. Other trademarks remain the property of their respective owners. Visit SAND Technology on the web at www.sand.com. 100620 SAND CDBMS: A Technological Overview 1. Introduction SAND CDBMS is a specialized Business Intelligence (BI) software platform that streamlines and accelerates the development and deployment of high-performance multi-user analytic applications. Unlike repurposed online transaction processing (OLTP) systems, SAND CDBMS implements a patented data structure specifically designed to provide speed and responsiveness for large and complex analytic tasks. This columnbased, tokenized, bitmap-indexed data structure offers very efficient storage of data at a granular level, and provides 100%-”indexing” of the data without requiring imposition of specialized schemas. At the same time, it appears to users as a standard relational database and fully supports standard SQL. The key benefits provided by SAND CDBMS are speed, flexibility, responsiveness, scalability and low overall cost of ownership: • Because it is designed specifically for high-performance analytic tasks, SAND CDBMS is very fast to set up without extensive requirements for specialized IT resources, and very fast to respond to user requests. • Because data is automatically 100%-”indexed”, SAND CDBMS empowers users with “go-anywhere” capabilities and responds to changes in business requirements without the need for intervention by the IT department. • Because it offers efficient data storage and scales in a sublinear fashion, SAND CDBMS is three to ten times more resource-efficient than traditional RDBMSs or Multi-Dimensional Databases. • Because it fully supports standard SQL and ODBC/OLE DB/JDBC interfaces, SAND CDBMS data can easily be integrated with existing or fa-miliar query and reporting tools, reducing implementation and training costs. • The Time Travel feature enables database updates concurrent with user access, permitting 24/7 operation without “down time”. • SAND CDBMS’ workload management utility enables easy scaling within SMP or MPP environments. This paper describes in detail the architecture of SAND CDBMS and shows how its various features can work as part of a powerful BI environment for organizations of any size that need fast and flexible analytical access to masses of corporate data. Architecture Overview SAND CDBMS typically sits at the heart of a high-performance BI application. Data can be loaded from flat files or any ODBC data source, and once loaded can be immediately accessed using any SQL92-compliant, ODBC-, OLE DB- or JDBC-enabled BI front-end tool or application. This permits SAND CDBMS to act as a single access point for data from multiple sources. Because data can be loaded from any legacy, operational or third-party source without the necessity of first building a schema, users can begin to run queries immediately, yielding fast business benefits. Figure 1: SAND CDBMS 5 SAND Technology White Paper Since SAND CDBMS incorporates a three-phased data loader that has the ability to support scripted Extract, Transform and Load (ETL) rou-tines, it is not necessary to use a third-party ETL tool. Most ETL tools were designed to load data into OLTP databases, and part of their func-tion is to build the kind of indexes and schemas that are not required by SAND CDBMS. However, in certain high-performance applications, users may choose an ETL tool in order to take advantage of specialized features such as aggregation-building. Supported Operating Systems SAND CDBMS is available for the Windows, Unix, and Linux operating systems. Hardware requirements are minimal: current implementa-tions typically involve SAND CDBMSs running on two to four processors, depending on the number of concurrent sessions that need to be supported. 2. SAND Technology’s Patented Storage Architecture SAND CDBMS has been designed to achieve a single goal: the transformation of operational or warehoused data into useful information in as quick, efficient, and simple a way as possible. The unique, patented Nucleus® technology that is at SAND CDBMS’ core enables rapid and flexible access to data coming from any source, which in turn enables fast return of business benefits at a very low total cost of ownership. SAND CDBMS employs a column-based data storage architecture, implemented using tokens to represent atomic data values and encoded bit vectors to represent data relationships. These features, central to SAND CDBMS’ data storage, retrieval and manipulation architecture, are perfectly adapted to satisfy the unique set of requirements posed by the analytical data warehouse. Together, they yield spectacular ad hoc query performance, tremendous memory and storage resource efficiency, and near-zero administration requirements. Tokenization Token database technology differs radically from standard database technology. In a standard database, when a record is added to the sys-tem, a physical representation of the data is appended on disk. Consider a few simple records that might be found in a standard database management system, as shown in Table 1: Table 1 Record 01 James Johnson 459-070-1872 Dallas TX 1000.00 Record 02 Tom Swanson 498-998-1097 El Paso TX 550.00 Record 03 James Thomas 409-550-1093 Waco TX 1000.00 Record 04 Tom Jameson 409-209-1097 Dallas TX 275.00 Record 05 Bill Johnson 498-550-1082 El Paso TX 300.00 Record 06 Terry Thomas 459-220-9832 Dallas TX 1000.00 Record 07 James Swanson 409-220-2987 El Paso TX 575.00 Record 08 Bill Wilson 498-209-1098 Dallas TX 1000.00 Record 09 John Thompson 498-207-1108 El Paso TX 550.00 Record 10 John Thomas 459-207-2008 Waco TX 1000.00 Each time a transaction is completed, a new record is added to the standard database. The scaling of data is said to be linear, because the volume of data is a function of the number of records. However, as the example in Table 1 shows, there is significant redundancy of values in this kind of database: the name “James” appears three times; the number “1000.00” appears 5 times; the state “Texas” appears 10 times, and so forth. Now consider the possibility of creating a token for each entry in the database. “James” would have a token of 01 for a first name; “Texas” would have a token of 01 for state name; “Dallas” might have a token of 01 for city name, and so on. The same can be done for the telephone number area codes and local exchanges. The database could be reduced to a series of tokens, represented as in Table 2. 6 SAND CDBMS: A Technological Overview Table 2 First Name Last Name City State James 01 Johnson 01 Dallas 01 Tom 02 Swanson 02 El Paso Bill 03 Thomas 03 Waco Terry 04 Jameson 05 John 05 Thompson 06 TX Amount 01 Area Code Prefix Extension 1000.00 01 459 01 070 01 1872 01 02 550.00 02 498 02 998 02 1097 02 03 275.00 03 409 03 550 03 1093 03 300.00 04 220 04 1082 04 575.00 05 209 05 9832 05 207 06 2987 06 1098 07 1108 08 2008 09 SAND CDBMS employs a domain construct to record the correspondence between actual scalar data values and their respective tokens. This allows multiple columns to reference the tokens and values contained in a single domain. Table 3 Record 01 01 01 01 01 01 01 01 01 Record 02 02 02 02 02 02 02 01 02 Record 03 01 03 01 03 03 03 01 01 Record 04 02 04 03 01 02 01 01 03 Record 05 03 01 02 03 04 02 01 04 Record 06 04 03 01 04 05 01 01 01 Record 07 01 02 02 04 06 02 01 05 Record 08 03 05 02 05 07 01 01 01 Record 09 05 06 03 06 08 02 01 02 Record 10 05 03 03 06 09 03 01 01 While the database can be conceived as a structure like the one shown in Table 3, SAND CDBMS actually uses arrays of bit vectors (where a ‘1’ indicates that the value is present, and a ‘0’ that it is not) to represent the positional relationships of tokens — and hence their respective atomic data values — in columns. In this respect SAND CDBMS is a “fully inverted” database, automatically keeping track of the location of each data value to enable extremely fast searches for specific information. It might be objected that these bit vector arrays could become extremely large when table columns contain many unique values. To avoid this problem, SAND CDBMS applies a unique, patented encoding technique to the bit arrays, producing substantial, lossless compression of sparse vectors (that is, vectors where either 1’s or 0’s predominate). The encoded bit vectors are further compressed in order to optimize stor-age economy; when SAND CDBMS processes a database query, it works with the compressed form of the bit vectors, which provides an addi-tional performance advantage. The second crucial feature of SAND CDBMS’ data storage architecture is column-orientation. While traditional RDBMS’ are row-oriented, accessing data at the record level, SAND CDBMS accesses data in columns using the bit vectors as described above. Thus, instead of having to bring each entire data record into memory for processing, only the columns that are required to execute the query are “read”. For a more comprehensive description of SAND CDBMS’ data storage architecture, please refer to the SAND Technology white paper entitled Nucleus: Technological Core of SAND CDBMS. 7 SAND Technology White Paper Benefits of the SAND CDBMS Storage Architecture Speed Since SAND CDBMS separates a stored data value from bit vectors indicating whether it exists in a given database table column, data which satisfies a certain criterion can be found very quickly: minimal work is required to simply scan the bit vectors. In fact, because of the patented compression applied to the bit vectors, it is possible to find a unique value in a column that contains one billion rows using just four bytes of data. It is this technique that allows SAND CDBMS to deliver the effective equivalent of 100% indexing, without imposing the need to create the additional space-consuming and management-intensive index structures that traditional systems require to provide fast access to data. Due to the column/token-based architecture, active working sets are very small. This characteristic alone accelerates queries by a factor of two or three (or even more). On top of this, complex queries are primarily resolved using operations performed on encoded bit vectors, which compounds the acceleration effect in most cases. When the speed of processing is greatly accelerated, many possibilities arise: for example, very high-speed in-memory processing allows techniques for heuristic analysis that otherwise would not be feasible. Analysts can get quick answers to queries that would require other systems to scan entire databases. Storage Economy Data tokenization, using domain-based storage along with the application of encoded bit vector representation, drastically reduces the storage volume required for structured data. In fact, the larger the amount of data, the greater the difference in storage requirements between a standard linearly-scaling database and a tokenized database — in other words, the greater the number of records, the greater the storage advantage of the token-based approach. Flexibility The SAND CDBMS storage architecture intrinsically indexes all columns in database tables, but without actually creating additional index structures. When all columns are indexed, the possibilities for analysis are unlimited — analysts can look at any field, in any manner. Query speed is such that if the analyst wants to refine the results, a new query can be reformulated and rerun. All reformulation and recalculation can be done in a fraction of the time that would have been required if the database resided in standard record-based storage with adjunctive index structures. Furthermore, SAND CDBMS’ column-based architecture makes it uncomplicated to add or drop table columns to accommo-date data model evolution. Simplified Administration Like a standard relational database, SAND CDBMS provides read-write access to data, with full ACID-compliant transaction support. However, SAND CDBMS does this without imposing the usual price of administrative complexity. The benefits of this simplified administration are manifest in a number of areas. First, SAND CDBMS takes responsibility for the details of storage out of the hands of the user, allowing completely free-form data manipula-tion — including ad hoc query, transformation, bulk load, create and destroy steps — without the administrative intervention that would traditionally be required at each turn. This is significant because in a standard RDBMS, physical storage issues normally devour time and ad-ministrative resources with constant structural reorganizations, I/O tuning, partitioning, cache tuning and indexing. SAND CDBMS uses pat-ented memory management techniques together with a container-based storage model to remove most of the details of storage manage-ment from the administrative workload. Administrators need only concern themselves with providing adequate file system storage to ac-commodate data growth, and setting a maximum amount of memory to be used for data operations. Second, SAND CDBMS uses the operating system to handle its file storage, so backup is also straightforward (see also section 7, Backup and Recovery). The database can be backed up by simply copying the database files, even when they are being actively queried. In fact, commercially available file system backup utilities can be used for this purpose, with full exploitation of parallelism. The fact that a SAND CDBMS da-tabase is made up of operating system files also makes it readily transportable from one system to another. Finally, SAND CDBMS removes two other administrative activities from the equation: locking and tuning, both notorious for their voracious appetite for DBA resources. SAND CDBMS employs a lock-less, optimistic concurrency control scheme that is notable for its complete absence of administrative requirements. SAND CDBMS also requires little performance tuning, making it ideal for supporting data exploration activi-ties consisting mainly of unexpected queries, as well as for prototyping applications. 8 SAND CDBMS: A Technological Overview 3. Virtual Mode Operation SAND CDBMS’ Virtual Mode of operation is the foundation for some of its most powerful capabilities, particularly when it is used as a BI sys-tem development facility. In Virtual Mode, any number of separate virtual instances of a single SAND CDBMS database can be made available, without the need to physically replicate any data. Each of these virtual database instances can be altered and manipulated by users in any way — including changes to data models, updates, inserts, deletes, and so on — without affecting users of other instances or touching the actual data in the real database. Only users connected to the same virtual instance will see the changes. In a SAND CDBMS system operating in standard (“Real”) mode, the data “pages” making up the database are in one of three states at any given time: unread blocks on disk, unmodified pages in memory, and modified (“dirty”) pages in memory. Dirty pages are written back to the disk-resident database space as necessary. When Virtual Mode is enabled, contrastingly, dirty pages are paged out to a separate, “temporary” disk location. This means that, in Virtual Mode, a consistent version of a database is made up of the unread blocks in the main data space, dirty pages in memory, and paged-out disk-resident blocks in the temporary location. Figure 2: Virtual Mode vs. Real Mode Thus client programs actually have “read-only” access to the real database when working in Virtual Mode. When a COMMIT WORK command is executed by a user connected to a virtual instance, the database changes effected during the preceding transaction are recorded in a temporary Delta File associated with that instance, rather than in the database itself. When the instance is shut down, the Delta File is deleted, and the changes are discarded. However, it is possible to preserve alterations to the database made in Virtual Mode, by using a special in-stance configured as the master virtual instance. This permits changes recorded in the instance Delta File to be written back to the core data-base. Changes made through the master virtual instance, whether live updates or high-speed bulk loads, can be performed concurrently with full client access through the virtual engines. Virtual Mode even supports high-speed backup of the database files while virtual engines are active. Figure 3: Multiple Concurrent Servers 9 SAND Technology White Paper This becomes very powerful when multiple SAND CDBMS engines are set up to run against a single database, each with its own virtual mem-ory and temporary space. The engines run completely independently of one another, with full read-write capability. Multiple engines can share processors, run on multiple processors in an SMP environment, or even run on separate computers in a networked or clustered envi-ronment. The “virtual database” concept is revolutionary, especially as it applies to the BI system design process. Separate audiences can take the data through many different experimental scenarios without fear of polluting or irrevocably modifying the original legacy data. The ability to explore multiple scenarios in parallel magnifies the power of SAND CDBMS and represents a quantum leap in BI system development capabilities. 4. Time Travel Database Versioning SAND CDBMS’ Time Travel feature is a unique database multi-versioning system, extending the basic Virtual Mode capabilities described above to permit multiple different virtual versions or “Snapshots” of a SAND CDBMS database to be stored and re-used — without requiring the creation of multiple physical copies of database files. As in Virtual Mode, a SAND CDBMS operating in Time Travel Mode records only the changes made to the original data, and then represents the result as a new version of the complete database. However, this new version can be preserved and used as a new starting point for database clients, who can make further updates and save them as a new Snapshot, in effect producing a “chain” of interrelated virtual database versions. Any of these database versions can be concurrently accessed and freely updated, without affecting the others. Figure 4: Example of a Time Travel Environment When working in Time Travel mode, the initial state of the core physical database can be considered as the root node of a tree structure, which grows as each set of updates is saved as a Snapshot. Altered database Snapshots (representing physical Update Files) are added to the tree as child nodes dependent on the Snapshot from which they started. When further changes are made to these new database Snapshots, a new level of nodes is added, forming “branches” that represent the dependencies between the various states (see Figure 5). At any time, selected Time Travel database Snapshots can be merged together (for example, to collapse a series of daily updates into one that represents a week), or used to update the core database. Furthermore, Snapshot merges and database updates can be processed “in the background” and then, once complete, be substituted seamlessly as the current version of the data made available to clients (this is accomplished using the Octopus workload management utility, described in the next section). Figure 5: The Tree Structure of Time Travel Database Versions The Time Travel option enables the following unique functionality for SAND CDBMS users: • The possibility of “freezing” analytical environments so that long-running analyses are unaffected by updates to the core database • The ability to interrogate the data as it existed at some time in the past, without having to restore any files, even while other users continue to use the current version of the database • The ability to preserve database structures created as part of “what-if” scenarios during data prototyping • 24/7 operations (with Octopus), allowing updates with no “down time”. 10 SAND CDBMS: A Technological Overview 5. Workload Manager A crucial component of large-scale SAND CDBMS implementations, the Octopus utility is designed to optimize the use of computing resources in a busy database environment by automatically managing the workload generated by multiple clients using the same SAND CDBMS database. When users issue queries simultaneously, Octopus transparently distributes query processing among a set of SAND CDBMS database engines running in parallel. Octopus is especially well suited for use in SMP (symmetric multi-processing) computing architectures; it is also the central component of the Massively Parallel Server Option (MPSO®), which allows SAND CDBMS implementations to scale to powerful multi-server (MPP) environments.. Octopus Architecture The basic architecture of the Octopus system is very simple. Each instance of the Octopus utility works with a single SAND CDBMS database. When Octopus is started, it automatically starts the database with a specified number of special server engines, which can reside on a single computer or on multiple computers connected over a network. In the second configuration, the system can start with a single server machine, then add more as demand grows — and the database files can themselves also be distributed among the server machines. All users connect to the same Octopus server, and proceed to work with the data as they would in a standard SAND CDBMS environment. Octopus will automatically send a user’s query to an engine that is not busy processing another command; if none are free, the query is queued for execution by the first engine that becomes available. Furthermore, these engines can be grouped into classes of service that are configured to handle specific processing requirements, so that the execution of complicated, long-running queries does not delay response to shorter-running ones. This feature gives DBA’s a simple and flexible way to guarantee specific levels of database service, without having to worry about the management of individual database connections . Figure 6: The Octopus System The operation of Octopus is entirely transparent: even while multiple connections to different engines are being made internally, users only need to specify one connection to Octopus. Each client session continues uninterrupted until it is disconnected from the Octopus server. From the point of view of ODBC connectivity, the Octopus database is a single data source. 11 SAND Technology White Paper 24/7 Operations When Octopus is used in conjunction with Time Travel Mode, SAND CDBMS is easily configured for 24/7 operations, enabling round-the-clock availability of data for clients. In Time Travel mode, database updates can be performed and saved as a new Snapshot “in the background” while users continue to access the old data. Once the update is complete, the Octopus utility can be instructed to switch client connections transparently to make the updated version available to users. 6. Text Search Advanced Text Search extends the analytic capabilities of SAND CDBMS to include high-performance searching within the blocks of unstructured text – email messages, documents, web pages , and so on. With the SAND CDBMS Advanced Text Search option, users can easily locate valuable information that previously lay hidden and unexploited within vast amounts of textual data to power applications in areas like as e-discovery, web site analytics, fraud detection and more. Enhancements to standard SQL provide extensive search capabilities, including: • • • • • SOUNDS LIKE SPELLED LIKE Proximity searches for words within or beyond a specified distance “Concept” searches for synonyms “Stem” searches for related words. SSearch terms can be combined with the boolean operators AND, OR and NOT. These booleans connectors allow very elaborate search terms to be constructed, and search results can be scored for relevance, making it simple to assess results and quickly find the precise information that is required. Any or all of these searches can be incorporated within standard database queries, delivering unprecedented analytic capabilities to users. SAND CDBMS’ column-oriented data management technology is based on the use of inverted indexes, a structure which is ideal for full-text indexing and searching: words in unstructured documents are tokenized and automatically indexed according to their location in each document, enabling very fast location of specific entries. The SAND CDBMS Advanced Text Search option takes full advantage of this inherent architectural adaptation to provide the highest performance for text analytics. 7. Security The security features of a data management product allow administrators to prevent unauthorized access to and manipulation of data, thus protecting a resource that is of vital significance to the enterprise. SAND CDBMS ensures data security not only by means of the usual SQL mechanisms for controlling access to database contents (that is, database user authorizations and privileges), but also through the intrinsic encryption provided by the tokenization/bitmap indexing process; furthermore, in data environments containing sensitive personal information, the optional Secure Record De-identification module can be implemented to guarantee the privacy of individuals. Database File Security An earlier section of this paper described how SAND CDBMS “tokenizes” all data held in database tables, and therefore needs to store each unique value only once. A side effect of this process is that the database is intrinsically encrypted, since the original relationship between the data token and its actual value is stored in the form of multiple encoded bit vectors that are unreadable to programs other than SAND CDBMS. Furthermore, all data values (text strings, numbers, and date/time/timestamps) in a SAND CDBMS database are stored in an “obfus-cated” way so that there is no readable clear text present in the database files; additionally, the stored values are not NULL delimited, so an intruder will not be able to discern the boundaries between values. Thus, while a SAND CDBMS database consists physically of a collection of operating system files that can be browsed, or copied as required for backup purposes, SAND’s “defense in depth” of stored data values en-sures that looking into the contents of these files will reveal nothing useful about the database. 12 SAND CDBMS: A Technological Overview As in any enterprise computing environment, the operating system administrator will be responsible for enforcing the necessary permissions on the actual database files, in order to ensure that these files can be accessed only by the SAND CDBMS system account. Database end users do not generally need to have direct access to the actual database files, since they will access the data through SAND CDBMS’ communications layer. Virtual/Time Travel Mode Delta File Security When changes are made to a SAND CDBMS database running in Virtual or Time Travel mode, a Delta File is produced. This is a temporary file that is locked exclusively by SAND CDBMS, and so cannot be read by any other program. When the administrator decides to make this temporary file permanent, the file is transformed into a persistent Update File. The Update File cannot be decoded by anyone — not even by SAND Technology personnel — without access to the original database, for while the Update File may indicate that token ‘23’ has changed, it is impossible to determine what value (or format) this token represents without reference to the original domain of values in the database. One of the major benefits of Virtual Mode Update Files is that it is possible to distribute them to remote sites and apply them there to ‘slave’ databases, allowing the simple replication of the master database at remote locations using only a single command. These files can also be compressed and then securely transferred using FTP (or, in some cases, even email) to the remote location; since a SAND CDBMS Update File cannot be decoded without access to the original database, it will be of no use on its own should it be intercepted by an unauthorized user. Secure Record De-Identification Protection of individual privacy is a major concern in healthcare institutions, government departments, law enforcement agencies, and other organizations that collect sensitive personal data. For situations where such considerations call for a particularly rigorous data security regime, SAND Technology offers a Data Privacy Option for use with SAND CDBMS, allowing for analytics on “de-identified” data while retaining the possibility of restoring the identification of records when required. The SAND Data Privacy Module separates attributes in the database that allow identification of an individual from fields that do not, and encrypts personal information using a AES 256-bit encryption key that is specific to the customer organization. This means that the central repository holding all personal information is fully encrypted on disk, protecting the organization against theft of storage (for example, of a tape backup containing a copy of the repository). Built-in monitoring ensures full compliance with any applicable legal regulations. The SAND secure data model, incorporating a user profile manager to manage access privileges, permits correlation of records relating to a given patient without decrypting any identifying information, enabling analytical tracking of visits or procedures while maintaining privacy. Sensitive personal information may be “re-identified” on a user’s workstation (based on the user’s security profile) when analytical results are returned, but only in strictly regulated circumstances. This means that all information transferred over the network is encrypted, protecting the confidentiality of the data against network hacking. Figure 7: SAND Data Privacy Module The SAND Advanced Data Security option features a utility for management of the access privilege profiles of various users in relation to specific data elements in the warehouse. This utility identifies the level of sensitivity of each data element contained in SAND CDBMS and the access level of each user of the server. For audit purposes, it also keeps track of all commands that grant or revoke user privileges on various data elements. 13 SAND Technology White Paper 8. Backup and Recovery SAND CDBMS databases consist of files that are visible to the operating system. This characteristic, when coupled with the SAND CDBMS Time Travel architecture, makes database backup, recovery, and restoration considerably simpler (and frequently more effective) than is possible with conventional data warehouse and data mart technologies. It also makes even large databases highly portable, yielding a very natural and effective approach to full disaster recovery — something that can be extremely difficult or even impossible for conventional RDBMSs. When deployed in Virtual Mode, all persistent database files can be backed up while the database is available for query, and even during load or update processes. A separate database server instance can be set up to process periodic refreshes (load and update). All changes will then be reflected in a Delta File under the control of this server instance. At predetermined intervals, the changes reflected in the Delta File can be propagated back to the database. SAND CDBMS’ Time Travel Mode provides 24x7 system availability, as well as incremental backup capability. Time Travel Mode permits the maintenance of a chain of Update Files referring back to a persistent database. In this context, any pair of consecutive Update Files in a chain can be merged without interrupting system availability. Merges back to the core database can be scheduled to occur weekly, monthly or at longer intervals. In this case, the backup strategy would entail backing up the Update Files, constituting an effective incremental backup capability. A database is restorable to the last complete set of changes by backing up the latest Update File in the chain. Disaster Recovery Conventional DBMS strategies for disaster recovery of high-availability data marts and warehouses are usually highly complex, expensive, and subject to failure. The fact that SAND CDBMS’ databases are made up of files that are visible to the host operating system makes these databases effectively “portable”. A database can literally be transferred to a new host system and mounted there with only a trivial amount of administrative overhead. This, together with Time Travel’s incremental backup capability, makes SAND CDBMS uniquely capable of realiz-ing simple and effective backup and recovery in disaster scenarios. Implementation of an off-site “shadow” SAND CDBMS system is straightforward: 1. Backup the entire set of persistent database files to an off-site system. 2. Periodically transfer incremental backups (Time Travel Update Files) to the off-site system via high-speed network transfer. 3. The original backup, together with the latest Time Travel Update File, represents the latest baseline of the primary system. In a disaster scenario, operations can be switched over to the off-site backup nearly instantaneously. There are a number of simple options for handing Time Travel Update Files in this situation, including maintaining them as a chain, merging delta files, or applying them back to the backup copy of the original persistent database. 14 SAND CDBMS: A Technological Overview Conclusion Ironically, while Business Intelligence has grown in importance, it has also developed a cumbersome infrastructure that hinders rapid re-sponse to evolving business needs. SAND CDBMS is designed to realize the initial promise of Business Intelligence, by enabling information delivery that is responsive to the pace and demands of business, and by placing “intelligence” directly in the hands of business users without requiring extensive assistance from the IT department. The advanced technological features described in this document combine to enable the speed, performance, flexibility, scalability and low cost of ownership of SAND CDBMS. Together, these features give organizations the ability to quickly implement powerful BI applications without disturbing their existing infrastructure, accelerating the benefits associated with enhanced business understanding. 15 SAND Technology White Paper About SAND Technology SAND CDBMS is the world’s most advanced massively parallel, column-oriented database software, accelerating the largest user populations, the biggest data, and the most complex analytics. SAND CDBMS also provides cost-effective nearline data access designed to lower TCO and improve operational performance for SAP NetWeaver BW, IBM DB2, Microsoft SQL Server, Oracle, SAS, and more. SAND Technology has offices in the United States, Canada, Western and Central Europe, and can be reached online at www.sand.com. 16 www.sand.com +1 877-468-2538 +44 (0)1276 804604 +49 40 211 0765 0