Download SAND CDBMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Data model wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

SAP IQ wikipedia , lookup

3D optical data storage wikipedia , lookup

Concurrency control wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Business intelligence wikipedia , lookup

Versant Object Database wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Accelerating the largest user populations, the biggest data, and the most complex analytics™
SAND CDBMS
A Technological Overview
Contents
1. Introduction .........................................................................................................................................................................................................................................5
Architecture Overview ....................................................................................................................................................................................................................5
Supported Operating Systems ........................................................................................................................................................................................................6
2. SAND Technology’s Patented Storage Architecture ...............................................................................................................................................................................6
Tokenization ...................................................................................................................................................................................................................................6
Benefits of the SAND CDBMS Storage Architecture..........................................................................................................................................................................8
3. Virtual Mode Operation ........................................................................................................................................................................................................................9
4. Time Travel Database Versioning.........................................................................................................................................................................................................10
5. Workload Manager .............................................................................................................................................................................................................................11
Octopus Architecture ....................................................................................................................................................................................................................11
24/7 Operations ...........................................................................................................................................................................................................................12
6. Text Search .........................................................................................................................................................................................................................................12
7. Security ..............................................................................................................................................................................................................................................12
Database File Security ...................................................................................................................................................................................................................12
Virtual/Time Travel Mode Delta File Security.................................................................................................................................................................................13
Secure Record De-Identification....................................................................................................................................................................................................13
8. Backup and Recovery .........................................................................................................................................................................................................................14
Disaster Recovery..........................................................................................................................................................................................................................14
Conclusion..............................................................................................................................................................................................................................................15
About SAND Technology .........................................................................................................................................................................................................................16
Copyright ©2010 SAND Technology.
All rights Reserved. SAND CDBMS is a trademark and Nucleus is a registered trademark of SAND Technology. Other trademarks remain the property of their respective
owners.
Visit SAND Technology on the web at www.sand.com.
100620
SAND CDBMS: A Technological Overview
1. Introduction
SAND CDBMS is a specialized Business Intelligence (BI) software platform that streamlines and accelerates the development and deployment
of high-performance multi-user analytic applications. Unlike repurposed online transaction processing (OLTP) systems, SAND CDBMS implements a patented data structure specifically designed to provide speed and responsiveness for large and complex analytic tasks. This columnbased, tokenized, bitmap-indexed data structure offers very efficient storage of data at a granular level, and provides 100%-”indexing” of the
data without requiring imposition of specialized schemas. At the same time, it appears to users as a standard relational database and fully
supports standard SQL.
The key benefits provided by SAND CDBMS are speed, flexibility, responsiveness, scalability and low overall cost of ownership:
• Because it is designed specifically for high-performance analytic tasks, SAND CDBMS is very fast to set up without extensive requirements
for specialized IT resources, and very fast to respond to user requests.
• Because data is automatically 100%-”indexed”, SAND CDBMS empowers users with “go-anywhere” capabilities and responds to changes
in business requirements without the need for intervention by the IT department.
• Because it offers efficient data storage and scales in a sublinear fashion, SAND CDBMS is three to ten times more resource-efficient than
traditional RDBMSs or Multi-Dimensional Databases.
• Because it fully supports standard SQL and ODBC/OLE DB/JDBC interfaces, SAND CDBMS data can easily be integrated with existing or
fa-miliar query and reporting tools, reducing implementation and training costs.
• The Time Travel feature enables database updates concurrent with user access, permitting 24/7 operation without “down time”.
• SAND CDBMS’ workload management utility enables easy scaling within SMP or MPP environments.
This paper describes in detail the architecture of SAND CDBMS and shows how its various features can work as part of a powerful BI environment for organizations of any size that need fast and flexible analytical access to masses of corporate data.
Architecture Overview
SAND CDBMS typically sits at the heart of a high-performance BI application. Data can be loaded from flat files or any ODBC data source, and
once loaded can be immediately accessed using any SQL92-compliant, ODBC-, OLE DB- or JDBC-enabled BI front-end tool or application. This
permits SAND CDBMS to act as a single access point for data from multiple sources. Because data can be loaded from any legacy, operational or
third-party source without the necessity of first building a schema, users can begin to run queries immediately, yielding fast business benefits.
Figure 1: SAND CDBMS
5
SAND Technology White Paper
Since SAND CDBMS incorporates a three-phased data loader that has the ability to support scripted Extract, Transform and Load (ETL) rou-tines,
it is not necessary to use a third-party ETL tool. Most ETL tools were designed to load data into OLTP databases, and part of their func-tion is to
build the kind of indexes and schemas that are not required by SAND CDBMS. However, in certain high-performance applications, users may
choose an ETL tool in order to take advantage of specialized features such as aggregation-building.
Supported Operating Systems
SAND CDBMS is available for the Windows, Unix, and Linux operating systems. Hardware requirements are minimal: current implementa-tions
typically involve SAND CDBMSs running on two to four processors, depending on the number of concurrent sessions that need to be supported.
2. SAND Technology’s Patented Storage Architecture
SAND CDBMS has been designed to achieve a single goal: the transformation of operational or warehoused data into useful information in as
quick, efficient, and simple a way as possible. The unique, patented Nucleus® technology that is at SAND CDBMS’ core enables rapid and flexible
access to data coming from any source, which in turn enables fast return of business benefits at a very low total cost of ownership.
SAND CDBMS employs a column-based data storage architecture, implemented using tokens to represent atomic data values and encoded
bit vectors to represent data relationships. These features, central to SAND CDBMS’ data storage, retrieval and manipulation architecture, are
perfectly adapted to satisfy the unique set of requirements posed by the analytical data warehouse. Together, they yield spectacular ad hoc
query performance, tremendous memory and storage resource efficiency, and near-zero administration requirements.
Tokenization
Token database technology differs radically from standard database technology. In a standard database, when a record is added to the sys-tem,
a physical representation of the data is appended on disk. Consider a few simple records that might be found in a standard database management system, as shown in Table 1:
Table 1
Record 01
James Johnson
459-070-1872
Dallas
TX
1000.00
Record 02
Tom Swanson
498-998-1097
El Paso
TX
550.00
Record 03
James Thomas
409-550-1093
Waco
TX
1000.00
Record 04
Tom Jameson
409-209-1097
Dallas
TX
275.00
Record 05
Bill Johnson
498-550-1082
El Paso
TX
300.00
Record 06
Terry Thomas
459-220-9832
Dallas
TX
1000.00
Record 07
James Swanson
409-220-2987
El Paso
TX
575.00
Record 08
Bill Wilson
498-209-1098
Dallas
TX
1000.00
Record 09
John Thompson
498-207-1108
El Paso
TX
550.00
Record 10
John Thomas
459-207-2008
Waco
TX
1000.00
Each time a transaction is completed, a new record is added to the standard database. The scaling of data is said to be linear, because the volume
of data is a function of the number of records. However, as the example in Table 1 shows, there is significant redundancy of values in this kind
of database: the name “James” appears three times; the number “1000.00” appears 5 times; the state “Texas” appears 10 times, and so forth.
Now consider the possibility of creating a token for each entry in the database. “James” would have a token of 01 for a first name; “Texas” would
have a token of 01 for state name; “Dallas” might have a token of 01 for city name, and so on. The same can be done for the telephone number
area codes and local exchanges. The database could be reduced to a series of tokens, represented as in Table 2.
6
SAND CDBMS: A Technological Overview
Table 2
First Name
Last Name
City
State
James
01
Johnson
01
Dallas
01
Tom
02
Swanson
02
El Paso
Bill
03
Thomas
03
Waco
Terry
04
Jameson
05
John
05
Thompson
06
TX
Amount
01
Area Code
Prefix
Extension
1000.00 01
459
01
070
01
1872
01
02
550.00 02
498
02
998
02
1097
02
03
275.00 03
409
03
550
03
1093
03
300.00 04
220
04
1082
04
575.00 05
209
05
9832
05
207
06
2987
06
1098
07
1108
08
2008
09
SAND CDBMS employs a domain construct to record the correspondence between actual scalar data values and their respective tokens. This
allows multiple columns to reference the tokens and values contained in a single domain.
Table 3
Record 01
01
01
01
01
01
01
01
01
Record 02
02
02
02
02
02
02
01
02
Record 03
01
03
01
03
03
03
01
01
Record 04
02
04
03
01
02
01
01
03
Record 05
03
01
02
03
04
02
01
04
Record 06
04
03
01
04
05
01
01
01
Record 07
01
02
02
04
06
02
01
05
Record 08
03
05
02
05
07
01
01
01
Record 09
05
06
03
06
08
02
01
02
Record 10
05
03
03
06
09
03
01
01
While the database can be conceived as a structure like the one shown in Table 3, SAND CDBMS actually uses arrays of bit vectors (where a
‘1’ indicates that the value is present, and a ‘0’ that it is not) to represent the positional relationships of tokens — and hence their respective
atomic data values — in columns. In this respect SAND CDBMS is a “fully inverted” database, automatically keeping track of the location of each
data value to enable extremely fast searches for specific information.
It might be objected that these bit vector arrays could become extremely large when table columns contain many unique values. To avoid this
problem, SAND CDBMS applies a unique, patented encoding technique to the bit arrays, producing substantial, lossless compression of sparse
vectors (that is, vectors where either 1’s or 0’s predominate). The encoded bit vectors are further compressed in order to optimize stor-age
economy; when SAND CDBMS processes a database query, it works with the compressed form of the bit vectors, which provides an addi-tional
performance advantage.
The second crucial feature of SAND CDBMS’ data storage architecture is column-orientation. While traditional RDBMS’ are row-oriented, accessing data at the record level, SAND CDBMS accesses data in columns using the bit vectors as described above. Thus, instead of having to bring
each entire data record into memory for processing, only the columns that are required to execute the query are “read”.
For a more comprehensive description of SAND CDBMS’ data storage architecture, please refer to the SAND Technology white paper entitled
Nucleus: Technological Core of SAND CDBMS.
7
SAND Technology White Paper
Benefits of the SAND CDBMS Storage Architecture
Speed
Since SAND CDBMS separates a stored data value from bit vectors indicating whether it exists in a given database table column, data which
satisfies a certain criterion can be found very quickly: minimal work is required to simply scan the bit vectors. In fact, because of the patented
compression applied to the bit vectors, it is possible to find a unique value in a column that contains one billion rows using just four bytes of
data. It is this technique that allows SAND CDBMS to deliver the effective equivalent of 100% indexing, without imposing the need to create
the additional space-consuming and management-intensive index structures that traditional systems require to provide fast access to data.
Due to the column/token-based architecture, active working sets are very small. This characteristic alone accelerates queries by a factor of
two or three (or even more). On top of this, complex queries are primarily resolved using operations performed on encoded bit vectors, which
compounds the acceleration effect in most cases. When the speed of processing is greatly accelerated, many possibilities arise: for example, very
high-speed in-memory processing allows techniques for heuristic analysis that otherwise would not be feasible. Analysts can get quick answers
to queries that would require other systems to scan entire databases.
Storage Economy
Data tokenization, using domain-based storage along with the application of encoded bit vector representation, drastically reduces the storage volume required for structured data. In fact, the larger the amount of data, the greater the difference in storage requirements between
a standard linearly-scaling database and a tokenized database — in other words, the greater the number of records, the greater the storage
advantage of the token-based approach.
Flexibility
The SAND CDBMS storage architecture intrinsically indexes all columns in database tables, but without actually creating additional index structures. When all columns are indexed, the possibilities for analysis are unlimited — analysts can look at any field, in any manner. Query speed
is such that if the analyst wants to refine the results, a new query can be reformulated and rerun. All reformulation and recalculation can be
done in a fraction of the time that would have been required if the database resided in standard record-based storage with adjunctive index
structures. Furthermore, SAND CDBMS’ column-based architecture makes it uncomplicated to add or drop table columns to accommo-date data
model evolution.
Simplified Administration
Like a standard relational database, SAND CDBMS provides read-write access to data, with full ACID-compliant transaction support. However,
SAND CDBMS does this without imposing the usual price of administrative complexity. The benefits of this simplified administration are manifest in a number of areas.
First, SAND CDBMS takes responsibility for the details of storage out of the hands of the user, allowing completely free-form data manipula-tion
— including ad hoc query, transformation, bulk load, create and destroy steps — without the administrative intervention that would traditionally be required at each turn. This is significant because in a standard RDBMS, physical storage issues normally devour time and ad-ministrative
resources with constant structural reorganizations, I/O tuning, partitioning, cache tuning and indexing. SAND CDBMS uses pat-ented memory
management techniques together with a container-based storage model to remove most of the details of storage manage-ment from the
administrative workload. Administrators need only concern themselves with providing adequate file system storage to ac-commodate data
growth, and setting a maximum amount of memory to be used for data operations.
Second, SAND CDBMS uses the operating system to handle its file storage, so backup is also straightforward (see also section 7, Backup and
Recovery). The database can be backed up by simply copying the database files, even when they are being actively queried. In fact, commercially available file system backup utilities can be used for this purpose, with full exploitation of parallelism. The fact that a SAND CDBMS
da-tabase is made up of operating system files also makes it readily transportable from one system to another.
Finally, SAND CDBMS removes two other administrative activities from the equation: locking and tuning, both notorious for their voracious
appetite for DBA resources. SAND CDBMS employs a lock-less, optimistic concurrency control scheme that is notable for its complete absence of
administrative requirements. SAND CDBMS also requires little performance tuning, making it ideal for supporting data exploration activi-ties
consisting mainly of unexpected queries, as well as for prototyping applications.
8
SAND CDBMS: A Technological Overview
3. Virtual Mode Operation
SAND CDBMS’ Virtual Mode of operation is the foundation for some of its most powerful capabilities, particularly when it is used as a BI sys-tem
development facility. In Virtual Mode, any number of separate virtual instances of a single SAND CDBMS database can be made available,
without the need to physically replicate any data. Each of these virtual database instances can be altered and manipulated by users in any
way — including changes to data models, updates, inserts, deletes, and so on — without affecting users of other instances or touching the
actual data in the real database. Only users connected to the same virtual instance will see the changes.
In a SAND CDBMS system operating in standard (“Real”) mode, the data “pages” making up the database are in one of three states at any
given time: unread blocks on disk, unmodified pages in memory, and modified (“dirty”) pages in memory. Dirty pages are written back to the
disk-resident database space as necessary. When Virtual Mode is enabled, contrastingly, dirty pages are paged out to a separate, “temporary”
disk location. This means that, in Virtual Mode, a consistent version of a database is made up of the unread blocks in the main data space, dirty
pages in memory, and paged-out disk-resident blocks in the temporary location.
Figure 2: Virtual Mode vs. Real Mode
Thus client programs actually have “read-only” access to the real database when working in Virtual Mode. When a COMMIT WORK command
is executed by a user connected to a virtual instance, the database changes effected during the preceding transaction are recorded in a temporary Delta File associated with that instance, rather than in the database itself. When the instance is shut down, the Delta File is deleted, and
the changes are discarded. However, it is possible to preserve alterations to the database made in Virtual Mode, by using a special in-stance
configured as the master virtual instance. This permits changes recorded in the instance Delta File to be written back to the core data-base.
Changes made through the master virtual instance, whether live updates or high-speed bulk loads, can be performed concurrently with full
client access through the virtual engines. Virtual Mode even supports high-speed backup of the database files while virtual engines are active.
Figure 3: Multiple Concurrent Servers
9
SAND Technology White Paper
This becomes very powerful when multiple SAND CDBMS engines are set up to run against a single database, each with its own virtual mem-ory
and temporary space. The engines run completely independently of one another, with full read-write capability. Multiple engines can share
processors, run on multiple processors in an SMP environment, or even run on separate computers in a networked or clustered envi-ronment.
The “virtual database” concept is revolutionary, especially as it applies to the BI system design process. Separate audiences can take the data
through many different experimental scenarios without fear of polluting or irrevocably modifying the original legacy data. The ability to explore multiple scenarios in parallel magnifies the power of SAND CDBMS and represents a quantum leap in BI system development capabilities.
4. Time Travel Database Versioning
SAND CDBMS’ Time Travel feature is a unique database multi-versioning system, extending the basic Virtual Mode capabilities described
above to permit multiple different virtual versions or “Snapshots” of a SAND CDBMS database to be stored and re-used — without requiring
the creation of multiple physical copies of database files. As in Virtual Mode, a SAND CDBMS operating in Time Travel Mode records only the
changes made to the original data, and then represents the result as a new version of the complete database. However, this new version can
be preserved and used as a new starting point for database clients, who can make further updates and save them as a new Snapshot, in effect
producing a “chain” of interrelated virtual database versions. Any of these database versions can be concurrently accessed and freely updated,
without affecting the others.
Figure 4: Example of a Time Travel Environment
When working in Time Travel mode, the initial state of the core physical database can be
considered as the root node of a tree structure, which grows as each set of updates is saved
as a Snapshot. Altered database Snapshots (representing physical Update Files) are added to
the tree as child nodes dependent on the Snapshot from which they started. When further
changes are made to these new database Snapshots, a new level of nodes is added, forming
“branches” that represent the dependencies between the various states (see Figure 5).
At any time, selected Time Travel database Snapshots can be merged together (for example,
to collapse a series of daily updates into one that represents a week), or used to update the
core database. Furthermore, Snapshot merges and database updates can be processed “in the
background” and then, once complete, be substituted seamlessly as the current version of the
data made available to clients (this is accomplished using the Octopus workload management
utility, described in the next section).
Figure 5: The Tree Structure of Time Travel Database
Versions
The Time Travel option enables the following unique functionality for SAND CDBMS users:
• The possibility of “freezing” analytical environments so that long-running analyses are unaffected by updates to the core database
• The ability to interrogate the data as it existed at some time in the past, without having to restore any files, even while other users
continue to use the current version of the database
• The ability to preserve database structures created as part of “what-if” scenarios during data prototyping
• 24/7 operations (with Octopus), allowing updates with no “down time”.
10
SAND CDBMS: A Technological Overview
5. Workload Manager
A crucial component of large-scale SAND CDBMS implementations, the Octopus utility is designed to optimize the use of computing resources
in a busy database environment by automatically managing the workload generated by multiple clients using the same SAND CDBMS database.
When users issue queries simultaneously, Octopus transparently distributes query processing among a set of SAND CDBMS database engines
running in parallel. Octopus is especially well suited for use in SMP (symmetric multi-processing) computing architectures; it is also the central
component of the Massively Parallel Server Option (MPSO®), which allows SAND CDBMS implementations to scale to powerful multi-server
(MPP) environments..
Octopus Architecture
The basic architecture of the Octopus system is very simple. Each instance of the Octopus utility works with a single SAND CDBMS database.
When Octopus is started, it automatically starts the database with a specified number of special server engines, which can reside on a single
computer or on multiple computers connected over a network. In the second configuration, the system can start with a single server machine,
then add more as demand grows — and the database files can themselves also be distributed among the server machines.
All users connect to the same Octopus server, and proceed to work with the data as they would in a standard SAND CDBMS environment.
Octopus will automatically send a user’s query to an engine that is not busy processing another command; if none are free, the query is queued
for execution by the first engine that becomes available.
Furthermore, these engines can be grouped into classes of service that are configured to handle specific processing requirements, so that the
execution of complicated, long-running queries does not delay response to shorter-running ones. This feature gives DBA’s a simple and flexible
way to guarantee specific levels of database service, without having to worry about the management of individual database connections
.
Figure 6: The Octopus System
The operation of Octopus is entirely transparent: even while multiple connections to different engines are being made internally, users only
need to specify one connection to Octopus. Each client session continues uninterrupted until it is disconnected from the Octopus server. From
the point of view of ODBC connectivity, the Octopus database is a single data source.
11
SAND Technology White Paper
24/7 Operations
When Octopus is used in conjunction with Time Travel Mode, SAND CDBMS is easily configured for 24/7 operations, enabling round-the-clock
availability of data for clients. In Time Travel mode, database updates can be performed and saved as a new Snapshot “in the background” while
users continue to access the old data. Once the update is complete, the Octopus utility can be instructed to switch client connections transparently to make the updated version available to users.
6. Text Search
Advanced Text Search extends the analytic capabilities of SAND CDBMS to include high-performance searching within the blocks of unstructured text – email messages, documents, web pages , and so on.
With the SAND CDBMS Advanced Text Search option, users can easily locate valuable information that previously lay hidden and unexploited within vast amounts of textual data to power applications in areas like as e-discovery, web site analytics, fraud detection and more.
Enhancements to standard SQL provide extensive search capabilities, including:
•
•
•
•
•
SOUNDS LIKE
SPELLED LIKE
Proximity searches for words within or beyond a specified distance
“Concept” searches for synonyms
“Stem” searches for related words.
SSearch terms can be combined with the boolean operators AND, OR and NOT. These booleans connectors allow very elaborate search terms to
be constructed, and search results can be scored for relevance, making it simple to assess results and quickly find the precise information that
is required. Any or all of these searches can be incorporated within standard database queries, delivering unprecedented analytic capabilities
to users.
SAND CDBMS’ column-oriented data management technology is based on the use of inverted indexes, a structure which is ideal for full-text
indexing and searching: words in unstructured documents are tokenized and automatically indexed according to their location in each document, enabling very fast location of specific entries. The SAND CDBMS Advanced Text Search option takes full advantage of this inherent architectural adaptation to provide the highest performance for text analytics.
7. Security
The security features of a data management product allow administrators to prevent unauthorized access to and manipulation of data, thus
protecting a resource that is of vital significance to the enterprise. SAND CDBMS ensures data security not only by means of the usual SQL
mechanisms for controlling access to database contents (that is, database user authorizations and privileges), but also through the intrinsic
encryption provided by the tokenization/bitmap indexing process; furthermore, in data environments containing sensitive personal information, the optional Secure Record De-identification module can be implemented to guarantee the privacy of individuals.
Database File Security
An earlier section of this paper described how SAND CDBMS “tokenizes” all data held in database tables, and therefore needs to store each
unique value only once. A side effect of this process is that the database is intrinsically encrypted, since the original relationship between the
data token and its actual value is stored in the form of multiple encoded bit vectors that are unreadable to programs other than SAND CDBMS.
Furthermore, all data values (text strings, numbers, and date/time/timestamps) in a SAND CDBMS database are stored in an “obfus-cated” way
so that there is no readable clear text present in the database files; additionally, the stored values are not NULL delimited, so an intruder will not
be able to discern the boundaries between values. Thus, while a SAND CDBMS database consists physically of a collection of operating system
files that can be browsed, or copied as required for backup purposes, SAND’s “defense in depth” of stored data values en-sures that looking into
the contents of these files will reveal nothing useful about the database.
12
SAND CDBMS: A Technological Overview
As in any enterprise computing environment, the operating system administrator will be responsible for enforcing the necessary permissions on
the actual database files, in order to ensure that these files can be accessed only by the SAND CDBMS system account. Database end users do not
generally need to have direct access to the actual database files, since they will access the data through SAND CDBMS’ communications layer.
Virtual/Time Travel Mode Delta File Security
When changes are made to a SAND CDBMS database running in Virtual or Time Travel mode, a Delta File is produced. This is a temporary file that
is locked exclusively by SAND CDBMS, and so cannot be read by any other program. When the administrator decides to make this temporary file
permanent, the file is transformed into a persistent Update File. The Update File cannot be decoded by anyone — not even by SAND Technology
personnel — without access to the original database, for while the Update File may indicate that token ‘23’ has changed, it is impossible to
determine what value (or format) this token represents without reference to the original domain of values in the database.
One of the major benefits of Virtual Mode Update Files is that it is possible to distribute them to remote sites and apply them there to ‘slave’
databases, allowing the simple replication of the master database at remote locations using only a single command. These files can also be
compressed and then securely transferred using FTP (or, in some cases, even email) to the remote location; since a SAND CDBMS Update File
cannot be decoded without access to the original database, it will be of no use on its own should it be intercepted by an unauthorized user.
Secure Record De-Identification
Protection of individual privacy is a major concern in healthcare institutions, government departments, law enforcement agencies, and other
organizations that collect sensitive personal data. For situations where such considerations call for a particularly rigorous data security regime,
SAND Technology offers a Data Privacy Option for use with SAND CDBMS, allowing for analytics on “de-identified” data while retaining the
possibility of restoring the identification of records when required.
The SAND Data Privacy Module separates attributes in the database that allow identification of an individual from fields that do not, and
encrypts personal information using a AES 256-bit encryption key that is specific to the customer organization. This means that the central
repository holding all personal information is fully encrypted on disk, protecting the organization against theft of storage (for example, of
a tape backup containing a copy of the repository). Built-in monitoring ensures full compliance with any applicable legal regulations. The
SAND secure data model, incorporating a user profile manager to manage access privileges, permits correlation of records relating to a given
patient without decrypting any identifying information, enabling analytical tracking of visits or procedures while maintaining privacy. Sensitive
personal information may be “re-identified” on a user’s workstation (based on the user’s security profile) when analytical results are returned,
but only in strictly regulated circumstances. This means that all information transferred over the network is encrypted, protecting the confidentiality of the data against network hacking.
Figure 7: SAND Data Privacy Module
The SAND Advanced Data Security option features a utility for management of the access privilege profiles of various users in relation to specific
data elements in the warehouse. This utility identifies the level of sensitivity of each data element contained in SAND CDBMS and the access
level of each user of the server. For audit purposes, it also keeps track of all commands that grant or revoke user privileges on various data
elements.
13
SAND Technology White Paper
8. Backup and Recovery
SAND CDBMS databases consist of files that are visible to the operating system. This characteristic, when coupled with the SAND CDBMS Time
Travel architecture, makes database backup, recovery, and restoration considerably simpler (and frequently more effective) than is possible
with conventional data warehouse and data mart technologies. It also makes even large databases highly portable, yielding a very natural and
effective approach to full disaster recovery — something that can be extremely difficult or even impossible for conventional RDBMSs.
When deployed in Virtual Mode, all persistent database files can be backed up while the database is available for query, and even during load
or update processes. A separate database server instance can be set up to process periodic refreshes (load and update). All changes will then
be reflected in a Delta File under the control of this server instance. At predetermined intervals, the changes reflected in the Delta File can be
propagated back to the database.
SAND CDBMS’ Time Travel Mode provides 24x7 system availability, as well as incremental backup capability. Time Travel Mode permits the
maintenance of a chain of Update Files referring back to a persistent database. In this context, any pair of consecutive Update Files in a chain can
be merged without interrupting system availability. Merges back to the core database can be scheduled to occur weekly, monthly or at longer
intervals. In this case, the backup strategy would entail backing up the Update Files, constituting an effective incremental backup capability. A
database is restorable to the last complete set of changes by backing up the latest Update File in the chain.
Disaster Recovery
Conventional DBMS strategies for disaster recovery of high-availability data marts and warehouses are usually highly complex, expensive, and
subject to failure. The fact that SAND CDBMS’ databases are made up of files that are visible to the host operating system makes these databases
effectively “portable”. A database can literally be transferred to a new host system and mounted there with only a trivial amount of administrative overhead. This, together with Time Travel’s incremental backup capability, makes SAND CDBMS uniquely capable of realiz-ing simple and
effective backup and recovery in disaster scenarios.
Implementation of an off-site “shadow” SAND CDBMS system is straightforward:
1. Backup the entire set of persistent database files to an off-site system.
2. Periodically transfer incremental backups (Time Travel Update Files) to the off-site system via high-speed network transfer.
3. The original backup, together with the latest Time Travel Update File, represents the latest baseline of the primary system. In a disaster
scenario, operations can be switched over to the off-site backup nearly instantaneously.
There are a number of simple options for handing Time Travel Update Files in this situation, including maintaining them as a chain, merging
delta files, or applying them back to the backup copy of the original persistent database.
14
SAND CDBMS: A Technological Overview
Conclusion
Ironically, while Business Intelligence has grown in importance, it has also developed a cumbersome infrastructure that hinders rapid re-sponse
to evolving business needs. SAND CDBMS is designed to realize the initial promise of Business Intelligence, by enabling information delivery
that is responsive to the pace and demands of business, and by placing “intelligence” directly in the hands of business users without requiring
extensive assistance from the IT department. The advanced technological features described in this document combine to enable the speed,
performance, flexibility, scalability and low cost of ownership of SAND CDBMS. Together, these features give organizations the ability to quickly
implement powerful BI applications without disturbing their existing infrastructure, accelerating the benefits associated with enhanced business understanding.
15
SAND Technology White Paper
About SAND Technology
SAND CDBMS is the world’s most advanced massively parallel, column-oriented database software, accelerating the largest user populations,
the biggest data, and the most complex analytics. SAND CDBMS also provides cost-effective nearline data access designed to lower TCO and
improve operational performance for SAP NetWeaver BW, IBM DB2, Microsoft SQL Server, Oracle, SAS, and more. SAND Technology has offices
in the United States, Canada, Western and Central Europe, and can be reached online at www.sand.com.
16
www.sand.com
+1 877-468-2538 +44 (0)1276 804604 +49 40 211 0765 0