Download better with bitemporal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
BETTER WITH
BITEMPORAL
MARKLOGIC WHITE PAPER
•
JUNE 2015
In our age of billion-dollar regulatory fines and time-consuming, costly litigation,
a database must hold up as the main system of record. Unfortunately, traditional
databases do not keep a complete history of the past. Only with a bitemporal
database can you truly maintain a complete and accurate picture of the past,
understanding exactly “what you knew” and “when you knew it.”
ASSESSMENT: DO YOU NEED BITEMPORAL?
Before you go any further, it is probably helpful to first ask whether you might need bitemporal data management
in your organization. If you answer “yes” to any of the following questions, then bitemporal is a solution that you
should consider.
YES
1. Is tracking when events or transactions occur critical to your business?
✔
2. Are there ever cases when historical data needs to be updated?
✔
3. Do you run into circumstances in which there is a lag between when something happened
in the real world, and when it was recorded in the database?
✔
4. Do you get frequent requests from regulators to review historical data?
✔
5. Do you work in an industry in which the sequence of when you learn about certain
information is significant, such as in law and intelligence?
✔
6. Is the cost and complexity of storing and accessing historical data in your organization
overwhelming?
✔
7. Does managing and accessing historical data cost significant developer resources, or
carry increasing risk over time?
✔
NO
Contents
Introduction................................................................................................................................................................1
The Cost of Not Having Bitemporal
Three Types of Temporality......................................................................................................................................2
Non-temporal
Unitemporal
Bitemporal
The Benefits of Bitemporal.......................................................................................................................................4
Things You Can Do With Bitemporal
The Increasing Need for Bitemporal
Bitemporal Across Industries
Why Bitemporal Has Been Difficult.........................................................................................................................7
Why the Time for Bitemporal is Now.......................................................................................................................8
Key Features of Bitemporal in MarkLogic
Get Going Quickly
More Information
INTRODUCTION
to accurately explore historical data, manage that data
across systems, ensure full data integrity, and do more
complex analysis.
Today, databases are the primary system of record,
not paper. In this new reality, organizations are required
to keep an accurate picture of all the facts, as they
occur. For certain industries such as financial services,
insurance, and healthcare, there are even laws that
mandate how historical data is tracked and managed.
MarkLogic® is an Enterprise NoSQL database that is
best suited for storing and managing bitemporal data
for the following reasons:
• Flexible Data Model – MarkLogic’s documentoriented data model is schema-agnostic and able
to manage the complexities of bitemporal data
that relational databases are ill-suited for, such
as integrity constraints, evolving schemas, and
multiple different data models.
Unfortunately, traditional databases cannot provide
a truly accurate picture of your business at different
points-in-time. The reason is that traditional databases
are unitemporal, and can only track start and end
times along a single timeline. But, what if there is a lag
between when something happened and when you
found out about it? Which time should you record?
Or, what if you realize you need to make a correction
to when something happened, but do not want to
overwrite any historical data? In those cases, a single
timeline is not enough.
• Enterprise Reliability – MarkLogic has the
enterprise features that other new generation
databases do not. MarkLogic is a proven database
that runs mission-critical applications at hundreds
of world-leading organizations.
• Bitemporal Out-of-the-Box – Bitemporal is
a feature built-in to MarkLogic whereas other
vendors make it an additional software add-on that
increases cost and complexity.1
With a bitemporal database, you can store and query
data along two timelines with timestamps for both
valid time—when a fact occurred in the real world
(“what you knew”), and also system time—when that
fact was recorded to the database (“when you knew
it”). By tracking events along two timelines with a
bitemporal database, it is possible to keep a complete
and accurate picture of your business at any given time
for internal search and discovery purposes or for when
regulators conduct audits.
THE COST OF NOT HAVING BITEMPORAL
Not having bitemporal is directly attributed to
costing one company $25 Million.1 It has cost
(or perhaps saved) many politicians their jobs.
In our age of super-regulation and the need
to maintain provenance, immutability, and
governance with historical data, the potential
cost of not using bitemporal grows much larger.
This is particularly true in industries such as
financial services where not having an accurate
picture of the past has contributed to multibillion dollar fines and further increases in
regulation.
Consider some of the new questions that a bitemporal
database allows you to ask:
• What were my customer’s credit ratings last year
as I knew them last quarter?
• What was our position with that security before the
trade was amended?
• What did our intelligence indicate before we
learned that new piece of information?
With a traditional unitemporal database, you can ask
what your customer’s credit ratings looked like as you
knew them today, but not yesterday or last quarter.
1 Hudson Foods recalled one-fifth of their annual output in 1997 due to an
outbreak of E. Coli, costing them an estimated $25 Million. Their database only allowed them to see a current view of which beef came from which sources, and not
a view of their data as it existed on the day the supplier processed the small batch
of contaminated meat. This meant the entire product had to be recalled. For more
information, see Richard T. Snodgrass’ book, Developing Time-Oriented Database
Applications in SQL (ch.2, 11).
Only a bitemporal database allows you to go back
and see an accurate and unaltered picture of historical
data, including past and present changes. A bitemporal
database is necessary for today’s enterprises to be able
1
THREE TYPES OF TEMPORALITY
To understand bitemporal, you first have to understand
how databases currently manage time. In relation to
time, there are three basic categories of databases:
non-temporal, unitemporal, and bitemporal. Each
type is discussed below, using the example of when
a patient was diagnosed with an allergy and when the
doctor found out about it as a guide.
NO
ALLERGY
DIAGNOSIS
TIME:
9 AM
POSITIVE ALLERGY
DIAGNOSIS
10 AM
11 AM
12 AM
FIGURE 2: A unitemporal database only tracks valid time.
NON-TEMPORAL
that patient get diagnosed with an allergy? How many
patients have that same allergy? How long has the
patient had the allergy? In the example of the patient
with the allergy, it is clear from the graph in Figure 2 that
the patient was diagnosed with an allergy at 9:00am
along the valid timeline.
Non-temporal databases store data with no time
dimension. A fact is just a fact—there is no history
and it is only understood to be true at the current
point in time. Data models that do not support a time
dimension are just called snapshots.
The problem is that valid time only shows a piece of
the picture. Looking at the figure above, it would not be
clear to an outside observer when the doctor learned
that the patient was diagnosed with the allergy. What
if it was the lab that first discovered the allergy, but
there was a lag in time before the doctor actually found
out about the lab results? That is valuable information
that is not recorded in a unitemporal database. In
this example, imagine if a drug was administered
to the patient that day that caused an anaphylactic
reaction—didn’t the doctor know not to administer that
drug? Let’s look at how to solve this problem with a
bitemporal database.
POSITIVE ALLERGY
DIAGNOSIS
FIGURE 1: A nontemporal database does not store any time dimensions.
Just imagine the example of when a patient was
diagnosed with an allergy, which is an important piece
of information considering the potential adverse and
even deadly reactions that some patients can have
to common medications like penicillin. With a nontemporal database, you would just see the current
state, which would be either “patient has no allergy” or
“patient has a positive allergy diagnosis,” as depicted in
Figure 1 in which the shaded area represents when the
fact is true.
BITEMPORAL
A bitemporal database records timestamps for events
along two dimensions of time: valid time and system
time. Valid time tracks when an event occurred in the
real world. System time (sometimes called “transaction
time”) tracks when the event is recorded to the
database. These two time dimensions are depicted
graphically along both axes in Figure 3. In this example,
valid time represents when the lab discovered the
allergy, and system time represents when the doctor
found out about it and recorded it to his chart.
In a non-temporal database, you just get a single
view of the data without respect to time. It should not
be surprising that non-temporal databases are very
uncommon, as most applications deal with timevarying data.
UNITEMPORAL
Unitemporal databases make the false assumption
that valid time is always equal to system time, and in
doing so loses valuable information. Sometime, as
Figure 3 depicts, valid time is equal to system time. But,
you would not know that unless you had a bitemporal
database. A bitemporal database records time along
Unitemporal databases support time across one
dimension: valid time. Most people just think of valid
time as just “time”—it represents when something
happened in the real world. Valid time is tracked along a
single timeline to answer questions such as: When did
2
Taking this example a bit further, imagine that later
on the same day, at 11:30am, the doctor gets a call
from the lab saying that they just discovered that they
did the tests incorrectly. The lab result was actually
negative—the patient does not have an allergy. This
correction is shown in Figure 5. With a bitemporal
database, it is easy to make corrections to historical
data, and the process does not overwrite any data.
both dimensions independently so you can keep
accurate records.
FIGURE 3: A bitemporal database tracks both valid time and system time.
FIGURE 5: A bitemporal database tracks corrections without overwriting data.
11 AM
POSITIVE ALLERGY
DIAGNOSIS
CORRECTION
10 AM
12 AM
VALID TIME
“When the lab discovered the allergy”
VALID TIME
“When the lab discovered the allergy”
12 AM
9 AM
NO ALLERGY
DIAGNOSIS
9 AM
10 AM
11 AM
12 AM
SYSTEM TIME
“When the doctor found out about it”
Using the example of the patient with the allergy,
imagine that the doctor actually found out about the
allergy at 10:30am, an hour and a half after the lab
did their tests and concluded that the patient had an
allergy. The lab noted that the patient had an allergy at
9:00am, but that information did not get to the doctor
until 10:30am. This represents a lag between valid time
and system time, and would look like Figure 4.
NO ALLERGY
DIAGNOSIS
POSITIVE
ALLERGY
DIAGNOSIS
10 AM
9 AM
9 AM
10 AM
11 AM
12 AM
SYSTEM TIME
“When the doctor found out about it”
By looking at Figure 5, we can ascertain the
following facts:
• Before 10:30am (system time), the doctor did not
know about the allergy
FIGURE 4: A bitemporal database tracks lags in information.
• At 10:30am (system time), the doctor recorded
the patient having an allergy, which had been
discovered by the lab at 9:00am (valid time)
12 AM
VALID TIME
“When the lab discovered the allergy”
11 AM
• At 11:30am (system time), the lab and doctor
discover the mistake and update the records to
show that the patient does not have an allergy
11 AM
NO ALLERGY
DIAGNOSIS
POSITIVE ALLERGY
DIAGNOSIS
10 AM
With this timeline tracked across both axes, it is now
possible to go back and see a true picture of events.
This can be extremely helpful in understanding and
avoiding mistakes, as the doctor’s decisions can be
easily married to what he knew or did not know at any
given point in time. In the setting of a hospital, drug
allergies can be life threatening, so having an accurate
record of when a patient was diagnosed and when care
providers learn this information is critical.
LAG
9 AM
9 AM
10 AM
11 AM
12 AM
SYSTEM TIME
“When the doctor found out about it”
3
THINGS YOU CAN DO WITH BITEMPORAL
The example of the allergy diagnosis may seem
somewhat simple, but the same concept can be
applied to any piece of data, whether it is when a
financial trade occurred, when someone got insurance,
or when someone owned a house. In all of these cases,
START DATE and END DATE for both valid time and
system time can be tracked in order to preserve the
most accurate picture of reality.
• Handle Regulation and Audits – Provide an
accurate picture of the past to meet requirements
for increased transparency and accountability
• Manage Risk – Create better risk models and
improve business intelligence by analyzing true
historical data
TABLE 1: Comparing Unitemporal to Bitemporal for a variety of examples.
UNITEMPORAL
When did the lab results
indicate that the patient
had an allergy to penicillin?
• Reduce Costs – Simplify architecture and reduce
the cost and operational risk of storing redundant
historical data
BITEMPORAL
When did the lab results
indicate that the patient had
an allergy to penicillin, and
when did the care provider
learn about the allergy?
When was the sell order
cancelled by the bank’s
counter party?
When was the sell order
cancelled by the bank’s
counter party, and when did
the trader learn that it was
cancelled?
What reference data
existed regarding trade
events on December 4th?
What reference data did
the trader actually have on
December 4th?
When did John become
eligible for insurance
coverage, as the
employment records
indicate now?
When did John become
eligible for insurance
coverage, as the
employment records
indicated in 2012?
THE INCREASING NEED FOR BITEMPORAL
The need to better manage regulatory concerns is
growing in general, though it is having a particularly
significant impact in certain industries, such as financial
services. Large banks have been hit with recordbreaking fines in recent years, coupled with an increase
in regulatory pressures. Since 2009, banks in the U.S.
and Europe have paid over $128 billion to regulators,
and 2014 was the biggest year ever, with $65 billion in
penalties and fines, about 40% greater than in 2013.2
Today, regulators are more intrusive and carry out more
vigorous enforcement as they drill into the details.
According to Gerold Grasshoff, the global head of
risk management and regulation at Boston Consulting
Group, regulatory pressures are now a core issue for
banks. “You have to change your operating model,
change your products, change the legal risks now...
Nothing is changing business models as much as
the regulatory issues. That is the biggest strategic
challenge.” To adopt to the changing way in which
business is done, banks are having to change their IT
and data management approaches to
increase transparency.3
THE BENEFITS OF BITEMPORAL
Bitemporal, simply put, gives you a better way to
manage time. No alternative to bitemporal, even
temporal versioning, can provide a seamless, queryable, flexible view of historical data. Bitemporal
is a critical capability any organization can take
advantage of, and there is a particularly growing
need for bitemporal in industries that face growing
regulatory pressures and litigation such as financial
services, insurance, and healthcare. In these industries,
organizations are having to better account for all of their
past actions with the onset of new laws and litigation,
more frequent and in-depth audits, and increased fines
for non-compliance. Organizations that better manage
their historical data are able to reduce their risk and get
through audits unscathed.
Other industries are also facing increased regulatory
pressures. In healthcare, for example, there is the
problem wrought by medical errors, which some reports
estimate to be $1 Trillion.4 Knowing when and how
2 James Sterngold, “For Banks, 2014 Was a Year of Big Penalties”, Dec.
30, 2014 <http://www.wsj.com/articles/no-more-regulatory-nice-guy-forbanks-1419957394>
3 Boston Consulting Group, “Building the Transparent Bank”, Dec. 2014 <https://
www.bcgperspectives.com/Images/Building_the_Transparent_Bank_Dec_2014_
tcm80-177814.pdf>
4 Andel, Davidow, Hollander, Moreno. “The economics of health care quality and
medical errors.” Journal of Health Care Finance 39(1):39-50 (2012) <http://www.
ncbi.nlm.nih.gov/pubmed/23155743>
4
“
We’re in an era of very, very vigorous enforcement, of heightened
super regulation. It’s not a one-off thing.”
Benjamin Lawsky,
Superintendent for Financial Services, New York State6
errors occurred is critical to improving medical decision
making and avoiding medical malpractice. And,
consider the growing cost of fraud and abuse across
the healthcare industry, estimated to be anywhere
between $82 and 272 Billion in the U.S.5 Unfortunately,
the general cost and complexity surrounding patient
safety, malpractice litigation, and fraud and abuse is
only increasing.6
TABLE 2: Bitemporal in Financial Services
BEFORE BITEMPORAL
By implementing bitemporal data management,
organizations can take a bold step towards lowering
risk, improving transparency, and gaining a competitive
advantage to outrun the competition.
What do we think the trader’s
position was, and what
information do we think was
available to the trader around
the time when the trade was
executed?
What was the trader’s exact
position when the trade was
executed, and what exact
reference data was available
at the time the trade was
executed?
What were our customer’s
credit ratings last year?
What were our customer’s
credit ratings last year, as we
knew them last quarter?
What was our market
exposure when trade was
made at 11:00am?
What was our market
exposure when that trade
was made at 11:00am, as we
knew it at 11:30am?
What was the company’s
profit when we gave
guidance?
What did we think the
company’s profit was when
we gave guidance?
BITEMPORAL ACROSS INDUSTRIES
FINANCIAL SERVICES
Bitemporal helps large banks better manage their data
and adapt to the changes in laws and regulation that
are impacting how business is done. For example,
bitemporal helps by providing an accurate record of
trades as they occur and are amended. After trades are
made, they are later reconciled with counterparties and
updates often occur before the trade is closed. With
a unitemporal database, updates overwrite historical
data, which can put enormous risk on individual traders
and entire companies. Bitemporal provides an accurate
picture of the entire lifecycle of a trade review, including
when changes to counterparty names, transaction id’s,
or price corrections occurred.
AFTER BITEMPORAL
TABLE 3: Bitemporal in Insurance
BEFORE BITEMPORAL
What was the estimated
impact of the disaster on
insurance premiums?
What was the estimated
impact of the disaster on
insurance premiums, before
the data was adjusted
retroactively?
Did the beneficiary have
coverage at the point of
diagnosis?
Did the beneficiary have
coverage at the point
of diagnosis, before the
legislation was enacted?
Was the employee with the
company when the event
occurred?
Was the employee with the
company when the event
occurred, as indicated by
your records at that time?
INSURANCE
In the insurance industry, bitemporal helps by providing
a clear determination of coverage over the course
of history, ensuring that even if there are retroactive
changes, data is never overwritten.
5 Berwick, Hackbarth. “Eliminating waste in US health care.” JAMA
307(14):1513-6 (2012) <http://www.ncbi.nlm.nih.gov/pubmed/22419800>
6 James Sterngold. “For Banks, 2014 Was a Year of Big Penalties.” Wall Street
Journal, 2014.
5
AFTER BITEMPORAL
“
MarkLogic’s bitemporal offers the flexibility of correlating and delivering additional
value of data (by providing intraday information, not just end-of-day information)
to a diverse customer group—rapidly—that just hasn’t been fully realized
before... In fact, MarkLogic’s bitemporal will provide an entirely new opportunity
for our customers to perform additional analytics as well as enabling much richer
capabilities in the area of compliance management.”
Paolo Pelizzoli, Global Head of Architecture, Global Technology Operations, Broadridge Financial Solutions
The insurance company can always go back and see a
history of past coverage at any point in time in the past.
An insurer may also want to know employee status, and
may need an accurate picture of when an employee
was actually with a company at any point in time, as
they knew it at any point in time.
TABLE 4: Bitemporal in Healthcare
BEFORE BITEMPORAL
HEALTHCARE
Healthcare faces enormous challenges for all
stakeholders, including providers, payers, and
pharmaceutical and biotechnology companies.
Bitemporal is one component of improvements in
health IT that helps lower costs and improve outcomes
by giving providers a more accurate picture of a
patient’s history as varied teams direct the course of
treatment, and an improved investigative tool when
looking at adverse events. And, when Payers receive
billing codes for procedures, they are able to track the
full history of each patient. Even if changes to insurance
coverage were made retroactively, no part of the history
is lost. There are also benefits to pharmaceutical
and biotechnology companies as they are able to
use bitemporal to enhance decision making in both
research and business.
AFTER BITEMPORAL
What did the patient’s chart
look like when the medication
was prescribed?
What did the patient’s chart
look like when the medication
was prescribed, before the
chart was updated with the
lab results?
What was the coverage
determination for that patient
in June 2010?
What was the coverage
determination for that patient
in June 2010, as we knew it in
August 2010?
What did the clinical trial
results indicate when
you made the additional
investment?
What did the clinical trial
results indicate when
you made the additional
investment, before the
research results were
updated?
TABLE 5: Bitemporal in Law and Intelligence
BEFORE BITEMPORAL
What was happening when
we made the decision?
What did we think was
happening when we made
the decision?
When did the event happen?
When did the event happen,
and when was that recorded?
Why do we currently think
that we pursued that course
of action?
What were we thinking when
we pursued that course of
action?
LAW AND INTELLIGENCE
Bitemporal helps paint a complete picture even when
disparate facts are gathered piece-meal before and
after certain events. With a more complete picture,
government agencies have the ability to better
understand motives and even better predict future
events. During investigations, bitemporal enables law
enforcement officers to go back and ask why you went
down a certain path, which is particularly useful when
investigations are resurrected from cold case files.
6
AFTER BITEMPORAL
“
Relational bitemporal offerings are not widely adopted because as time changes,
the shape of the data usually changes as well… and RDBMS’ are not able to
capture the evolving schema.”
Global Investment Bank
WHY BITEMPORAL
HAS BEEN DIFFICULT
• Schema Evolution – There is incredible
complexity when adding bitemporal to a relational
model. Architectural and structural changes are
temporal themselves, and when new columns are
added with temporal dimensions or new tables are
created as new data is ingested, the schema will
change. Handling a changing schema and resulting
changes in application code are complex projects
already, even before trying to add bitemporal.
At this point you are likely asking, “If bitemporal is that
important, why haven’t I heard about it?” Although
there have been thousands of research papers written
on the topic of temporal data in the past twenty years
and the topic of bitemporal has been discussed by
experts since the early 1990’s, bitemporal is still
relatively unknown.
• Multiple Data Models – Handling schema
evolution is a difficult challenge, but now imagine
the task of handling multiple evolving schemas
across multiple data models and data silos, and
then aggregating them into a single source of truth.
Data integration is an expensive task, but when
bitemporal data is included, the complexity grows
exponentially.
Bitemporal clearly has incredible business value. Yet,
most analysts on the business side do not even know
they can ask for bitemporal data because it is so
seldom put into production. The problem is that with
relational databases the complexities of implementing
and maintaining bitemporal generally outweigh the
benefits. In fact, just handling ordinary temporal data in
a relational database can be a huge challenge.
• Decline in Performance – Read and write
performance typically dips because bitemporal
queries must consider the additional axis of time
in every query, and data usually spans multiple
tables and in some cases even multiple servers.
Attempts have been made to simplify queries and
improve performance, but they have not gone
far enough in eliminating the inherent complexity
and performance issues caused by scattering
bitemporal data across tables.
Unfortunately, despite efforts to make bitemporal data
easier to manage in relational databases, bitemporal
remains an unreachable goal with traditional tools. The
number of experts in the world that can manage the
complexities inherent with bitemporal implementations
using relational databases is probably limited to only a
special few individuals. Without going too deeply into
the details of bitemporal data modelling, here are some
of the key reasons why relational databases are illsuited for bitemporal data management:
• Vendor Lock-in – Some vendors have begun to
implement improvements in bitemporal. However,
as happened in the past with implementing SQL
standards, each vendor will implement them
differently with their own syntax and then tack on
an additional cost of the feature as an add-on.
• Integrity Constraints – The relational data model
comes with constraints such as referential integrity,
entity integrity, and defined schemas that are not
easily changed. Some constraints are specific to
temporal data, such as child rows within a table
only being able to include valid periods of time
within the valid period of time defined by the
parent row of the table. When bitemporal columns
are added to a relational table, they can wreak
havoc on the relational data model.
Oftentimes, the response to the challenges of
implementing bitemporal in a relational database is
to find the next best solution. Here are some of the
common responses.
7
“
Despite the near universality of time and the time-varying nature of the enterprise being
modeled—a static and unmalleable configuration is rare and uninteresting—SQL quite
frankly does a lousy job in capturing those aspects that are changing in time, or in
providing constructs to effectively model, query, or modify such information.”
Richard Snodgrass, Developing Time-Oriented Database Applications in SQL8
• “But, I can just use Slowly Changing
Dimensions” – Attempts to use dimensional
modelling “type two” Slowly Changing Dimensions
(SCDs) as a way to approximate bitemporal data
have been made in recent years, and the problems
with this approach have been well documented.7
Using SCDs only approximates valid temporal data
and results in many inconsistencies that are later
difficult to uncover and fix. And, even if everything
is designed properly, query performance will likely
still be slow and results may not be reproducible.
• “But, I can just rely on my audit logs” – While
useful for tracking event information, logs are not
sufficient for bitemporal because they cannot
be easily or quickly queried and would not meet
standards for maintaining immutable records
Bitemporal is the only approach to managing time that
provides a quick and seamless way to look back at
historical data, query it on the fly at any point-in-time,
and work with it operationally just as you would with
your most current data.8
• “But, I can just take frequent snapshots” –
This approach, also referred to as “temporal
versioning,” is a more common argument against
bitemporal, as most organizations are already
taking regular weekly or monthly snapshots of
their data. This approach is stable and predictable.
Unfortunately, this approach results in massive
amounts of redundant data, immense storage
costs, and still lots of lost information because
of the gaps between snapshots. And, even if
frequent snapshots are taken, regulators in most
industries view this as increasingly unacceptable.
Both regulators and data analysts have specific
questions, require fast answers, and do not
appreciate any gaps.
WHY THE TIME FOR
BITEMPORAL IS NOW
As an Enterprise NoSQL database, MarkLogic
provides the flexibility required to make storing and
managing bitemporal data a practical reality, without
sacrificing any performance with complex queries or
data resiliency and security. MarkLogic is also unique
in being the only Enterprise NoSQL database that has
bitemporal capability.
MarkLogic is schema-agnostic, and manages data
as documents. This means that you do not have to
maintain a strict schema that must be adhered to
throughout the life of the database. If you have to
Advantages of Bitemporal in MarkLogic
Schema-agnostic to handle schema evolution and multiple varying data models
Simpler coding and operations
Quicker time-to-value
Scalability, elasticity, and reduced storage costs
MarkLogic
Other DBs
✔
✔
✔
✔
✖
✖
✖
✖
8
Richard T. Snodgrass. Developing Time-Oriented Database Applications in
SQL. Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999. <http://www.
cs.arizona.edu/~rts/tdbbook.pdf>
7 Tom Johnston. Bitemporal Data: Theory and Practice (Waltham, MA: Elsevier,
2014) 311 - 313.
8
HOW BITEMPORAL
WORKS IN MARKLOGIC
integrate a new data source at a later date, you do
not have to do complex ETL before loading that data
into MarkLogic. The frustration of having to add a new
column into a relational database simply disappears—
whether you are adding a DATE column or
anything else.
For those with a relational database background,
working with temporal and bitemporal data in
MarkLogic should be very familiar. The main difference
is that rather than columns of dates in a table, that
information now appears as timestamps within
documents. MarkLogic stores and manages all data as
documents, including bitemporal data.
Bitemporal data may have a lifespan of decades, and
organizations need a database that can respond rapidly
to keep pace with schema evolution as new data
sources are added. MarkLogic makes it easy to ingest
new data sources, and if there are conflicts that need
to be resolved (e.g., new data source has the column
name “SRC_DATE” but it should be “CLAIM_DATE”),
MarkLogic makes it easy to perform the necessary
transformations to ensure a standard vocabulary.
With MarkLogic, you never have to worry about
the constraints found with relational data modelling
such as entity integrity, referential integrity, and
denormalization—even when it comes to bitemporal
data management.
Whether working with JSON or XML documents, a
document is considered to be bitemporal if it includes
timestamps for valid start and end times, and for
system start and end times. One way to load a
bitemporal document into MarkLogic is with MarkLogic
Content Pump, or mlcp. You can also use the REST
API. Or, you can load a bitemporal document using a
simple JavaScript update query, which is
shown below.
After loading bitemporal documents into MarkLogic,
they are managed as a series of documents with range
indexes for valid and system time axes. The valid
and system time axes each serve as a container for
a named pair of range indexes. And, the bitemporal
documents are stored in temporal collections, which
are logical groupings of temporal documents. You
can create additional temporal collections if you have
documents that require a different schema for the
timestamps.
MarkLogic performs orders of magnitude better than
relational databases for large-scale data integration
projects, speeding up project delivery times by
reducing the amount of time spent doing requirements
gathering and data modelling, and improving the quality
of prototypes. At Broadridge, a large financial services
organization, it was remarked that “The first MarkLogic
project took 60 days… It was estimated to take 3,000
days with existing technology.”
FIGURE 6. Updating a bitemporal document
declareUpdate();
var root =
{ "tempdoc": {
"systemStart": null,
"systemEnd": null
"validStart": "2014-04-03T11:00:00",
"validEnd": "2014-04-03T16:00:00",
"content": "some data, like closing price"
}
};
temporal.documentInsert("temporalCollection", "exampledata.json", root);
9
FIGURE 7. A bitemporal query as viewed in MarkLogic Query Console
Adapt to evolving schema – Avoid worrying about the
changing shape of the data over time. Unlike relational
databases, MarkLogic is schema-agnostic and can
easily manage schema changes over time
After initial documents are loaded into MarkLogic,
they are always kept and never changed. Even if a
bitemporal document is “deleted”, MarkLogic still
keeps the document, but the system time is changed
from infinity to the time of the delete. The same process
works for updates—older versions are still kept and
the “new” version is simply added. MarkLogic also
does not allow updates to system start times. Once the
system time is set for a collection, it continues to roll
forward to further insure the integrity of the data.
Maintain a Last Stable Query Time – A special
timestamp, called the LSQT (Last Stable Query Time),
can be enabled in order to manage and coordinate
system start times across systems
Combine with tiered storage – Use tiered storage to
easily migrate historical data to less expensive storage
tiers, without losing the ability to query the data
Keeping track of the provenance of information with
full governance and immutability is critical, which
is why MarkLogic applies its security model to
bitemporal documents. MarkLogic is certified by the
National Information Assurance Partnership (NIAP)
Common Criteria Evaluation and Validation Scheme,
and uses Role Based Access Control (RBAC) by
default to manage access to documents. This high
level of security ensures that historical records are
not tampered with, and that documents maintain their
permissions over time.
Combine with semantics – Assign bitemporal
elements to documents, whether they are RDF triples,
or documents that include RDF triples, giving you the
ability to track how relationships change over time
Combine with geospatial – Gain the ability to track
your data over time and space. MarkLogic stores
geospatial data, and now you can accurately track how
geospatial data changes over time
KEY FEATURES OF BITEMPORAL IN MARKLOGIC
Take advantage of certified security – Manage
bitemporal documents with the same certified security
as all other documents, using Role Based Access
Control (RBAC) or other security models
Insert, update (and never delete) – Ingest temporal
JSON or XML documents with references to valid time
using the Temporal API or mlcp, and make changes
without losing any data as new versions are added
Scale quickly and easily – Avoid any concerns
of under-provisioning with MarkLogic’s scale-out
architecture, which allows you to easily add nodes to
handle the increased demands of bitemporal data
Complex temporal queries – Query the database
along valid and system time axes using standard Allen
and SQL operators when comparing time periods
10
“
MarkLogic has a history of bringing advanced data management
technology to market, and many of their customers and partners are
accustomed to managing complex data in an agile manner. As a result,
MarkLogic customers and partners, in general, have a more mature
and creative view of how to manage and use data than do most other
database users.”
Carl Olofson, Research Vice President for Data Management Software Research, IDC
HOW TO GET STARTED
MORE INFORMATION
• Read MarkLogic Documentation – Learn how to
work with bitemporal data in MarkLogic at
docs.marklogic.com/guide/temporal
Managing time is not easy. If it were, we probably could
have avoided multi-billion dollar problems like Y2K.9
But, managing time is a necessity, and bitemporal is
the future of managing time in datbases as we seek to
maintain a better record of “what we knew” and “when
we knew it.” MarkLogic takes away the constraints
that prevent the adoption of bitemporal, and is the best
database for storing and managing bitemporal data.
• Watch a Presentation – Hear from a MarkLogic
customer about “Why Banks Care About
Bitemporal” www.marklogic.com/resources/whybanks-care-about-bitemporality/
• Schedule a Meeting – Discuss your particular
use case with a MarkLogic sales representative by
contacting us at [email protected]
GET GOING QUICKLY
1. Identify the questions your business cannot
currently answer
2. Identify the business benefits of adding bitemporal
3. Assess the current data management environment
4. Engage with MarkLogic to discuss implementation
5. Download MarkLogic
6. Learn more in MarkLogic’s free training
9 According to the BBC and ComputerWorld, the estimated cost of the preparation and remediation for the “Year 2000 problem”, or Y2K, was $608 Billion, and
that’s not taking into account inflation. For more information: Robert L. Mitchell.
“Y2K: The good, the bad and the crazy”. ComputerWorld (28 December 2009)
<http://www.computerworld.com/article/2522197/it-management/y2k--the-good-the-bad-and-the-crazy.html?page=2>
11