Download ColumnarDatabaseExperiencesV5 5390KB Feb 10 2014 12:05

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Oracle Database wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Columnar Database
Experiences
Unlocking the Value of Big Healthcare Data
Enterprise Informatics, BCBSA
Nasir Khan
Bob Kero
Biography
Nasir Khan
Executive Director of Enterprise Informatics
BCBSA
Nasir has been with BCBSA for over 12 years and has over 25
years of leadership experience in the Healthcare Insurance,
Biomedical, Banking and Pharmaceutical industries.
Recently he was named “One to Watch” by CIO magazine.
2
Biography
Bob Kero
Managing Director of Enterprise Informatics
BCBSA
Bob has been with BCBSA for over four years and has over 25
years of professional leadership experience in the Healthcare
and P&C Insurance, Consulting, Healthcare Provider, and
Federal R&D industries. Additionally, he has been a guest
lecturer in the Graduate Department of Health Systems
Management, RUSH University.
3
Agenda
Living Big in Interesting Times
Our Own Big Data Challenge
Our Search For Effective Solutions
What We Were Able to Achieve
How Our Experience Can Help
4
Living Big In Interesting
Times
5
What is Big Data?
Is it a Terabyte of data?
• = 1,000 Gigabytes of data
Is it a Petabtye of data?
• = 1,000 Terabytes of data
Is it a Exabyte of data?
• = 1,000 Petabytes of data
Data whose size or
structure is
beyond the ability
of an organization's
existing technology or
processes
to use to full
business advantage
6
2014:
The Big Data Inflection Point
For Healthcare?
Medical Loss
Ratio
Meaningful Use
ICD-10/HIPAA
5010
Accountable
Care
Organizations
2011
Health Info
Exchanges
HC Reform
Coverage
Mandates
Disease
mgmt/predictive
modeling
Payer M&As
Consolidation
Value-Based
Reimbursement
/ Shared Risk
2012
Medical Home
7
Health
Insurance
Exchanges
2013
Medicare
Advantage
Cuts
Diversification
EvidenceBased
Medicine
Individual
Insurance
Growth
Social Media
2014
Personalized
Medicine
2015
Provider
Collaboration
International
Expansion
Medicaid
Expansion
Genetic Testing
The Challenge:
We Must Analyze Exponentially
Growing Healthcare Data Assets
8
The Opportunity:
Big Data Processing Offers
$300B Potential Annual Savings
to Healthcare
Transparency in
clinical data &
clinical decision
support
$9B
Public Health
Public health
surveillance &
response systems
Source: McKinsey Global Institute, May 2011
Personalized
medicine & clinical
trial design
$165B
Clinical
$108B
R&D
$47B
Accounts
Aggregation of
patient records &
online
communities
$5B
Business
Model
Advanced fraud
detection &
performance
based drug pricing
9
The Objective:
Big Data Solutions Facilitate
Healthcare Transformation
Evidence Based Healthcare
• Driven by healthcare outcomes
• Requires analyzing structured and semi-structured data
Health Outcomes
• Looking at all patient data to provide optimal care
• Scoring and outcomes-based incentive calculations
Patient Centered Care
• Knowing about lifestyles choices helps improve health outcomes
• Assess vital signs & diagnostic information from medical devices
Disease Management
• Processing of structured and unstructured data to identify and
manage chronic and emerging diseases
Drug Discovery & Genomics Analytics
• Integration of clinical, compound & journal information
• Combine clinical data with patient genomics
10
Our Own Big Data Challenge
11
BCBSA, FEP & BHI
Healthcare Data Repositories
Federal Employee
Program Claims
FEP
Systems
National
Account
Claims
BHI
Member
Plan
Claims
Third
Party Data
Daily
Claims
Enrollment
Provider IDs
Plans
Surveys
Standard
Reports
Extract
Files
CCTI
Members
& Claims
CCTI
Marts
PDR
Providers
Monthly
Claims
Membership
Contracts Products
Provider IDs
NDW
Members
& Claims
Ad-hoc Queries
& Reports
ADaM
Members
& Claims
BDR
BHI Data
Repository
Extract
Files
Plans
12
ADaM:
Key Functionality Centered
Around Business Flexibility
Cost and utilization
reporting (PMPM,
Utilization and
Trends)
Detailed access to
claim-line information,
with data
enhancements
Development of
custom comparisons
Facility and provider
level drill down
Proactive
identification of atrisk members
(concurrent and
prospective)
Population risk
adjustment
functionality
Prescription cost and
utilization information
Evidence-based
physician quality
performance
Ability to design
custom reports to suit
specific business
needs
13
ADaM:
An Encounter with the Challenges
of Big Data
June 2009: Ad-hoc Queries only
• Disappointing performance
Aug 2009: Hardware & DB2 upgrade
• Additional expense
• No improvement in query performance
Sep 2009: Indexes tuned for queries
• Additional effort
• Some performance improvement
Dec 2009: Add summary tables
• Additional effort
• Places a heavy load on database servers
2010: Other Reports deferred
• Each additional report needed summary tables to meet
performance requirements
14
Our Search For Effective Solutions
15
Assessment:
Emerging Trends in Big Data
DBMS Technology
Column Storage
More data warehouses
will be stored in highly
compressed columnar
fashion
Clustering
Most large-scale database
servers will achieve
horizontal scalability
through clustering
In Memory
Most OLTP databases will
be augmented by an inmemory database
Smart Tuning
NoSQL
Many new systems will
deemphasize
partitioning
Many
reporting
problems
schemes,
indexing
will be solved with nostrategies anddatabases
buffer
schema/NoSQL
management
Smart Tuning
Many new systems will
deemphasize partitioning
schemes, indexing
strategies and buffer
management
16
Initial Choice: HP Vertica
An Advanced Columnar Database
Massively
Parallel
Processing
Column
Storage
Advanced
Compression
Standard
SQL
Interface
High
Availability
Auto
Database
Design
Native DB-aware
clustering on lowcost x86 Linux nodes
Simple integration
with existing ETL
and BI solutions
Not Supported:
Referential integrity
Triggers
Stored procedures
17
What is a Columnar Database?
Imagine an Excel Spreadsheet to
Load into a Database
Customer Purchases
CustID
Name
City
Item ID
Description
Qty
Total
000001
Smith
Tucson
100101
Green Widgets
1
$50.00
000001
Smith
Tucson
100102
Blue Widgets
2
$100.00
000001
Smith
Tucson
100103
Yellow Widgets
1
$50.00
000002
Jones
L.A.
100101
Green Widgets
2
$100.00
000002
Jones
L.A.
100106
Orange Widgets
1
$50.00
18
What is a Columnar Database?
Let’s First Load it Into a Standard
Database, Oracle or IBM DB2
CustID
Name
City
Item ID
Description
Qty
Total
000001
Smith
Tucson
100101
Green Widgets
1
$50.00
000001
Smith
Tucson
100102
Blue Widgets
2
$100.00
000001
Smith
Tucson
100103
Yellow Widgets
1
$50.00
000002
Jones
L.A.
100101
Green Widgets
2
$100.00
000002
Jones
L.A.
100106
Orange Widgets
1
$50.00
Storage
Access Rule
Query
• Each record is
stored by row
• Read data
row by row
moving from
left to right in
each row
• Read the
description of
each widget
ordered by
‘Smith’
Result
• Reads
attributes of
no interest,
slow for big
data
19
What is a Columnar Database?
Now Let’s Load it Into a Columnar
Database: HP Vertica
CustID
Name
City
Item ID
Description
Qty
Total
000001
Smith
Tucson
100101
Green Widgets
1
$50.00
000001
Smith
Tucson
100102
Blue Widgets
2
$100.00
000001
Smith
Tucson
100103
Yellow Widgets
1
$50.00
000002
Jones
L.A.
100101
Green Widgets
2
$100.00
000002
Jones
L.A.
100106
Orange Widgets
1
$50.00
Storage
Access Rule
Query
Result
• Each record is
stored by
column
• Read data
columns
where needed
to answer the
query
• Read the
description of
each widget
ordered by
‘Smith’
• Skips all
columns of no
interest, fast
for big data
20
Columnar Storage Allows Options
for Data Compression of Repeating
Values
CustID
Name
City
Item ID
Description
Qty
Total
000001
Smith
Tucson
100101
Green Widgets
1
$50.00
000001
Smith
Tucson
100102
Blue Widgets
2
$100.00
000001
Smith
Tucson
100103
Yellow Widgets
1
$50.00
000002
Jones
L.A.
100101
Green Widgets
2
$100.00
000002
Jones
L.A.
Compress
~60%
100106
Orange Widgets
1
$50.00
Name
City
3 X Smith
3 X Tucson
2 X Jones
2 X L.A.
Requires much less disk I/O
time to retrieve
Trades faster CPU time for
slower disk I/O time
21
Columnar Databases Are Mostly
Plug Compatible with Standard
Databases Like IBM DB2
Columnar
Database
DB2
Oracle
NoSQL
Database
Interface
SQL
SQL
Map Reduce
JSON
Other APIs
Behavior
Transactional
Transactional
Eventual
Consistency
Storage
Column
Storage
Row Storage
In Memory
Document
Key/Value
22
Compatibility Enables
a Manageable
Replacement Strategy
Application
Access Layer
Access Layer+
Interface
SQL
IBM
DB2
Oracle
MS SQL
Server
SQL (w/ limits)
Behavior
Transactional
Transactional
Columnar
Database
Storage
Rows
Columns
23
Compatibility Enables
a Manageable
Replacement Strategy
Update App
Access Layer
Update
Database
Schema
Replace Stored
Procedures
IBM
DB2
Oracle
MS SQL
Server
Update
Database
Connectors
Columnar
Database
Update Data
Loading
Process
Tune Database
Configuration
24
What We Were Able to Achieve
25
ADaM:
Performance Issue
Root Cause Assessment
ADaM is a relatively
small data mart!
• Less than 2TB of raw data
Current system only
scales at great expense
• Hardware upgrades
• More DB2 licenses
• Complex database
optimizations
Required queries are
inefficient in IBM DB2
Optimizations only
support predefined
queries—limited
performance when
asking ‘what if’
26
HP Vertica Offered Savings for
Hardware and License Costs
Vertica
POC
ADaM
Production
Database Software
Vertica 4.0 pre-release build
DB2 LUW 9.5
Operating System
Red Hat Enterprise Linux 5
AIX 5.3
Compute Platform
3 HP DL380; Intel nodes
(total SPECint2006 rate: 324)
IBM P550 Power6 node; 2
IBM P570 Power6 nodes (total
SPECint2006 rate: 294)
Storage Platform
24 SCSI disks @ 146GB;
10,000 rpm; 3TB total usable
space
288 FC disks @ 146GB, 15,000
rpm; 29TB total usable space
Hardware Cost
~9x less
Software Cost
~2.7x less
27
HP Vertica ADaM
POC Findings
Query Performance
• 12 of 15 queries execute
faster by 40% to 1000%
• Most execute at least 150%
faster
• Run time reduced by at
least 60%
Load Performance
• ADaM database can be loaded
in less than 8 hours
• 500M rows/hour
28
How Our Experiences Can Help
29
Help Clarify Your Business
Vision For Big Data Support
Inventory your
usage scenarios
• Current
• Known future
• Wishlist/dream-list
future
Establish
reasonable
constraints
• People and
platforms
• Money and time
Develop target
SLAs with
stakeholders
• Must-haves
• Nice-to-haves
30
Help Understand and Clarify
Your Specific Requirements
What’s your tolerance for specialized hardware?
What’s your tolerance for set-up effort?
What’s your tolerance for ongoing administration?
What are your insert and update requirements?
At what volumes will you run fairly simple queries?
What are your complex queries like?
For which third-party tools do you need support?
31
BCBSA Enterprise Informatics:
A Big Data Projects Accelerator
Flexible resource model
Mature processes
• Business intelligence
focused practice
• Fast ramp-up and roll-off
(elastic staffing)
• Rational Unified Process
• Agile/SCRUM
• Kimball Lifecycle
World class solution options
• Ability to deliver ultra-large
scale informatics systems
• Leveraging advanced database
technologies
• Service bus for fast deployment
Medical informatics
experience
• How the business functions
• What drives cost and revenue
• What improves productivity
and efficiency
Value Proposition
• Staffing cost benefits
• Faster time to delivery
• Instant business alignment
32
Informatics Service Bus Enables
Lower Risk & Faster Deployment
of Big Data Apps
App Hosting
Plan
Connexion
Plan
Custom
Apps
BHI &
Partner
Apps
Tools Layer
Plan
Custom
Tools
SAS
COGNOS
DataStage
Other
Standard
Tools
Data Layer
Plan Data
(Vertica)
Plan Data
(DB2)
NDW
IPDS
Etc.
LDAP
SiteMinder
S-FTP
Web
Services
Monitoring
Administration
SAML (Certificates)
Client Access
Portal
33
Summary
• Healthcare is undergoing radical & historic
changes
– Regulatory, business, research and clinical
• Use of big data will be mandatory to adapt and
succeed
• Columnar databases enable efficient use of big data
– Cost effectiveness, high performance, fault tolerance
• We are ready to support entry into the big data area
– Database, healthcare, analytics, and business expertise
– Available tools & infrastructure for POCs and deployment
34
Questions?
35