Download Distribution, Data, Deployment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Distribution, Data, Deployment
Software Architecture Convergence
in Big Data System
Ian Gorton and John Klein
Presenter: Weicong Ma
Agenda
Ø  Background
Ø  The Challenges for Big Data
Ø  Big Data Application Characteristics
Ø  Example: Clinical Application
Ø  Systematic Design Using Tactics
Ø  Discussions
Background
The exponential growth of data
Data-intensive (big data) software system
Open source and commercial data management technologies are developed
Requires:
q  Design tradeoffs spanning the distributed software, data and deployment
architectures
q  Extending traditional software architecture design knowledge to account
for the tight coupling that exists in scalable systems
Background
“…Distribution, data and deployment architectural qualities can no
longer be effectively considered separately…”
Distribution: level of the design which deals with the high-level organization of
computational elements and the interactions between those elements
Data: a set of rules, policies, standards and models that govern and define the
type of data collected and how it is used, stored, managed and integrated
Deployment architecture: depicts the mapping of a logical architecture to
a physical environment
The Challenges for Big Data
Traditionally,
Recently,
SQL Database technology
NoSQL products emerge
•  Vertical scaling
•  Horizontally scaling
– across clusters of low-cost,
moderate-performance servers
– faster processors and bigger
disks as workloads or storage
requirements increase
•  Strictly defined normalized data
models
•  Strong data consistency guarantees
•  SQL standards
– partitioning and replicating
datasets across a cluster
•  Schemaless and intentionally
denormalized data model
•  Weak consistency
•  Proprietary APIs to expose data
management mechanisms
The Challenges for Big Data: NoSQL
q  Each technology supports its own query
mechanism
q  Programmer are responsible for
formulating query executions
q  Programmer are responsible for
combining results from different data
collections
q  Programmer must manage consistency
when concurrent updates occur and
design applications to tolerate stale data
due to latency in update replication
The Challenges for Big Data: Tradeoffs
Distributed database fundamental quality constrains – by Eric Brewer
CAP Theorem
Consistency Availability Partition tolerance
Consistency: each server returns right response to each request
Availability: each request eventually receive a response
Partition tolerance: Guaranteed properties are
maintained even when network failures prevent some
machine from communicating with others
The Challenges for Big Data: Tradeoffs
Practical interpretation of CAP Theorem – by Daniel Abadi
Availability
Else
Consistency
PACELC
Partition
Consistency
Latency
If there is a partition (P), how does the system trade off availability and
consistency (A and C);
else (E), when the system is running normally in the absence of partitions,
how does the system trade off latency (L) and consistency (C)?
The Challenges for Big Data: Others
•  Achieving high scalability and availability leads to high distributed systems
•  The abstraction of a single system image, with transactional writes and
consistent reads using SQL-like query languages, is difficult to achieve scale
•  Each NoSQL product embodies a specific set of quality attribute tradeoffs
– polyglot persistence: using different technologies to store
different datasets in a single system
•  Required hardware resources grow as data volumes grow. Many widely used
software architecture patterns are unsuitable
The Challenges for Big Data: Scale
q  Change our designs’ problem space
Problems: partial failures, communication latencies, concurrency,
consistency and replication
Scalable applications must treat failures as common events
-  Replicate data
-  Architecture components are stateless, replicated, and tolerant of
failures of dependent services
q  Economics based implications
“At a very large scale, small optimizations in resource
use can lead to very large cost reductions..”
q  Testing and fault diagnosis
-  Comprehensively validating code before deployment is impossible
-  Testing at scale (advanced monitoring and logging)
Big Data Application Characteristics
Big data system must be able to …
1.  sustain write-heavy workloads
2.  Deal with variable request loads
3.  Support computation-intensive analytics
4.  Have high availability
These requirements crosscut the distribution, data and deployment
architectures
Big Data Application Characteristics
These requirements crosscut the distribution, data and deployment
architectures
Example:
Elasticity requires…
• 
Processing capacity that can be acquired from the
execution platform on demand
• 
Policies and mechanisms to appropriately start and
stop services as the application load varies
• 
A database architecture that can reliably satisfy
queries under an increased load
Example: Healthcare Example
Patient demographics (name, insurance provider …)
•  Immediately visible at
local site where the data
was modified
•  Delay acceptable at other
sites – eventual replica
consistency
Diagnostic-test Results
(blood, image test results…)
•  Immediately visible
everywhere – strong
replica consistency
Example: Healthcare Example
MangoDB prototype solution
Patient demographics (name, insurance provider …)
•  Writes durable on the primary replica
•  Reads can be directed to the closest
replica for low latency
Diagnostic-test Results
(blood, image test results…)
•  Writes durable on all replicas
•  Reads are insensitive to
partitions
Example: Healthcare Example
Scale drives a consolidation of concerns so that the distribution, data
and deployment architectural qualities can no longer be effectively
considered separately…
Today, healthcare informatics application:
•  Atop SQL databases
•  Hidden physical data model and deployment topology from developers
•  Separates concerns between the application and database
Shift to NoSQL:
•  Handle faults depend on physical data distribution
•  Low-level infrastructure concerns now must be explicitly handled in
application logic
“
Low-level infrastructure
concerns, traditionally hidden
under the database interface,
must be explicitly handled in big
data system.
Systematic Design Using Tactics
Tactics: elemental design decisions, embodying architectural knowledge
of how to satisfy one design concern of a quality attribute.
In designing an architecture:
Systematically select and apply a sequence of architecture tactics
Tactics catalogs enable reuse of the architectural knowledge, but existing
catalogs don’t contain tactics specific to big data system.
Systematic Design Using Tactics
Systematic Design Using Tactics
Systematic Design Using Tactics
References
[1] Gorton, I., & Klein, J. (2015). Distribution, Data, Deployment: Software
Architecture Convergence in Big Data Systems. IEEE Software, 32(3),
78-85. doi:10.1109/ms.2014.51
[2] Magee, Jeff, Naranker Dulay, Susan Eisenbach, and Jeff Kramer.
"Specifying Distributed Software Architectures." Software Engineering —
ESEC '95 Lecture Notes in Computer Science (1995): 137-53. Web.
[3] Chapter 5 Designing a Deployment Architecture. (2004). Retrieved
October 30, 2016, from https://docs.oracle.com/cd/E19199-01/817-5759/
dep_architect.html
[4] @. (n.d.). What is Data Architecture? - Definition from Techopedia.
Retrieved October 30, 2016, from https://www.techopedia.com/definition/
6730/data-architecture
[5] NoSQL Databases: An Overview. (2014). Retrieved October 13, 2016,
from https://www.thoughtworks.com/insights/blog/nosql-databasesoverview
[6] Abadi, D. (2012). Consistency Tradeoffs in Modern Distributed
Database System Design: CAP is Only Part of the Story. Computer,45(2),
37-42. doi:10.1109/mc.2012.33
Images References
[1] https://www.getfilecloud.com/blog/2014/08/leading-nosqldatabases-to-consider/#.WBP_a5MrL-Y
[2] https://en.wikipedia.org/wiki/HipHop_for_PHP
[3] http://blog.soprasteria.com/aeronautics-big-data-helps-acceleratetest-flights/
[4] http://www.edureka.co/blog/big-data-applications-in-healthcare/
[5] https://cs.uwaterloo.ca/~kmsalem/courses/cs743/F14/slides/
ShuZhang.pdf
Discussions
Strengths and Weakness
Strengths:
•  Goals and motivations are clearly identified
•  Gives a comprehensive overview of the challenges, concerns and
solutions in designing software architecture for big data problems
•  Uses concrete example of a healthcare system to explain and
demonstrate hard to understand concepts
Weakness:
•  Too much background information
•  It lacks in-depth details.
e.g. How we handled the tradeoffs in real world problems?
•  It only shows limited architecture tactics working for limited problem domains
•  Future work lacks in-depth discussion
Related Papers
[1] Abadi, D. (2012). Consistency Tradeoffs in Modern Distributed
Database System Design: CAP is Only Part of the Story. Computer,
45(2), 37-42. doi:10.1109/mc.2012.33
[2] Klein, J., & Gorton, I. (2015). Design Assistant for NoSQL
Technology Selection. Proceedings of the 1st International Workshop
on Future of Software Architecture Design Assistants - FoSADA '15.
doi:10.1145/2751491.2751494
[3] Kruchten, P. (1995). The 4+1 View Model of architecture. IEEE
Software,12(6), 42-50. doi:10.1109/52.469759
Future Work
Two complementary directions:
Expand the collection of architecture tactics and encoding them in an
environment that supports navigation between quality attributes and
tactics, making crosscutting concerns for design choices explicit
Link tactics to design solutions based on specific big data technologies,
enabling architects to rapidly relate a particular technology’s capabilities to
a specific set of tactics
Questions
•  Is there any SQL database trying to solve big data problems like their
NoSQL counterparts as mentioned in this paper?
•  How to expanding our collection of architecture tactics? Any idea?
•  Is it easy to link tactics to design solutions based on specific big data
technologies?
•  Will it be easier if we use these tactics while designing an software system
for big data application in real world? Is it easy to do the tradeoffs?
Thanks!
Any questions?