Download Course Catalog - Big Data Science School

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Big Data Science
Certified Professional
(BDSCP)
Course
Catalog
™
TM
Provided by Arcitura Education
Step 1: Get Trained
Take instructor-led workshops or
purchase Self-Study Kits
Step 2: Get Tested
Take exams anywhere in the world
via Pearson VUE testing centers or
Pearson VUE Online Proctoring
Step 3: Get Certified
Get recognized by attaining one
or more industry certifications
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
How to Get Trained
■■Public Workshops
Visit www.bigdataworkshops.com for
calendar.
■■On-Site Training
Contact [email protected] for details.
■■Self-Study
Visit www.bigdataselfstudy.com for details.
How to Get Tested
Exams are available world-wide via regional
testing centers. For details, visit
www.arcitura.com/exams.
Exams can also be proctored on-site as part
of private on-site and select public workshops.
How to Get Certified
Receiving passing grades on the exams that
correspond to a certification track results in the
automatic issuance of an official certificate.
The matrix at the center of this course catalog
shows how exams relate to certifications.
BIGDATASCIENCESCHOOL.COM
The Big Data Science Certified
Professional (BDSCP) program is
comprised of a comprehensive
curriculum of course modules, exams
and industry certifications providing
IT professionals with the opportunity
to obtain formal accreditation
in recognition of proficiency in
specialized areas of Big Data
practice and technology.
The BDSCP curriculum is strictly
vendor-neutral and aligned
with the Big Data industry as a
whole. Its academic coverage of
contemporary Big Data topics
ensures that skills developed through
study are applicable to different
commercial Big Data vendor tools
and environments. This program was
developed in cooperation with
best-selling author Thomas Erl and
several organizations and
subject-matter experts.
To receive automatic updates about
new courses, exams and certifications,
send a blank e-mail to notify@
arcitura.com.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Module 1
Fundamental
Big Data
(Exam B90.01)
This foundational course provides a high-level overview
of essential Big Data topic areas. A basic understanding
of Big Data from business and technology perspectives
is provided, along with an overview of common benefits,
challenges and adoption issues.
The following primary topics are covered:
■■ Fundamental Terminology and Concepts
■■ A Brief History of Big Data
■■ Business Drivers leading to Big Data Innovations
■■ Characteristics of Big Data
■■ Benefits of Adopting Big Data
■■ Challenges and Limitations of Big Data
■■ Basic Big Data Analytics
■■ Big Data and Traditional Business Intelligence and
Data Warehouses
■■ Big Data Visualization
■■ Common Adoption Issues
■■ Planning for Big Data Initiatives
■■ New Roles Introduced by Big Data Projects
■■ Emerging Trends
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module1
BDSCP Master Symbol Legend
metadata
semi-structured
structured
unstructured
resource
manager
workflow
engine
processing
engine
analytics
engine
coordination
engine
query
engine
data
analysis
metadata
data transfer
engine
analytics
data mining
Big Data Drivers
Big Data Types
analytics & data science
digitization
affordable technology & commodity hardware
social media
hyper-connected communities & devices
cloud computing
bulk import
Enterprise Technologies
<XML>
JSON
XML data
JSON data
dataset
6,8,9
dataset
Oct
structured
data
‘12 ‘13 ‘14
predictive
analytics
prescriptive
analytics
OLTP
OLAP
digitization
fast simple
queries
dashboard
NoSQL
repository or
storage device
storage
device
human data
interpretation
processing
drill down
roll up
KPI dashboard
ERP
database
smart meter
ETL
data mart
data
warehouse
SCM
database
RDBMS
database
machine
learning
query
interface
sensor data
relational/
tabular data
queue
application
tool
Dec
report
Nov
textual
data
GPS
RDBMS or
DBMS
Data Analysis
workstation
data
mining
hard drive
processor
are comprised of
is closely
related to
automate
machine
learning
data
analysis
human
attacker
OLAP
database
video
NoSQL
database
OLTP
database
audio
DVD
conflict
analytics
RFID
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
use
uses
Value
Variety
Velocity
Veracity
Volume
Terminology & Concepts
Analytics
Business Intelligence
Big Data Characteristics
is a type
of
Big Data Science Certified Professional (BDSCP) Program
Module 1: Fundamental Big Data
Official Mind Map Supplement
www.bigdatascienceschool.com
tool features
aggregation
drill-down
filter
roll-up
what-if analysis
types
types
human-generated
machine-generated
descriptive
diagnostic
predictive
prescriptive
traditional BI
Big Data BI
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
data
warehouse
uses data stored in
interfaces with/
provides
analyze
quantitative
analysis
quantitative analysis
qualitative analysis
data mining
types
Module 1
Fundamental Big Data
analytics
business intelligence
dataset
data analysis
key performance indicator (KPI)
Data Visualization
server
online transaction processing (OLTP)
online analytical processing (OLAP)
extract-transform-load (ETL)
data warehouse
data mart
Hadoop
Adoption & Planning Considerations
Big Data Sources
CRM
database
image file
CRM
ERP
SCM
business justification
data procurement
organizational prerequisites
privacy
provenance
security
limited realtime support
distinct performance challenges
distinct governance requirements
distinct methodology
cloud computing
business
intelligence
uses
is a type
of
OLAP
uses
feeds
data
into
reports
qualitative
analysis
KPIs
ETL
uses
OLTP
is used
for
is used
for
commodity
hardware
generally
uses
can process
is an open-source
implementation of
large datasets
Big Data Science Certified Professional (BDSCP) Program
Module 1: Fundamental Big Data
Official Relationship Map Supplement
www.bigdatascienceschool.com
Hadoop
Big Data
solution
provides
storage for
NoSQL
database
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIGDATASCIENCESCHOOL.COM
Module 2
Big Data Analysis
& Technology
Concepts
(Exam B90.02)
This course explores a range of the most relevant
topics that pertain to contemporary analysis practices,
technologies and tools for Big Data environments.
The following primary topics are covered:
■■ The Big Data Analysis Lifecycle (from Dataset
Identification to Integration, Analysis and Visualization)
■■ Common Analysis and Analytics Techniques (including A/B
Testing, Regression, Correlation, Text Analytics, Sentiment
Analysis, Time Series Analysis, Network Analysis, Spatial
Analysis)
■■ Automated Recommendation, Classification, Clustering,
Machine Language, Natural Language, Semantics, Data
Visualization and Visual Analysis
■■ Assessing Hierarchies, Part-to-Whole Relationships,
Plotting Connections and Relationships, Mapping GeoSpatial Data
■■ Foundational Big Data Technology Mechanisms, Big Data
and Cloud Computing
■■ Big Data Storage (Query Workload, Sharding,
Replication, CAP, ACID, BASE)
■■ Big Data Processing (Parallel Data Processing,
Distributed Data Processing, Shared-Everything/Nothing
Architecture, SCV)
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module2
external datasets
internal datasets
alerts
applications
business process optimization
Business Case Evaluation
Data Identification
of
Data Acquisition & Filtering
Data Extraction
Data Validation & Cleansing
Data Aggregation & Representation
Data Analysis
Data Visualization
Utilization of Analysis Results
Big Data Analysis Lifecycle Stages
via
Big Data Technology Components & Concepts
Clusters
File Systems & Distributed File Systems
NoSQL
Distributed Data Processing
Parallel Data Processing
batch
Processing Workloads types
transactional
Cloud Computing
Module 2
Big Data Analysis & Technology Concepts
A/B Testing
Correlation
Regression
Statistical Analysis
Natural Language Processing
Sentiment Analysis
Text Analytics
Semantic Analysis
Classification
Clustering
Outlier Detection
Filtering
Machine Learning
Heat Maps
Network Analysis
Spatial Data Analysis
Time Series Analysis
Visual Analysis
Big Data Analysis Techniques
Big Data Science Certified Professional (BDSCP) Program
Module 2: Big Data Analysis & Technology Concepts
Official Mind Map Supplement
www.bigdatascienceschool.com
Big Data Mechanisms
Analytics Engine
Coordination Engine
Data Transfer Engine
Processing Engine
Query Engine
Resource Manager
Storage Device
Workflow Engine
types
Ingress
Egress
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Module 3
Big Data Analysis &
Technology Lab
(Exam B90.03)
This course module presents participants with a series
of exercises and problems designed to test their ability
to apply knowledge of topics covered previously in
Modules 1 and 2. Completing this lab will help foster
cross-topic proficiency and will assist in highlighting
areas that require further attention.
As a hands-on lab, this course provides a set of
detailed exercises that require participants to solve
a number of inter-related problems, with the goal of
fostering a comprehensive understanding of how Big
Data environments work from both front and back-ends,
and how they are used to solve real-world analysis and
analytics problems.
For instructor-led delivery of this lab course, the
Certified Trainer works closely with participants to
ensure that all exercises are carried out completely and
accurately. Attendees can voluntarily have exercises
reviewed and graded as part of the class completion.
For individual completion of this course as part of the
Module 3 Self-Study Kit, a number of supplements are
provided to help participants carry out exercises with
guidance and numerous resource references.
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module3
external datasets
internal datasets
Analytics Engine
Business Case Evaluation
of
Coordination Engine
Data Identification
Data Transfer Engine
Data Acquisition & Filtering
Data Extraction
Data Validation & Cleansing
Processing Engine
Big Data Mechanics
Big Data Analysis Lifecycle Stages
applications
via
Ingress
Egress
Storage Device
Data Analysis
Workflow Engine
Data Visualization
alerts
Visual Analysis
types
Query Engine
Resource Manager
Data Aggregation & Representation
Heat Maps
Network Analysis
Spatial Data Analysis
Clusters
Utilization of Analysis Results
File Systems & Distributed File
business process optimization
Systems
Time Series Analysis
NoSQL
Big Data Technology Components & Concepts
Classification
Clustering
Distributed Data Processing
Parallel Data Processing
Machine Learning
Outlier Detection
Processing Workloads
Big Data Analysis Techniques
Filtering
types
Cloud Computing
batch
transactional
Natural Language Processing
Semantic Analysis
Sentiment Analysis
analytics & data science
Text Analytics
digitization
A/B Testing
affordable technology & commodity hardware
Big Data Drivers
Statistical Analysis
Correlation
social media
Regression
hyper-connected communities & devices
cloud computing
business justification
data procurement
Module 3
Big Data Analysis & Technology Lab
organizational prerequisites
privacy
provenance
security
online transaction processing (OLTP)
online analytical processing (OLAP)
Enterprise Technologies
Adoption & Planning Considerations
extract-transform-load (ETL)
data warehouse
data mart
limited realtime support
Hadoop
distinct performance challenges
distinct governance requirements
distinct methodology
cloud computing
Data Analytics
quantitative analysis
types
qualitative analysis
data mining
analytics
business intelligence
dataset
Terminology & Concepts
data analysis
Big Data Sources
key performance indicator (KPI)
metadata
semi-structured
structured
human-generated
machine-generated
descriptive
Big Data Types
Analytics
unstructured
types
diagnostic
predictive
prescriptive
Business Intelligence
value
types
traditional BI
Big Data BI
variety
velocity
Big Data Characteristics
aggregation
veracity
Big Data Science Certified Professional (BDSCP) Program
Module 3: Big Data Analysis & Technology Lab
Official Mind Map Supplement
www.bigdatascienceschool.com
volume
Data Visualization
drill-down
tool features
filter
roll-up
what-if analysis
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIGDATASCIENCESCHOOL.COM
Module 4
Fundamental Big
Data Analysis &
Science
(Exam B90.04)
This course provides an in-depth overview of essential
topic areas pertaining to data science and analysis
techniques relevant and unique to Big Data, with an
emphasis on how analysis and analytics need to be
carried out individually and collectively in support of
the distinct characteristics, requirements and challenges
associated with Big Data datasets.
The following primary topics are covered:
■■ Data Science, Data Mining and Data Modeling
■■ Big Data Dataset Categories
■■ Exploratory Data Analysis (EDA) (including numerical
summaries, rules, data reduction)
■■ EDA Analysis Types (including univariate, bivariate,
multivariate)
■■ Essential Statistics (including variable categories, relevant
mathematics)
■■ Statistics Analysis (including descriptive, inferential,
correlation, covariance, hypothesis testing)
■■ Data Munging and Machine Learning
■■ Variables and Basic Mathematical Notations
■■ Statistical Measures and Statistical Inference
■■ Distributions and Data Processing Techniques
■■ Data Discretization, Binning and Clustering
■■ Visualization Techniques and Numerical Summaries
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module4
interquartile range (IQR)
mean
measures of central tendency
measures of variation or dispersion
median
numerical summaries
mode
measures of association
Chebyshev’s inequality rule
empirical rule
Statistics Mathematics
rules
robustness
range
quantile
quintile
forward selection
backward elimination
quartile
dimensionality reduction
decision tree induction
feature extraction
Exploratory Data Analysis
data reduction
binning
bivariate analysis
population
variance
standard deviation
z-score
univariate analysis
multivariate analysis
percentile
bias
data discretization
clustering
distributions
frequency
probability
analysis types
discrete
standard error
continuous
statistical estimator
sampling
confidence interval
binomial
Module 4
Fundamental Big Data Analysis & Science
bar chart
line graph
skewness
geometric
point estimator
interval estimator
positively skewed
negatively skewed
normal
uniform
histogram
frequency polygons
Visualization
scatter plot
nominal
stem and leaf plot
ordinal
cross-tabulation
box and whisker plot
Statistics Variable Categories
quantile-quantile (q-q)plot
binary
quantitative
independent
lattice plot
random
high-volume
high-velocity
high-variety
qualitative
descriptive statistics
Big Data Dataset Categories
high-veracity
high-value
inferential statistics
Statistics Analysis
correlation
covariance
hypothesis testing
null hypothesis
alternative hypothesis
statistical significance
p-value
type I error
type II error
Big Data Science Certified Professional (BDSCP) Program
Module 4: Fundamental Big Data Analysis & Science
Official Mind Map Supplement
www.bigdatascienceschool.com
critical region
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Module 5
Advanced Big Data
Analysis & Science
(Exam B90.05)
This course delves into a range of advanced data
analysis practices and analysis techniques that are
explored within the context of Big Data. The course
content focuses on topics that enable participants
to develop a thorough understanding of statistical,
modeling and analysis techniques for data patterns,
clusters and text analytics, as well as the identification
of outliers and errors that affect the significance and
accuracy of predictions made on Big Data datasets.
The following primary topics are covered:
■■ Statistical Models, Model Evaluation Measures (including
cross-validation, bias-variance, confusion matrix, f-score)
■■ Machine Learning Algorithms, Pattern Identification
(including association rules, Apriori algorithm)
■■ Advanced Statistical Techniques (including parametric
vs. non-parametric, clustering vs. non-clustering distancebased, supervised vs. semi-supervised)
■■ Linear Regression and Logistic Regression for Big Data
■■ Decision Trees for Big Data
■■ Classification Rules for Big Data
■■ K Nearest Neighbor (kNN) for Big Data
■■ Naïve Bayes for Big Data
■■ Association Rules for Big Data
■■ K-Means for Big Data
■■ Text Analytics for Big Data
■■ Outlier Detection for Big Data
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module5
logistic regression
linear regression
decision trees
global
contextual
pre-pruning
outlier types
collective
post-pruning
Outlier Detection
feature splitting
Classification
entropy
parametric
non-parametric
information gain
statistical techniques
classification rules
distance-based/unsupervised
rule-based model
supervised
k-means
semi-supervised
cluster-based local outlier factors (CBLOF)
clustering
naïve Bayes
one rule (1R) algorithm
k nearest neighbor (kNN)
Bayes’ theorem
Laplace smoothing
non-clustering
Module 5
Advanced Big Data Analysis & Science
Modeling
machine learning algorithms
linear regression
statistical models
mean squared error
predictive modeling
error term
feature vector
residual
instance/example
trivial
coefficient of determination R2
target
actionable
standard error of estimate
concept
association rules
inexplicable
assign
k-means
update
stages
Apriori algorithm
clustering
sensitivity
Pattern Identification
specificity
reassignment
inverse document frequency (IDF)
term frequency inverse document frequency (TFIDF)
recall
bag of words
term frequency
cosine distance
n-grams
token/term
document
named entity extraction
text representation
precision
text analytics
Model Evaluation Measures
accuracy
error rate
f-score
confusion matrix
cross-validation
bias-variance
corpus
Big Data Science Certified Professional (BDSCP) Program
Module 5: Advanced Big Data Analysis & Science
Official Mind Map Supplement
www.bigdatascienceschool.com
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIGDATASCIENCESCHOOL.COM
Module 6
Big Data Analysis &
Science Lab
(Exam B90.06)
This course module covers a series of exercises and
problems designed to test the participant’s ability
to apply knowledge of topics covered previously in
Modules 4 and 5. Completing this lab will help highlight
areas that require further attention, and will further
prove hands-on proficiency in Big Data analysis and
science practices as they are applied and combined to
solve real-world problems.
As a hands-on lab, this course incorporates a set
of detailed exercises that require participants to
solve various inter-related problems, with the goal
of fostering a comprehensive understanding of how
different data analysis techniques can be applied
to solve problems in Big Data environments and used
to make significant, relevant predictions that offer
increased business value.
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module6
high-volume
actionable
trivial
k-means
Big Data Dataset Categories
Apriori algorithm
clustering
stages
text representation
measures of central tendency
high-velocity
association rules
inexplicable
assign
update
reassignment
Pattern Identification
measures of variation or dispersion
numerical summaries
high-variety
measures of association
high-veracity
high-value
Chebyshev’s inequality rule
rules
empirical rule
text analytics
bag of words
box and whisker plot
term frequency
inverse document frequency (IDF)
cosine distance
term frequency inverse document frequency (TFIDF)
Exploratory Data Analysis
n-grams
plots
quantile-quantile (q-q) plot
lattice plot
dimensionality reduction
data reduction
global
contextual
Outlier Detection
semi-supervised
clustering
clustering
bivariate analysis
analysis types
multivariate analysis
time series analysis
supervised
k-means
feature extraction
univariate analysis
statistical techniques
non-parametric
distance-based/unsupervised
cluster-based local outlier factors (CBLOF)
binning
data discretization
outlier types
collective
parametric
forward selection
backward elimination
decision tree induction
named entity extraction
mean
non-clustering
median
mode
sensitivity
robustness
specificity
range
recall
Module 6
Big Data Analysis & Science Lab
precision
accuracy
error rate
Model Evaluation Measures
quantile
Statistics Mathematics
quintile
quartile
percentile
f-score
population
confusion matrix
frequency
bias
cross-validation
probability
variance
bias-variance
standard deviation
z-score
error term
residual
discrete
continuous
sampling
distributions
binomial
linear regression
geometric
mean squared error
standard error
statistical estimator
confidence interval
skewness
Poisson
Statistical Models
normal
uniform
coefficient of determination R2
standard error of estimate
descriptive statistics
inferential statistics
logistic regression
Statistics Analysis
decision trees
pre-pruning
entropy
one rule (1R) algorithm
k nearest neighbor (kNN)
nominal
ordinal
Classification
Statistics Variable Categories
classification rules
binary
null hypothesis
alternative hypothesis
statistical significance
p-value
type I error
type II error
quantitative
independent
rule-based model
bar chart
naïve Bayes
random
line graph
Bayes’ theorem
histogram
Laplace smoothing
Big Data Science Certified Professional (BDSCP) Program
Module 6: Big Data Analysis & Science Lab
Official Mind Map Supplement
www.bigdatascienceschool.com
covariance
hypothesis testing
post-pruning
feature splitting
information gain
correlation
Visualization
frequency polygon
scatter plot
stem and leaf plot
cross-tabulation
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Module 7
Fundamental Big
Data Engineering
(Exam B90.07)
This course explores introductory topics pertaining to
the field of developing data processing solutions–data
engineering–in the context of Big Data environments.
Specifically it covers concepts, techniques and
technologies related to the processing and storage of
Big Data datasets including MapReduce and NoSQL. It
highlights the unique challenges faced when processing
and storing Big Data datasets and further introduces
the main components of Hadoop–the de-facto platform
for data processing and data storage within Big Data
environments.
The following primary topics are covered:
■■ Big Data Storage Terminologies (including sharding,
replication, CAP theorem, ACID, BASE)
■■ Big Data Storage Requirements
■■ On-Disk Storage (including distributed file system –
databases)
■■ Introduction to NoSQL – NewSQL
■■ NoSQL Rationale – Characteristics
■■ NoSQL Database Types (including key-value, document,
column-family and graph databases)
■■ Big Data Processing Requirements, Big Data Processing
(including batch mode and realtime mode)
■■ Introduction to MapReduce for Big Data Processing
(batch mode)
■■ MapReduce Explained (including map, combine,
partition, shuffle and sort, and reduce)
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module7
scalability
redundancy & availability
fast access
Storage Device Characteristics
long-term storage
schema-less storage
inexpensive storage
On-Disk Storage
map
combine (optional)
map task
distributed file system
RDBMS
key-value
database
NoSQL
column-family
NewSQL
document
MapReduce Algorithms
partition
shuffle & sort
graph
reduce task
reduce
Module 7
Fundamental Big Data Engineering
distributed/parallel data processing
schema-less data processing
cluster
batch mode
Processing Engine Characteristics
Fundamental Big Data Processing
multi-workload support
scalability
realtime mode
redundancy & fault-tolerance
low cost
Big Data Storage Terminology & Concepts
master-slave
peer-to-peer
replication
sharding
consistency
CAP theorem
availability
partition tolerance
atomicity
ACID
basically available
consistency
BASE
soft state
isolation
eventual consistency
durability
Big Data Science Certified Professional (BDSCP) Program
Module 7: Fundamental Big Data Engineering
Official Mind Map Supplement
www.bigdatascienceschool.com
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIGDATASCIENCESCHOOL.COM
Big Data Science Professional (BDSCP) Certification Matrix
x
x
B90.05 – Advanced Big Data Analysis & Science
B90.06 – Big Data Analysis & Science Lab
x
B90.03 – Big Data Analysis & Technology Lab
x
x
x
B90.02 – Big Data Analysis & Technology Concepts
B90.04 – Fundamental Big Data Analysis & Science
x
x
x
B90.01 – Fundamental Big Data
x
x
x
Certified
Big Data
Consultant
Certified
Big Data
Scientist
Certified
Big Data Science
Professional
x
x
Certified
Big Data
Engineer
x
x
Certified
Big Data
Architect
x
x
Certified
Big Data
Governance
Specialist
Use this matrix to map exam requirements to certification tracks. These views can help you plan certification paths and discover how
exams that you have passed may already be giving you credit toward additional certifications. This matrix is available online at
www.bigdatascienceschool.com/matrix.
x
x
B90.14 – Advanced Big Data Governance
B90.15 – Big Data Governance Lab
x
B90.12 – Big Data Architecture Lab
x
x
B90.11 – Advanced Big Data Architecture
B90.13 – Fundamental Big Data Governance
x
B90.10 – Fundamental Big Data Architecture
x
B90.09 – Big Data Engineering Lab
x
x
x
B90.08 – Advanced Big Data Engineering
B90.07 – Fundamental Big Data Engineering
Module 8
Advanced Big
Data Engineering
(Exam B90.08)
This course builds upon Module 7 by exploring
advanced topics pertaining to the storage and
processing of Big Data datasets. Specifically it covers
advanced Big Data engineering mechanisms, in-memory
data storage and realtime data processing. The
following primary topics are covered:
■■ Advanced Big Data Engineering Mechanisms (including
serialization & compression engines)
■■ In-Memory Storage Devices, In-Memory Data Grids &
In-Memory Databases
■■ Read-Through, Read-Ahead, Write-Through & WriteBehind Integration Approaches
■■ Polyglot Persistence (including Explanation, Issues &
Recommendations)
■■ Realtime Big Data Processing Concepts (including
Speed Consistency Volume (SCV), Event Stream
Processing (ESP) & Complex Event Processing (CEP))
■■ General Realtime Big Data Processing & Realtime Big
Data Processing & MapReduce
■■ Advanced MapReduce Algorithm Design
■■ Bulk Synchronous Parallel (BSP) Processing Engine & BSP
vs. MapReduce
■■ Graph Data & Graph Data Processing using BSP
■■ Big Data Pipelines (including Definition and Stages)
■■ Big Data with Extract-Load-Transform (ELT)
■■ Big Data Solutions (including Characteristics, Design
Considerations & Design Process)
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module8
scalability
redundancy & availability
fast access
Storage Device Characteristics
long-term storage
schema-less storage
inexpensive storage
On-Disk Storage
map
combine (optional)
map task
distributed file system
RDBMS
key-value
database
NoSQL
column-family
NewSQL
document
MapReduce Algorithms
partition
shuffle & sort
graph
reduce task
reduce
Module 7
Fundamental Big Data Engineering
distributed/parallel data processing
schema-less data processing
cluster
batch mode
Processing Engine Characteristics
Fundamental Big Data Processing
multi-workload support
scalability
realtime mode
redundancy & fault-tolerance
low cost
Big Data Storage Terminology & Concepts
master-slave
peer-to-peer
replication
sharding
consistency
CAP theorem
availability
partition tolerance
atomicity
ACID
consistency
BASE
isolation
basically available
soft state
eventual consistency
durability
Big Data Science Certified Professional (BDSCP) Program
Module 7: Fundamental Big Data Engineering
Official Mind Map Supplement
www.bigdatascienceschool.com
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Module 9
Big Data
Engineering Lab
(Exam B90.09)
This course module covers a series of exercises and
problems designed to test the participant’s ability
to apply knowledge of topics covered previously in
Modules 7 and 8. Completing this lab will help highlight
areas that require further attention, and will further
prove hands-on proficiency in Big Data engineering
practices as they are applied and combined to solve
real-world problems.
As a hands-on lab, this course incorporates a set
of detailed exercises that require participants to
solve various inter-related problems, with the goal
of fostering a comprehensive understanding of how
different data engineering technologies, mechanisms
and techniques can be applied to solve problems in Big
Data environments.
For instructor-led delivery of this lab course, the
Certified Trainer works closely with participants to
ensure that all exercises are carried out completely and
accurately. Attendees can voluntarily have exercises
reviewed and graded as part of the class completion.
For individual completion of this course as part of the
Module 9 Self-Study Kit, a number of supplements are
provided to help participants carry out exercises with
guidance and numerous resource references.
For more information about course materials provided
during instructor-led workshops and as part of self-study
kits, visit: www.bigdatascienceschool.com/courses/module9
scalability
redundancy & availability
fast access
Storage Device Characteristics
long-term storage
schema-less storage
inexpensive storage
On-Disk Storage
map
combine (optional)
map task
shuffle & sort
distributed file system
RDBMS
key-value
database
NoSQL
column-family
NewSQL
document
MapReduce Algorithms
partition
graph
reduce task
reduce
Module 7
Fundamental Big Data Engineering
distributed/parallel data processing
schema-less data processing
cluster
batch mode
Processing Engine Characteristics
Fundamental Big Data Processing
multi-workload support
scalability
realtime mode
redundancy & fault-tolerance
low cost
Big Data Storage Terminology & Concepts
master-slave
peer-to-peer
replication
sharding
consistency
CAP theorem
availability
partition tolerance
atomicity
ACID
basically available
consistency
BASE
soft state
isolation
eventual consistency
durability
Big Data Science Certified Professional (BDSCP) Program
Module 7: Fundamental Big Data Engineering
Official Mind Map Supplement
www.bigdatascienceschool.com
Big Data Science SchoolTM
Big Data Science Certified Professional (BDSCP) Program
www.arcitura.com • www.bigdatascienceschool.com
Copyright © Arcitura Education Inc.
BIGDATASCIENCESCHOOL.COM
Certified Big Data
Science Professional
A Certified Big Data Science Professional has
demonstrated proficiency in the analysis practices
and technology concepts and mechanisms that
comprise and are featured in contemporary Big Data
environments and tools.
The following course modules are part of the official
Big Data Science Professional Certification curriculum:
Module 1: Fundamental Big Data
Foundational course that establishes a
basic understanding of Big Data from
business and technology perspectives,
including common benefits, challenges
and adoption issues.
Module 2: Big Data Analysis & Technology Concepts
Explores contemporary analysis practices,
technologies and tools for Big Data
environments at a conceptual level, focusing
on common analysis functions and features
of Big Data solutions. Module 3: Big Data Analysis & Technology Lab
A hands-on lab providing a series of
real-world exercises for assessing and
establishing Big Data environments, and
for solving problems using Big Data
analysis techniques and tools.
Workshops & Self-Study
Attend an instructor-led
workshop, or purchase the
official Big Data Science
Professional Certification
Self-Study Kit Bundle.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Certified
Big Data Scientist
A Certified Big Data Scientist has demonstrated
proficiency in the application of techniques and tools
required for exploring large volumes of complex
data and the communication of the analysis results.
The courses in this certification track focus on the
application of numerous contemporary analysis and
analytics techniques.
In addition to Modules 1 and 2, the following courses
are part of this certification:
Module 4: Fundamental Big Data Analysis & Science
Essential coverage of Big Data analysis
algorithms, as well as the application
of analytics, data mining and basic
mathematical and statistical techniques.
Module 5: Advanced Big Data Analysis & Science
An in-depth course that covers the
application of a range of advanced
analysis techniques, including machine
learning algorithms, data visualization
and various forms of data preparation
and querying.
Module 6: Big Data Analysis & Science Lab
A case study-based lab providing a
series of real-world exercises that
require participants to apply Big Data
analysis and analytics techniques to
fulfill requirements and solve problems.
Workshops & Self-Study
Attend an instructor-led
workshop or purchase the
official Big Data Scientist
Certification S elf-Study Kit
Bundle.
BIGDATASCIENCESCHOOL.COM
Certified
Big Data Consultant
A Certified Big Data Consultant has demonstrated
proficiency in the most common Big Data analysis
and analytics concepts and techniques, as well as
contemporary Big Data technologies, tools and solution
environments.
addition to Modules 1 and 2, the following courses
In
are part of this certification:
Module 3: Big Data Analysis & Technology Lab
A hands-on lab providing a series of
real-world exercises for assessing and
establishing Big Data environments,
and for solving problems using Big
Data analysis techniques and tools.
Module 4: Fundamental Big Data Analysis & Science
Essential coverage of Big Data analysis
algorithms, as well as the application
of analytics, data mining and basic
mathematical and statistical techniques.
Module 7: Fundamental Big Data Engineering
Focuses on the hands-on usage of the
Hadoop and MapReduce frameworks,
HDFS, Hive, Pig, Sqoop, Flume and
NoSQL databases.
Workshops & Self-Study
Attend an instructor-led
workshop or purchase the
official Big Data Consultant
Certification Self-Study Kit
Bundle.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Certified
Big Data Engineer
A Certified Big Data Engineer has demonstrated
proficiency in utilizing, configuring and programming
an established Big Data solution (using Hadoop,
MapReduce and other tools) to customize and optimize
features in support of Big Data Scientists and general
business requirements.
addition to Modules 1 and 2, the following courses
In
are part of this certification:
Module 7: Fundamental Big Data Engineering
Focuses on the hands-on usage of the
Hadoop and MapReduce frameworks,
HDFS, Hive, Pig, Sqoop, Flume and
NoSQL databases.
Module 8: Advanced Big Data Engineering
Builds upon Module 7 to delve into
advanced development, testing
and debugging techniques, as well as
the application of Big Data
design patterns.
Module 9: Big Data Engineering Lab
A hands-on lab during which
participants carry out a series of
exercises based upon the tools and
technologies covered in preceding
course modules.
Workshops & Self-Study
Attend an instructor-led
workshop or purchase the
official Big Data Engineer
Certification Self-Study
Kit Bundle.
BIGDATASCIENCESCHOOL.COM
Certified
Big Data Architect
A Certified Big Data Architect has demonstrated
proficiency in the design, implementation and
integration of Big Data solutions within IT enterprise
and cloud-based environments. The courses in this
certification track focus on a drill-down perspective of
Big Data platforms and environments via the definition
of mechanisms and architectural design patterns.
In addition to Modules 1 and 2, the following courses
are part of this certification:
Module 10: Fundamental Big Data Architecture
Coverage of the Hadoop stack,
data pipelines and other technology
architecture layers, mechanisms
and components, and associated
design patterns.
Module 11: Advanced Big Data Architecture
Drill-down of Big Data solution
environments, additional advanced
design patterns and coverage of cloud
implementations and various enterprise
integration considerations.
Module 12: Big Data Architecture Lab
A hands-on lab in which a set of realworld exercises challenge participants
to build and integrate Big Data
solutions within IT enterprise and cloudbased environments.
Workshops & Self-Study
Attend an instructor-led
workshop or purchase the
official Big Data Architect
Certification Self-Study
Kit Bundle.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Certified Big Data
Governance Specialist
A Certified Big Data Governance Specialist has
demonstrated proficiency in establishing and
administering Big Data governance frameworks that
standardize and regulate the Big Data lifecycle, the
bodies of data processed by Big Data solutions, as
well as the Big Data environments themselves.
addition to Modules 1 and 2, the following courses
In
are part of this certification:
Module 13: Fundamental Big Data Governance
Introduces Big Data governance
frameworks, and covers the basics of
governing high-volume, multi-source data
and Big Data technology environments.
Module 14: Advanced Big Data Governance
Steps through the Big Data lifecycle
to cover specific precepts, processes
and associated policies for regulating
disparate bodies of data and Big
Data solution environments.
Module 15: Big Data Governance Lab
A hands-on lab during which
participants are required to
work with Big Data governance
precepts, processes and policies
to address a series of real-world
governance concerns.
Workshops & Self-Study
Attend an instructor-led
workshop or purchase
the official Big Data
Governance Specialist
Certification S elf-Study
Kit Bundle.
BIGDATASCIENCESCHOOL.COM
Certified
Big Data Professional
A Certified Big Data Professional has mastered
the fundamental topic areas pertaining to
cloud computing, and has met minimum BDSCP
qualifications by demonstrating proficiency in
at least one a
dditional area.
To achieve this certification, Exam B90.01:
Fundamental Big Data must be completed w
ith a
passing grade, together with a passing g
rade
in any one additional exam from the Big Data
Certified Professional program.
The Certified Big Data Professional designation
can b
e used as a standalone accreditation
to verify f undamental competency. This
certification can a
lso be used as an interim
accreditation for IT p
rofessionals pursuing one
or more specialized c ertification tracks.
BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP)
Arcitura Education
™
The Big Data Science
Certified Professional
program is operated and
overseen by Arcitura
Education Inc., a global
provider of vendorneutral IT training and
accreditation. To learn
more, visit: www.arcitura.com TM
Arcitura Community
Connect with the Arcitura Community via
Facebook, Twitter, LinkedIn and YouTube.
Explore the ever-growing network of
schools, practitioners, instructors, academic
institutions, authors and events.
TM
TM
www.arcitura.com/community
Becoming a Certified Trainer,
Training Partner o
r Reseller
Arcitura provides comprehensive programs
dedicated to the development of certified
trainers and different types of partnerships.
TM
Certified Trainer Guide 2014
TA
Certified Trainer Guide
TM
this document is protected by copyright and legal
privacy and confidentiality regulations.
do not redistribute without permission.
1
To learn more, contact:
[email protected]
ARCITURA.COM
Cloud Certified Professional (CCP)
The Cloud Certified Professional (CCP) program
establishes a series of vendor-neutral industry
certifications dedicated to areas of specialization
in the fields of cloud technology, architecture,
virtualization, storage, capacity management
and networking.
SOA Certified Professional (SOACP)
The SOA Certified Professional (SOACP) program
establishes a series of vendor-neutral industry
certifications dedicated to areas of specialization
in the fields of service-oriented architecture (SOA),
service-orientation and service-oriented computing.
SOASchool.com
SOASchool.com
SOASchool.com
SOASchool.com
SOA CERTIFIED
SOA CERTIFIED
SOA CERTIFIED
SOA CERTIFIED
Professional
Consultant
Analyst
Architect
SOASchool.com
SOASchool.com
SOASchool.com
SOASchool.com
SOASchool.com
SOA CERTIFIED
SOA CERTIFIED
SOA CERTIFIED
SOA CERTIFIED
SOA CERTIFIED
Java Developer
.NET Developer
Governance Specialist
Security Specialist
QA Specialist
Cloud Certified
Professional
(CCP)
SOA Certified
Professional
(SOACP)
Course
Catalog
Course
Catalog
™
™
TM
Provided by Arcitura Education
Ask for the latest
CCP and SOACP
Course Catalogs
TM
Provided by Arcitura Education
www.cloudschool.com • www.cloudselfstudy.com • www.cloudworkshops.com
www.soaschool.com • www.soaselfstudy.com • www.soaworkshops.com
™