Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Data Science Certified Professional (BDSCP) Course Catalog ™ TM Provided by Arcitura Education Step 1: Get Trained Take instructor-led workshops or purchase Self-Study Kits Step 2: Get Tested Take exams anywhere in the world via Pearson VUE testing centers or Pearson VUE Online Proctoring Step 3: Get Certified Get recognized by attaining one or more industry certifications BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) How to Get Trained ■■Public Workshops Visit www.bigdataworkshops.com for calendar. ■■On-Site Training Contact [email protected] for details. ■■Self-Study Visit www.bigdataselfstudy.com for details. How to Get Tested Exams are available world-wide via regional testing centers. For details, visit www.arcitura.com/exams. Exams can also be proctored on-site as part of private on-site and select public workshops. How to Get Certified Receiving passing grades on the exams that correspond to a certification track results in the automatic issuance of an official certificate. The matrix at the center of this course catalog shows how exams relate to certifications. BIGDATASCIENCESCHOOL.COM The Big Data Science Certified Professional (BDSCP) program is comprised of a comprehensive curriculum of course modules, exams and industry certifications providing IT professionals with the opportunity to obtain formal accreditation in recognition of proficiency in specialized areas of Big Data practice and technology. The BDSCP curriculum is strictly vendor-neutral and aligned with the Big Data industry as a whole. Its academic coverage of contemporary Big Data topics ensures that skills developed through study are applicable to different commercial Big Data vendor tools and environments. This program was developed in cooperation with best-selling author Thomas Erl and several organizations and subject-matter experts. To receive automatic updates about new courses, exams and certifications, send a blank e-mail to notify@ arcitura.com. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Module 1 Fundamental Big Data (Exam B90.01) This foundational course provides a high-level overview of essential Big Data topic areas. A basic understanding of Big Data from business and technology perspectives is provided, along with an overview of common benefits, challenges and adoption issues. The following primary topics are covered: ■■ Fundamental Terminology and Concepts ■■ A Brief History of Big Data ■■ Business Drivers leading to Big Data Innovations ■■ Characteristics of Big Data ■■ Benefits of Adopting Big Data ■■ Challenges and Limitations of Big Data ■■ Basic Big Data Analytics ■■ Big Data and Traditional Business Intelligence and Data Warehouses ■■ Big Data Visualization ■■ Common Adoption Issues ■■ Planning for Big Data Initiatives ■■ New Roles Introduced by Big Data Projects ■■ Emerging Trends For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module1 BDSCP Master Symbol Legend metadata semi-structured structured unstructured resource manager workflow engine processing engine analytics engine coordination engine query engine data analysis metadata data transfer engine analytics data mining Big Data Drivers Big Data Types analytics & data science digitization affordable technology & commodity hardware social media hyper-connected communities & devices cloud computing bulk import Enterprise Technologies <XML> JSON XML data JSON data dataset 6,8,9 dataset Oct structured data ‘12 ‘13 ‘14 predictive analytics prescriptive analytics OLTP OLAP digitization fast simple queries dashboard NoSQL repository or storage device storage device human data interpretation processing drill down roll up KPI dashboard ERP database smart meter ETL data mart data warehouse SCM database RDBMS database machine learning query interface sensor data relational/ tabular data queue application tool Dec report Nov textual data GPS RDBMS or DBMS Data Analysis workstation data mining hard drive processor are comprised of is closely related to automate machine learning data analysis human attacker OLAP database video NoSQL database OLTP database audio DVD conflict analytics RFID Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. use uses Value Variety Velocity Veracity Volume Terminology & Concepts Analytics Business Intelligence Big Data Characteristics is a type of Big Data Science Certified Professional (BDSCP) Program Module 1: Fundamental Big Data Official Mind Map Supplement www.bigdatascienceschool.com tool features aggregation drill-down filter roll-up what-if analysis types types human-generated machine-generated descriptive diagnostic predictive prescriptive traditional BI Big Data BI Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. data warehouse uses data stored in interfaces with/ provides analyze quantitative analysis quantitative analysis qualitative analysis data mining types Module 1 Fundamental Big Data analytics business intelligence dataset data analysis key performance indicator (KPI) Data Visualization server online transaction processing (OLTP) online analytical processing (OLAP) extract-transform-load (ETL) data warehouse data mart Hadoop Adoption & Planning Considerations Big Data Sources CRM database image file CRM ERP SCM business justification data procurement organizational prerequisites privacy provenance security limited realtime support distinct performance challenges distinct governance requirements distinct methodology cloud computing business intelligence uses is a type of OLAP uses feeds data into reports qualitative analysis KPIs ETL uses OLTP is used for is used for commodity hardware generally uses can process is an open-source implementation of large datasets Big Data Science Certified Professional (BDSCP) Program Module 1: Fundamental Big Data Official Relationship Map Supplement www.bigdatascienceschool.com Hadoop Big Data solution provides storage for NoSQL database Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIGDATASCIENCESCHOOL.COM Module 2 Big Data Analysis & Technology Concepts (Exam B90.02) This course explores a range of the most relevant topics that pertain to contemporary analysis practices, technologies and tools for Big Data environments. The following primary topics are covered: ■■ The Big Data Analysis Lifecycle (from Dataset Identification to Integration, Analysis and Visualization) ■■ Common Analysis and Analytics Techniques (including A/B Testing, Regression, Correlation, Text Analytics, Sentiment Analysis, Time Series Analysis, Network Analysis, Spatial Analysis) ■■ Automated Recommendation, Classification, Clustering, Machine Language, Natural Language, Semantics, Data Visualization and Visual Analysis ■■ Assessing Hierarchies, Part-to-Whole Relationships, Plotting Connections and Relationships, Mapping GeoSpatial Data ■■ Foundational Big Data Technology Mechanisms, Big Data and Cloud Computing ■■ Big Data Storage (Query Workload, Sharding, Replication, CAP, ACID, BASE) ■■ Big Data Processing (Parallel Data Processing, Distributed Data Processing, Shared-Everything/Nothing Architecture, SCV) For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module2 external datasets internal datasets alerts applications business process optimization Business Case Evaluation Data Identification of Data Acquisition & Filtering Data Extraction Data Validation & Cleansing Data Aggregation & Representation Data Analysis Data Visualization Utilization of Analysis Results Big Data Analysis Lifecycle Stages via Big Data Technology Components & Concepts Clusters File Systems & Distributed File Systems NoSQL Distributed Data Processing Parallel Data Processing batch Processing Workloads types transactional Cloud Computing Module 2 Big Data Analysis & Technology Concepts A/B Testing Correlation Regression Statistical Analysis Natural Language Processing Sentiment Analysis Text Analytics Semantic Analysis Classification Clustering Outlier Detection Filtering Machine Learning Heat Maps Network Analysis Spatial Data Analysis Time Series Analysis Visual Analysis Big Data Analysis Techniques Big Data Science Certified Professional (BDSCP) Program Module 2: Big Data Analysis & Technology Concepts Official Mind Map Supplement www.bigdatascienceschool.com Big Data Mechanisms Analytics Engine Coordination Engine Data Transfer Engine Processing Engine Query Engine Resource Manager Storage Device Workflow Engine types Ingress Egress Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Module 3 Big Data Analysis & Technology Lab (Exam B90.03) This course module presents participants with a series of exercises and problems designed to test their ability to apply knowledge of topics covered previously in Modules 1 and 2. Completing this lab will help foster cross-topic proficiency and will assist in highlighting areas that require further attention. As a hands-on lab, this course provides a set of detailed exercises that require participants to solve a number of inter-related problems, with the goal of fostering a comprehensive understanding of how Big Data environments work from both front and back-ends, and how they are used to solve real-world analysis and analytics problems. For instructor-led delivery of this lab course, the Certified Trainer works closely with participants to ensure that all exercises are carried out completely and accurately. Attendees can voluntarily have exercises reviewed and graded as part of the class completion. For individual completion of this course as part of the Module 3 Self-Study Kit, a number of supplements are provided to help participants carry out exercises with guidance and numerous resource references. For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module3 external datasets internal datasets Analytics Engine Business Case Evaluation of Coordination Engine Data Identification Data Transfer Engine Data Acquisition & Filtering Data Extraction Data Validation & Cleansing Processing Engine Big Data Mechanics Big Data Analysis Lifecycle Stages applications via Ingress Egress Storage Device Data Analysis Workflow Engine Data Visualization alerts Visual Analysis types Query Engine Resource Manager Data Aggregation & Representation Heat Maps Network Analysis Spatial Data Analysis Clusters Utilization of Analysis Results File Systems & Distributed File business process optimization Systems Time Series Analysis NoSQL Big Data Technology Components & Concepts Classification Clustering Distributed Data Processing Parallel Data Processing Machine Learning Outlier Detection Processing Workloads Big Data Analysis Techniques Filtering types Cloud Computing batch transactional Natural Language Processing Semantic Analysis Sentiment Analysis analytics & data science Text Analytics digitization A/B Testing affordable technology & commodity hardware Big Data Drivers Statistical Analysis Correlation social media Regression hyper-connected communities & devices cloud computing business justification data procurement Module 3 Big Data Analysis & Technology Lab organizational prerequisites privacy provenance security online transaction processing (OLTP) online analytical processing (OLAP) Enterprise Technologies Adoption & Planning Considerations extract-transform-load (ETL) data warehouse data mart limited realtime support Hadoop distinct performance challenges distinct governance requirements distinct methodology cloud computing Data Analytics quantitative analysis types qualitative analysis data mining analytics business intelligence dataset Terminology & Concepts data analysis Big Data Sources key performance indicator (KPI) metadata semi-structured structured human-generated machine-generated descriptive Big Data Types Analytics unstructured types diagnostic predictive prescriptive Business Intelligence value types traditional BI Big Data BI variety velocity Big Data Characteristics aggregation veracity Big Data Science Certified Professional (BDSCP) Program Module 3: Big Data Analysis & Technology Lab Official Mind Map Supplement www.bigdatascienceschool.com volume Data Visualization drill-down tool features filter roll-up what-if analysis Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIGDATASCIENCESCHOOL.COM Module 4 Fundamental Big Data Analysis & Science (Exam B90.04) This course provides an in-depth overview of essential topic areas pertaining to data science and analysis techniques relevant and unique to Big Data, with an emphasis on how analysis and analytics need to be carried out individually and collectively in support of the distinct characteristics, requirements and challenges associated with Big Data datasets. The following primary topics are covered: ■■ Data Science, Data Mining and Data Modeling ■■ Big Data Dataset Categories ■■ Exploratory Data Analysis (EDA) (including numerical summaries, rules, data reduction) ■■ EDA Analysis Types (including univariate, bivariate, multivariate) ■■ Essential Statistics (including variable categories, relevant mathematics) ■■ Statistics Analysis (including descriptive, inferential, correlation, covariance, hypothesis testing) ■■ Data Munging and Machine Learning ■■ Variables and Basic Mathematical Notations ■■ Statistical Measures and Statistical Inference ■■ Distributions and Data Processing Techniques ■■ Data Discretization, Binning and Clustering ■■ Visualization Techniques and Numerical Summaries For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module4 interquartile range (IQR) mean measures of central tendency measures of variation or dispersion median numerical summaries mode measures of association Chebyshev’s inequality rule empirical rule Statistics Mathematics rules robustness range quantile quintile forward selection backward elimination quartile dimensionality reduction decision tree induction feature extraction Exploratory Data Analysis data reduction binning bivariate analysis population variance standard deviation z-score univariate analysis multivariate analysis percentile bias data discretization clustering distributions frequency probability analysis types discrete standard error continuous statistical estimator sampling confidence interval binomial Module 4 Fundamental Big Data Analysis & Science bar chart line graph skewness geometric point estimator interval estimator positively skewed negatively skewed normal uniform histogram frequency polygons Visualization scatter plot nominal stem and leaf plot ordinal cross-tabulation box and whisker plot Statistics Variable Categories quantile-quantile (q-q)plot binary quantitative independent lattice plot random high-volume high-velocity high-variety qualitative descriptive statistics Big Data Dataset Categories high-veracity high-value inferential statistics Statistics Analysis correlation covariance hypothesis testing null hypothesis alternative hypothesis statistical significance p-value type I error type II error Big Data Science Certified Professional (BDSCP) Program Module 4: Fundamental Big Data Analysis & Science Official Mind Map Supplement www.bigdatascienceschool.com critical region Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Module 5 Advanced Big Data Analysis & Science (Exam B90.05) This course delves into a range of advanced data analysis practices and analysis techniques that are explored within the context of Big Data. The course content focuses on topics that enable participants to develop a thorough understanding of statistical, modeling and analysis techniques for data patterns, clusters and text analytics, as well as the identification of outliers and errors that affect the significance and accuracy of predictions made on Big Data datasets. The following primary topics are covered: ■■ Statistical Models, Model Evaluation Measures (including cross-validation, bias-variance, confusion matrix, f-score) ■■ Machine Learning Algorithms, Pattern Identification (including association rules, Apriori algorithm) ■■ Advanced Statistical Techniques (including parametric vs. non-parametric, clustering vs. non-clustering distancebased, supervised vs. semi-supervised) ■■ Linear Regression and Logistic Regression for Big Data ■■ Decision Trees for Big Data ■■ Classification Rules for Big Data ■■ K Nearest Neighbor (kNN) for Big Data ■■ Naïve Bayes for Big Data ■■ Association Rules for Big Data ■■ K-Means for Big Data ■■ Text Analytics for Big Data ■■ Outlier Detection for Big Data For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module5 logistic regression linear regression decision trees global contextual pre-pruning outlier types collective post-pruning Outlier Detection feature splitting Classification entropy parametric non-parametric information gain statistical techniques classification rules distance-based/unsupervised rule-based model supervised k-means semi-supervised cluster-based local outlier factors (CBLOF) clustering naïve Bayes one rule (1R) algorithm k nearest neighbor (kNN) Bayes’ theorem Laplace smoothing non-clustering Module 5 Advanced Big Data Analysis & Science Modeling machine learning algorithms linear regression statistical models mean squared error predictive modeling error term feature vector residual instance/example trivial coefficient of determination R2 target actionable standard error of estimate concept association rules inexplicable assign k-means update stages Apriori algorithm clustering sensitivity Pattern Identification specificity reassignment inverse document frequency (IDF) term frequency inverse document frequency (TFIDF) recall bag of words term frequency cosine distance n-grams token/term document named entity extraction text representation precision text analytics Model Evaluation Measures accuracy error rate f-score confusion matrix cross-validation bias-variance corpus Big Data Science Certified Professional (BDSCP) Program Module 5: Advanced Big Data Analysis & Science Official Mind Map Supplement www.bigdatascienceschool.com Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIGDATASCIENCESCHOOL.COM Module 6 Big Data Analysis & Science Lab (Exam B90.06) This course module covers a series of exercises and problems designed to test the participant’s ability to apply knowledge of topics covered previously in Modules 4 and 5. Completing this lab will help highlight areas that require further attention, and will further prove hands-on proficiency in Big Data analysis and science practices as they are applied and combined to solve real-world problems. As a hands-on lab, this course incorporates a set of detailed exercises that require participants to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data analysis techniques can be applied to solve problems in Big Data environments and used to make significant, relevant predictions that offer increased business value. For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module6 high-volume actionable trivial k-means Big Data Dataset Categories Apriori algorithm clustering stages text representation measures of central tendency high-velocity association rules inexplicable assign update reassignment Pattern Identification measures of variation or dispersion numerical summaries high-variety measures of association high-veracity high-value Chebyshev’s inequality rule rules empirical rule text analytics bag of words box and whisker plot term frequency inverse document frequency (IDF) cosine distance term frequency inverse document frequency (TFIDF) Exploratory Data Analysis n-grams plots quantile-quantile (q-q) plot lattice plot dimensionality reduction data reduction global contextual Outlier Detection semi-supervised clustering clustering bivariate analysis analysis types multivariate analysis time series analysis supervised k-means feature extraction univariate analysis statistical techniques non-parametric distance-based/unsupervised cluster-based local outlier factors (CBLOF) binning data discretization outlier types collective parametric forward selection backward elimination decision tree induction named entity extraction mean non-clustering median mode sensitivity robustness specificity range recall Module 6 Big Data Analysis & Science Lab precision accuracy error rate Model Evaluation Measures quantile Statistics Mathematics quintile quartile percentile f-score population confusion matrix frequency bias cross-validation probability variance bias-variance standard deviation z-score error term residual discrete continuous sampling distributions binomial linear regression geometric mean squared error standard error statistical estimator confidence interval skewness Poisson Statistical Models normal uniform coefficient of determination R2 standard error of estimate descriptive statistics inferential statistics logistic regression Statistics Analysis decision trees pre-pruning entropy one rule (1R) algorithm k nearest neighbor (kNN) nominal ordinal Classification Statistics Variable Categories classification rules binary null hypothesis alternative hypothesis statistical significance p-value type I error type II error quantitative independent rule-based model bar chart naïve Bayes random line graph Bayes’ theorem histogram Laplace smoothing Big Data Science Certified Professional (BDSCP) Program Module 6: Big Data Analysis & Science Lab Official Mind Map Supplement www.bigdatascienceschool.com covariance hypothesis testing post-pruning feature splitting information gain correlation Visualization frequency polygon scatter plot stem and leaf plot cross-tabulation Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Module 7 Fundamental Big Data Engineering (Exam B90.07) This course explores introductory topics pertaining to the field of developing data processing solutions–data engineering–in the context of Big Data environments. Specifically it covers concepts, techniques and technologies related to the processing and storage of Big Data datasets including MapReduce and NoSQL. It highlights the unique challenges faced when processing and storing Big Data datasets and further introduces the main components of Hadoop–the de-facto platform for data processing and data storage within Big Data environments. The following primary topics are covered: ■■ Big Data Storage Terminologies (including sharding, replication, CAP theorem, ACID, BASE) ■■ Big Data Storage Requirements ■■ On-Disk Storage (including distributed file system – databases) ■■ Introduction to NoSQL – NewSQL ■■ NoSQL Rationale – Characteristics ■■ NoSQL Database Types (including key-value, document, column-family and graph databases) ■■ Big Data Processing Requirements, Big Data Processing (including batch mode and realtime mode) ■■ Introduction to MapReduce for Big Data Processing (batch mode) ■■ MapReduce Explained (including map, combine, partition, shuffle and sort, and reduce) For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module7 scalability redundancy & availability fast access Storage Device Characteristics long-term storage schema-less storage inexpensive storage On-Disk Storage map combine (optional) map task distributed file system RDBMS key-value database NoSQL column-family NewSQL document MapReduce Algorithms partition shuffle & sort graph reduce task reduce Module 7 Fundamental Big Data Engineering distributed/parallel data processing schema-less data processing cluster batch mode Processing Engine Characteristics Fundamental Big Data Processing multi-workload support scalability realtime mode redundancy & fault-tolerance low cost Big Data Storage Terminology & Concepts master-slave peer-to-peer replication sharding consistency CAP theorem availability partition tolerance atomicity ACID basically available consistency BASE soft state isolation eventual consistency durability Big Data Science Certified Professional (BDSCP) Program Module 7: Fundamental Big Data Engineering Official Mind Map Supplement www.bigdatascienceschool.com Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIGDATASCIENCESCHOOL.COM Big Data Science Professional (BDSCP) Certification Matrix x x B90.05 – Advanced Big Data Analysis & Science B90.06 – Big Data Analysis & Science Lab x B90.03 – Big Data Analysis & Technology Lab x x x B90.02 – Big Data Analysis & Technology Concepts B90.04 – Fundamental Big Data Analysis & Science x x x B90.01 – Fundamental Big Data x x x Certified Big Data Consultant Certified Big Data Scientist Certified Big Data Science Professional x x Certified Big Data Engineer x x Certified Big Data Architect x x Certified Big Data Governance Specialist Use this matrix to map exam requirements to certification tracks. These views can help you plan certification paths and discover how exams that you have passed may already be giving you credit toward additional certifications. This matrix is available online at www.bigdatascienceschool.com/matrix. x x B90.14 – Advanced Big Data Governance B90.15 – Big Data Governance Lab x B90.12 – Big Data Architecture Lab x x B90.11 – Advanced Big Data Architecture B90.13 – Fundamental Big Data Governance x B90.10 – Fundamental Big Data Architecture x B90.09 – Big Data Engineering Lab x x x B90.08 – Advanced Big Data Engineering B90.07 – Fundamental Big Data Engineering Module 8 Advanced Big Data Engineering (Exam B90.08) This course builds upon Module 7 by exploring advanced topics pertaining to the storage and processing of Big Data datasets. Specifically it covers advanced Big Data engineering mechanisms, in-memory data storage and realtime data processing. The following primary topics are covered: ■■ Advanced Big Data Engineering Mechanisms (including serialization & compression engines) ■■ In-Memory Storage Devices, In-Memory Data Grids & In-Memory Databases ■■ Read-Through, Read-Ahead, Write-Through & WriteBehind Integration Approaches ■■ Polyglot Persistence (including Explanation, Issues & Recommendations) ■■ Realtime Big Data Processing Concepts (including Speed Consistency Volume (SCV), Event Stream Processing (ESP) & Complex Event Processing (CEP)) ■■ General Realtime Big Data Processing & Realtime Big Data Processing & MapReduce ■■ Advanced MapReduce Algorithm Design ■■ Bulk Synchronous Parallel (BSP) Processing Engine & BSP vs. MapReduce ■■ Graph Data & Graph Data Processing using BSP ■■ Big Data Pipelines (including Definition and Stages) ■■ Big Data with Extract-Load-Transform (ELT) ■■ Big Data Solutions (including Characteristics, Design Considerations & Design Process) For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module8 scalability redundancy & availability fast access Storage Device Characteristics long-term storage schema-less storage inexpensive storage On-Disk Storage map combine (optional) map task distributed file system RDBMS key-value database NoSQL column-family NewSQL document MapReduce Algorithms partition shuffle & sort graph reduce task reduce Module 7 Fundamental Big Data Engineering distributed/parallel data processing schema-less data processing cluster batch mode Processing Engine Characteristics Fundamental Big Data Processing multi-workload support scalability realtime mode redundancy & fault-tolerance low cost Big Data Storage Terminology & Concepts master-slave peer-to-peer replication sharding consistency CAP theorem availability partition tolerance atomicity ACID consistency BASE isolation basically available soft state eventual consistency durability Big Data Science Certified Professional (BDSCP) Program Module 7: Fundamental Big Data Engineering Official Mind Map Supplement www.bigdatascienceschool.com Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Module 9 Big Data Engineering Lab (Exam B90.09) This course module covers a series of exercises and problems designed to test the participant’s ability to apply knowledge of topics covered previously in Modules 7 and 8. Completing this lab will help highlight areas that require further attention, and will further prove hands-on proficiency in Big Data engineering practices as they are applied and combined to solve real-world problems. As a hands-on lab, this course incorporates a set of detailed exercises that require participants to solve various inter-related problems, with the goal of fostering a comprehensive understanding of how different data engineering technologies, mechanisms and techniques can be applied to solve problems in Big Data environments. For instructor-led delivery of this lab course, the Certified Trainer works closely with participants to ensure that all exercises are carried out completely and accurately. Attendees can voluntarily have exercises reviewed and graded as part of the class completion. For individual completion of this course as part of the Module 9 Self-Study Kit, a number of supplements are provided to help participants carry out exercises with guidance and numerous resource references. For more information about course materials provided during instructor-led workshops and as part of self-study kits, visit: www.bigdatascienceschool.com/courses/module9 scalability redundancy & availability fast access Storage Device Characteristics long-term storage schema-less storage inexpensive storage On-Disk Storage map combine (optional) map task shuffle & sort distributed file system RDBMS key-value database NoSQL column-family NewSQL document MapReduce Algorithms partition graph reduce task reduce Module 7 Fundamental Big Data Engineering distributed/parallel data processing schema-less data processing cluster batch mode Processing Engine Characteristics Fundamental Big Data Processing multi-workload support scalability realtime mode redundancy & fault-tolerance low cost Big Data Storage Terminology & Concepts master-slave peer-to-peer replication sharding consistency CAP theorem availability partition tolerance atomicity ACID basically available consistency BASE soft state isolation eventual consistency durability Big Data Science Certified Professional (BDSCP) Program Module 7: Fundamental Big Data Engineering Official Mind Map Supplement www.bigdatascienceschool.com Big Data Science SchoolTM Big Data Science Certified Professional (BDSCP) Program www.arcitura.com • www.bigdatascienceschool.com Copyright © Arcitura Education Inc. BIGDATASCIENCESCHOOL.COM Certified Big Data Science Professional A Certified Big Data Science Professional has demonstrated proficiency in the analysis practices and technology concepts and mechanisms that comprise and are featured in contemporary Big Data environments and tools. The following course modules are part of the official Big Data Science Professional Certification curriculum: Module 1: Fundamental Big Data Foundational course that establishes a basic understanding of Big Data from business and technology perspectives, including common benefits, challenges and adoption issues. Module 2: Big Data Analysis & Technology Concepts Explores contemporary analysis practices, technologies and tools for Big Data environments at a conceptual level, focusing on common analysis functions and features of Big Data solutions. Module 3: Big Data Analysis & Technology Lab A hands-on lab providing a series of real-world exercises for assessing and establishing Big Data environments, and for solving problems using Big Data analysis techniques and tools. Workshops & Self-Study Attend an instructor-led workshop, or purchase the official Big Data Science Professional Certification Self-Study Kit Bundle. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Certified Big Data Scientist A Certified Big Data Scientist has demonstrated proficiency in the application of techniques and tools required for exploring large volumes of complex data and the communication of the analysis results. The courses in this certification track focus on the application of numerous contemporary analysis and analytics techniques. In addition to Modules 1 and 2, the following courses are part of this certification: Module 4: Fundamental Big Data Analysis & Science Essential coverage of Big Data analysis algorithms, as well as the application of analytics, data mining and basic mathematical and statistical techniques. Module 5: Advanced Big Data Analysis & Science An in-depth course that covers the application of a range of advanced analysis techniques, including machine learning algorithms, data visualization and various forms of data preparation and querying. Module 6: Big Data Analysis & Science Lab A case study-based lab providing a series of real-world exercises that require participants to apply Big Data analysis and analytics techniques to fulfill requirements and solve problems. Workshops & Self-Study Attend an instructor-led workshop or purchase the official Big Data Scientist Certification S elf-Study Kit Bundle. BIGDATASCIENCESCHOOL.COM Certified Big Data Consultant A Certified Big Data Consultant has demonstrated proficiency in the most common Big Data analysis and analytics concepts and techniques, as well as contemporary Big Data technologies, tools and solution environments. addition to Modules 1 and 2, the following courses In are part of this certification: Module 3: Big Data Analysis & Technology Lab A hands-on lab providing a series of real-world exercises for assessing and establishing Big Data environments, and for solving problems using Big Data analysis techniques and tools. Module 4: Fundamental Big Data Analysis & Science Essential coverage of Big Data analysis algorithms, as well as the application of analytics, data mining and basic mathematical and statistical techniques. Module 7: Fundamental Big Data Engineering Focuses on the hands-on usage of the Hadoop and MapReduce frameworks, HDFS, Hive, Pig, Sqoop, Flume and NoSQL databases. Workshops & Self-Study Attend an instructor-led workshop or purchase the official Big Data Consultant Certification Self-Study Kit Bundle. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Certified Big Data Engineer A Certified Big Data Engineer has demonstrated proficiency in utilizing, configuring and programming an established Big Data solution (using Hadoop, MapReduce and other tools) to customize and optimize features in support of Big Data Scientists and general business requirements. addition to Modules 1 and 2, the following courses In are part of this certification: Module 7: Fundamental Big Data Engineering Focuses on the hands-on usage of the Hadoop and MapReduce frameworks, HDFS, Hive, Pig, Sqoop, Flume and NoSQL databases. Module 8: Advanced Big Data Engineering Builds upon Module 7 to delve into advanced development, testing and debugging techniques, as well as the application of Big Data design patterns. Module 9: Big Data Engineering Lab A hands-on lab during which participants carry out a series of exercises based upon the tools and technologies covered in preceding course modules. Workshops & Self-Study Attend an instructor-led workshop or purchase the official Big Data Engineer Certification Self-Study Kit Bundle. BIGDATASCIENCESCHOOL.COM Certified Big Data Architect A Certified Big Data Architect has demonstrated proficiency in the design, implementation and integration of Big Data solutions within IT enterprise and cloud-based environments. The courses in this certification track focus on a drill-down perspective of Big Data platforms and environments via the definition of mechanisms and architectural design patterns. In addition to Modules 1 and 2, the following courses are part of this certification: Module 10: Fundamental Big Data Architecture Coverage of the Hadoop stack, data pipelines and other technology architecture layers, mechanisms and components, and associated design patterns. Module 11: Advanced Big Data Architecture Drill-down of Big Data solution environments, additional advanced design patterns and coverage of cloud implementations and various enterprise integration considerations. Module 12: Big Data Architecture Lab A hands-on lab in which a set of realworld exercises challenge participants to build and integrate Big Data solutions within IT enterprise and cloudbased environments. Workshops & Self-Study Attend an instructor-led workshop or purchase the official Big Data Architect Certification Self-Study Kit Bundle. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Certified Big Data Governance Specialist A Certified Big Data Governance Specialist has demonstrated proficiency in establishing and administering Big Data governance frameworks that standardize and regulate the Big Data lifecycle, the bodies of data processed by Big Data solutions, as well as the Big Data environments themselves. addition to Modules 1 and 2, the following courses In are part of this certification: Module 13: Fundamental Big Data Governance Introduces Big Data governance frameworks, and covers the basics of governing high-volume, multi-source data and Big Data technology environments. Module 14: Advanced Big Data Governance Steps through the Big Data lifecycle to cover specific precepts, processes and associated policies for regulating disparate bodies of data and Big Data solution environments. Module 15: Big Data Governance Lab A hands-on lab during which participants are required to work with Big Data governance precepts, processes and policies to address a series of real-world governance concerns. Workshops & Self-Study Attend an instructor-led workshop or purchase the official Big Data Governance Specialist Certification S elf-Study Kit Bundle. BIGDATASCIENCESCHOOL.COM Certified Big Data Professional A Certified Big Data Professional has mastered the fundamental topic areas pertaining to cloud computing, and has met minimum BDSCP qualifications by demonstrating proficiency in at least one a dditional area. To achieve this certification, Exam B90.01: Fundamental Big Data must be completed w ith a passing grade, together with a passing g rade in any one additional exam from the Big Data Certified Professional program. The Certified Big Data Professional designation can b e used as a standalone accreditation to verify f undamental competency. This certification can a lso be used as an interim accreditation for IT p rofessionals pursuing one or more specialized c ertification tracks. BIG DATA SCIENCE CERTIFIED PROFESSIONAL (BDSCP) Arcitura Education ™ The Big Data Science Certified Professional program is operated and overseen by Arcitura Education Inc., a global provider of vendorneutral IT training and accreditation. To learn more, visit: www.arcitura.com TM Arcitura Community Connect with the Arcitura Community via Facebook, Twitter, LinkedIn and YouTube. Explore the ever-growing network of schools, practitioners, instructors, academic institutions, authors and events. TM TM www.arcitura.com/community Becoming a Certified Trainer, Training Partner o r Reseller Arcitura provides comprehensive programs dedicated to the development of certified trainers and different types of partnerships. TM Certified Trainer Guide 2014 TA Certified Trainer Guide TM this document is protected by copyright and legal privacy and confidentiality regulations. do not redistribute without permission. 1 To learn more, contact: [email protected] ARCITURA.COM Cloud Certified Professional (CCP) The Cloud Certified Professional (CCP) program establishes a series of vendor-neutral industry certifications dedicated to areas of specialization in the fields of cloud technology, architecture, virtualization, storage, capacity management and networking. SOA Certified Professional (SOACP) The SOA Certified Professional (SOACP) program establishes a series of vendor-neutral industry certifications dedicated to areas of specialization in the fields of service-oriented architecture (SOA), service-orientation and service-oriented computing. SOASchool.com SOASchool.com SOASchool.com SOASchool.com SOA CERTIFIED SOA CERTIFIED SOA CERTIFIED SOA CERTIFIED Professional Consultant Analyst Architect SOASchool.com SOASchool.com SOASchool.com SOASchool.com SOASchool.com SOA CERTIFIED SOA CERTIFIED SOA CERTIFIED SOA CERTIFIED SOA CERTIFIED Java Developer .NET Developer Governance Specialist Security Specialist QA Specialist Cloud Certified Professional (CCP) SOA Certified Professional (SOACP) Course Catalog Course Catalog ™ ™ TM Provided by Arcitura Education Ask for the latest CCP and SOACP Course Catalogs TM Provided by Arcitura Education www.cloudschool.com • www.cloudselfstudy.com • www.cloudworkshops.com www.soaschool.com • www.soaselfstudy.com • www.soaworkshops.com ™