Download Chapter 11 - Faculty Server Contact

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Data center wikipedia , lookup

Predictive analytics wikipedia , lookup

Data analysis wikipedia , lookup

Database model wikipedia , lookup

Data vault modeling wikipedia , lookup

Forecasting wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Apache Hadoop wikipedia , lookup

Web analytics wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
CHAPTER 11:
BIG DATA AND ANALYTICS
Modern Database Management
12th Edition
Jeff Hoffer, Ramesh Venkataraman,
Heikki Topi
Copyright © 2016 Pearson Education, Inc.
INTRODUCTION
 Big
Data
 Data
that exist in very large volumes and many
different varieties (data types) and that need
to be processed at a very high velocity (speed).
 Analytics
 Analytics
is the systematic process of
transforming raw data into knowledge and
insight for making better decisions.
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-2
CHARACTERISTICS OF BIG DATA

The Three (Five) Vs of Big Data
 Volume
– much larger quantity of data than typical for
relational databases
 Variety
– lots of different data types and formats
 Velocity
– data comes at very fast rate (e.g. mobile
sensors, web click stream)
 Veracity
– traditional data quality methods don’t apply;
how to judge the data’s accuracy and relevance?
 Value
– big data is valuable to the bottom line, and for
fostering good organizational actions and decisions
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-3
Figure 11-2 Schema on write vs. schema on read
Traditional
database
design
The big data
approach
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-4
Figure 11-1 Examples of JSON and XML
JavaScript Object
Notation
eXtensible Markup
Language
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-5
NOSQL






NoSQL = Not Only SQL
A category of recently introduced data storage and
retrieval technologies not based on the relational
model
Scaling out (more servers) rather than scaling up
(a larger server)
Natural for a cloud environment
Supports schema on read
Largely open source
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-6
Figure 11-3 Four types of NoSQL databases
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-7
MAPREDUCE




Computation model for massive parallel processing of
various types of computing tasks in an environment
consisting of a large number of commodity servers
Core idea – divide a computing task so that multiple
nodes can work on it at the same time
Each node works on local data doing local processing.
Two stages:
 Map stage – divide for local processing
 Reduce stage – integrate the results of the
individual map processes
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-8
Figure 11-6 Schematic representation of MapReduce
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-9
HADOOP




Hadoop is an open source implementation
framework of MapReduce on HDFS.
Hadoop Distributed File System (HDFS) is a file
system designed for managing a large number of
potentially very large files in a highly distributed
environment
HDFS is a file system, not a DBMS, not relational
Hadoop is the most talked about Big-Data data
management product today
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-10
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)





Breaks data into blocks and distributes them on
various computers (nodes) throughout a Hadoop
cluster
Each cluster consists of a NameNode (master
server) and some DataNodes (slaves)
Overall control through YARN (“yet another resource
allocator”)
No updates to existing data in files, just appending
to files
Move computation to the data
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-11
INTEGRATED DATA ARCHITECTURE
Figure 11-8 Teradata Unified Data Architecture – logical view
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-12
TYPES OF ANALYTICS



Descriptive analytics – describes the past status of
the domain of interest using a variety of tools through
techniques such as reporting and data visualization
Predictive analytics – applies statistical and
computational methods and models to data regarding
past and current events to predict what might happen
in the future
Prescriptive analytics –uses results of predictive
analytics along with optimization and simulation tools
to recommend actions that will lead to a desired
outcome
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-13
ONLINE ANALYTICAL PROCESSING (OLAP)



Online Analytical Processing (OLAP) -- the use of a
set of graphical tools that provides users with
multidimensional views of their data and allows
them to analyze the data using simple windowing
techniques
Mainly for descriptive analytics
OLAP operations



Slicing a (hyper)cube
Pivoting
Drill down/Roll-up
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-14
Figure 11-12 Slicing a data cube
Slicing, dicing, pivoting, and drill-down are useful cube operations
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-15
Summary report
Figure 11-13
Example of drill-down
Starting with summary
data, users can obtain
details for particular
cells.
Chapter 11
Drill-down with
color added
Copyright © 2016 Pearson Education, Inc.
11-16
PREDICTIVE ANALYTICS



Predictive analytics uses statistical and computational
methods for prediction based on past and current data
Data mining is the systematic process of discovering
hidden patterns and knowledge in data
Data mining techniques and applications
 Classification – credit evaluation, fraud detection
 Clustering - market segmentation
 Association - market basket analysis, Web usage
analysis
 (Numeric) Prediction - stock price forecasting
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-17
Chapter 11
Copyright © 2016 Pearson Education, Inc.
11-18