Download Big Data Analytics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Database model wikipedia , lookup

Functional Database Model wikipedia , lookup

Transcript
BIG DATA ANALYTICS
A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng
AGENDA
 Big Data Analytics and its Objectives
 Financial Impact
 Structured vs Unstructured Data
 Users of Big Data
 Relevant Technologies ( Hadoop, MongoDB)
 Coding Examples
 Future of Analytics
WHAT IS BIG DATA AND WHY DOES IT MATTER?
 Defining Big Data Analytics
 Examining large sets of data
 Discovering patterns and trends
 Data warehouses are insufficient
 Purposes
 Uncovering hidden needs of
customers
 Improve operational efficiency
BIG DATA & OPERATIONAL EFFICIENCY
 “By using big data for operations analysis,
organizations can gain real-time visibility
into operations, customer experience,
transactions and behavior.” – IBM
 Core Objectives
 Gain
 Analyze
 Apply
 Optimize
FINANCIAL IMPACT OF BIG DATA
 High cost of poor data quality
 3.1 trillion to US government
annually
 10-25% of US business revenues
 Opportunities for qualified analysts
 Business Analyst: $66,000
 Data Analyst: $60,000
 Data Scientist: $113,000
DIMENSIONS OF BIG DATA
 Essential Characteriestics:
Volume - Data quantity
Velocity - Data Speed
 Variety - Data Types
STRUCTURED VS. UNSTRUCTURED DATA
Structured Data
Unstructured Data
• Represented as text
• May be textual or non-textual
• Transactional data, formal
reports, accounting records of
sales and costs
• Mobile usage, click stream
activity, social media responses,
genomic data
• Relational databases / data
warehouse
• No structured database / data
lake
• SQL
• NoSQL (Not only SQL), SQL
Batch Queries
ILLUSTRATIVE EXAMPLE
Inventory Analyst
Insurance Actuary
INTERPRETATIONS
Structured
Data
Big Data Analytics
Big Data Analytics
Structured
Data
USERS OF BIG DATA
 Device manufacturers, ERP providers, consulting firms
comprise 7 of top 10 users Big Data
 Based on a survey conducted by Dell of large corporations
in 2014…
 55% now follow Big Data strategy
 60% of Big Data projects involve a cloud
 32% involve real-time or near real-time processing
 22% use data lake
 20% of projects by outside consultants
HADOOP
 Free, Java-Based programming framework
 Distributes storage and processes large data sets
 Started from a Google File System paper published in
October 2003
 Development was furthered by Apache
 Named after Doug Cutting’s son’s toy elephant (logo!)
WHEN TO USE (AND NOT USE) HADOOP
YES!
 Analytics
 Search
 Data Retention
 Log File processing
 Analysis of Text, Image, Audio, and
Video Content
 Recommendation systems like in ECommerce Websites
NO!
 Low-latency or near real-time data
access
 Large number of small files to process
 Multiple write scenarios requiring
arbitrary writes between files
WHO USES HADOOP?
HADOOP FRAMEWORK
 Hadoop Common: Contains all the libraries and utilities
 Hadoop Distributed File System (HDFS): Storage with high bandwith
 Hadoop YARN: Resource-management platform
 Hadoop MapReduce: Programming Model
 for data processing
HDFS
MAPREDUCE
MAPREDUCE EXAMPLE
MONGODB
MONGODB = “THE DATABASE FOR GIANT IDEAS”
 Cross-platform documentoriented database
 Open-source
 “The database for giant
ideas”
 Founded in 2007 written to
 handle specific problems
with DoubleClick
 Classified as NoSQL
database
MONGODB EXAMPLE
Also, we can practice!
http://www.w3resource.com/mongodbexercises/#PracticeOnline
THE FUTURE OF BIG DATA ANALYTICS
ANY QUESTIONS?