Download High Performance Analytics and the Challenges of Big Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
High Performance Analytics and the Challenges of Big Data
Toronto Area SAS Users Group
12 Dec 2014
Charu Shankar, SAS Technical Training Specialist
Copyright © 2012, SAS Institute Inc. All rights reserved.
Agenda
What is Big Data
Thriving in the Big Data era
Our Perspective – the Analytics Gap
1.1. Volume
1.2 Variety
1.3 Velocity
2.1 Problem #1 Data Prep time part of problem
2.2 Problem #2 Shortage of talent
2.3 Problem #3 Our working ways don’t help
3.1 Some definitions
3.2 What can data mining models tell us?
3.3 How can HPA help?
Questions
Copyright © 2012, SAS Institute Inc. All rights reserved.
What is BIG DATA
Big Data is RELATIVE not ABSOLUTE
When volume, velocity and variety of data exceeds an organization’s
storage or compute capacity for accurate and timely decision-making
Copyright © 2012, SAS Institute Inc. All rights reserved.
THRIVING IN THE
DATA ERA
Thriving
inBIGthe
BIG DATA era
VOLUME
DATA SIZE
VARIETY
VELOCITY
VALUE
TODAY
THE FUTURE
Copyright © 2012, SAS Institute Inc. All rights reserved.
OUR
PERSPECTIVE
THE ANALYTICS GAP
Most organizations:
Can’t generate the information they need.
Can’t generate information fast enough to act on it.
Continue to incur huge costs due to uninformed decisions
and misguided strategies.
The opportunities afforded by analytics have never been greater.
Copyright © 2012, SAS Institute Inc. All rights reserved.
Does this look familiar?
Data is a
corporate asset
yet org are not
leveraging the
asset like they
do labour &
capital assets
they normally
have.
Copyright © 2012, SAS Institute Inc. All rights reserved.
1.1 VOLUME
Data is no longer
in megabytes or
gigabytes
We’re talking
Petabytes
And that is 10 15
Copyright © 2012, SAS Institute Inc. All rights reserved.
Putting a Petabyte in perspective
• If the average MP3 encoding for mobile is around 1MB per minute,
and the average song lasts about 4 minutes, then a petabyte of
songs would last over 2,000 years playing continuously.
• If the average smartphone camera photo is 3MB in size and the
average printed photo is 8.5 inches wide, then the assembled
petabyte of photos placed side by side would be over 48,000 miles
long - almost long enough to wrap around the equator twice.
• 1 petabyte is enough to store the DNA of the entire population of
the US – and then clone them, twice.
Wes Biggs, chief technology officer at Adfonic
Copyright © 2012, SAS Institute Inc. All rights reserved.
Big data on social media
73% of online adults use
a social networking site
of some kind
684 million daily active
users on Facebook
500 million tweets per
day in 2013
Copyright © 2012, SAS Institute Inc. All rights reserved.
1.2 Variety – And this is a real life experience
The New
LinkedIN
Twitter
Instagram
Tumblr
Google+
Vine
Ooovoo
Ask.fm
Yik Yak
WhatsApp
Whisper
YIKES!
The Old
Print Media
Television
Radio
And it was only a 1-way
monologue
Copyright © 2012, SAS Institute Inc. All rights reserved.
1.3 Velocity. Big data is coming at high
velocity. Are you Ready ?
VELOCITY
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.1 Problem #1 Data Prep time part of problem
THE ANALYTICS LIFECYCLE
EVALUATE /
MONITOR
RESULTS
• Consumes up to 80% of the project
• Specific to the data and the analysis
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DEPLOY
MODEL
DATA
EXPLORATION
VALIDATE
MODEL
TRANSFORM
& SELECT
Data is the number one challenge in
the adoption or use of business
analytics.
Companies continue to struggle with
data accuracy, consistency, and even
access.
Bloomberg BusinessWeek Survey
2011
BUILD
MODEL
Copyright © 2012, SAS Institute Inc. All rights reserved.
A single electronic
medical record
(EMR) system from
one cancer center
showed lab results
for Albumin, a
protein measured in
cancer patients, in
over 30 ways.
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem #2 – Shortage of Talent
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem #2 – Shortage of Talent
Who is a data scientist?
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.3 Problem #3 – Our working ways don’t help
Copyright © 2012, SAS Institute Inc. All rights reserved.
2.2 Problem #2 – Our working ways
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.1 Some definitions
1. HPA is the ability to rapidly perform complex analysis on big data, enabling
you to solve problems that you thought were unsolvable. HP on the front of
a proc.
2. HPA Server - lifts data into memory. When it sees HP PROC it splits into
worker nodes to split up sorting data, summarizing data, and even the sort
it splits up to do the work parallely
3. SAS VA provides a drag and drop web interface to enable you to quickly
explore huge amounts of data.
4. Hadoop Think of it as an infinitely expandable filing cabinet
5. That has the ability to help you summarize
what is stored in it
5. SAS LASR Server - is part of HPAS(High performance analytic server). Its
role is to push data into Memory.
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.2 What questions should we be asking?
Data Mining Models
Which products are customers likely to buy?
Which workers are likely to quit/resign/be fired?
Text Models
What are people saying about my products and services? Can I detect emerging
issues from customer feedback or service claims?
Forecasting Models
How many products will be sold this year, next year?
How does this break down into each product over the next 3 months, 6 months?
Operations Research
What is the optimal inventory and stock to be held of each of the products to
minimize out of stock and overall holding costs?
What is the least cost route for transporting goods from warehouses to final
destinations? (PRESCRIPTIVE)
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.2 What can data mining models tell us?
Range penetration salary level compared to peers
Copyright © 2012, SAS Institute Inc. All rights reserved.
The value of harvesting big data in different industries
TELCO -cust satisfaction at a telco, wait time is imp, then I might take action
to put best customers head of the line. I can influence cust satisfaction by
understanding underlying factors & then taking action to influence
purchasing behaviour.
HEALTH -The next cure for cancer lies in big data. If we had a way to track,
monitor, store & retrieve cancer patients’ way of life, we would be able to
draw inferences to lead us to cure.
Copyright © 2012, SAS Institute Inc. All rights reserved.
example-HPA in unemployment statistics
Saskatchewan-5%
Alberta - 4.5%
Ontario - 7.9% Looks like labour doesn't move easily.
Copyright © 2012, SAS Institute Inc. All rights reserved.
HPA value another example
More labour economics, this time about your work. The Data Scientist
EMC Survey 65% of the respondents expect demand for data scientists to
outstrip availability over the next five years
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
3.3 How can HPA Help?
Copyright © 2012, SAS Institute Inc. All rights reserved.
Key Takeaways of working with big data using HPA
• Working with entire data no longer just a sample
• Leverage real time data access
Copyright © 2012, SAS Institute Inc. All rights reserved.
Thanks for attending
QUESTIONS???
BLOG http://blogs.sas.com/content/sastraining/author/charushankar/
LINKEDIN http://ca.linkedin.com/in/charushankar
TWITTER https://twitter.com/CharuSAS
EMAIL [email protected]
Charu Shankar, SAS institute Inc.
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
sas.com