Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
High Performance Analytics and the Challenges of Big Data Toronto Area SAS Users Group 12 Dec 2014 Charu Shankar, SAS Technical Training Specialist Copyright © 2012, SAS Institute Inc. All rights reserved. Agenda What is Big Data Thriving in the Big Data era Our Perspective – the Analytics Gap 1.1. Volume 1.2 Variety 1.3 Velocity 2.1 Problem #1 Data Prep time part of problem 2.2 Problem #2 Shortage of talent 2.3 Problem #3 Our working ways don’t help 3.1 Some definitions 3.2 What can data mining models tell us? 3.3 How can HPA help? Questions Copyright © 2012, SAS Institute Inc. All rights reserved. What is BIG DATA Big Data is RELATIVE not ABSOLUTE When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making Copyright © 2012, SAS Institute Inc. All rights reserved. THRIVING IN THE DATA ERA Thriving inBIGthe BIG DATA era VOLUME DATA SIZE VARIETY VELOCITY VALUE TODAY THE FUTURE Copyright © 2012, SAS Institute Inc. All rights reserved. OUR PERSPECTIVE THE ANALYTICS GAP Most organizations: Can’t generate the information they need. Can’t generate information fast enough to act on it. Continue to incur huge costs due to uninformed decisions and misguided strategies. The opportunities afforded by analytics have never been greater. Copyright © 2012, SAS Institute Inc. All rights reserved. Does this look familiar? Data is a corporate asset yet org are not leveraging the asset like they do labour & capital assets they normally have. Copyright © 2012, SAS Institute Inc. All rights reserved. 1.1 VOLUME Data is no longer in megabytes or gigabytes We’re talking Petabytes And that is 10 15 Copyright © 2012, SAS Institute Inc. All rights reserved. Putting a Petabyte in perspective • If the average MP3 encoding for mobile is around 1MB per minute, and the average song lasts about 4 minutes, then a petabyte of songs would last over 2,000 years playing continuously. • If the average smartphone camera photo is 3MB in size and the average printed photo is 8.5 inches wide, then the assembled petabyte of photos placed side by side would be over 48,000 miles long - almost long enough to wrap around the equator twice. • 1 petabyte is enough to store the DNA of the entire population of the US – and then clone them, twice. Wes Biggs, chief technology officer at Adfonic Copyright © 2012, SAS Institute Inc. All rights reserved. Big data on social media 73% of online adults use a social networking site of some kind 684 million daily active users on Facebook 500 million tweets per day in 2013 Copyright © 2012, SAS Institute Inc. All rights reserved. 1.2 Variety – And this is a real life experience The New LinkedIN Twitter Instagram Tumblr Google+ Vine Ooovoo Ask.fm Yik Yak WhatsApp Whisper YIKES! The Old Print Media Television Radio And it was only a 1-way monologue Copyright © 2012, SAS Institute Inc. All rights reserved. 1.3 Velocity. Big data is coming at high velocity. Are you Ready ? VELOCITY Copyright © 2012, SAS Institute Inc. All rights reserved. 2.1 Problem #1 Data Prep time part of problem THE ANALYTICS LIFECYCLE EVALUATE / MONITOR RESULTS • Consumes up to 80% of the project • Specific to the data and the analysis IDENTIFY / FORMULATE PROBLEM DATA PREPARATION DEPLOY MODEL DATA EXPLORATION VALIDATE MODEL TRANSFORM & SELECT Data is the number one challenge in the adoption or use of business analytics. Companies continue to struggle with data accuracy, consistency, and even access. Bloomberg BusinessWeek Survey 2011 BUILD MODEL Copyright © 2012, SAS Institute Inc. All rights reserved. A single electronic medical record (EMR) system from one cancer center showed lab results for Albumin, a protein measured in cancer patients, in over 30 ways. Copyright © 2012, SAS Institute Inc. All rights reserved. 2.2 Problem #2 – Shortage of Talent Copyright © 2012, SAS Institute Inc. All rights reserved. 2.2 Problem #2 – Shortage of Talent Who is a data scientist? Copyright © 2012, SAS Institute Inc. All rights reserved. 2.3 Problem #3 – Our working ways don’t help Copyright © 2012, SAS Institute Inc. All rights reserved. 2.2 Problem #2 – Our working ways Copyright © 2012, SAS Institute Inc. All rights reserved. 3.1 Some definitions 1. HPA is the ability to rapidly perform complex analysis on big data, enabling you to solve problems that you thought were unsolvable. HP on the front of a proc. 2. HPA Server - lifts data into memory. When it sees HP PROC it splits into worker nodes to split up sorting data, summarizing data, and even the sort it splits up to do the work parallely 3. SAS VA provides a drag and drop web interface to enable you to quickly explore huge amounts of data. 4. Hadoop Think of it as an infinitely expandable filing cabinet 5. That has the ability to help you summarize what is stored in it 5. SAS LASR Server - is part of HPAS(High performance analytic server). Its role is to push data into Memory. Copyright © 2012, SAS Institute Inc. All rights reserved. 3.2 What questions should we be asking? Data Mining Models Which products are customers likely to buy? Which workers are likely to quit/resign/be fired? Text Models What are people saying about my products and services? Can I detect emerging issues from customer feedback or service claims? Forecasting Models How many products will be sold this year, next year? How does this break down into each product over the next 3 months, 6 months? Operations Research What is the optimal inventory and stock to be held of each of the products to minimize out of stock and overall holding costs? What is the least cost route for transporting goods from warehouses to final destinations? (PRESCRIPTIVE) Copyright © 2012, SAS Institute Inc. All rights reserved. 3.2 What can data mining models tell us? Range penetration salary level compared to peers Copyright © 2012, SAS Institute Inc. All rights reserved. The value of harvesting big data in different industries TELCO -cust satisfaction at a telco, wait time is imp, then I might take action to put best customers head of the line. I can influence cust satisfaction by understanding underlying factors & then taking action to influence purchasing behaviour. HEALTH -The next cure for cancer lies in big data. If we had a way to track, monitor, store & retrieve cancer patients’ way of life, we would be able to draw inferences to lead us to cure. Copyright © 2012, SAS Institute Inc. All rights reserved. example-HPA in unemployment statistics Saskatchewan-5% Alberta - 4.5% Ontario - 7.9% Looks like labour doesn't move easily. Copyright © 2012, SAS Institute Inc. All rights reserved. HPA value another example More labour economics, this time about your work. The Data Scientist EMC Survey 65% of the respondents expect demand for data scientists to outstrip availability over the next five years Copyright © 2012, SAS Institute Inc. All rights reserved. 3.3 How can HPA Help? Copyright © 2012, SAS Institute Inc. All rights reserved. 3.3 How can HPA Help? Copyright © 2012, SAS Institute Inc. All rights reserved. 3.3 How can HPA Help? Copyright © 2012, SAS Institute Inc. All rights reserved. 3.3 How can HPA Help? Copyright © 2012, SAS Institute Inc. All rights reserved. 3.3 How can HPA Help? Copyright © 2012, SAS Institute Inc. All rights reserved. 3.3 How can HPA Help? Copyright © 2012, SAS Institute Inc. All rights reserved. Key Takeaways of working with big data using HPA • Working with entire data no longer just a sample • Leverage real time data access Copyright © 2012, SAS Institute Inc. All rights reserved. Thanks for attending QUESTIONS??? BLOG http://blogs.sas.com/content/sastraining/author/charushankar/ LINKEDIN http://ca.linkedin.com/in/charushankar TWITTER https://twitter.com/CharuSAS EMAIL [email protected] Charu Shankar, SAS institute Inc. C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . sas.com