Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIG DATA ANALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng AGENDA Big Data Analytics and its Objectives Financial Impact Structured vs Unstructured Data Users of Big Data Relevant Technologies ( Hadoop, MongoDB) Coding Examples Future of Analytics WHAT IS BIG DATA AND WHY DOES IT MATTER? Defining Big Data Analytics Examining large sets of data Discovering patterns and trends Data warehouses are insufficient Purposes Uncovering hidden needs of customers Improve operational efficiency BIG DATA & OPERATIONAL EFFICIENCY “By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.” – IBM Core Objectives Gain Analyze Apply Optimize FINANCIAL IMPACT OF BIG DATA High cost of poor data quality 3.1 trillion to US government annually 10-25% of US business revenues Opportunities for qualified analysts Business Analyst: $66,000 Data Analyst: $60,000 Data Scientist: $113,000 DIMENSIONS OF BIG DATA Essential Characteriestics: Volume - Data quantity Velocity - Data Speed Variety - Data Types STRUCTURED VS. UNSTRUCTURED DATA Structured Data Unstructured Data • Represented as text • May be textual or non-textual • Transactional data, formal reports, accounting records of sales and costs • Mobile usage, click stream activity, social media responses, genomic data • Relational databases / data warehouse • No structured database / data lake • SQL • NoSQL (Not only SQL), SQL Batch Queries ILLUSTRATIVE EXAMPLE Inventory Analyst Insurance Actuary INTERPRETATIONS Structured Data Big Data Analytics Big Data Analytics Structured Data USERS OF BIG DATA Device manufacturers, ERP providers, consulting firms comprise 7 of top 10 users Big Data Based on a survey conducted by Dell of large corporations in 2014… 55% now follow Big Data strategy 60% of Big Data projects involve a cloud 32% involve real-time or near real-time processing 22% use data lake 20% of projects by outside consultants HADOOP Free, Java-Based programming framework Distributes storage and processes large data sets Started from a Google File System paper published in October 2003 Development was furthered by Apache Named after Doug Cutting’s son’s toy elephant (logo!) WHEN TO USE (AND NOT USE) HADOOP YES! Analytics Search Data Retention Log File processing Analysis of Text, Image, Audio, and Video Content Recommendation systems like in ECommerce Websites NO! Low-latency or near real-time data access Large number of small files to process Multiple write scenarios requiring arbitrary writes between files WHO USES HADOOP? HADOOP FRAMEWORK Hadoop Common: Contains all the libraries and utilities Hadoop Distributed File System (HDFS): Storage with high bandwith Hadoop YARN: Resource-management platform Hadoop MapReduce: Programming Model for data processing HDFS MAPREDUCE MAPREDUCE EXAMPLE MONGODB MONGODB = “THE DATABASE FOR GIANT IDEAS” Cross-platform documentoriented database Open-source “The database for giant ideas” Founded in 2007 written to handle specific problems with DoubleClick Classified as NoSQL database MONGODB EXAMPLE Also, we can practice! http://www.w3resource.com/mongodbexercises/#PracticeOnline THE FUTURE OF BIG DATA ANALYTICS ANY QUESTIONS?