Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIG DATA ANALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng AGENDA  Big Data Analytics and its Objectives  Financial Impact  Structured vs Unstructured Data  Users of Big Data  Relevant Technologies ( Hadoop, MongoDB)  Coding Examples  Future of Analytics WHAT IS BIG DATA AND WHY DOES IT MATTER?  Defining Big Data Analytics  Examining large sets of data  Discovering patterns and trends  Data warehouses are insufficient  Purposes  Uncovering hidden needs of customers  Improve operational efficiency BIG DATA & OPERATIONAL EFFICIENCY  “By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.” – IBM  Core Objectives  Gain  Analyze  Apply  Optimize FINANCIAL IMPACT OF BIG DATA  High cost of poor data quality  3.1 trillion to US government annually  10-25% of US business revenues  Opportunities for qualified analysts  Business Analyst: $66,000  Data Analyst: $60,000  Data Scientist: $113,000 DIMENSIONS OF BIG DATA  Essential Characteriestics: Volume - Data quantity Velocity - Data Speed  Variety - Data Types STRUCTURED VS. UNSTRUCTURED DATA Structured Data Unstructured Data • Represented as text • May be textual or non-textual • Transactional data, formal reports, accounting records of sales and costs • Mobile usage, click stream activity, social media responses, genomic data • Relational databases / data warehouse • No structured database / data lake • SQL • NoSQL (Not only SQL), SQL Batch Queries ILLUSTRATIVE EXAMPLE Inventory Analyst Insurance Actuary INTERPRETATIONS Structured Data Big Data Analytics Big Data Analytics Structured Data USERS OF BIG DATA  Device manufacturers, ERP providers, consulting firms comprise 7 of top 10 users Big Data  Based on a survey conducted by Dell of large corporations in 2014…  55% now follow Big Data strategy  60% of Big Data projects involve a cloud  32% involve real-time or near real-time processing  22% use data lake  20% of projects by outside consultants HADOOP  Free, Java-Based programming framework  Distributes storage and processes large data sets  Started from a Google File System paper published in October 2003  Development was furthered by Apache  Named after Doug Cutting’s son’s toy elephant (logo!) WHEN TO USE (AND NOT USE) HADOOP YES!  Analytics  Search  Data Retention  Log File processing  Analysis of Text, Image, Audio, and Video Content  Recommendation systems like in ECommerce Websites NO!  Low-latency or near real-time data access  Large number of small files to process  Multiple write scenarios requiring arbitrary writes between files WHO USES HADOOP? HADOOP FRAMEWORK  Hadoop Common: Contains all the libraries and utilities  Hadoop Distributed File System (HDFS): Storage with high bandwith  Hadoop YARN: Resource-management platform  Hadoop MapReduce: Programming Model  for data processing HDFS MAPREDUCE MAPREDUCE EXAMPLE MONGODB MONGODB = “THE DATABASE FOR GIANT IDEAS”  Cross-platform documentoriented database  Open-source  “The database for giant ideas”  Founded in 2007 written to  handle specific problems with DoubleClick  Classified as NoSQL database MONGODB EXAMPLE Also, we can practice! http://www.w3resource.com/mongodbexercises/#PracticeOnline THE FUTURE OF BIG DATA ANALYTICS ANY QUESTIONS?