Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Big Data What is Big Data? • https://www.youtube.com/watch?v=c4BwefH5Ve8 • Big Data Analytics: 11 Case Histories and Success Stories • https://www.youtube.com/watch?annotation_id=annotation_3535169775&f eature=iv&src_vid=c4BwefH5Ve8&v=t4wtzIuoY0w Big Data • Data Size: – Gigabyte – Terabyte: Terabyte USB – Petabyte: Wal-Mart handles more than 1m customer transactions every hour at more than 2.5 petabytes – Exabyte: the amount of traffic flowing over the internet about 700 exabytes annually – Zettabyte • Big Data: Some Facts • World’s information is doubling every two years • World generated 1.8 ZB of information in 2011 • Cisco predicts that by 2016 global IP traffic will reach 1.3 zettabytes • There will be 19 billion networked devices by 2016 • 70% of this data is being generated by individuals as opposed to enterprises & organizations Big Data Sources • • • • • • Web sites Social media Machine generated RFID Image, video, and audio Etc. Big Data Challenges • Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. • “3Vs": – Volume: Size >= 30-50 TBs – Velocity: Processing speed – Variety: • Structured: able to fit in a database table • unstructured data Do Companies care about Data? • Not really, What they care about are Key • Performance Indicators (KPIs) • Some examples of KPIs are – Revenue – Profit – Revenue per customer/employee – Customer Attrition: the loss of clients or customers • Big Data is only useful if it helps drive KPIs Big Data to KPIs Applications • Text mining: deriving high-quality information from text. – text categorization, text clustering, concept/entity extraction, sentiment analysis, etc. • Web mining: – Web usage mining – Web content mining • Social media mining – Salesforce Radian6 Social Marketing Cloud • http://www.youtube.com/watch?v=EH1dcFh_-I4 Hadoop HDFS: Hadoop Distributed File System • O"Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files.“ Hadoop: MapReduce • “rather than take the conventional step of moving data over a network to be processed by software, MapReduceuses a smarter approach tailor made for big data sets.” • “…rather than move the data to the software, MapReducemoves the processing software to the data.” (InfoWeek) NoSQL Database • NotOnlySQL is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. • They are useful when working with a huge quantity of data when the data's nature does not require a relational model. In-Memory Database • An in-memory database is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. • Main memory databases are faster than diskoptimized databases. • Good for Big Data analytics. • Use non-volatile memory module that retains data even when electrical power is removed. SAP HANA • High-Speed Analytical Appliance (HANA), uses a technique called sophisticated data compression to store data in the random access memory. HANA's performance is 10,000 times faster when compared to standard disks, which allows companies to analyze data in a matter of seconds instead of long hours. (Techopedia)