Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 ISQS 6339: Data Management & Business Intelligence Spring, 2017 Instructor: Zhangxi Lin Office: BA E311 Phone: (806) 834-1926 E-mail: [email protected] Homepage: http://zlin.ba.ttu.edu Class meetings: TR 2-4:50p, BA287 Office hours: TR 10:30a-12:30p, or by appointment 2 About me PhD, IS, UT Austin, 1999 (Joined TTU since then) MS, Economics, UT Austin, 1996 MEng, CS, Tsinghua University, Beijing, 1982 EE, Tongji University, Shanghai, 1978-1979 Hometown: Fuzhou, China 3 What is Business Intelligence A Simple Definition: The applications and technologies transforming Business Data into Action Business intelligence (BI) is a business management term refers to applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions. YouTube: What is BI? – B, 2’ Global warming 0’31” World Economy & Population 2’45” Microsoft Business Intelligence Surface Demo 6’34” ISQS 6339, Data Mgmt & BI 4 Data, information, and knowledge Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring. Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information. ISQS 6339, Data Mgmt & BI BI Problems 5 Structured Detecting Credit card fraud Setting Loan parameters Market segmentation/Mass customization Deciding Marketing mix Customer Churn Reducing employee turnover Improving Quality/Efficiency … Unstructured Data exploration Utilization of resources (stored knowledge) to maximum effectiveness … ISQS 6339, Data Mgmt & BI 6 BI Applications Customer Analytics Customer profiling Targeted marketing Personalization Collaborative filtering Customer satisfaction Customer lifetime value Customer loyalty Sales Channel Analytics Marketing Sales performance and pipeline ISQS 6339, Data Mgmt & BI 7 BI Applications (2) Supply Chain Analytics Supplier and vendor management Shipping Inventory control Distribution analysis Behavior Analysis Purchasing trends Web activity Fraud and abuse detection Customer attrition Social network analysis ISQS 6339, Data Mgmt & BI Business Intelligence Evolution Stream Analytics* Real-time, continuous, sequential analysis (ranging from basic to advanced analytics) 3rd-Generation BI Advanced Analytics/Optimization Rules Predictive Analytics Real-time and traditional Data Mining “New Traditional” Analytics “2.5-Gen” Analytics (In-Memory OLAP, Search-Based) Source: Bill O’Connell IBM, Aug 2007 8 Traditional Analytics 1st Generation Analytics (Query & Reporting) 2nd Generation Analytics (OLAP, Data Warehousing) ISQS 6339, Data Mgmt & BI Legacy BI 9 Driving Force - Big Data A collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization. Videos What is big data 1’33” Big Data Analytics 3’05” Artificial intelligence & big data 1’54” Copyright 2012 8/14/20 12 10 ISQS73 39, Fall 2012 11 Data Scale 12 13 14 Big Data Companies IBM Oracle Facebook LinkedIn Cloudera (Hortonworks) Yahoo Amazon Google AirBNB Uber an online marketplace and hospitality service, enabling people to lease or rent short-term lodging including vacation rentals, apartment rentals, homestays, hostel beds, or hotel rooms. a transportation network company headquartered in San Francisco, California, operating in 528 cities worldwide. Palantir a private American software and services company headquartered in Palo Alto, California which specializes in big data analysis. In January 2015, the company was valued at US$15 billion. This valuation rose to US$20.33 billion in late 2015 as the company closed an $880 million round of funding. 15 Cloud Computing Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation. Buzzword: SaaS/IaaS/PaaS ISQS 6339, Data Mgmt & BI Cloud versus cloud Amazon Elastic Compute Cloud Google App Engine Microsoft Azure GoGrid AppNexus 17 Case Study: Alibaba A privately owned E-Commerce company, started in 1999, covering B2B, B2C (Tmall), C2C (Taobao), ePayment (Alipay, 49% market share), financing (AliFinance), and data-centric cloud computing services. Facts: One of the 20 most-visited websites globally. Account for over 60% of the parcels delivered in China. In 2012, handled 1.1 trillion yuan ($170 billion) in sales, more than competitors eBay and Amazon combined. Recent events IPO at NASDAQ with Market cap $200 billion Became the second largest e-commerce company in the world. Z. Lin, ISQS Colloquium 201402-28 18 Apache Hadoop An open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. The Apache Hadoop framework is composed of the following modules : Hadoop Common - contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS). Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce - a programming model for large scale data processing. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers. ISQS 6339, Data Mgmt & BI 19 ISQS 6339, Data Mgmt & BI 20 Hadoop 2: Big data's big leap forward The new Hadoop is the Apache Foundation's attempt to create a whole new general framework for the way big data can be stored, mined, and processed. The biggest constraint on scale has been Hadoop’s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck. Hadoop 2 uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node. ISQS 6339, Data Mgmt & BI MapReduce 2.0 – YARN 21 (Yet Another Resource Negotiator) ISQS 6339, Data Mgmt & BI 22 Data Center ISQS 6339, Data Mgmt & BI 23 Big Data Analytics To understand the nature of a complex system from huge amount of data, which are observations of the system. What is the core of big data technology? Data abstraction for knowledge creation Data mining – knowledge discovery by computer Visualization – Human-computer Interactive knowledge discovery To do the above we need to Collect data Management data – database and data warehousing Z. Lin, TTU/SWUFE 01/06/2 015 What is Data Mining? 24 Many Definitions Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns. (Berry and Linoff, 1997, 2000) Data Mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amount of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. (Gartner Group, 2004) Data analytics 2’37” Visual data mining 4’32” ISQS 6347, Data & Text Mining 25 Origins of Data Mining Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems Traditional Techniques may be unsuitable due to Statistics/ Machine AI Learning/ High dimensionality Pattern of data Recognition Data Mining Heterogeneous, Enormity of data distributed nature of data Database systems ISQS 6347, Data & Text Mining Why Mine Data? Commercial Viewpoint 26 Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Computers have become cheaper and more powerful Competitive Pressure is Strong Provide better, customized services for an edge (e.g. in Customer Relationship Management) ISQS 6347, Data & Text Mining 27 ISQS 6339 Course Description Data management comprises all the disciplines related to managing data as a valuable resource. Business intelligence (BI) is referred to as applications and technologies which are used to gather, provide access to, and analyze data and information about their company operations. Three main topics Data warehousing Introductory data mining Big data and its trends 28 Syllabus Textbook and references Deliverables: projects, exercises Exams Grading policy Schedule 29 Application Tools Microsoft SQL Server 2016 SAS Enterprise Guide SAS Enterprise Miner Hadoop – Horton Works or CDH 30 Big Data Tools Pentaho: A company that offers Pentaho Business Analytics, a suite of open source Business Intelligence (BI) products, founded in 2004 by five founders, headquartered in Orlando, FL, USA, acquired by Hitachi in 2015 (https://en.wikipedia.org/wiki/Pentaho) Pentaho Data Integration (PDI) Pentaho for Big Data Pentaho Data Mining Tableau Software founded in Mountain View, California in January, 2003 by Chris Stolte, Christian Chabot and Pat Hanrahan, headquartered in Seattle, Washington. It produces a family of interactive data visualization products focused on business intelligence. 31 Your checklist Website Class home page, Schedule, online Notes Shared network drive \\TechShare\coba\d\ISQS3358\ \\TechShare\coba\d\ISQS6347\ Downloadable materials E-Textbooks Datasets Homework assignments Slides Exercises Demonstrative Videos 32 CAABI Center for Advanced Analytics and Business Intelligence initially started in 2004 by Dr. Peter Westfall, ISQS, Rawls College of Business. Looking to offer support to companies in developing BI capabilities. Lots of technical expertise. 32 ISQS 6339, Data Mgmt & BI 33 Your opportunities to contact BI industry SAS Analytics 2017 Conference, September 18-20, Washington DC. Check https://www.sas.com/en_us/events/analytic s-conference.html SAS Global Forum 2017 April 2 - 5, Orlando, FL https://www.sas.com/en_us/events/sasglobal-forum/sas-global-forum-2017.html Posters in SAS M2008 35 36