Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
"Analytic Tasks from Business perspective" ISI CODATA International training Workshop on Big Data 18 March,2015 DRTC,ISI Bangalore By K.K.CHOWDHURY SQC & OR Division, ISI,Bangalore 1 NEEDS OF THE BUSINESS • Organization is in the business to ensure sustainable growth in profit. • Financial results does not mean the profit alone. • Organization looks for • Quantum of money- Revenue • Quality of money- Margin/ Savings/ Profit • Speed of money- Liquidity/ Cash flow • Ease of Money- Ease of doing business • First three are termed as Hard aspects of Businesssometimes spelt out as `looking for hard savings- i.e. short-term financial results’. 2 NEEDS OF THE BUSINESS • The fourth one calls for Soft gain. Customers, Employees, Management, Suppliers should feel comfortable to do business or work with the Organization. Soft aspect is very important for any Organization. • `Soft’ aspect only brings the ability of an Organization to address overall sensitivity to all the stakeholders. • Anybody who is impacted by the products and processes of an Organization is called stakeholder. 3 NEEDS OF THE BUSINESS • Customer is somebody who can be defined as ‘Ultimate recipient of the products or services for their respective use’. • Addressing sensitivity has three important issues: o Customer has to be satisfied. o Governments (Legal requirements) has to be at least complied with. o We need to be sensitive to all other stakeholders. Sensitivity need not mean satisfaction. Sensitivity may mean transparency on values/ policies/ practices/ processes/ requirements/ attitude or behavior etc. 4 NEEDS OF THE BUSINESS • Stakeholders support the Organization to ensure long-term survival. Any gain in sensitivity to the stakeholders would keep the Organization better focused. Hence the real meaning of the business is o Business = Hard (Finance) + Soft (Sensitivity to the stakeholders 5 STRATEGY TO IMPROVE PROFIT AND PROFITABILITY • Existing Resources: More output from less resources -----> More output from less investment------>Reducing opportunities for waste------>Cash Flow • Existing Resources: More output from same resources ----->Reducing Defects ---->Problem Solving------>Margins 6 STRATEGY TO IMPROVE PROFIT AND PROFITABILITY • More Resources: More output from more resources/investments -----> Expand Capacity ----->Prevent Defects from the beginning ------> Revenue • More Resources: More output from more resources/investments ----->Increase Market Share ----->New Product------> Prevent Defects from the beginning ------>Revenue 7 What is Big Data Everyday, we create 2.5 quintillion (10^18)bytes of data-so much that 90% of the data in the world today has been created in last two years alone. These data come from everywhere ( to name a few): • • • • • Sensors used to gather climate information Posts to social media site Digital pictures and videos Purchase transaction records Cell phone GPS signals”etc. THIS DATA IS “BIG DATA” 8 4 V’s OF BIG DATA • • • • High Volume High Velocity Large Variety Poor Veracity 9 BIG DATA: Volume • Enterprise are acquiring very large volume of data through variety of sources • Some examples of use: - Sentiment Analysis - Twitter Data –Terabytes of Tweets are created each day which can be used for improved product sentiment analysis - Predict Power Consumption-Convert billions of annual meter readings into better predict power consumption say every hour per minute 10 BIG DATA: Velocity • For the time sensitive processes such as catching fraud, preventing accidents, giving life saving medication etc. Big data must be used as it streams into an enterprise in order to maximize its value • Some examples of use: -Scrutinize millions of credit card transactions each day to identify potential fraud -Analyze billions of daily call detail records in real time to predict customer churn faster -In ICU,analyze blood chemistry/ECG readings in real time to deliver life saving medication 11 BIG DATA: Variety • Big data can be of any type-structured and unstructured data such as text, sensor data,audio,video,click streams,logfiles and more. New insights are found when analysing these data types together. • Some examples of use: -Monitor live video feeds from surveillance cameras to identify potential threats - Utilize image,audio,video and web information about a customer to give better product usage trainings, safety tips and recommendations. 12 BIG DATA: Veracity • Accuracy is a big concern in Big Data. There is no easy way to segregate good data from bad. • Some concerns: -Among thousands of reviews of hotels which ones are authentic and which ones are not -How to find out the truth from thousands of product reviews - How to identify a rumour from a informed communication 13 Data and Extraction of Information - Current Scenario • • • • • • • • The growth of data availability is mind-boggling. According to Intel the quantity of information generated from dawn of human history till 2003 – some 5 exabytes – is now created every two days Data processing and storage costs have decreased by a factor of 1000 over the past decade Technologies like Hadoop and MapReduce eliminates the need to structure the data in rigidly defined formats – a costly, labour-intensive proposition Powerful techniques for analyzing data to extract various insights have been developed and software are available to enable easy implementation Advanced statistical, optimization, machine-learning and data-mining techniques enable extraction of hitherto unavailable insights At present technology allows us to keep a lot of data on phenomenon as well as individual entities It should be possible for us to learn a lot about phenomenon and entities from these data and these knowledge may be used for improved decision making Ability to store the right data in appropriate structure and extract meaningful information from the same is, therefore, becoming crucial for business success 14 Some Examples • • • • • • • • • An automobile manufacturer wants to understand how the fault and failure related data captured through the sensors may be used to classify the condition of vehicles so that preventive maintenance may be carried out optimally. Similar situations are applicable to large manufacturers having many machines, e.g. miners, aircraft manufacturers. Insurers may wish to classify drivers as very risky, risky, safe etc. on the basis of their driving habits so that insurance premium may be fixed intelligently A company engaged in oil exploration may need to estimate the time and expenses of drilling under different geological conditions before taking up a drilling assignment A company in any segment may wish to forecast the total demand based on past demands as well as past and current economic conditions Manufacturers of consumer electronics may need to understand the sentiment of people communicating over social media about their products A large retailer may like to understand the impact of a natural disaster like a hurricane on purchase behaviour An e-commerce company may want to know the impact of making changes in the portal or sales policy on the quantum of sales Credit card as well as health insurance companies may wish to identify fraudulent transactions so that appropriate actions may be initiated A retailer may like to suggest additional products a customer may be willing to buy on the basis of the current as well as past surfing data 15 Definition of Business Analytics (BA) 1. 2. 3. 4. BA is the science and the art of improving business functions using data and analytical techniques It is a science since it uses theories of probability, statistics, data mining techniques and a well defined process It is an art because, like a brilliant painter, the analyst has to draw from a diverse pallets of colours (data sources) to find the perfect mix that will yield actionable results It is also an art as the analyst must have a deep level of creativity and business understanding to be able to clearly identify the problem, understand the implementation challenges and effectively communicate the proposed solution. As the saying goes – in business analytics problems will often have to be taken rather than being given Note: The solution of analytics problems will often come in the following forms • Insights that may be acted upon (or stop unnecessary actions) • Models or solutions that may be used to improve effectiveness of business functions • Automatic solutions embedded in software systems 16 Components of Business Analytics Data acquisition, engineering and processing (mostly compilation) Feedback to implementation business and Understanding the organizational structure and skill required for effective implementation, identification of important business problems, understanding how information are used and created by line managers (understanding of cognitive and behavioral sciences), setting up measurement systems to assess success, linking business analytics to the strategy of the firm Operational data bases, data warehouses, online processing and mining, enterprise information management systems, data acquisition and cleaning, big data technologies like Hadoop & MapReduce Application of statistical and data mining tools on specific business problems Formulation of the business problem in statistical terms, breaking down a problem into a set of canonical analytic tasks, identification of statistical / data mining tools, verification of assumptions, avoiding traps like selection bias, processing and interpreting data, model validation, presentation of quantitative solution, design of data collection plans, analysis of data collected on campaign basis (e.g. surveys) Identification of types and classes of problems (horizontals and verticals) 17 Business Analytics Process Problem Statement Problem Formulation Business Understanding Operational Databases Deployment Data Understanding Data Preparation Data Repository Model Building and Validation 18 Types of Analytics Problems • Analytics problems may be classified from two perspectives, namely – Method of analyses – Type of business problem 19 Two Broad Types from Methods Perspective • Supervised learning – Understanding the behaviour of a target (response / dependent / Y) variable as a set of inputs (independent / explanatory / X) vary – Typically attempts are made to develop a function to estimate the target – These methods are often called dependency analyses • Unsupervised learning – Discovering associations and patterns among a set of input measures. After patterns are found, the analyst is responsible for finding how to interpret and use them. – These analyses do not attempt to estimate some Y on the basis of X variables. Rather, attempts are made to understand relationships / patterns of X variables. – These methods are often called inter-dependency analyses 20 Examples of Supervised Learning • Predict whether a patient, hospitalized due to a heart attack, will have a second heart attack. The prediction is to be based on demographic, diet and clinical measurements for that patient. • Predict the price of a stock in 6 months from now, on the basis of company performance measures and economic data. • Predict whether a particular credit card transaction is fraudulent. The prediction is to be based on past transaction history, transaction type, reputation of the merchants involved and other similar variables • Identify the impact of different variables like price, relative brand position, general economic condition, level of competition, and product type (luxury / necessity…) on the demand of a particular product during a given period 21 Examples of Unsupervised Analytics • • • • Find typical profile of employees who quit quickly Find products that are usually sold together Group cities with respect to their characteristics Develop a scale to measure brand position 22 Analytic Tasks from Business Perspective 1. 2. 3. 4. 5. 6. 7. 8. 9. Hypotheses testing Classification and class probability estimation Value estimation, explanatory and causal models Discovering dimensions, and construction and validation of measures Profiling – understanding behavioural pattern of individual entities Associations and co-occurrence grouping Exploration of phenomenon and understanding trends Link prediction Constrained optimizations (primarily LP, its variants and network optimization) Most business problems can be solved using a combination of these tasks. As an analyst one should be in a position to break a problem in terms of these tasks. 23 Hypotheses Testing • Hypotheses are statements about a given phenomenon, e.g. increasing number of years of education increases earning potential; design A produces a lower defect rate compared to design B; a particular design of a web page leads to more conversion compared to another • Hypotheses testing consists of determining the plausibility of the statements on the basis of data 24 Classification and Class Probability Estimation • There are situations where the target is classified, e.g. whether a particular credit card transaction is fraudulent or not; whether a customer will renew her contract or not; whether a sales bid will be won, lost or abandoned by the customer; how to classify a loan application as low, high or medium risk • The problem is to allocate the target variable to one of the classes based on the value of some explanatory variable(s) • In most cases the probability that the target will belong to different classes is first estimated. An allocation to a particular class is made on the basis of the estimated probabilities 25 Value Estimation • Some business problems require estimating the value of a target variable rather than classifying the same. • Some examples of value estimation are – finding the lifetime value of a customer; estimating the effort required to complete a software development project; finding the total number of cheques that may arrive for processing… • The value needs to be estimated based on certain explanatory variables and hence this task comes under the broad class of supervised analytics (dependency analyses) 26 Discovering Dimensions and Constructing Scales • Some business problems require understanding unobserved variables, e.g. it may be important to identify the different dimensions of customer satisfaction or skill so that the same can be measured. • Developing dimensions of unobserved variables in terms of observable variables to facilitate its measurement is called scale construction • Businesses may also need to group a large number of variables being measured into a small number of dimensions. This enables reducing the dimension of a problem. – A retail store may try to discover the different dimensions of store appearance and performance on the basis of a large number of measurements. Controlling a small set of dimensions may be easier than controlling a large number of variables – Construction of indices like corruption index or performance measures require discovering the different dimensions of these unobserved variables • Note: The unobserved (unobservable) variables are often called latent variables or constructs 27 Profiling – Understanding Behavioural Pattern • This is an example of unsupervised learning • Profiling is often referred to as behaviour description. • Examples of profiling questions are – What are the usual characteristics of persons buying this brand of automobile – Can we describe the credit card spending pattern of a particular customer – What is the typical cell phone usage for this customer segment 28 Association and (Co-Occurrence) Grouping • Some business analytics tasks requires grouping a set of entities such that actions may be initiated for the entire group. Some examples are – Customers may be grouped with respect to payment behaviour such that finance team may have different collection strategies and targets for the different groups – A company wants to carry out test marketing in large number of tier-II cities but faces a budget constraint. In order to overcome the budget constraint, the company groups cities into a small number of clusters and test markets in one randomly selected city from each group. It is assumed that the cities within each cluster are likely to have similar behaviour. – Employees of an organization may be grouped with respect to a number of behavioural traits. The performance as well as propensity to leave the company may be similar within groups while between group differences may be large. – Identification of simultaneous occurrences like which products are bought together frequently; which events take place together (a behaviour of the customer in the beginning of a project may indicate a behaviour later) 29 Link Prediction • This analytic task attempts to predict connections between data items usually by suggesting that a link should exist between the items and also estimating the strength of the link. • Link prediction is common in social networking system – since you and Amitava share 10 friends, maybe you would like to be Amitava’a friend • Link prediction can attempts to measure the strength of the links and use the same for recommendation systems, e.g. for recommendation of movies, books or other product purchase 30 Phenomenon and Trend Understanding • There are situations when we try to understand how a system will behave by changing the inputs. Examples – Variation of customer waiting time (at a bank, call center, garage…) as the arrival, service time and number of servers change – Stock-out conditions depending on variation of lead time to purchase, demand, market conditions leading to changes of demand – Default risks based upon change of economic conditions Notes 1. These tasks are same as classification or value estimation. However, in some cases the situations become so complex that developing models become extremely difficult. In certain cases closed form models may not even exist. In these cases the phenomenon understanding / classification / value estimation is carried out using a technique called simulation. 31 Optimization • This is an important class of problems where we try to maximize or minimize an objective function subject to a number of constraints – Planning the transportation system within a city such that utilization is maximum – Allocating jobs to different machines so that the waiting time is minimized 32 Relationship Between Fundamental Tasks and Techniques Sl. No. Fundamental Task Statistical / Data Mining Techniques 1 Phenomenon Understanding Descriptive Statistics , EDA, hypothesis testing, graphical analysis and data visualization, contingency tables 2 Classification Logistic Regression, Discriminant Analysis, Decision Trees – CART and CHAID, ANN, Support Vector Machine 3 Value Estimation Table Lookup, Naive Bayesian, Nearest Neighbor, Regression models – MLR and its variants including shrinkage methods, Count Regression and zero inflated models, Cox Regression, Survival Analysis, different non-parametric methods, ANN 4 Profiling Distributional and descriptive analyses, clustering 5 Association and Grouping Correlation and similarity analyses, MTS, Cluster analysis, multi dimensional scaling, market basket analysis / association rule mining 6 Link Prediction Graph Theory and graph traversal rules 7 Phenomenon Exploration and Trend Understanding Time series analyses, ARIMA models, Other forecasting models, Time Series Regression, Simulation 8 Discovering Dimensions Principal Component Analysis, Factor analysis, SEM, Cronbach’s Alpha 9 Optimization Linear Programming and its variants, Genetic Algorithms and Swarm Intelligence 33 Summary • Business analytics problems may be looked at from two perspectives – methods and business problems • Statistical learning problems may be divided into two broad classes from methods perspective – supervised and unsupervised • Supervised learning consists of building model to establish relationship between a dependent variable (target) Y and a set of input (explanatory) variables. Unsupervised learning finds patterns of association of the inputs • Business problems may be divided into nine major classes. Real life analytics problems are often combination of these problems • An expert in Business Analytics must know the different techniques of supervised and unsupervised learning. At the same time s/he should be in a position to construct a business problems in terms of the fundamental tasks. S/he should be aware of the techniques of supervised / unsupervised learning typically used to address the different fundamental tasks 34