Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Outline Background Lessons and challenges presented Business-level Technical-level (by data mining lifecycle stages) Data collection Data warehouse construction Business intelligence Deployment 2 Background Blue Martini Software From beginning, significant consideration was given to data transformation and analysis needs Lessons from 1999-2003 More than 20 clients Durations from a few person-weeks to several person-months Some are available as case studies Sources of data Customer registration and demographic information, web click streams, response to DM and email campaigns, orders places through a website, call center, or in-store POS systems A few thousands records to more than 100 million records Collected from a few months to several years 3 Business lessons By data mining lifecycle stages Requirement gathering Data collection Data warehouse construction Business intelligence Deployment 4 Requirement gathering lessons Clients are often reluctant to list specific business questions Whet the clients’ appetite by presenting preliminary findings Push clients to ask characterization and strategic questions “What is the distribution of males/females among those spending more than $500?” “What characterize people who spend more than $500” Challenges: developing methodology and best practices to help business people define appropriate questions 5 Data collection The system transparently collects Every search and the number of results returned Shopping carts events Important events such as registration, initiation of checkout, and order confirmation Any form field failure Use’s local time zone, data for robot detection, color depth, screen resolution 6 Data collection lessons Collect the right data, up front Integrate external events 7 Data warehouse construction Lessons Automatic generation of Decision Support System database is appreciated Challenges Firewalls Integration 8 Business intelligence lessons Expect the operational channels to be higher priority than decision support Crawl, walk, run Start from basic reporting Train data analyst Tell people the time, not how to build clocks Define the terminology Writing a good glossary and sharing the terms across reports is important 9 Business intelligence challenges Make it easier to map business questions to data transformations Automate feature construction Build comprehensible models Experiment because correlation does not imply causality Explain counter-intuitive insights Assess the ROI (return on investment) of insights 10 Deployment Lessons Share insights Take action Challenges Have transformed data available for scoring 11 Technical details (1) data definition, collection, and preparation Data collection and management Data cleansing Data processing 12 Data collection and management Lessons Collect data at the right abstraction levels Design forms with data mining in mind Validate forms to ease data cleansing and analysis Determine thresholds based on careful data analysis Example: session timeout 13 Data collection and management Challenges Sample at collection Support slowly changing dimensions Perform data warehouse updates effectively 14 Data cleansing Lessons Audit the data 15 Data cleansing Challenges Detect bots Between 5% to 40% of visits are due to bots Perform regular de-duping of customers and accounts many-to-many relationship 16 Data processing Lessons Support hierarchical attributes Handle cyclical attributes Support rich data transformations 17 Data processing Challenges Support hierarchical supports Handle “unknown” and “not applicable” attribute values NULL 18 Technical details (2) - Analysis Understanding and enriching the data Building models and identifying insights Deploying models, acting upon the insights, and closing the loop Empowering business users to conduct their own analysis 19 Understanding and enriching the data Lessons Statistics Distributions, min, max, mean, number of NULL and non-NULL Weighted average Visualization Line chart, bar chart, scatter plot, heatmap, filter chart 20 Building models and identifying insights Lessons Mine data at the right granularity levels Handle leaks in predictive models Leaks are attributes highly correlated with the target but not useful in practice as good predictors Improve scalability Build simple models first Use data mining suites Peel the oinion and validate results 21 Sharing insights, deploying models, and closing the loop Lessons Represent models visually for better insights Understand the importance of the deployment context Creating actionable models and closing the gap 22 Empowering business users to conduct their own analysis Lessons Share the results among business users via simple, easy to understand reports Provide canned reports that can be run by business users by simply specifying values for a few parameters Technically savvy business users might be comfortable designing their own investigations provided a simple user interface 23 Empowering business users to conduct their own analysis Challenges Visualize models Prune rules and associations Analyze and measure long-term impact of changes 24 Summary Top three lessons Integrate data collection into operations to support analytics and experimentation Do not confuse yourself with the target user Provide simple reports and visualizations before building more complex models Top three challenges The ability to translate business questions to the desired data transformations Efficient algorithms whose output is comprehensible for business insight, and which can handle multiple data types Integrated workflow 25