Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Integrating E-Commerce and Data Mining: Architecture and Challenges WEB-KDD Workshop August, 2000 Llew Mason [email protected] Joint work with Suhail Ansari, Ron Kohavi, Zijian Zheng Blue Martini Software © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 1 2 Outline E-Commerce: A Killer Domain Integrated Architecture Data Collection Analysis Challenges Summary © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 2 3 Killer Domain E-Commerce Data records are plentiful Electronic collection provides reliable data Enables closed-loop analysis Insight can easily be turned into action Success can be directly measured e.g., Return on investment (ROI) © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 3 4 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Analysis © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 4 5 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Business facing Products, content Analysis Attributes Shared meta-data © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 5 6 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Build store Test before production Analysis Transform for efficiency Zero down-time © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 6 7 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Customer facing Multiple Touchpoints Analysis Integrated Data Collection © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 7 8 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Build warehouse Automated using meta-data Analysis Reduces pre-processing Transform for analysis © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 8 9 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Analysis Data transformations Analysis Exploration Modeling © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 9 10 Integrated Architecture Business Data Definition Stage Data Deploy Results Customer Interaction Build Data Warehouse Close the loop Transfer scores, models Analysis Personalize © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 10 11 Clickstream Logging Web server logs Packet sniffers Logs every HTTP request - filtering required Stateless - must identify users and sessions Captures URLs - must map to content Can’t understand dynamic content Streaming data - must parse to understand content Can’t understand encrypted data (SSL) Solution : Application server logging © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 11 12 Beyond Clickstream Logging Business Event Logging Consider several requests as one logical event Add or remove from shopping cart Initiate or finalize checkout Search Register Personalization rule evaluation Provides business insight Difficult to log outside of application server © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 12 13 Aggregation Data occurs at multiple granularities Customers Sessions Cities Finer Granularity Requests Customers Orders Many interesting attributes need to be aggregated for analysis © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 13 14 Aggregation Interesting customer attributes What wallet share did each customer spend on books? How much is each female customer’s average order amount above the mean value for female customers? What is the total amount of each customer’s five most recent purchases over $30? What is the frequency of each customer’s purchases? How long ago was each customer’s last purchase? © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 14 15 Hierarchies E-Commerce data contains many hierarchies How can we use them in analysis? Products Clothing Books 2 Mens 1 $12 T F F Womens © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 15 16 Analytical Tools Reporting OLAP How do sales vary over time in each geographic region? Modeling Algorithms Who are the top referrers by sales generated? What are the top abandoned products? What are the conversion rates for each product? What characterizes visitors that do not buy? What characterizes customers that prefer promotions? Which are the potential cross-sells and up-sells? Visualization © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 16 17 E-Commerce Challenges Make data mining comprehensible Support multiple granularity levels Utilize hierarchies Support date and time types effectively Support external events and changing data Identify bots and crawlers Handle large amounts of data © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 17 18 Summary Integrated E-Commerce and data mining enables effective closed-loop analysis Application server logging provides integrated data collection and reduces pre-processing Powerful data transformations and a broad suite of analysis techniques are needed There are many challenges ahead © Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 18