Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Optimizing Online Yield via Predictive Modeling of Individual Site Visitors David Lapayowker Marissa Quitt Elaine Shaver (PM) Devin Smith Magnify360 Liasons: Olivier Chaine, Jim Healy, Nate Pool, Gilles ????? HMC Advisor: Zachary Dodds Magnify360 Designs multiple websites for clients with each site customized to meet the needs of different types of users. Analyzes clickstream data from site visitors in order to provide the website that will best suit each one. The result is to convert a larger set of users than a single page. old Facebook new Facebook System Overview Navigates to a site Tailored interactions "Conversion" User Actions [email protected] Dataflow clickstream data Our system classify user Musician Pasadena resident User groups Insomniac Musician choose page Online classifier serve page results • user data • pages served • conversion data clustering Musician Bioengineer Pachyphile Offline analysis Problem Statement Navigates to a site User Actions Tailored interactions "Conversion" Detailed problem statement here [email protected] Dataflow clickstream data Our system classify user Musician Pasadena resident User groups Insomniac Musician choose page Online classifier serve page results • user data • pages served • conversion data clustering Musician Bioengineer Pachyphile Offline analysis Clickstream Data example columns… Database 80 tables 110,000,000 rows ethics ~ anonymous ~ no purchased data! 13 GB User profiles A profile is a binary attribute that captures a specific combination of data values. Currently 42 of them, hand-specified from Mag360's site insomniac something something Tradeoffs: + captures experienced intuition about what is important + takes advantage of Magnify360's site-design expertise - binary attributes - may miss patterns not captured by the user profiles Conversion data The site yield, or conversion, is client-specified Amount of transaction(s) Time spent on (a part of) the site Contact information presence and/or time of an email address table Goal: to determine those clusters of visitors who will be best served (convert) via a particular version of a client site 3% conversion Offline analysis ~ user clustering one big cluster ~ "best page" hand-tuned clusters hierarchical clustering growing neural gas decision-tree clustering fuzzy k-means clustering support vector machines Visitors ~ vectors of profile attributes Offline analysis ~ user clustering one big cluster ~ "best page" hand-tuned clusters hierarchical clustering growing neural gas decision-tree clustering fuzzy k-means clustering support vector machines Visitors ~ vectors of profile attributes Offline analysis ~ user clustering one big cluster ~ "best page" hand-tuned clusters hierarchical clustering growing neural gas decision-tree clustering fuzzy k-means clustering support vector machines Visitors ~ vectors of profile attributes Offline analysis ~ user clustering one big cluster ~ "best page" hand-tuned clusters hierarchical clustering growing neural gas decision-tree clustering fuzzy k-means clustering support vector machines Visitors ~ vectors of profile attributes Support vector machine example Can we get one of the real data pages? From clusters to sites Training data from each cluster determines the best site: 7 + 1 + 1 (yield) Page: A Yield: 7 Page: A Yield: 1 Page: A Yield: 1 3 (visits) page A score ~ 3.0 Page: B Yield: 7 7 + 8 + 3 (yield) Page: B Yield: 3 Page: B Yield: 8 3 (visits) page B score ~ 6.0 This cluster of six people responds better to site B, Time-based site choice Magnify360 wants to adapt quickly to new preferences: Time-weighted average yields: 20 • 7 + 2-3 • 1 + 2-4 • 1 Page: A Yield: 7 t: 0 Page: A Yield: 1 t: 3 Page: B Yield: 7 t: 4 t ~ age of data Page: A Yield: 1 t: 4 20 + 2-3 + 2-4 page A score ~ 6.05 2-4 • 7 + 2-5 • 8 + 2-1 • 3 Page: B Yield: 8 t: 5 Page: B Yield: 3 t: 1 2-4 + 2-5 + 2-1 page B score ~ 3.68 but site A has had better recent performance. Online classification procedure Possible results… Results ~ Packet 8 all on one graph what about hand-tuned system results? comments A closer look… talk about SVM parameters here? comments Sensitivity to scoring parameters? David's charts comments Software structure Diagram What's done and not done… comments Software structure Diagram What's done and not done… comments Perspective Concluding comments Questions? Clickstream Data The Good: We have DATA! The Bad: Too much? The Ugly: What is this data!? ~ 80 tables ~ 13 GB One of our tables… ID, anyone? Fun Statistics Data: To do Understand the purpose of each table / column Understand relationships between tables Create a single table (or file) of relevant information in order to test and evaluate our clustering algorithms. (table demodularization, against all design principles) Clustering Algorithms k-Means: Choose centroids at random, and place points in cluster such that distances inside clusters are minimized. Recalculate centroids and repeat until a steady state is reached Fuzzy k-Means: Similar, but every datapoint is in a cluster to some degree, not just in or out. Heirarchical Clustering: Uses a bottom-up approach to bring together points and clusters that are close together FuzME's best 10-cluster results ~ synthetic data Bottom line: These clustering algorithms are simple and effective techniques for categorizing data, but they cannot exist in a vacuum; we are investigating other techniques that may be used in parallel or instead. Growing Neural Gas A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines nodes or “centroids” to represent the data Growing Neural Gas A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines nodes or “centroids” to represent the data Representative Nodes User Profiles Growing Neural Gas A clustering algorithm masquerading as a neural network Given a data distribution, dynamically determines nodes or “centroids” to represent the data Representative Nodes User Profiles “Dynamic” because it adds or deletes nodes as necessary, as well as adapting nodes toward changes in the data. How it works… Given some input x: Find the closest node, s, and the next closest, t. Update the error of s by εw|s – x| Shift s and its neighbors toward x, and increment the age of all those edges. If s and t are adjacent, set the age of that edge to 0. Otherwise, create that edge. Remove edges that are too old, decrease the error of all edges by a small amount. Add a node every generations, putting it between the node with the largest error and its largest-error neighbor. Repeat! A Few Parameters… (Making sense of the GUI) λ: Controls how frequently new nodes are inserted Max Edge Age: Dictates how often old edges are deleted εw: Factor to scale the value of the “winning” node εn: Factor to scale the value of the next nearest node α: Scale factor for decreasing the error of parent nodes β: Scale factor for decreasing error of all nodes … and the difference they make. λ= 100 • Smaller λ, nodes inserted more often • Leaves straggler nodes that don’t accurately match data λ= 1000 • Larger λ, nodes inserted less often • Takes longer, but yields more accurate placement of nodes Support Vector Machines Clearly planar Planar in feature space Support Vector Regression (Machine?) Goal: Minimize error between hyper-plane and data points. SVM Maximize cluster separation SVR Minimize plane-to-data distance Getting the correct page… What do we want from a technique? CLASSIFICATION: Input: User data. Output: Page to serve. REGRESSION: Input: User data and possible page. Output: Predicted Success. Both require multiple SVMs. Using Classification via SVMs C B DATA Predicted Page: C C Using Regression via SVRs Page A Predictor 0.42 Predicted Page: DATA Page B Predictor 0.24 Page C Predictor 0.78 C Data The Good: We have DATA! The Bad: Too much? The Ugly: What is this data!? ~ 80 tables ~ 13 GB One of our tables… ID, anyone? Fun Statistics Data: To do Understand the purpose of each table / column Understand relationships between tables Create a single table (or file) of relevant information in order to test and evaluate our clustering algorithms. (table demodularization, against all design principles) Goal Breakdown Short-term Plan Plan for Algorithm Comparison Plan for Algorithm Comparison Plan for Algorithm Comparison Schedule and Conclusion Friday November 14 Friday November 21 Initial testing on real data Meeting with Magnify360 Friday December 5 Prototype algorithm comparison method Initial composition of classification algorithms Friday December 12 Midyear Report Questions? Questions? SVM vs SVR SVM Maximize Distance SVR Minimize Distance Data The Bad, or, The Challenges: Lots of SQL data Some Data Tables 80 tables total… Data Size Problem Statement Officially: Develop an innovative predictive modeling system to predict shopping cart abandonment based on profiles, clusters, shopping cart contents Most importantly: GRAB from email ! Research and implement various AI techniques to optimize the process of matching users with websites Individualized Online Experiences Classifying Users Unsupervised clustering: points are clustered without knowledge of the results Supervised clustering: clusters are built using prior knowledge of the results Ethical concerns? Recap: What Magnify360 Does Individualize a website for different types of users Collect data on users from their clickstream, and give them the site that will appeal to them best Appeal to a larger base of users by making the site more interesting to a larger group old Facebook serving both!