Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Personalization Technologies: A Process-Oriented Perspective Gediminas Adomavicius Information and Decision Sciences Carlson School of Management University of Minnesota [email protected] Alexander Tuzhilin Information, Operation and Management Sciences Stern School of Business New York University [email protected] 1. Introduction Over the past several years, there has been much work done in personalization focusing on the development of new technologies, understanding personalization from the business point of view, and developing novel personalization applications [CACM00]. Since personalization constitutes a young and rapidly developing field, there still exist different points of view on what personalization is expressed by academics and practitioners. In this article we synthesize these various points of view and describe personalization technologies from a process-oriented perspective. Several attempts have been made to define personalization by the industry practitioners and the academic researchers. Some of the representative definitions include: • “Personalization is the ability to provide content and services that are tailored to individuals based on knowledge about their preferences and behavior” [“Smart Personalization,” Forrester Report by Paul Hagen, 1999]. • “Personalization is the combined use of technology and customer information to tailor electronic commerce interactions between a business and each individual customer. Using information either previously obtained or provided in real-time about the customer and other customers, the exchange between the parties is altered to fit that customer's stated needs so that the transaction requires less time and delivers a product best suited to that customer” [www.personalization.com]. 1 • “Personalization is the capability to customize communication based on knowledge preferences and behaviors at the time of interaction” [Jill Dyche, CRM Handbook, Addison-Wesley, 2002]. • “Personalization is about building customer loyalty by building a meaningful one-to-one relationship; by understanding the needs of each individual and helping satisfy a goal that efficiently and knowledgeably addresses each individual’s need in a given context” [Doug Riecken, “Personalized Views of Personalization,” Communications of the ACM, 43(8), 2000]. These definitions cover various aspects of personalization, and several important features of personalization emerge from them. In particular, these definitions state collectively that personalization tailors certain offerings (e.g., content, services, product recommendations, communications, and e-commerce interactions) by providers (e.g., e-commerce Web sites) to the consumers of these offerings (e.g., customers, visitors, users, etc.) based on knowledge about them with certain goal(s) in mind. In the rest of this section, we will elaborate on these concepts. Personalization Participants. As stated above, personalization takes place between one or several providers of personalized offerings and one or several consumers of these offerings, such as customers, users, and Web site visitors. Personalized offerings can be delivered from providers to consumers by personalization engines in three ways presented in Fig. 1. In these diagrams, providers and consumers of personalized offerings are denoted by white boxes, personalization engines by grey boxes, and the interactions between consumers and providers by lines. Fig. 1(a) presents a provider-centric personalization approach that assumes that each provider has its own personalization engine that tailors the provider’s content to its consumers. This is the most common approach to personalization, as popularized by Amazon.com. The second approach, presented in Fig. 1(b), is a consumer-centric approach, which assumes that each consumer has its own personalization engine (or agent) that “understands” this particular consumer and provides personalization services based on this knowledge. This type of 2 consumer-centric personalization delivered across a broad range of providers and offerings, is called an e-Butler service [AT02]. Rudimentary e-Butler services are provided by such Web sites as Gator.com, MySimon.com and MyGeek.com. The third approach, presented in Fig. 1(c), is a market-centric approach that provides personalization services for a marketplace in a certain industry or sector. In this case, the personalization engine performs the role of an infomediary [GT01] by knowing the needs of the consumers and the providers’ offerings and trying to match the two parties in the best ways according to their internal goals. While this approach has not been extensively used before, it has a high potential, especially because of the proliferation of electronic marketplaces, such as Covisint [www.covisint.com] for the automobile industry. Consumers Providers (a) Provider-centric Consumers Providers (b) Consumer-centric Consumers Providers (c) Market-centric Figure 1. Classification of Personalization Approaches. What Is Being Personalized? The result of personalization is the delivery of various offerings to consumers by the personalization engine(s) on behalf of the providers using any of the approaches shown in Fig. 1. Examples of the personalized offerings include • personalized content, such as personalized Web pages and links; • product and service recommendations, e.g., for books, CDs and vacations; 3 • personalized email; • personalized information searches; • personalized (dynamic) prices; • personalized products for individual customers, such as custom-made CDs. Personalization Goals. The personalization objectives usually are multifaceted. They may range from simply improving the consumer’s browsing and shopping experience (e.g., by presenting only the content that is relevant to the consumer) to much more complex objectives, such as building long-term relationships with consumers, improving consumer loyalty, and generating a measurable value for the company. Currently, the most commonly used metrics are accuracy metrics that measure how the consumer liked a specific personalized offering, e.g., how accurate the recommendation was [BS97, Paz99]. Although important, accuracy metrics are fairly simplistic and do not reflect the “bigger picture”, i.e., to what extent the more complex personalization objectives have been met. For this purpose, there is a need for more sophisticated metrics, such as the consumer lifetime value, consumer loyalty value, purchasing and consumption experiences, and other ROI-based metrics [CS00, RNE+02]. Consumer Knowledge. Successful personalization depends to a very large extent on the knowledge about personal preferences and behavior of the consumers that is usually distilled from the large volumes of granular information about the consumers and stored in consumer profiles [Paz99, AT01]. We will cover this topic in Section 3. The definitions listed above collectively cover several major points about personalization. However, the point that personalization constitutes an iterative process that takes place over time has not been sufficiently addressed before, and we describe it in the next section. 4 2. Personalization Process Personalization constitutes an iterative process that can be defined by the Understand-DeliverMeasure cycle taking place in time and consisting of the following stages shown in Fig. 2: • Understand consumers by collecting comprehensive information about them and converting it into actionable knowledge stored in consumer profiles. • Deliver personalized offering based on the knowledge about each consumer, as stored in the consumer profiles. The personalization engine must be able to find the most relevant offerings and deliver them to the consumer. • Measure personalization impact by determining how much the consumer is satisfied with the delivered personalized offerings. It provides information that can enhance our understanding about consumers or point out the deficiencies of the methods for personalized delivery. Therefore, this additional information serves as a feedback for possible improvements to each of the other components of personalization process. This feedback information completes one cycle of the personalization process, and sets the stage for the next cycle where improved personalization techniques can make better personalization decisions. The technical implementation of the Understand-Deliver-Measure cycle consists of the six stages presented in Fig. 2 and described below. Data collection. The personalization process begins with collecting data across different channels of interaction between consumers and providers (e.g., Web, phone, direct mail, and other channels) and from various other heterogeneous data sources. Such data can be solicited explicitly (e.g., via surveys) or tracked implicitly and may include histories of consumers’ purchasing and searching activities, as well as demographic and psychographic information. The 5 objective is to obtain the most comprehensive “picture” of a consumer. After the data is collected, it is usually processed, cleaned, and stored in a consumer-oriented data warehouse. Building consumer profiles. A key issue in developing personalization applications is constructing accurate and comprehensive consumer profiles based on the collected data. We discuss the techniques for building consumer profiles in more detail in Section 3. Matchmaking. Personalization systems must match appropriate content and services to individual consumers. There are many matchmaking technologies including user-specified rulebased content delivery systems, e.g., as deployed by BroadVision [http://www.broadvision.com] and ATG [http://www.atg.com], statistics-based predictive approaches, and recommender systems. However, in this paper we will focus on recommender systems technologies because of the space limitation and because they represent the most developed matchmaking technologies applicable to various types of personalized offerings. We will describe them further in Section 3. Adjusting Personalization Strategy Feedback loop Measuring Personalization Impact Measure Impact of Personalization Delivery and Presentation Deliver Personalized Offering Matchmaking Building Consumer Profiles Understand the Consumer Data Collection Figure 2. Personalization process. 6 Delivery and presentation. E-companies deliver personalized information to consumers in several ways. One classification of delivery methods is pull, push, and passive [SKR01]. Push methods reach a consumer who is not currently interacting with the system, e.g., by sending an email message. Pull methods notify consumers that personalized information is available but display this information only when the consumer explicitly requests it. Passive delivery displays personalized information as a by-product of other activities of the consumer. For example, while looking at a product on a Web site, a consumer also sees recommendations for related products. The system can present personalized information in various forms: narrative, a list ordered by relevance, an unordered list of alternatives, or various types of visualization. Measuring personalization impact. As was explained before, various accuracy metrics as well as consumer lifetime value, loyalty value, and purchasing and consumption experience can be used to evaluate the effectiveness of personalization. The quality of recommendations, as measured by these metrics, depends on the sophistication of technologies deployed in the previous four stages in Fig. 2. Adjusting personalization strategy. As Fig. 2 shows, measuring personalization impact serves as a feedback for possible improvements to each of the other five steps of the personalization process. This feedback should be used to decide whether to collect additional data, build better user profiles, develop better matchmaking algorithms, improve information delivery and presentation, or to use additional measures of personalization impact. If this feedback is properly integrated in the personalization process, the quality of interactions with individual consumers, as measured by the metrics discussed above, should grow over time resulting in the virtuous cycle of personalization. If the feedback is not properly integrated in the personalization process, then the metrics can decrease over time producing the effect of de-personalization, when the 7 consumer is getting frustrated with the personalization system and stops using it. The depersonalization effect is largely responsible for failures of some of the personalization projects reported in the literature [PR01]. Therefore, one of the main challenges of personalization is the ability to achieve the virtuous cycle of personalization and not fall into the de-personalization trap. From the algorithmic sophistication perspective, the technologies that contribute the most to this goal are the profiling and the matchmaking technologies. We describe them and their interaction in the next section. 3. Profiling and Matchmaking Technologies One of the key issues in developing personalization applications is the problem of constructing accurate and comprehensive profiles of individual consumers. Such profiles should provide the most relevant information describing who the consumers are and how they behave. Traditionally, consumer profiles consist of simple factual information. For example, this information may include consumer’s demographics, such as name, gender, date of birth and address, or be derived from the past transactions of a consumer, such as the largest purchase value made at a Web site. This factual profile information is usually defined as a record of values and stored in a relational database, one record per consumer. In addition to the factual information, [AT01] considers profiles that capture more complex behavioral information of consumers. These profiles are modeled using such techniques as • Conjunctive rules. For example, the rule “John Doe prefers to see action movies on weekends” (i.e., Name = “John Doe” & MovieType = “action” → TimeOfWeek = “weekend”) can be a part of John Doe’s profile that describes his movie viewing habits [AT01]. Such rules can be learned from the transactional history of the consumer using 8 various data mining techniques, including association and classification rule discovery methods [HMS01]. • Sequences, such as sequences of Web browsing activities. For example, we may want to store in John Doe’s profile his typical browsing sequence “when John Doe visits the book Web site XYZ, he usually first accesses the home page, then goes to the Home&Gardening section of the site, then browses the Gardening section and then leaves the Web site” (i.e., XYZ: StartPage → Home&Gardening → Gardening → Exit). Such sequences can be learned from the transactional histories of consumers using frequent episodes and other sequence learning methods [HMS01]. • Signatures, i.e., the data structures that are used to capture the evolving behavior learned from large data streams of simple transactions [CFP+00]. For example, “top 5 most frequently browsed product categories over the last 30 days” would be an example of a signature that could be stored in individual consumer profiles in a Web store application. In summary, all profiling approaches can be classified into simple, that support unstructured factual information about the customers (e.g., demographic information), and advanced, that support the behavioral information about consumers expressed in the form of rules, sequences, signatures, and other knowledge representation methods. Besides profiling, delivering targeted content and services for the consumers is another crucial aspect of personalization that depends significantly on the quality of the underlying matchmaking technologies. There has been much research done on this subject, including rulebased matchmaking, statistics-based predictive approaches, and recommender systems. However, as was explained in Section 2, in this paper we will focus on recommender systemsbased matchmaking techniques. 9 In the context of recommender systems, matchmaking technologies are often classified into broad categories according to their recommendation approach as well as their algorithmic technique as described below. Classification based on the recommendation approach [BS97]: • Content-based recommendations: the consumer is recommended items (e.g., content, services, products) similar to the ones the consumer preferred in the past. In other words, content-based methods analyze the commonalities among the items the consumer has rated highly in the past. Then, only the items that have high similarity with the consumer’s past preferences would get recommended. • Collaborative recommendations (or collaborative filtering): the customer is recommended items that people with similar tastes and preferences liked in the past. Collaborative methods first find the closest peers for each consumer, i.e., the ones with the most similar tastes and preferences. Then, only the items that are most liked by the peers would get recommended. • Hybrid approaches: these methods combine collaborative and content-based methods. This combination can be done in many different ways, e.g., separate content-based and collaborative systems are implemented and their results are combined to produce the final recommendations. Another approach would be to use content-based and collaborative techniques in a single recommendation model, rather than implementing them separately. Classification based on the algorithmic technique [BHK98]: • Heuristic-based techniques constitute heuristics that calculate recommendations based on the previous transactions made by the consumers. An example of such a heuristic for a movie recommender system could be to find the person whose taste in movies is the closest to mine, and recommend me everything this person liked that I have not seen yet. 10 • Model-based techniques use the previous transactions to learn a model (usually using some machine learning or statistical learning technique), which is then used to make recommendations. For example, based on the movies that I have seen, a probabilistic model is built to estimate the probability of how I would like each of the unseen movies. As we have done with profiling techniques, we classify various matchmaking methods into simple and advanced. Existing empirical research suggests that hybrid approaches outperform the pure content-based and pure collaborative approaches [BS97, Paz99] and that model-based techniques outperform heuristic-based ones in terms of recommendation accuracy [BHK98]. Based on these results, we classify content-based and collaborative approaches that use heuristic algorithms as “simple” and hybrid heuristics and model-based matchmaking techniques are classified as “advanced.” Combining the profiling and the matchmaking classifications, personalization technologies can be characterized by the 2x2 matrix presented in Table 1 emphasizing that various recommendation techniques often use different types of consumer profiles for recommendation purposes. In particular, many collaborative techniques only use the ratings of items that were provided by individuals as a feedback to the recommender system, although some techniques take into account simple demographic attributes (e.g., age, gender) in the recommendation process. Content-based techniques commonly use keywords to represent tastes and preferences of individuals. In other words, traditional recommendation techniques use very limited, “factual” profiling information. Even more advanced approaches to recommender systems, such as the current generation of hybrid recommendation heuristics and model-based techniques, still use the same limited profiling information as simple heuristics (i.e., ratings, keywords, demographic attributes). Therefore, all these approaches to recommender systems where 11 classified as having “simple” profiling components. Thus, heuristic techniques for content-based and collaborative recommendations were placed in the upper left quadrant of Table 1. Similarly, the more advanced recommendation approaches, such as hybrid heuristics and model-based techniques, also utilize simple profiling methods, as discussed above. Therefore, they fall into the lower left quadrant of Table 1. PROFILING Simple Simple MATCHMAKING Advanced Content-based heuristics Collaborative filtering heuristics Hybrid heuristics Model-based approaches Advanced Rule-based matching Future work? Table 1. Existing personalization techniques according to their profiling and matchmaking components. Furthermore, the upper right quadrant of Table 1 corresponds to the advanced profiling methods, such as rule-based consumer profiling techniques, described above. However, the advanced profiling methods are underutilized in the modern recommender systems, and there has been relatively little work done on developing matchmaking technologies that fit into this quadrant. Some exceptions to this constitute fairly straightforward rule matching techniques, such as the ones utilized by BroadVision [www.broadvision.com] and ATG [www.atg.com] companies. For example, if there are rules in the consumer’s profile indicating that this consumer reads politicsrelated news in the morning (i.e., rule morning → politics) and sports-related news in the evening (i.e., evening → sports), then in the morning the news Web site will present more politics-related stories and in the evening more sports-related stories to this consumer. Since such systems use advanced profiling and simple recommendation methods, they fall into the upper right quadrant of Table 1. 12 Finally, there has been very little prior work done for the lower right quadrant of Table 1 because most of the personalization research has focused on the development of matchmaking techniques that do not require a comprehensive understanding of consumers’ tastes, preferences, and behavior. Similarly, the advanced profiling methods have only been applied for simple matchmaking problems. Since it is crucial to have a comprehensive understanding of the consumers and deploy sophisticated recommendation methods in order to provide accurate recommendations, the lower right quadrant of Table 1 represents an important research opportunity in personalization technologies. 4. Future Work on Personalization Process Some of the outstanding issues in personalization have been previously pointed out by several researchers and discussed extensively in the literature. Such issues include the degree of personalization, privacy, scalability, trustworthiness, intrusiveness, and usage of various metrics to measure effectiveness of personalization [CACM00, SKR01]. The integration of advanced profiling and matchmaking techniques also constitutes an important research problem that we discussed in Section 3. In the context of the process-oriented view of personalization, one of the most important problems lies in understanding the dynamics between various stages of the personalization process and in their proper integration using the feedback loop resulting in the virtuous cycle of personalization discussed in Section 2. As depicted in Fig. 2, personalization is a process consisting of several stages, and it is crucial to understand how much each stage contributes towards the success (or failure) of the overall personalization strategy, as measured by various ebusiness and other metrics. In other words, if these metrics suggest that our personalization strategy is not performing well, is it because of poor data collection, inaccurate consumer 13 profiles, poorly chosen techniques for matchmaking or content delivery? Alternatively, the selection metrics may not be well suited for the application at hand and are not giving us accurate measurements. We call this a feedback integration problem, since the main issue is how to improve the personalization process by integrating the feedback into the overall process. Note, that the feedback integration problem is a recursive one, i.e., if we are able to identify the underperforming stages of the personalization process, we may still face similar challenges when deciding on the specific adjustments within each stage. For example, if we need to improve the data collection phase of the personalization process, should we collect more data, collect different data, or use better data pre-processing techniques [SMB+03]? The problem of feedback integration in the personalization process has not been extensively studied before. Therefore, more research is needed in order to achieve a more comprehensive understanding of how to transform the e-business measurements into specific adjustments to various stages of the personalization process. Much of personalization research has been focusing on only few stages of personalization process. The process-oriented view of personalization described in this paper suggests the importance and the need for vertical personalization research, i.e., research that integrates all the stages of the personalization process. References [AT01] G. Adomavicius and A. Tuzhilin. Expert-Driven Validation of Rule-Based User Models in Personalization Applications. Data Mining and Knowledge Discovery, 5(1-2): 33- 58, 2001. 14 [AT02] G. Adomavicius and A. Tuzhilin. e-Butler: An Architecture of a CustomerCentric Personalization System. International Journal of Computational Intelligence and Applications, 2(3): 313-327, 2002. [BHK98] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 1998. [BS97] M. Balabanovic. and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the ACM, 40(3): 66-72, 1997. [CACM00] Communications of the ACM, 43(8), 2000. Special issue on personalization. [CFP+00] C. Cortes, K. Fisher. D. Pregibon, A. Rogers, and F. Smith. Hancock: a language for extracting signatures from data streams. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000. [CS00] M. Cutler and J. Sterne. E-Metrics: Business Metrics for the New Economy, NetGenesis Corporation, 2000. [http://www.emetrics.org/articles/emetrics.pdf] [GT01] V. Grover and J. Teng. E-commerce and the information market. Communications of the ACM, 44(4), 2001. [HMS01] D. Hand, H. Mannila, and P. Smyth. Principles of Data Mining, MIT Press, 2001. [Paz99] M. Pazzani. A Framework for Collaborative, Content-Based and Demographic Filtering. Artificial Intelligence Review, 13(5-6): 393-408, 1999. [PR01] Peppers, D. and Rogers, M. Why CRM Initiatives Fail and What You Can Do about It. Inside 1to1. Oct. 8, 2001. [RNE+02] S. Rosset, E. Neumann, U. Eick, N. Vatnik, Y. Idan. Customer Lifetime Value Modeling and Its Use for Customer Retention Planning. Proceedings of the 8th 15 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002. [SKR01] Schafer, J. B., J. A. Konstan, and J. Riedl. E-commerce recommendation applications. Data Mining and Knowledge Discovery, 5(1-2): 115-153, 2001. [SMB+03] M. Spiliopoulou, B. Mobasher, B. Berendt, and M. Nakagawa. A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis. INFORMS Journal on Computing, 15(2), 2003. 16