Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Technology for E-commerce Helena Ahonen-Myka In this part... search tools metadata personalization collaborative filtering data mining Search tools the site has to be accessible site architecture and navigation structure is important … but some users prefer search keep users on the site usage can be monitored: useful knowledge about the users’ needs Users’ preferences search: 50% navigation: 20% mixed: the rest... Search tools Indexer: gathers the words from documents (HTML pages, local files, database records) and puts them into an index file Search engine: accepts queries, locates the relevant pages in the index, and formats the results in an HTML page Remote vs local search search tool can reside in a different server, also in a remote location indexing may take a lot of processing time, and the resulting index may need a lot of space local software may be faster Indexer local: scans directories web spider: an indexing robot begins at a given page, then follows the links and stores words of the pages ’robots.txt’ file: which robots allowed HTML meta elements: <meta name=”robots” content=”noindex, follow”> <meta name=”robots” content=”index,nofollow”> <meta name=”robots” content=”noindex,nofollow”> Indexer link structure should reach all the pages that should be indexed non-text links (imagemaps etc.): robots may not be able to follow links -> provide also text links frames: provide some navigational links to give a context, if the page is retrieved by a query Search page search forms are the user interface of the search engine simple form: just a text field and a button or a(n advanced) search page: boolean search, date ranges, subscopes... Search results the occurrences of the query terms are located from the index the results are sorted according to their (assumed) relevance to the query results page should have the same look-and-feel than the other pages on the site Why searches fail? empty searches: people just put the search button without giving any words wrong scope: people think they are searching the entire web vocabulary mismatch: terms are too specific, too general, just not used spelling mistakes query requirements not met Why searches fail? problems with query syntax: spaces, parentheses, etc. capitalization and special characters: exact matches required stopwords: some common words are not indexed short words: short words are not indexed numbers are not indexed No-matches pages answer pages to the user if the search does not return any matches should have the same look-and-feel than the other pages + navigation aids + search again field explanations why the search might have failed and what to do next Some usability issues web design: strong sense of structure and navigation support some people do not like to search people who search end up in some page: they should know where they are people need to move around in the neighborhood search should be available on every page Some usability issues scoped search: difficult for the users to understand what is the scope -> scope should be stated clearly, and a search to the entire site has to be offered easily boolean search is difficult: ’cats and dogs’ vs ’cats or dogs’ -> ’or’ could be used in the query, ’and’ in the ordering Metadata often a search results in a long list of matches; many of them may be irrelevant metadata can make the queries more powerful HTML meta elements <head profile=”http://www.acme.com/profiles/core”> <title>How to complete memo cover sheets</title> <meta name=”author” content=”John Doe”> <meta name=”copyright” content=”© 2000 Acme”.. <meta name=”keywords” content=”corporate, guidelines, cataloging”> <meta name=”date” content=”2000-10-17”> </head> Metadata RDF (Resource Description Framework): – Gives means to define metadata for XML and HTML documents – Give means to interchange it between different applications on the Web Example: Dublin Core metadata – Contains 15 elements (title, creator, date…) Dublin Core Dublin Core Metadata Elements: Content: Title Subject Description Language Relation Coverage Intellectual Property: Creator Publisher Contributor Rights Instance: Date Type Format Identifier Dublin Core in RDF Dublin Core represented in RDF <RDF:RDF> <RDF:Description RDF:HREF="URI"> <DC:Relation> <RDF:Description> <DC:Relation.Type> isPartOf </DC:Relation.Type> <RDF:Value RDF:HREF="URI2"/> </RDF:Description> </DC:Relation> </RDF:Description> </RDF:RDF> Searching XML documents structure of XML documents can be used to make more precise queries, e.g. find Albert Einstein in Author element only problem: how the user specifies the structure Searching XML documents 1) The user specifies the hierarchy in the query: Einstein in Author 2) The user makes a simple query, but the search engine presents the alternative contexts: Einstein can be in Author or in Street or in School Using links good site: many links into the site, particularly from other good sites text surrounding the link describes (probably) what the target of the link is about the knowledge above + the contents of the page itself are taken into account e.g. Google (www.google.com) Natural language queries E.g. Ask Jeeves questions and answers prepared by human editors user’s query is mapped to the prepared queries Personalization goal: the right people receive the right information at the right time but: people do not like to state complex queries, or initialize a service (like answering a questionaire) user profiles have to be generated and stored, preferably automatically User profiles may contain data like: interests, geographical area, age could be collected once, and shared with many services trust of the user: the profile should only be used to offer better service, and only if the user wants to let some service to use it Recommendations users who bought this book also bought these books / liked these cd’s etc. rating movies, tv programs, wines… recommending paths on a site Recommendations based on the user’s former behavior and profile data based on social (collaborative) filtering: what similar users liked User’s former behavior if used as the only source: the user never sees anything new particularly a new user hardly gets any recommendations Collaborative filtering draws on the experiences of a population or community of users the profile information of the target user is compared to the profiles of nearestneighbor users look for correlation between users in terms of their ratings: recommend items that are included in the neighbors profile but not in the target user’s profile Collaborative filtering Problems: cannot recommend new items (some users have to rate an item before it can be recommended) unusual user may not get (good) recommendations: no neighbors that are close enough Matching engines Apply one set of complex characteristics to another e.g., recruiting sites: match a job seeker and a job Data mining for e-commerce users’ behavior on the web site provides a lot of information: Which pages the users view? Which paths the users navigate? How long the users spend on the site? What is the rate of viewing a product and purchasing it? Data mining process Gathering the data Cleaning/preprocessing the data Transforming the data Analysis / finding general models Interpreting the results Using the knowledge Data collection clickstream logging: web server logs or packet sniffers business event logging Clickstream logging web log: page requested, time of request, client HTTP address, etc. lot of requests for images -> have to be filtered out users and user sessions difficult to identify requests for a page: the same page, but different dynamic content Clickstream logging more efficient at the application server layer instead of just pages, knowledge on products user and session tracking possible also track of information absent in web server logs: pages that were aborted while being downloaded Business event logging looking at subsets of requests as one logical event or episode: add/remove item to/from shopping cart initiate/finish checkout search (log keywords and nr of results) register From order data to customers collected data is order-oriented data for each customer is spread into many records information on customers is the real target information for each customer has to be aggregated From order data to customers What percentage of each customer’s orders used a VISA credit card? How much money does each customer spend on books? What is the frequency of each customer’s purchases? Model generation Answer questions like: What characterizes heavy spenders? What characterizes customers that prefer promotion X over Y? What characterizes customers that buy quickly? What characterizes visitors that do not buy? Data mining tools e.g., classification rules IF Income > $80,000 AND Age <= 30 AND Average Session Duration is between 10 AND 20 minutes THEN Heavy spender Understanding the results result of a data mining process may be difficult for a business user to understand: e.g. thousands of rules visualization is important tailored for a specific domain Using the results site structure can be updated procedures like registering or checkingout can be simplified metadata can be added to make search more efficient personalization rules, recommendating systems