Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Web Mining by: Katharotiya Manthan Overview       Web Mining Semantic Web Ontologies Semantic Web Mining Future Work References Problems With Web Interaction     Finding Relevant Information Creating New Knowledge using Existing Resources Personlization of Information Learning about Consumers or Individual Users Web Mining    The term created by Orem Etzioni (1996) Application of Data mining techniques Web Mining into Subtasks     Resource finding Information Selection and pre-processing Generalization Analysis Different Types  Web Usage Mining  Web Content Mining  Web Structure Mining Data Mining vs. Web Mining  Traditional data mining    data is structured and relational well-defined tables, columns, rows, keys, and constraints. Web data    Semi-structured and unstructured readily available data rich in features and patterns Web Structure Mining  Generate structural summary about the Web site and Web page   Extraction of patterns from the hyperlinks Mining of the structure of the document Web Usage Mining  Discovering user ‘navigation patterns’ from web data.   Prediction of user behavior while the user interacts with the web. Helps to Improve large Collection of resources. Usage Mining Techniques  Data Preparation     Data Collection Data Selection Data Cleaning Data Mining   Navigation Patterns Sequential Patterns Data Mining Techniques  Navigation Patterns     Example: 70% of users who accessed /company/product2 did so by starting at /company and proceeding through /company/new, /company/products and company/product1 80% of users who accessed the site started from /company/products 65% of users left the site after four or less page references Cont…  Sequential Patterns   In Google search, within past week 30% of users who visited /company/product/ had ‘camera’ as text. 60% of users who placed an online order in /company/product1 also placed an order in /company/product4 within 15 days Web Content Mining  ‘Process of information’ or resource discovery from content of millions of sources across the World Wide Web   E.g. Web data contents: text, Image, audio, video, metadata and hyperlinks Goes beyond key word extraction, or some simple statistics of words and phrases in documents. Semantic Web  The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information and services on the web is defined, making it possible for the web to "understand" and satisfy the requests of people and machines to use the web content. XML, RDF and Web Data     Structured and Unstructured Data W3c Standards for RDF Semantic Web: Different Kinds of databases Tight Coupling and Loose Coupling RDF - Resource Description Framework  Data Model consists of three object types:    Resources Properties Statements Example   Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila This sentence has the following parts:    Subject(Resource) http://www.w3.org/Home/Lassila Predicate (Property) Creator Object (literal) "Ora Lassila" Cont… Cont… Ontologies  Ontologies are developed to provide machine-processable semantics of information sources that can be communicated between different agents (software and humans). Developing an Ontology     Defining classes in the ontology, Arranging the classes in a taxonomic (subclass–superclass) hierarchy Defining slots and describing allowed values for these slots, Filling in the values for slots for instances. Cont… Semantic Web Mining   Closing the gap between Semantic Web and Web Mining. Use of ontologies Mining the Semantic in Web Evaluation Of Semantic Web Mining   Web Mining Vs. Semantic Web Mining A Note On E-Commerce Research initiatives   Vivísimo proposes a clustering approach for web document organization Haveliwala also propose a methodology for evaluating strategies for similarity search on the Web.  Jaccard coefficient Future Work  Demonstrating the utility of web mining can be done by making exploratory changes to web sites, e.g., adding links from hot parts of web site to cold parts and then extracting, visualizing and interpreting changes in access patterns. Conti…   There is often a tension in the design of algorithms between accommodating a wide range of data, or customizing the algorithm to capitalize on known constraints or regularities. Also web content mining can be introduced to implementations of this architecture. References       http://en.wikipedia.org/wiki/Web_mining http://www.engr.sjsu.edu/meirinaki/papers/NEMIS.p df http://www.w3.org http://www.cs.washington.edu/research/projects/We bWare1/www/softbots/papers/agents97.pdf http://infomesh.net/2001/swintro/ http://www.ksl.stanford.edu/people/dlm/etai/etaiabstract.html