Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ontological Classification of Web Pages Zafer Erenel • Many users use search engines to locate and buy goods and services (such as choosing a vacation). • Web pages presented on the internet do not conform to any data organization standard and search engines provide primitive query capabilities for users to retrieve relevant data [1]. • In addition to that, they do not list sites equally and are inclined toward listing more popular pages. These tendencies brush many web pages aside and leave a limited number of alternatives to the users. I have created a lightweight domain ontology that consists of a taxonomic hierarchy and made use of it by an automated agent to classify web pages on the internet. • The automated agent discovers and classifies relevant pages with the help of Yahoo and Google search engines. Related Research • Desai and Spink presented a clustering scheme that groups documents into partially and substantially relevant pages by using similarity measures and ranking heuristics [2]. • They worked with the end-user queries (limited number of terms) to obtain the relevance score. Instead, my automated agent will act along with an established ontology to discover and classify documents. • Chiang, Chua, and Storey parsed snippets of returned links to find the ratio of the number of matching terms to rank the web pages for relevance [1]. • I believe snippets consist of a very few number of words and we cannot judge the web page on the basis of snippets. My agent scours the entire web page which is more time-consuming but more effective • Yahoo and Google search results contain scores of links. • My lightweight domain ontology consists of 7 branches. Each branch is comprised of predetermined terms. • Score of the web page increases in a certain branch as the agent comes across these predetermined terms on the html code. • I’ve chosen country ontology because internet users’ interest in a certain country can be quite high. • The ranks of web pages in each cluster will clarify their content to the user. • In addition to that, we can compare result sets of different search engines (Yahoo and Google) for the same queries and find complement and intersection of their result sets to have a clear understanding of search engines’ behaviors. • I’ve used web stream classes in C# Programming language to create my agent. • A WebRequest is an object that requests a Uniform Resource Identifier (URI) such as the URL for a web page [3]. • You can use a WebRequest object to create a WebResponse object that will encapsulate the object pointed to by the URI. • Once you get the actual object (e.g., a web page) pointed to by the URI, what you get back is a stream of the web page. • I used this capability for reading a page from a site to extract the information I need. I have created two web requests using search syntax given below http://www.google.com/search?q=cyprus+vacation+travel&lr=&start=0&sa=N http://search.yahoo.com/search?p=cyprus+vacation+travel&b=1 • Google search engine has returned 200 URLs and I have created 200 web requests to extract relevant information from each web page. • Yahoo search engine has been used in the same manner to extract relevant information. • If we analyze price rankings, we come across pages that have information about student flights, travel insurances, vacation package discounts, cheap flights and etc.. • If we analyze nature rankings, we come across web pages that offer adventure and etc. • If I want to do scuba diving on my vacation, I know that hawai and fiji are among my options by looking at activities rankings • Interestingly enough, in the top 100 search lists, the number of web pages that both appear on Google and Yahoo is 19. • In the top 200 search lists, the number of web pages that both appear on Google and Yahoo is 24. • As a result, ontologically organized clusters of web sites that are offering information about a given country regarding vacation and travel alternatives serve our objective to a greater extent in finding what we are in search of. • In my work, I have used 2 search engines and a single ontology. Search Engines’ shortcomings can be prevented by combining multiple engines with multiple ontologies to ease the search for most needed information on the internet. • Venn Diagrams prove that a specific search engine is not very effective by itself. References [1] R.H.L. Chiang, C.E.H. Chua, V.C. Storey, A smart web query method for semantic retrieval of web data, Data & Knowledge Engineering 38 (2001) 63-84. [2] M. Desai, A. Spink, An algorithm to cluster documents based on relevance, Information Processing and Management 41 (2005) 1035-1049. [3] Liberty, J., Programming C#,3rd ed. O’REILLY, 2003.