Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Commercial Systems by Sylvia King Overview Crawler based Search engines Directory based Search Engines 3 Main Characteristics Google’s Pigeon Ranking Technology Advantages and Disadvantages Human-powered directory Natural Language Processing – AskJeeves Advantages and Disadvantages MetaCrawler Search Engine Conclusion Crawler based Search Engines 3 Main Components: Spider The index Also known as crawler, visits a web page reading as it goes along, then it follows links to other pages within that site. returns to the site every month or two and checks for changes. All amendments are detected and are transferred into the index. is sometimes known as the catalogue, has a copy of every single web page that the spider finds. Index is updated with all the new changes. a web page may have been "spidered" but not yet "indexed." until it is added to the index it will not be available to those who are searching with crawl based search engine. Location & Frequency Search engines follow a set of rules called algorithms. They concentrate on the location and frequency of keywords on a web page Crawler based Search Engines Cont…. PigeonRank Technology Firstly a user submitted a query to Google, The query is then routed to what is known as data coop, When a relevant result is located by one of the pigeons in the cluster, it strikes a rubber-coated steel bar, this gives the page a Pigeon Rank value of number one. For each peck, the Pigeon Rank value is increased. The pages that get the most pecks are prioritised and are shown at the top of the user's results page. The remaining results are displayed in order of this pecking system. The pigeon rank methods makes it difficult to amend results, aside from the Location & Frequency tricks, some try and boost rankings by including images on their pages, Google's Pigeon Rank technology is not fooled be such techniques. Crawler based Search Engines Cont…. Advantages Offers much larger databases of web sites for searches. The full text of individual web pages is often searchable. Great for searching very obscure terms or phrases. Disadvantages No humans to weed out problems, such as duplicates and rubbish The huge size of the database can lead to high numbers of search results. Search command languages can often be complex and confusing. Directory based Search Engines Human-powered directory Directories that depend on humans to collect their listings. Directories point to sites rather than compiling databases containing pages You submit a short description to the directory for your site, and then a search looks for matches only with the description submitted. Natural Language Processing – AskJeeves Through the use of Teoma Technologies, AskJeeves assists the user through questions which helps narrow the search also searches of up to six other search sites for the relevant web pages This technique avoids searchers to Boolean or other query languages Teoma technology & AskJeeves Teoma technology places strong emphasis on popularity of web sites in their ranking algorithms, this search engine decides results by ranking a site based on the following: Subject-Specific Popularity: which is the number of web pages about the subject that reference the page. General Popularity: the number of all the other web pages that reference the page. Teoma technology also uses what are known as "communities" of expert sites. Communities are relevant knowledge hubs that are used to guide the user through their search. Directory based Search Engines Cont… Advantages If the user is uncertain of which keywords to use Because these directories use human editors, the general standards are higher than what’s found in search engines Disadvantages It could take the user a longer time in locating a suitable website. Directories tend to be smaller than search engine databases. Because directories are maintained by people and not spiders, and also because they point to sites, rather than compiling databases containing pages, the content of a site or page can change without the directory being updated. Dead links, -these are links that do not go to the pages they are intended to, but instead produce an error message is viewed as a problem because the responsible is on human editors to maintain the content of the directory. MetaCrawler Search Engine A Meta search engine works as an agent between the user and the search engines. Meta search engines do not build or maintain their own web indexes, they use the indexes built by others Meta search engines generally present the first 10 - 30 results from each of the results page Advantages and Disadvantages The advantages here is that Meta has the ability to single-handily search several databases for the required topic. The disadvantage is that it may return a limited number of hits. MetaCrawler Search Engine Cont… Conclusion Why Google’s number one statues? Google processes its search queries at a speed much greater than the traditional search engines; it accomplishes this by collecting pigeons in thick clusters. Suggestions to minimize duplications/rubbish within Crawler based search engines?