Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Searching the web MG 25th March '02 TABLE OF CONTENTS •Page(s) 1-4 Description and explanation of search engines on the web. • Table 1 Engine Comparison Chart - describes methods that are beneficial for successful retrieval of information from the web. •Table 2 -Methods various search engines use to gather and format information for their databases. Description of some of the more popular search engines on the web today. •Guidelines for the best search engine for your information needs. MG 25th March '02 • Search Engines have evolved into a dynamic and powerful means of gathering sorting and selecting a wide range of information that would not otherwise be accessible to the general public. Today there are hundred of search engines, also called ‘web location services’, vying for a competitive edge on the market. These search engines have made the gathering and retrieval of information very simple for the user. Everyone who has had contact with the Internet have encountered, used or heard about search engine. Some claimed that ‘search engines’ have put a great amount of power in the hands of the user free of cost but others question whether all this abundant information actually comes with a price. MG 25th March '02 • The creation of the search engine can be accredited to Allen Emtage, a grad student, who wrote a program that ‘automatically searched for keywords appearing in archives coded in File Transfer Protocol (FTP, the language created to send net files). He later released this program in 1989 which he called ‘Archie”. Other search tools such as Veronica and Wide Area Internet Services (WAIS) were later created. These search tools basically gathered general keywords describing the topic you inputted and the sources where these tools should look. They then were able to best match the criteria by looking into the index and counting how many times each item contained the selected keywords. Unfortunately, Emtage did not file a patent application and presently a Anchover Mass. Company has claimed patent rights and are presently disputing patent infringement rights of other search tools companies. The battle continues but that does not prevent the scores of new search engines from being created very day. MG 25th March '02 • Today we are bombarded with hundred of search tools crawling the Internet. In order to access information on any search engine, you must type in a keyword. Then the search engine takes the keyword(s) and search for documents that have the keyword(s) requested. The amount of returns of keyword(s) found in the documents will determine which documents to chose. A search engine usually displays the results page by page. There usually thousands of documents on the Web that seems to relate to an inquiry. As it is impossible to look through so many documents, it is best to confine your enquiry to a particular topic in order to achieve satisfactory results. In order to find what you need from a search engine it is best to analyze your information needs, create queries and select appropriate search strategies. Depending on your specificity to a particular topic, it is best if you have a distinct idea of what you are searching for. MG 25th March '02 – There are tricks and pitfalls in search engines that a user should be aware of. Due to the increasingly competitive nature of the World Wide Web, site owners have take drastic steps to ensure that their sites are seen as many people as possible. Usually Corporations and Businesses try to attract potential customers by using tricks to guarantee that their sites are seen. A key feature of this type of practice is “keyword spamming”. Keyword spamming is the use of multiple keywords in documents to ensure that their sites are placed on the top of the hit list. Another method is the use of hidden text. Black text is hidden against a black background in web pages to ensure that it gets at the top of the list. Also some site owners just simply pay the search engines to place their sites at the top. The abuse of keywords and other aggressive means are only a few of the techniques used to ensure popularity. MG 25th March '02 – These questionable practices are now being employed as it guarantees popular hits. This does not guarantee however that these sites are indeed the best ones but in order for a user to effectively benefit from the information on the web, he needs to know what to look for and what to discard. Search engines are vast in numbers and it is best to use well thought out keywords and prior knowledge of exactly where to search for the information. The primary purpose of most search engines to get as many visitors to the sites listed as possible. Many web sites are caught up in the grand idea of getting as many hits as possible and they fail to realize that in order to maintain the optimum quality of service possible they must provide quality information. MG 25th March '02 – Bruceclay.com provided excellent ideas to improve and design your site to best suite the general needs of the user. One suggestion was using a ‘follow the leader’ method’ when selecting keywords and page wording. Meta tags in the source codes of the sites control keywords in search engines and the right keywords are the backbone for receiving popularity for a site. Knowledge of the keyword of competitors and improving and maintaining quality information is the next step to maintaining a popular ranking on a search engine This is by no means similar to ‘keyword spamming’ as proper selection of keywords will enable a site to maintain popularity. • MG 25th March '02 • There are different types of web resources that can help you to find the answers to your questions: • Subject tree: • A subject tree is a hierarchically organized category of topics with lists of web sites and online documents relevant to each topic. Also called directories. • • Clearinghouse: • A collection of Websites and online documents about a specific topic. Clearinghouses are similar to subject trees but on a larger scale • General search engine: • Indexes a large collection of web pages that users retrieve by entering keywords. General search engines rely heavily on web spiders to do most of the sorting and gathering. These databases are huge and sometimes the relevant information may be hidden MG 25th in March deep the'02list. Providing specific and relevant keywords to your search is idea in order to retrieve information • Specialized search engines • Similar to general search engine but is limited to specific web pages. Takes the concept of the clearinghouse but does more than just provide links to the documents. The specialized search engine provides the actual documents. These are handpicked that a user has selected as relevant to the topic. – The best search engines offer a simply query option where you type full sentences or question that describe your information needs. The engine with the most pages in the database is not necessarily the best search engine. The chart attached indicates the best search engines to date even though this is always subjected to change. There are many types of search engines ranging from general information to specialized information. The compilation of databases of documents and indexing of these documents provide users with on the spot result if queries are successful. MG 25th March '02 – Again, there are hundreds of search engines that specialize in different levels of searches. By using web spiders, search engines create and update their document databases automatically. A web spider is a computer program that searches for Web pages and collects, update, replace and renew old pages or find new web pages. This program keeps a list of all the Uniform Resource Locator’s (URL) and returns all the information to the search engine. There are numerous techniques used to index these documents and large search engines constantly run Web Spiders to index as much information as possible of the Web. There are many things on the Internet that are legitimate but quite a bit of the information found is also very unreliable. Therefore, it is best to be critical of some of the information received. The bottom line however is that in order to get satisfaction from search engine, users must be aware of the nature of their query. – phttp://www.submitcorner.com • http://webreference.com/search/background.html • http://ariade.ac.uk/issue10/search engines MG 25th March '02 • http://searchability.com/about.htm Crawling Yes No Deep Crawl All but... Excite Frames Support Image Maps robots.txt Meta Robots Tag Link Popularity Helps Deep Crawl Learns Frequency Paid Inclusion MG 25th March '02 All but... Excite, FAST AltaVista, Excite, FAST, Google, NLight Inktomi All n/a All but Excite n/a All n/a AltaVista, Excite, FAST, Google, Inktomi NLight AltaVista, Inktomi, FAST (coming 9/01) Excite, Google Notes Indexing Yes No Notes Some stop words may not be Full Body Text All n/a indexed AltaVist a, Excite, Inktomi , Stop Words Google FAST, NLight All Google, Meta Description but... NLight All Excite, FAST, Meta Keywords but... Google, NLight AltaVist a, Excite, FAST, ALT text Google Inktomi, NLight Comments Inktomi Stemming Ranking Others -- See Search Features Chart -- Yes No Notes AltaVista, Excite, FAST, Meta Tags Boost Ranking Google, Inktomi NLight Link Popularity Boosts Ranking Very important All n/a Boost Ranking HotBot Others Spam Yes No AltaVist Google, Inktomi, Meta Refresh a NLight Invisible Text Excite, FAST MG 25th March '02 Others AltaVist a, Excite, FAST, Tiny Text Inktomi NLight at Google Direct Hit Excite, FAST, Notes Web Search Engine Comparison Chart Search Engine Connector Terms (Boolean) Phrase Searchin g Search Modifiers Proximity Searching Truncation or Wildcards AltaVista Yes. Can use AND, OR, AND NOT, and (...) in Advanced Search. Default connector is AND. Yes. Put phrase in quotation marks. Can use + and - in the Simple Search. Not available in Advanced Search. Yes. Use NEAR to specify that terms be within 10 words of each other. Yes. Use the * for truncation or as a wildcard. Excite Yes. Can use AND, OR, AND NOT, and (...). Must be in ALL CAPS. Default connector is OR. Yes. Put phrase in quotation marks. Can use + and - to require or exclude terms. Proximity searching not available. Truncation not available. Google Yes. Can use AND. Default connector is AND. Yes. Put phrase in quotation marks. Can use + and - to require or exclude terms. Proximity searching not available. Truncation not available. HotBot Yes. Specify Boolean Phrase in the drop-down box. Can use AND, OR, NOT, and (...). Default connector is AND. Yes. Put phrase in quotation marks or specify in the dropdown box. Can use + and - to require or exclude terms. Proximity searching not available. Yes. Use the * for truncation or as a wildcard. InfoSeek Use of connector terms not available. Default connector is OR. Yes. Put phrase in quotation marks. Can use + and - to require or exclude terms. Proximity searching not available. Truncation not available. Lycos Pro Yes. Can use AND, OR, NOT, and (...). Can specify AND, OR in the drop-down box. Default connector is AND. Yes. Put phrase in quotation marks or specify in the dropdown box. Can use + and - to require or exclude terms. Proximity Truncation searching not available. available in the drop-down box. MG 25th March '02 Choose the Best Search for Your Information Needs • http://www.searchenginewatch.com/ • http://nuevaschool.org/~debbie/library/r esearh/adviceengine.html • » End MG 25th March '02