Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Finding Information on the Information Highway How to get around in the Internet Finding information on the information highway  the Internet vs the World Wide Web  Search engines  Subject  Online directories databases  Boolean searches Aren’t the Internet and the Web the same thing? the Internet  Think of the Internet as the physical components necessary to build a [massive] computer network (nodes, cables, servers, gateways, routers, firewalls, etc.). the World Wide Web  Think of the web as all the services (i.e. email, webpages, file transfers, etc.) available over the Internet; each service requires its own protocol (SMTP, HTTP, FTP, etc.). The primary internet protocols: Transmission Control Protocol Internet Protocol File Transfers File Transfer Protocol Web Pages HyperText Transfer Protocol eMail Simple Mail Transfer Protocol Post Office Protocol Finding information on the Internet: Search Engines Search engines are comprised of 3 basic components A spider aka: crawler/bot program that crawls across the Web collecting info A database organized by an indexer program Search engine software pulls hits based on your inquiry The search process: The user enters key words or phrases The search engine “spider” searches the database index to find matching items The software returns “Hits” (results). [The hits are prioritized according to multiple factors] A search engine is just a program, nothing validates or authenticates the results; no human review takes place. Why does the same search in different search engines get different results?  Each search engine uses different algorithms or “spiders.”   Each engine has a different method for ranking or relevance:   The hits are dependent on database content. These might be based on factors such as:  Frequency: How many times do the words occur in the website?  Location: Are keywords contained in the URL or the site name? Each engine may search different sites.  Is the search being conducted across the entire web?  Is this a specialty search? Bottom line? Use more than one search engine to perform research! Some of the factors search engines may use to rank results: Factors based on the site itself Factors based on external criteria  Frequency  Link popularity  Location  Click popularity  Page count  Demographics  Website structure  Alliances  $$$ (who pays the most to have their sites shown) What are “metasearch” engines?  Search engines that search search engines instead of individual websites  Think how much wider the search area is! Finding information on the Internet: Subject Directories How do subject directories differ from search engines?  Utilize the human element to categorize  Typically more commercial/consumer oriented  “Drill-down” search by subject, not keywords  Hierarchical organization  Topics  Subtopics A great resource on subject directories: http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SubjDirectories.html Finding information on the Internet: Online Databases Online databases are referred to as the hidden Internet or deep web  Online databases provide access to resources outside the reach of web crawlers or search spiders:  Newspapers, journals, periodicals  Academic papers, white papers  Corporate data and specialty data About Yahoo! Library Index A great resource for databases: http://www.itc.nl/Pub/Home/library/Library-generalinformation/more-info-databases/lii_info.html Making the most of your search Do your searches end up returning an overwhelming number of hits? Use Boolean operators to “tweak” them! The basic Boolean operators: Examples of how Boolean operators affect your search: example returns car AND ford Documents containing BOTH the words car and ford (AND is assumed when 2 words are used) car OR ford Documents containing either word and both words OR results in the greatest number of hits car NOT ford Documents containing car that do NOT contain ford NOT generally returns the smallest number of hits Ways to further refine your search: example returns combinations Documents containing BOTH the words car (car AND ford) and ford but nothing about President Gerald NOT Gerald Ford Quotes “Men In Black” Documents containing the exact string of words within the quotes, not any occurrence of any of the words Wildcard (*) Bio* Documents that contain any words that begin with the letters bio – (biology, biography, biotech, etc) Wildcard (%, ?) Smithw%ck % stands for any letter – great when a word may be spelled in a variety of ways. Smithwick - Smithwyck - Smithweck More considerations (these may vary based on the search engine used) example returns stopwords Ignored by the search engine – (a, an, the, of, by, with, for, to, etc.) keys (not concepts) Break phrases down into keywords (TQM in manufacturing assembly lines) (total quality management, TQM, production, manufacturing, assembly line production, etc.) proximity operators variety designate how close keywords should be Change spellings; try abbreviations, singular/plural forms, related terms, synonyms