Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LIS618 lecture 1 Thomas Krichel 2004-01-2 Structure of talk • Recap on Boolean (aurally) • Before online searching • Working with DIALOG – – – – Overview Search command Bluesheets Basic and additional index before a search • what is purpose – brief overview – comprehensive search • What perspective on the topic – scholarly – technical – business – popular I before search • What type of information – – – – Fulltext Bibliographic Directory Numeric • Are there any known sources? – – – – Authors Journals Papers Conferences II before search • • • • III What are the language restrictions? What, if any, are the cost restrictions? How current need the data to be? How much of each record is required? Concept analysis • This is the art/science of taking the topic to search for and develop facets. Example “Internet filtering in Libraries” – Internet filter – Libraries – Controversy not technical issues • We may also need the think about the aim of the search. Search aims • a known needle in a known haystack • a known needle in an unknown haystack • an unknown needle in an unknown haystack • any needle in a haystack • the sharpest needle in a haystack • most of the sharpest needles in a haystack Search aims • • • • • • all the needles in a haystack affirmation of no needles in a haystack things like needles in a haystack is there a new needle in the haystack where are the haystacks needles, haystacks, anything types of searches • • • • • known-item searches negative searches selective dissemination of information topical or subject searches passage searching, where the user is only interested in part of the item search strategies I • Building block approach – Do a number of elementary searches – Combine the resulting sets with Boolean operators • This is what I did in the example in the previous lecture • Works only with the Boolean model search strategies II • Snowballing approach – Start with a very specific query – Think of other term that can be added to get more results – Stop when a reasonable number of results are achieved. • Not sure this really works well in practice. search strategies III • The successive fraction approach is the opposite of the snowballing approach – First search for a broad concept – Then repeat the query by adding various limiting factors. • Can work well if the IR system allows to repeat and edit queries. • But queries can become unwieldy. search strategies IV • Most specific facet first – Conduct concept analysis – Look for the most specific facet – Search that first, add others later • Presupposes that you have done a decent concept analysis. two steps in DIALOG • step one: select databases (aka files) to look at • step two: perform searches on the selected databases • You may wonder why one does not have one single step like in a search engine. Discuss. • today we concentrate on the second step working on selected files • We assume that we have selected database that we know and we look at the search interface on the selected database. • The database selection process is a bit more complicated, covered next week. • First, let us login and look at the command prompt. • Then we select the first database (file) with the begin command The begin command • As its name suggests, usually the first command. • begin number, number,… • selects files with numbers number • Once they are selected they can be searched. • Now select the ERIC "begin 1" • "Begin 1" can be abbreviated as "b 1" Substeps in the second step • Identify search terms • Use Dialog basic commands to conduct a search • View records online or print the results the 's' (select) command • Once issued the "begin" command to select a database, we issue the "s" command on the database. • "s query_terms" where query_terms are the query terms • This will search the index of selected database in full-text view for the query issued • It will not find any of the following: "an and by for from of the to with". They are stop words. connectors • If you want to use several keywords there are three ways – you can truncate search terms – you can build an expression by putting several keywords together. This is achieved by DIALOG's connectors. – you can combine several expressions with the use of Boolean operators • we will cover this is in turn now truncation of terms • Open Truncation – "select path?" retrieves all words that begin with path: paths, pathos, pathway, pathology • Controlled-Length Truncation – "select path??" retrieves the root and up to two additional characters: paths, pathos truncation of terms II • Embedded Character truncation can be used for variant spellings: – "select organi?ation" -> organization organisation – "select fib??board" -> fiberboard fibreboard • This truncation feature is also useful for searching for unusual plural forms: – "select wom?n" -> woman women • Apparently you can also do prefixes by putting the ? in the beginning. – "?mobile" -> automobile metamobile Use of connectors • Connectors are used to put several words together. • One instance where this is useful is when you have words that on their own mean different things. • For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages. example: terms related to "mate" What other terms to be used? – matear – matero – cebar – cebador – yerba – bombilla (drink mate) (mate drinker) (prepare mate) (mate preparer) (mate herb) (mate straw) connectors I • '(W)' requires terms to appear one after the other next to each other e.g. 'yerba(W)mate?' matches "yerba mate". • '(i W)' where i is an integer, means followed by at most i words, e.g. 'ceba?(3W)mate?' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate" connectors II • '(N)' requires terms to be next to each other e.g. 'yerba(N)mate?' matches "yerba mate" or "mate yerba". • '(i N)' where i is an integer, means proximity by at most i words, e.g. 'ceba?(3N)mate?' matches "cebar mate" or "matear con la cebadora". • '(S)' searches for the occurrence of connected terms in the same paragraph. using Boolean operators • In your query, you can combine several expressions with Boolean operators • Example: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION" • But I usually do not issue such fancy queries. executing several searches • there can be several searches done sequentially, and the results sets are saved by the system. • Each time the system assigns a set number, Si, • These can be combined in Boolean expressions, e.g. 's S1 or S2 and S3' • Remember that Boolean operations are set-theoretic! Boolean operators on sets • when using Booleans, be aware that "and" has higher precedence than "or". • Thus: a or b and c is not the same as (a or b) and c but it is a or (b and c) • use parenthesis when in doubt DS (display sets) • This command can be executed any time to review the sets that have been formed since the last B (begin) command. • This can be useful to review your search history. the target command • "target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set. • I have not seen details about how this subset is computed. • new result set is being formed. display: the type command type set/format/range • set is a result set • format is a format • range can be – start – end • start is a record number to start • end is a record number to end – all standard delivery formats • • • • • 2 -- full record except abstract 3 or medium – citation 5 or long – full except full text 6 or free – title and dialog number 8 or short – title plus indexing terms – useful to find other indexing terms • 9 or full – everything • KWIC or K – keywords in context options for delivery • I once tried to email results to me, to no avail • You can save the html of the search results in the browser. • You can print the results within the browser. http://openlib.org/home/krichel Thank you for your attention!