Download AAAI Proceedings Template

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

URL redirection wikipedia , lookup

Transcript
COPING WITH INFORMATION RETRIEVAL PROBLEMS
ON THE WEB: TOWARDS PERSONAL WEB WEAVER AGENTS
Mohamed Yassine El Amrani, Sylvain Delisle & Ismaïl Biskri *
Département de mathématiques et d'informatique
Université du Québec à Trois-Rivières,
C.P. 500, Trois-Rivières, Québec, Canada, G9A 5H7
{elamrani, biskri, delisle}@uqtr.uquebec.ca
http://www.uqtr.uquebec.ca/{~delisle/, ~biskri/}
Abstract
We present the salient features of a new research project in
which we are developing software that can help the user to
customise (subsets of) the WWW to her individual and
subjective information needs. The software’s architecture is
based on several natural language processing tools, some of
them numeric and some others linguistic. The software
involves a semi-automatic processing strategy in which the
user’s contribution ensures that the results are useful and
meaningful to her. These results are saved in various
databases that capture several types of information (e.g.
classes of related Web sites/pages, user profile, data
warehouse, lexicons, etc.) that will constitute a local and
personalised subset of the WWW, support the user’s
information retrieval tasks and make them more effective
and more productive.
1. Introduction
Whether one looks at the World Wide Web (abbreviated
here as WWW or Web) as the largest electronic library
ever built by man or as the biggest semi-structured
database of digital information available today, one has to
admit that the WWW has also generated unprecedented
levels of user frustration. Indeed, finding on the Web the
information that is relevant to one’s needs all too often
leads to one or both of these situations: either search results
are unrealistically too numerous (typically with
unacceptable levels of noise, i.e. with very low precision)
to be processed in a reasonable period of time, or search
results are suspiciously too few (typically due to
unacceptable levels of silence, i.e. with very low recall) to
be trusted. There are several interrelated reasons why
information retrieval (IR) on the Web is so difficult; let us
now consider them.
* The work described here is in progress. In particular, the
agent-based architecture is being implemented and is not
described here. However, further details will be available in
the final version of this paper and at the conference.
First, most WWW search engines are based on information
access schemes, in particular indexing, with which only
superficial searches may be performed. Accessing
information via an index is a simple and efficient technique
but it is not necessarily the best one for IR purposes, at
least from the user’s perspective. Typically, search engines
do not have enough “intelligence” to go beyond the surface
of the Web.
Second, the wide variety of data on the WWW and the lack
of more content-based access information (i.e. metadata)
associated with each document makes it difficult for
engines to search more intelligently the Web. The various
types of document encoding schemes, formats and
contents, combined to the heterogeneous and multimedia
nature (i.e. text, image, sound, etc.) of documents now
available on the WWW, significantly complicates the task
of making Web documents more intelligible by
computerised means.
Third, the incredible rate at which the Web has developed
and continues to do so, plus the number of communication
links, computers, documents, and users it now involves,
and all this within the context of an obvious lack of
standardisation and organisation, cast serious doubts on the
feasibility of an eventual project of “Web normalisation”.
The WWW has pretty much evolved as a dynamic
organism and will most likely keep on doing so in the near
future.
Fourth, despite certain hopes of normalisation and selfimposed control within the WWW community, an
important difficulty remains, a difficulty which we think is
often overlooked: when a user performs a search on the
WWW, whether she knows exactly what she is looking for
or not, she is actually trying to find a match between her
concepts (i.e. what she knows) and those available on the
Web (i.e. what the Web “knows”). As of today, this
mapping can essentially be established through the use of
keywords (indexing terms). However, if the available
keywords are not at the meeting point of the user’s
concepts and the concepts used in the relevant Web
documents, the user will find search engines’ results
desperately useless as they appear meaningless to her.
Some argue that in order to solve the IR problem on the
WWW, we should develop “brighter” search engines and
standardise all aspects of the Web, including how indexing
terms should be determined, how documents should be
categorised, what metadata should be associated with any
document, and even how any document should be encoded
(e.g. with XML). Maybe this is how it should be. However,
we think this is not how it will be in the foreseeable future.
We believe the WWW will continue to evolve almost
freely, subject to the democratic pressure of its large user
community, and will thus resist to any large scale
normalisation effort. If our hypothesis is right, if only for a
few more years, the conclusion is clear: Web users will
continue to experience frustration with search engines’
results for a good while!
The purpose of this paper is to present an alternative to the
current situation in which many WWW users are trapped:
the Web is not how they think it should be and they simply
cannot productively make use of current search engines’
results. So be it! Why not allow them to weave (i.e.
construct) themselves a meaningful subset of the WWW
especially tailored to their needs? In order to do that, we
propose a semi-automatic approach which involves the use
of natural language processing (NLP) tools. In Section 3,
we argue that the evaluation of Web documents with
regard to a user’s information needs is highly personal and
that the user must actively take part in the evaluation
(interpretation) process. The computerised tools are
designed to facilitate her IR endeavour by supporting the
task of identifying intersecting concepts between what she
knows and what the WWW has to offer, by supporting the
weaving of a personalised sub-WWW, and by learning
from the user’s interaction with the system in order to push
user assistance even further. Our work can be associated
with recent research related to the personalisation (or
customisation) of the Web and also to other recent work on
the use of NLP and Artificial Intelligence techniques for
the Web : we summarise such related work in the next
section.
2. Related Work
Due to space constraints, the review of related work will
only appear in the final version. However, our references
appear at the end of the paper.
3. The Personal Web Weaver Project
We now describe the main elements of our Personal Web
Weaver Project, which is currently in progress. Although
the project is still in its early stage, we have already
designed
and
developed
several
components—
implemented components were programmed in C++ and
Prolog. In continuation with some of our recent work on
hybrid models, in particular Biskri & Meunier (1998),
Biskri & Delisle (1999) and Biskri & Delisle (2000), this
project combines numerical and linguistic tools. The main
justification for such a hybrid approach is that numerical
tools quickly give indices about a text’s (corpus’) subject
matter, whereas linguistic tools use these indices to
perform further more expensive processing in order to
extract the most interesting information for a given user.
The quick numerical processing can be applied to a very
large text (corpus) in order to select subsets that seem to
deserve further detailed processing by the linguistic tools.
As we know all too well, the Web grows in size and
complexity at an incredible rate: every day thousands of
new Web pages are created. In a few years’ time, the
WWW has become an impressive source of information.
But how to get access to this information? Or, more
precisely, how to reach Web sites (and Web pages) that are
of interest to us, i.e. of interest to our individual needs?
The current default solution to this non trivial problem is to
perform a search on the WWW with the help of a search
engine. Typically, such a search is performed after the user
has specified a few keywords, most of the time less than
five and often only two or three. This deceptively simple
procedure hides several difficulties and subtleties. First, the
user often has only partial knowledge of the domain she is
searching or knows only partially what her exact
information needs are. Second, as a consequence of the
first point, the user does not know what keywords will best
represent the profile of her information needs. Third, many
search engines use indexes which are built automatically
with a list of “objective” or “standard” keywords. But these
keywords are anything but objective! Lots of words have
lots of different meanings or usages, depending on the
particular context or domain in which the appear. This is
the problem we consider in the Personal Web Weaver
Project: we present a hybrid software tool, comprised of
several components, that constitutes an environment for
tailoring (a subset of) the Web to the user’s information
needs. Let us now have a look at the overall processing
strategy which is organised in four main phases.
Phase #1 (fully automatic): An initial query is formulated
and submitted to a standard search engine, such as
AltaVista or Yahoo!. This initial query is most probably
sub-optimal, because of the reasons we mentioned just
above. But that is exactly our starting point: we only
expect the user to have a relatively good idea of what she is
looking for, not too vague but not too precise either. In
fact, we expect this initial query to subsume the exact
information she is hoping to find on the Web. In a
subsequent step, the user will be given the opportunity to
reconsider her query in the light of the search results
the WWW
original
query
user
numerical
classifier
new query
list of
Web page
adresses
standard Web
search engine
corpus made up of
segments (i.e. of
Web page contents)
corpus
contruction
process
classes of similar (or related) Web
sites & classes of co-occurring
words
Phase #1
Phase #2
the user picks new
words from the
classes and improves
the original query
not
OK
the user determines if
the results satisfy her
information needs
OK
final query &
final classes
Figure 1: From the user’s original query to a satisfying query (phases #1,2 of a 4-phase process)
returned for her original query. These search results are
typically Web addresses of sites containing information in
textual or other forms. Of course, there is the possibility
that the number of Web addresses returned by the search
engine will be very large, maybe too large for practical
reasons. We are planning to consider only a certain
percentage of these addresses (amongst the highest ranked)
when their number exceeds a certain upper limit, a limit
which remains to be determined by empirical evaluation.
This selection will be done, if necessary, by the “corpus
construction process” module (see Figure 1 above).
An important hypothesis in this project is the following:
we consider that each Web site represents a text segment.
Thus, a set of Web sites forms a set of text segments
which, taken altogether, can be looked at as a corpus—
every segment retains its identity and source. We then
submit this corpus to a numerical classifier (Meunier et al.,
1997; Rialle et al., 1998) that will help us identify
segments sharing lexical regularities. Assuming that such
similarities correspond to content similarities between the
text segments, thus between the Web sites, then related
segments will tend to be grouped in the same classes. In
other words, the classes produced by the numerical
classifier will tend to contain segments about the same
topic and, by the same token, will identify the lexical units
that tend to co-occur within these topics. And this is where
we have a first gain with respect to the original query: the
numerical classifier’s results will provide a list of
candidate query terms that are related to the user’s original
query and that will help her reformulate a new, more
precise, query (perhaps using some of the words in the
original query plus some others from step #1). Above,
Figure 1 depicts phase #1.
Phase #2 (semi-automatic): The classes of Web pages
obtained at the end of phase #1 are now considered as extra
or contextual information, relative to the user’s original
query, that will allow her to reformulate her query (if need
be). Indeed, words in classes of co-occurring words can act
as carrier of more selective or complementary sense when
compared with the keywords used in the original query.
For instance, such words will be more selective when the
original keywords are polysemous or when they have
multiples usages in different domains; these new words
will be complementary when they broaden the subset of
the WWW implied by the original query. Now, which of
these new words should be picked by the user to
reformulate her query? An obvious starting point if to pick
words that appear in classes in which the original query
keywords appear. At that point the user can formulate an
updated query from which she will obtain, through the
processing already presented in phase #1, another bunch of
classes containing similar (or related) Web pages and cooccurring words. Again, at the end of this step, the user
may discover new words that could guide her in a more
precise reformulation of her query. Eventually, after a few
iterations, the user will end up with a precise query
formulation that leads (via the Web addresses) to the
information she was looking for or, at least, with which she
is now satisfied. It should be clear that it entirely up to the
user to determine when this query reformulation iterative
process will end. This is a personal decision relative to a
personal information need via a personal evaluation
scheme: this is what subjectivity is all about and it must be
accounted for in customisable Web tools. Phase #2 is
depicted in the lower part of Figure 1 above.
Phase #3 (semi-automatic): At the end of phase #2, the
user has an improved request that satisfies her information
needs. It is time now to explore in further details the final
resulting segments produced by the numerical classifier.
This in-depth exploration will serve two main purposes: i)
to allow the user to construct (weave) her own personal
subset of the Web, including the construction of a
meaningful index; and ii) to extract information and
knowledge from these Web sites, much in the sense of data
mining. To assist the user during this phase, a variety of
NLP tools will be available, such as text summarisers, text
analysers, or, more simply, lexicon construction tools, that
can all be applied to entire segments or subsets of them
that are of particular interest to the user—we already have
several tools (e.g. Biskri & Delisle, 1999; Biskri &
Meunier, 1998; Biskri et al., 1997), some others are being
developed, and others are available on the WWW. Further
research is required to identify other potentially useful
tools and eventually develop them.
The extracted information will be saved in a database,
following the basic principles of data warehousing. This
information will be organised in a way that will facilitate
further processing in the context of this project. For
example, for the reformulation of her query (see phase #2),
the user may find useful to consult the database on
relationships between words, terms concepts and Web
addresses. In a sense, this database will constitute the basic
level of a memory-based learning mechanism that will
guide the user during her interaction with the system. The
database aspects of the project have numerous
ramifications, one of which is the evaluation of extended
SQLs that are especially developed for the processing of
textual data. This is, again, subject to further research.
Finally, the improved request will be used to update the
user’s profile kept by the system. User profiles have
several applications (e.g. information routing), the most
important of which we mentioned in our review of related
work (Section 2). Phase #3 is depicted in Figure 2 (not
included here due to space constraints).
Phase #4 (semi-automatic): It is in this phase that
personalisation (or, customisation) takes place. With the
system being developed within the Personal Web Weaver
Project, the user will be able to shape a subset of the
WWW according to her own personal vision. Web pages
that the user has explored (in the course of previous
phases) and decided to keep as truly useful and informative
will be indexed by her, according to her criteria. Indexing
will be done locally on her computer with the help of a set
of words, terms, proper nouns, etc., to which she can relate
in a meaningful way. At the end of this fourth phase, the
user will be able to directly consult Web pages that she
knows are useful to her, as well as to submit search queries
based on her own (local) personal index only. The
customised Web represents well her information needs and
can be seen as a projection of the WWW over the set of
attributes that come from a combination of the user’s
profile and the dynamic choices she made when exploring
the Web.
Of course, this customised Web comes at a certain price.
As is typical of semi-automatic approaches involving some
knowledge extraction mechanism, this cost is usually
relatively high at the beginning but tends to significantly
decrease in the long run—see Barker et al. (1998). In this
case, a semi-automatic “multi-extraction” process (see
Figure 3: not included here due to space constraints)
working on the textual content of the Web pages kept by
the user will produce a lexicon, a list of single- and multiword terms, a list of proper names, a list of acronyms, etc.
These lists will populate a term database that will be useful
during the personalised indexing process. Since it is the
user who selected the Web pages that were of interest to
her, the probability is high that the resulting terms (and
associated concepts) will be meaningful to her and thus
well adapted to her customised Web index. Obviously, not
all terms in that database will be used for indexing: here
again, the user’s supervision and decision will determine
the final outcome. Finally, in order to keep track of
changes in Web sites or pages that are on the user’s
personal index (i.e. in her customised Web), an update
process will regularly verify the URLs that appear in the
user’s index.
4. Conclusion
There are three important aspects that distinguish our work
when compared to others with similar objectives: 1) we
think that users should not expect too much from the Web
community in terms of self-regulation or generalised
development of tools that will significantly improve the
adaptability of the Web to individual information needs; 2)
we think that the very idea of developing an “objective”
Web is problematic; 3) we think that the idea that only
fully automatic Web tools are of any interest will prevent
the users from reaching many of their personal goals when
searching the Web. Such tools must make room for the
user’s subjectivity: this is a must in a process involving the
human in a personal reading or text interpretation activity.
All this justifies an approach such as the one put forward
here: let us give the tools to Web users that will allow
them to weave their own personal and subjective, but truly
meaningful, WWW. Only then will information retrieval
take another dimension from the user’s viewpoint, more
efficient, more effective and certainly more productive
than what it typically is now.
We have presented the main ideas of an new research
project called the Personal Web Weaver Project. The latter
is still at an early stage and it is much too soon to draw
conclusions at this point in time: a lot of work remains to
be done at the implementation level and an extensive
evaluation involving “real” users will have to be carried
out once the implementation is completed. One aspect of
evaluation could try to measure the gains that our software
can bring to the user, in the short and longer run, and make
a comparison with current tools. Several aspects of our
present design might also have to be reconsidered or
extended. For instance, user queries are currently
formulated with keywords only. But we are also interested
in testing queries based on: bookmarks, phrases, sentences
or paragraphs; syntactic and semantic markers; logical
representations; and even extended SQL queries in
situations where our software could be integrated in a
database environment. As well, several aspects of machine
learning seem quite promising to further investigate in the
context of the current system architecture. For instance,
the potential iteration from phase #4 to phase #1, guided
by the information already integrated in the user’s personal
Web, could intelligently support further extensions of the
user’s Web as well as provide relevant background for
new information retrieval queries. There is also the
possibility of integrating an interactive user feedback
mechanism that would allow the system to keep track of
successful operations during the overall Web
customisation process and thus better adapt to the user’s
information needs. At last, two other areas of research will
have to be investigated: user interface design (to make the
semi-automatic processing as easy to use as possible) and
multimedia adaptation (to process data in other forms than
text alone).
Acknowledgments. This research is supported by
l’UQTR’s internal funding program (FIR) and by the
Natural Sciences and Engineering Research Council of
Canada (NSERC).
5. References
Amati G., F. Crestani & F. Ubaldini (1997), “A Learning
System for Selective Dissemination of Information”, Proc. of
the 15th International Joint Conf. on Artificial Intelligence
(IJCAI-97), Nagoya, Japan, 23-29 August 1997, 764-769.
Anikina N., V. Golender, S. Kozhukhina, L. Vainer & B.
Zagatsky (1997), “REASON : NLP-based Search System for
the WWW Lingosense Ltd.”, 1997 AAAI Symposium on
Natural Language Processing for the World Wide Web, Menlo
Park, California, 24-26 March 1997, 1-9.
Arampatzis A.T., T. Tsoris & C.H.A. Koster (1997),
“IRENA : Information Retrieval Engine Based on Natural
Language Analysis”, Proc. of the Conf. on Computer-Assisted
Information Searching on Internet (RIAO-97), Montréal
(Québec), Canada, 25-27 June 1997, 159-175.
Barker K., S. Delisle & S. Szpakowicz (1998). “TestDriving TANKA : Evaluating a Semi-automatic System of
Text Analysis for Knowledge Acquisition”, 12th Biennial
Conference of the Canadian Society for Computational Studies
of Intelligence (CAI'98), Vancouver (B.C.), Canada, June 1820 1998, 60--71. Published in Lectures Notes in Artificial
Intelligence #1418, Springer.
Biskri I. & J.-G. Meunier (1998), “Vers un modèle hybride
pour le traitement de l’information lexicale dans les bases de
données textuelles”, Actes du Colloque International JADT-98,
Nice, France.
Biskri I. & S. Delisle (1999), “Un modèle hybride pour le
textual data mining - un mariage de raison entre le numérique
et le linguistique”, 6ème Conférence Annuelle sur le Traitement
Automatique des Langues (TALN-99), Cargèse, Corse, 12-17
juillet 1999, 55-64.
Biskri I. & S. Delisle (2000), “User-Relevant Access to
Textual Information Through Flexible Identification of Terms:
A Semi-Automatic Method and Software Based on a
Combination of N-Grams and Surface Linguistic Filters”, Actes
de la 6ème Conférence RIAO-2000 (Content-Based Multimedia
Information Access), Paris (France), 12-14 avril 2000, to
appear.
Biskri I., C. Jouis, F. Le Priol, J.-P. Desclés, J.-G. Meunier
& W. Mustafa Elhadi (1997), “Outil d'aide à la fouille
documentaire : approche hybride numérique linguistique”,
Actes du colloque international FRACTAL 1997, Linguistique
et Informatique : Théories et Outils pour le Traitement
Automatique des Langues, Revue Annuelle BULAG - Année
1996-1997 – Numéro Hors-Série, Besançon, France, 35-43.
Chandrasekar R. & B. Srinivas (1997), “Using Syntactic
Information in Document Filtering : A Comparative Study Of
Part-Of-Speech Tagging And Supertagging”, Proc. of the Conf.
on Computer-Assisted Information Searching on Internet
(RIAO-97), Montréal (Québec), Canada, 25-27 June 1997, 531545.
Ciravegna F., A. Lavelli, N. Mana, J. Matiasek, L.
Gilardoni, S. Mazza, M. Ferraro, W.J. Black, F. Rinaldi
& D. Mowatt (1999), “FACILE: Classifying Texts
Integrating Pattern Matching and Information Extraction”,
Proc. of the 16th International Joint Conf. on Artificial
Intelligence (IJCAI-99), Stockholm, Sweden, 31 July-6 August
1999, 890-895.
Cohen W.W. & D. Kudenko (1997), “Transferring and
Retraining Learned Information Filters”, Proc. of the 14th
National Conf. on Artificial Intelligence (AAAI-97), Providence
(Rhode Island), USA, July 27-31 1997, 583-590.
Corvaisier F., A. Mille & J.M. Pinon (1997), “Information
Retrieval on The World Wide Web Using a Decision Making
System”, Proc. of the Conf. on Computer-Assisted Information
Searching on Internet (RIAO-97), Montréal (Québec), Canada,
25-27 June 1997, 284-295.
Craven, M., D. DiPasquo, D. Freitag, A. McCallum, T.
Mitchell, K. Nigam & S. Slattery (1998), “Learning to
Extract Symbolic Knowledge from the World Wide Web”,
Proc. of the 15th National Conf. on Artificial Intelligence
(AAAI-98), Madison, Wisconsin, USA, July 26-30 1998, 509516.
DiMarco C. & M.E. Foster (1997), “The Automated
Generation of Web Documents that Are Tailored to the
Individual Reader”, 1997 AAAI Symposium on Natural
Language Processing for the World Wide Web, Menlo Park,
California, 24-26 March 1997, 44-53.
Gaizauskas R. & A.M. Robertson (1997), “Coupling
Information Retrieval and Information Extraction : A New
Text Technology for Gathering Information from the Web”,
Proc. of the Conf. on Computer-Assisted Information
Searching on Internet (RIAO-97), Montréal (Québec), Canada,
25-27 June 1997, 356-372.
Gaizauskas R. & Y. Wilks (1998), “Information Extraction:
Beyond Document Retrieval”, Journal of Documentation,
54(1), 70-105.
Geldof S. & W. van de Velde (1997), “Context-Sensitive
Hypertext Generation”, 1997 AAAI Symposium on Natural
Language Processing for the World Wide Web, Menlo Park,
California, 24-26 March 1997, 54-61.
Joachims T., D. Freitag & T. Mitchell (1997), “Webwatcher
: A Tour Guide for the World Wide Web”, Proc. of the 15th
International Joint Conf. on Artificial Intelligence (IJCAI-97),
Nagoya, Japan, 23-29 August 1997, 770-777.
Jouis C., W. Mustafa el-Hadi & V. Rialle (1998),
“COGNIWEB, Modélisation hybride linguistique et numérique
pour un outil de filtrage d’informations sur les réseaux”, Actes
de la Rencontre Internationale sur l’Extraction, le Filtrage et
le Résumé Automatique (RIFRA-98), Sfax, Tunisie, 11-14
novembre 1998, 191-203.
Katz B. (1997), “Annotating the World Wide Web Using
Natural Language”, Proc. of the Conf. on Computer-Assisted
Information Searching on Internet (RIAO-97), Montréal
(Québec), Canada, 25-27 June 1997, 136-158.
Kindo T., H. Yoshida, T. Morimoto & T. Watanabe
(1997), “Adaptive Personal Information Filtering System that
Organizes Personal Profiles Automatically”, Proc. of the 15th
International Joint Conf. on Artificial Intelligence (IJCAI-97),
Nagoya, Japan, 23-29 August 1997, 716-721.
Leong H., S. Kapur & O. de Vel (1997), “Text
Summarization for Knowledge Filtering Agents in Distributed
Heterogeneous Environments”, 1997 AAAI Symposium on
Natural Language Processing for the World Wide Web, Menlo
Park, California, 24-26 March 1997, 87-94.
Meunier J.-G., I. Biskri, G. Nault & M. Nyongwa (1997),
“ALADIN et le traitement connexionniste de l’analyse
terminologique”, Proc. of the Conf. on Computer-Assisted
Information Searching on Internet (RIAO-97), Montréal
(Québec), Canada, 25-27 June 1997, 661-664.
Ohsawa Y. & M. Yachida (1997), “An Index Navigator for
Understanding and Expressing User’s Coherent Interest”, Proc.
of the 15th International Joint Conf. on Artificial Intelligence
(IJCAI-97), Nagoya, Japan, 23-29 August 1997, 722-728.
Perkowitz, M. & O. Etzioni (1998), “Adaptive Web Sites:
Automatically Synthesizing Web Pages”, Proc. of the 15th
National Conf. on Artificial Intelligence (AAAI-98), Madison,
Wisconsin, USA, July 26-30 1998, 727-732.
Pustejovsky J., B. Boguraev, M. Verhagen, P. Buitelaar &
M. Johnston (1997), “Semantic Indexing and Typed
Hyperlinking”, 1997 AAAI Symposium on Natural Language
Processing for the World Wide Web, Menlo Park, California,
24-26 March 1997, 120-128.
Rialle V., J.-G. Meunier, S. Oussedik, I. Biskri & G. Nault
(1998), “Application de l'algorithmique génétique à l'analyse
terminologique”, Acte du Colloque International JADT 98,
Nice, France.
Salton G. & M. McGill (1983), Introduction to Modern
Information Retrieval, McGraw-Hill.
Salton G. (1989), Automatic Text Processing: The
Transformation, Analysis and Retrieval of Information by
Computer, Addison-Wesley.
Sugimoto M., N. Katayama & A. Takasu (1997),
“COSPEX: A System for Constructing Private Digital
Libraries”, Proc. of the 15th International Joint Conf. on
Artificial Intelligence (IJCAI-97), Nagoya, Japan, 23-29
August 1997, 738-744.
Zaiane O.R., A. Fall, S. Rochefort, V. Dahl & P. Tarau
(1997), “On-Line Resource Discovery Using Natural
Language”, Proc. of the Conf. on Computer-Assisted
Information Searching on Internet (RIAO-97), Montréal
(Québec), Canada, 25-27 June 1997, 336-355.
Zarri G.P. (1997), “Natural Language Indexing of Multimedia
Objects in the Context of a WWW Distance Learning
Environment”, 1997 AAAI Symposium on Natural Language
Processing for the World Wide Web, Menlo Park, California,
24-26 March 1997, 155-158.