Download Term lists Taxonomies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Part 3B: Text Indexing, Term Lists & Taxonomies
Value space continuum of expressivity…
Thesauri
Text indexing
Term lists
Ontology
Faceted
Classification
Less
More
Taxonomies
Tagging
Enumerated
Classification
Analytico-synthetic
Classification
Increasing control over form, relationships and meaning…
Text Indexing
▪ Full-text and inverted files/indexes
Inverted files…

Primary form of index developed for use in information systems for full-text
retrieval

It is called an “inverted file” because the normal rows (documents) and
columns (words) of a database are inverted with rows representing words
and columns representing documents.
Example inverted file…
Main Data
File
ID
HOUSE
PRICE
1
1208 Twin Oaks Way
$100,000
2
100 Sutton Heights
$200,000
3
10 Pine Street
$150,000
4
8539 Billings Circle
$100,000
5
9537 Highway 101 North
$100,000
6
10 Capitol Hill Avenue North
$150,000
Inverted File
or Inverted
Index
$100,000
1
4
$150,000
3
6
$200,000
2
5
Inverted file (document level)…
Document
Text
1
2
3
4
Gold silver truck
Shipment of gold damaged in a fire
Delivery of silver arrived in a silver truck
Shipment of gold arrived in a truck
Number
Term
1
2
3
4
5
6
7
8
9
10
11
a
arrived
damaged
delivery
fire
Gold
of
in
shipment
silver
truck
Times; Documents
<3; 2,3,4>
<2; 3,4>
<1; 2>
<1; 3>
<1; 2>
<3; 1,2,4>
<3; 2,3,4>
<3; 2,3,4>
<2; 2,4>
<2; 1,3>
<3; 1,3,4>
Inverted file (term-level)…
Document
Proximity
operator
support
Text
1
2
3
4
Gold silver truck
Shipment of gold damaged in a fire
Delivery of silver arrived in a silver truck
Shipment of gold arrived in a truck
Number
Term
1
2
3
4
5
6
7
8
9
10
11
a
arrived
damaged
delivery
fire
gold
of
in
shipment
silver
Truck
Times; Documents Words
<3; (2;6),(3;6),(4;6)>
<2; (3;4),(4;4)>
<1; (2;4)>
<1; (3;1)>
<1; (2;7)>
<3; (1;1),(2;3),(4;3)>
<3; (2;2),(3;2),(4;2)>
<3; (2;5),(3;5),(4;5)>
<2; (2;1),(4;1)>
<2; (1;2),(3;3,7)>
<3; (1;3),(3;8),(4;7)>>
Inverted file (document level)…
Document
Stop
words
Text
1
2
3
4
Gold silver truck
Shipment of gold damaged in a fire
Delivery of silver arrived in a silver truck
Shipment of gold arrived in a truck
Number
Term
1
2
3
4
5
6
7
8
9
10
11
a
arrived
damaged
delivery
fire
Gold
of
in
shipment
silver
truck
Times; Documents
<3; 2,3,4>
<2; 3,4>
<1; 2>
<1; 3>
<1; 2>
<3; 1,2,4>
<3; 2,3,4>
<3; 2,3,4>
<2; 2,4>
<2; 1,3>
<3; 1,3,4>
Term Lists
Term lists…

The simplest forms of controlled value spaces are term
lists—lists of controlled terms ordered by some principle
(frequently alphabetical)





Infants
Ankle biters
Rug rats
Infants (preferred term)
The list of authorized U.S. state abbreviations
An alphabetic list of enumerated subject terms
Simple (yet powerful) lists…

A list (also sometimes called a pick list) is a limited set of terms arranged as a
simple alphabetical list or in some other logically evident way. Lists are used to
describe aspects of entities that have a limited number of possibilities. Examples
include geography (e.g., country, state, city), language (e.g., English, French,
Swedish), or format (e.g., text, image, sound)
Simple alphabetical list:
Alabama
Alaska
Arkansas
California
Connecticut
Delaware
Simple logical list:
Mercury
Venus
Earth
Mars
Jupiter
Saturn
Uranus
Neptune
Pluto*
Taxonomies
▪ Yahoo! Directory
Dominant form on the Web…

Hierarchical tree structure


Example: Yahoo! Directory
Frequently permit polyhierarchy (multiple parents)

No general principles guiding design of taxonomies

“A collection of controlled vocabulary terms organized into a
hierarchical structure. Each term in a taxonomy is in one or
more parent/child (broader/narrower) relationships to other
terms in the taxonomy.” [NISO/Z39.19] [emphasis added]
Polyhierarchy
Polyhierarchy… [NISO/Z39.19]
musical instruments

Based on generic
relationship
stringed instruments
percussion instruments
piano

Based on whole-part
relationship
biology
chemistry
biochemistry

Based on multiple
types of relationship
bones
head
skull
Node Labels
milk .
. <milk by source animal>
.. buffalo milk
.. cow milk
.. goat milk
.. sheep milk
. <milk by region>
.. United States
.. India
..China
Non-indexable concepts
used for purposes of
organizing other concepts in
meaningful ways
End
• Part 3B: Text Indexing, Term Lists & Taxonomies
Related documents