Download Full Text – Connection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

PL/SQL wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
Full Text – Connection
Copyright © 2016 Lexmark. All rights reserved.
Lexmark is a trademark of Lexmark International, Inc., registered in the U.S. and/or other countries. All other trademarks
are the property of their respective owners. No part of this publication may be reproduced, stored, or transmitted in any
form without the prior written permission of Lexmark.
Table of Contents
1
Introduction .........................................................................................................
1.1
Storing in the Full Text Database ...................................................................
1.2
Changing a Full Text Database .......................................................................
1.3
Increasing Performance ..................................................................................
2
Using Full Text .....................................................................................................
2.1
Copying of Texts from an Application File in a Full Text Field .......................
2.2
Displaying Hit Ratios in Result List ...............................................................
2.3
Displaying the Full Text Field in Result List ...................................................
2.4
Highlighting of Hits in the Index Form .........................................................
2.4.1 Characteristics of dtSearch .............................................................................
3
dtSearch ...............................................................................................................
3.1
Setup of Full Text Databases ..........................................................................
3.2
Configuration ...................................................................................................
3.2.1 Activating dtSearch .........................................................................................
3.2.2 Definition and Configuration of Full Text Fields ............................................
3.2.3 Entries in PROGRAM.INI ...............................................................................
3.3
Search with Operators ....................................................................................
3.4
Search with Wildcards ....................................................................................
3.5
Using Dictionaries ...........................................................................................
3.6
Extended Search Options ...............................................................................
4 Oracle Full Text ....................................................................................................
4.1
Introduction – Oracle Full Text ......................................................................
4.2
Setting up the Oracle Full Text .......................................................................
4.3
Searching using Wildcards .............................................................................
5
SQL Server Full Text ............................................................................................
5.1
Introduction – SQL Server Full Text ...............................................................
5.2
Setting Up the SQL Server Full Text ...............................................................
5.2.1 Requirements ...................................................................................................
5.2.2 Installation of the SQL Full Text .....................................................................
5.2.3 Maintenance of the SQL Full Text ..................................................................
5.3
Using the SQL Full Text ..................................................................................
5.3.1 Searching for Terms with Boolean Operators ...............................................
5.3.2 Searching for Terms Using Proximity Operators ...........................................
5.3.3 Emphasis of Terms .........................................................................................
5.3.4 Search for Phrases ..........................................................................................
5.3.5 Using Stop Word Lists ....................................................................................
5.3.6 Search with Prefixes ........................................................................................
2
2
2
2
3
3
3
4
4
5
5
5
5
5
5
5
6
6
7
7
8
8
8
8
8
8
9
9
9
9
9
10
10
10
11
11
11
2
Full Text – Connection
1
Introduction
You can enter texts almost unlimited in length (up to 2 GB) in a full text field. You also have considerably
extended search options compared to the search via entry fields. Therefore, you have a comfortable and
powerful method to quickly and easily find documents even in large and unstructured data volumes.
Full text fields offer a big function set and multiple configuration options. This chapter describes all
information about full text-fields in depth.
i
The full text function is NOT available for Linux or Solaris. Furtheron SAPERION does not support
Unicode languages (e.g. Chinese, Japanese, Korean) for full text databases. Even if the full text
database itself supports these languages, this support cannot be utilized by SAPERION.
1.1
Storing in the Full Text Database
When you use full text fields, the information are not saved in the database tables of the search database,
but are saved word by word in special full text databases. When you enter a query, the full text database
and the search database are searched separately. The individual search results are then put together and
displayed as the overall result in the result list.
1.2
Changing a Full Text Database
Since the saving structures of the individual full text databases are normally quite different, a direct
conversion is not possible. As the import and export functions are only limited, changing the full text
database is done by reorganizing the media.
During a full text search an export of the database is not possible, depending on the full text database
being used.
!
You can normally change to another full text database only by reorganizing the media. The time
this operation takes should not be underestimated.
1.3
Increasing Performance
The performance of full text queries can be increased with parameter "MaxUIDsPerSearch" in section
[full text]. This parameter is only effective when external databases like dtSearch are used.
[full text]
2
Using Full Text
MaxUIDsPerSearch=500 (default 40, max. 1000)
2
Using Full Text
To use the full text function, you have to define an index field as full text field (Text-Retrieval).
Fig. 2–1: Definition of a full text field
!
When you use SQL full text (MS-SQL, ORACLE), only one full text field per DDC is to be used.
2.1
Copying of Texts from an Application File in a Full Text Field
In full text fields, you can use the special system variable "FileText" for default values. This copies the
content of an application file in a full text field automatically during indexing.
SAPERION supports the following file types for this operation: BAT, CMD, CSV, DIF, DOC, DOT, HTM,
HTML, INI, LOG, OBD, OBT, PDF, POT, PPT, RTF, SAM, SLK, TXT, WPF, WRI, XLS, XLT. You can configure
the copy operation for special file types by means of suitable filters. The following example copies only
text from files with the file extension ".TXT":
Pos(".txt" , ""+FileName)>0 ? FileText : ""
i
2.2
Texts in a full text field are ANSI texts, according to the set system standard codepage. Thus, also
Russian texts can be processed, for example.
Displaying Hit Ratios in Result List
During installation of the full text search and when a full text field exists in the definition, you can enter
two additional variables in the result list. These variables display the number of hits in a document
(TRHits) or a graphic percentage of the document relevance (TRHitGraph). For the calculation of the
percentage the number of hits is compared with the maximum number of hits of all found documents.
Documents with 100% also have the highest hit ratios.
3
4
Characteristics of SQL full text
For SQL full text, the system field "SYSRELEVANCE" is required for the display of hit ratios (TRHits resp.
TRHitGraph). If this field is not existing, hit ratios cannot be displayed.
By default, the hit with the highest relevance is normalized to 100 and the other hits are adjusted to this
calculation. If this normalization on 100 is not desired, you have to set the following INI entry:
[Compatibility]
SQLfull textRanking=FALSE
2.3
Displaying the Full Text Field in Result List
Since the data in full text databases are only present word by word, they cannot be displayed in the result
list. But there is a trick which enables you to display at least part of the full text field:
Define a so-called "mirror field" as data type "Character" in a useful text length (e.g. 100 characters) in
the DDC file. This field only has the purpose to accept part of the full text field as a copy. If you enter
for example:
Copy(@Comment,1,100)
in the index form as default value in this field, the first 100 characters of the full text fields "Comment" are
synchronized automatically with the mirror field. Since you do not enter anything in this field, it makes
sense to hide it in the form.
2.4
Highlighting of Hits in the Index Form
You can display the index data of a document found via the full text search in the result list by clicking
[Index]. During the full text search, all hits are highlighted in the full text field which corresponds to the
search criteria.
Enter the following in the PROGRAM.INI resp. the local ARCHIEF.INI of the respective client so the
highlighting is only carried out in full text fields:
Example
[Setup]
OnlyFulltextWordMarking=TRUE
Highlighting of hits
Parameter
Description
OnlyfulltextWordMarking
TRUE = Highlighting only carried out in full text fields (default: FALSE)
3
dtSearch
5
Characteristics of dtSearch
In some cases the highlighting of hits in dtSearch is not active. If you only require a simple search like
with SQL full text, you can activate it via the following INI switch:
[full text]
SimpleLocationSearch=TRUE
3
dtSearch
3.1
Setup of Full Text Databases
When index data are saved, the content of a full text field is written to the medium completely. However, to
achieve better performance, this content is only written to the search database word by word. SAPERION
saves this word list after each word of the full text field has been synchronized with the dictionaries where
applicable (and the words have been filtered (see below).
Full text fields are by default quite demanding on memory (20 to 100 byte per word). Therefore, you
should enter not too many, but appropriate words during indexing resp. set up dictionaries and filters
so that only relevant entries are copied to the search database.
When full text is searched with dtSearch, the full text data are saved in a special full text database in
separate sub-directories (in the DBS directory of SAPERION). They cannot be copied to a SQL database.
The name of the sub-directory consists of the table name (DDC file) and a continuous numbering. During
queries, SAPERION executes two queries internally and then connects both results (Cursor).
3.2
Configuration
Activating dtSearch
To activate dtSearch as a full text engine, make the following entry in the PROGRAM.INI:
[Modules32]
Retrieval=DTSEAR32.RET
Definition and Configuration of Full Text Fields
After you have defined index fields of data type "full text", you can already enter default values for extended
search conditions when you define query forms for these fields. To minimize the memory usage of the
search database, you can use dictionaries (see below) and setup an additional filter.
Entries in PROGRAM.INI
Section [DtSearch] contains settings for the full text database DtSearch.
Example
[DtSearch]
test=TRUE
6
Log=<Drive>:\<Path>\<File name of log file>
AccentSensitive=FALSE
MergeLimit=100
ShutDownAfterMerge=TRUE
[DtSearch] section
Parameter
Description
test
TRUE = Activates the superordinate SAPERION log
log
Activates the dtSearch log
AccentSensitive
TRUE = Activates accent-insensitive creation of full text index
MergeLimit
Maximum amount of documents (Insert Index)
ShutDownAfterMerge
TRUE = Shutdown after connecting the index
This switch is set to TRUE by default. If there are problems with the index, this switch can
be set to FALSE
3.3
Search with Operators
During installation of the full text search you can connect your search criteria with the Boolean operators
AND, OR, NOT or the "word proximity" operator.
+
AND only finds documents which contain both desired words. The search criteria "Arch* AND
Construction" finds all documents with an indexing containing words starting with "Arch" and the
word "Construction".
+
OR finds all documents which contain at least one of the words (e.g. "Arch* OR "Construction").
+
NOT finds documents that do not contain the word. "NOT Archive" therefore finds all documents
which do not contain the word "Archive".
+
The word proximity operator W/[n] finds documents in which both words only have a certain amount
of other words between them. Example: For "Eva W/2 Smith" there must only be a maximum of
two words between "Smith" and "Eva", thus "Eva Anna Smith" and "Eva Smith" are found (but "Eva
Anna Theresa Smith" is not found).
You can structure more complicated queries as you desire for example "(Architecture OR Construction)
AND NOT Archive". Here, all documents are found which contain the words "Architecture" or
"Construction", but must not contain the word "Archive".
i
3.4
If you include several full text fields in your search, you have to connect them with an OR link. An
AND link is not possible due to technical reasons.
Search with Wildcards
When installing the true full-text search, you can use wildcards such as "?" (which represents exactly one
letter) and "*" (any amount of letters) in the search criteria.
With "Arch*" as the search criterion, documents are found containing words such as "Archive",
"archiving", "Architecture" etc.
3
dtSearch
i
7
Searching with wildcards at the beginning of words only works if phonetic search is deactivated.
Uppercase and lowercase are not distinguished.
3.5
Using Dictionaries
To reduce the memory requirements for the word lists in the search databases, you can use dictionaries
for full text searches. You can access these dictionaries using the [Options][Dictionaries] menu
command.
By doing so, when indexing a full text field, the system will insert only those words into the full-text
database, that are not available in a stop word list (words to ignore). The system instantly activates this
evaluation when there are entries in this "Stop words" dictionary.
!
When using dtSearch, stop words must be defined in lower case. Capitalized stop words are not
evaluated.
In addition, you can maintain a thesaurus (synonym dictionary) and view words having a similar meaning
under a "master" word. You can activate the "Thesaurus" option while defining index forms. The system
assesses these synonyms during a query and therefore you must activate them in the query form.
Furthermore the true full text search also includes derivations and forms of a word, depending on the
full-text database used, so that you do not need to append them to the dictionary list.
3.6
Extended Search Options
By using the full text search methodology, you have the option of more sophisticated search possibilities.
Extended search options
Option
Description
Phonetic search
If activated the sound of words are compared.
Fuzzy search
If activated a certain amount of letters within a word may be different. The amount can be
defined directly after the fuzzy search option (normally one to two characters are reasonable). The fuzzy search option is e.g, for texts recognized by OCR very useful.
Root word
Using the "Root Word" option, variations of a word (plural, genitive, word combinations,
etc.) are included. If you use this option to search for "Handle", you will possibly only find
documents containing "Handbag". Therefore, you should first test this option to see whether the results will be useful.
Natural language
This option allows the formulation of a query in standard of English (e.g. "Please find documents which are including the term hand").
Synonyms
If this option is activated all terms from the synonym dictionary are considered automatically.
8
You can pre-set every single option in formulas, but you can also activate or deactivate them using the
right mouse button during a query. With the exception of the "Synonyms" option, the search result is
not always predictable.
4
Oracle Full Text
4.1
Introduction – Oracle Full Text
The Oracle full text option makes it possible to use Oracle for full text fields in SAPERION. In contrast
to external full text engines such as dtSearch, the full text is saved together with the index data in the
SQL database.
4.2
Setting up the Oracle Full Text
In order to use Oracle full text, the following conditions must be met:
Oracle
+
The Oracle full text option must be installed.
+
The ODBC user needs following permissions on database: CTXAPP and RESOURCE
+
The ODBC user needs also the right "Create Any Index".
SAPERION
+
The "Full text" box must be checked for the data source that you wish to use.
+
A full-text field must be created in the DDC file.
!
4.3
Only one full text field can be used for each DDC. A linked full-text search (AND, OR, …) functions
only if the Boolean operators are entered in the query form.
Searching using Wildcards
You can use wildcards at any position in the text and uppercase and lowercase are not distinguished.
For Oracle full text the character "%" must be used for the wildcard search (instead of "*").
5
SQL Server Full Text
5.1
Introduction – SQL Server Full Text
The SQL Server full text option allows you to use the Microsoft Search Service for full text fields in
SAPERION. Contrary to external full text engines such as dtSearch, the full text is saved in the SQL
database along with the index data.
5
SQL Server Full Text
5.2
Setting Up the SQL Server Full Text
Requirements
+
Full text indexing must be activated for the current database with the database command "EXEC
sp_fulltext_database 'enable'".
For more information, see http://msdn2.microsoft.com/de-de/library/ms190321.aspx
i
If this command is executed when full-text catalogs already exist in the database, these
catalogs will be removed from the database.
Installation of the SQL Full Text
The full text search (the Microsoft Search Service) is provided automatically with the standard installation
of Microsoft SQL Server 2000 Standard or Enterprise edition. The service can also be installed later,
however.
Maintenance of the SQL Full Text
A full-text catalog is created for every DDC. Catalog names are made up of the prefix "CA_” and the name
of the DDC. SAPERION creates full-text catalogs with modification logs and the option refreshing in the
background, which allows modifications to be transferred directly to the full-text index.
SQL Server tools, such as those in the Enterprise Manager, are used to maintain the Full-Text. The full-text
catalog can be refilled the procedure for refreshing can be modified, etc. For a detailed description of
the possibilities, see the SQL Server documentation.
5.3
Using the SQL Full Text
In order to use the functions of the SQL full text, an index field must be defined as a full text field
(Text-Retrieval).
If you wish the relevance of a hit to be displayed in the result list, the SYSRELEVANCE field, which is of
the whole number type, must be defined.
The SQL Server full text for an ODBC data source is activated in the "ODBC Data Source Properties"
dialog by means of activating the checkbox "Full text".
9
10
Fig. 5–1: Activation of the option "Full text"
In the classic full text search, simple or complex linking of search terms can be accomplished with the
use of Boolean and proximity operators. Parenthetical expressions can be used to enter complex search
queries, e.g., (cheese OR sausage) AND (bread OR cold cuts).
Searching for Terms with Boolean Operators
Search terms can be linked with Boolean operators:
+
AND
This operator ensures that both search terms are contained in the found documents.
+
OR
With this operator, at least one of the entered search terms will be contained in the found
documents.
+
AND NOT
This operator ensures that the search terms following it are not contained in the found documents.
Operators of the same type are associative, and their order does not affect the search result. AND
operators are evaluated before OR operators.
!
A linked full-text search (AND, OR, …) functions only if the Boolean operators are entered in the
query form.
Searching for Terms Using Proximity Operators
The proximity operator NEAR allows searches in which the weighting of search results is influenced by
how close two search terms are located to one another in the text. Logically, the results correspond to
an AND search, i.e., even texts containing terms that are far from one another will be found in the data
record, but such hits are assigned a correspondingly low relevance.
Emphasis of Terms
Search terms can be weighted differently with the use of the ISABOUT key word. The weight is indicated
as a value between 0.0 and 1.0. Logically, the individual terms of the query are linked with OR.
5
SQL Server Full Text
Example
ISABOUT( Core weight (0.8), "Sap*” weight (0.2) )
Search for Phrases
Phrase searches are searches for several words that follow one another in a specific order. The words
are entered enclosed in quotation marks.
Using Stop Word Lists
The stop word list used by MS Search is not the same as the list kept in SAPERION. This is because MS
Search is an independent service that may be used by other applications in the organization.
The stop word lists are named "Noise", while the file extension stands for the specific language
("Noise.deu" for German). The lists are contained in the "\Mssql\Ftdata\Sqlserver\Config" directory
and can be edited. For details on this topic, see the SQL Server documentation.
When the stop word list has been modified, the full text catalog must be rebuilt in order for the
modifications to take effect.
Search with Prefixes
Prefix searches allows you to conduct a search using the beginning of words. The beginning of the word
is placed in quotation marks, and an asterisk is set before the closing quotation mark. It is possible to
apply prefix searches for phrases.
Examples
+
"Cor*" leads to Core, Core Server etc.
+
"Sa Ser *" leads to SAPERION Server, SAP Service etc.
11