Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Full Text – Connection Copyright © 2016 Lexmark. All rights reserved. Lexmark is a trademark of Lexmark International, Inc., registered in the U.S. and/or other countries. All other trademarks are the property of their respective owners. No part of this publication may be reproduced, stored, or transmitted in any form without the prior written permission of Lexmark. Table of Contents 1 Introduction ......................................................................................................... 1.1 Storing in the Full Text Database ................................................................... 1.2 Changing a Full Text Database ....................................................................... 1.3 Increasing Performance .................................................................................. 2 Using Full Text ..................................................................................................... 2.1 Copying of Texts from an Application File in a Full Text Field ....................... 2.2 Displaying Hit Ratios in Result List ............................................................... 2.3 Displaying the Full Text Field in Result List ................................................... 2.4 Highlighting of Hits in the Index Form ......................................................... 2.4.1 Characteristics of dtSearch ............................................................................. 3 dtSearch ............................................................................................................... 3.1 Setup of Full Text Databases .......................................................................... 3.2 Configuration ................................................................................................... 3.2.1 Activating dtSearch ......................................................................................... 3.2.2 Definition and Configuration of Full Text Fields ............................................ 3.2.3 Entries in PROGRAM.INI ............................................................................... 3.3 Search with Operators .................................................................................... 3.4 Search with Wildcards .................................................................................... 3.5 Using Dictionaries ........................................................................................... 3.6 Extended Search Options ............................................................................... 4 Oracle Full Text .................................................................................................... 4.1 Introduction – Oracle Full Text ...................................................................... 4.2 Setting up the Oracle Full Text ....................................................................... 4.3 Searching using Wildcards ............................................................................. 5 SQL Server Full Text ............................................................................................ 5.1 Introduction – SQL Server Full Text ............................................................... 5.2 Setting Up the SQL Server Full Text ............................................................... 5.2.1 Requirements ................................................................................................... 5.2.2 Installation of the SQL Full Text ..................................................................... 5.2.3 Maintenance of the SQL Full Text .................................................................. 5.3 Using the SQL Full Text .................................................................................. 5.3.1 Searching for Terms with Boolean Operators ............................................... 5.3.2 Searching for Terms Using Proximity Operators ........................................... 5.3.3 Emphasis of Terms ......................................................................................... 5.3.4 Search for Phrases .......................................................................................... 5.3.5 Using Stop Word Lists .................................................................................... 5.3.6 Search with Prefixes ........................................................................................ 2 2 2 2 3 3 3 4 4 5 5 5 5 5 5 5 6 6 7 7 8 8 8 8 8 8 9 9 9 9 9 10 10 10 11 11 11 2 Full Text – Connection 1 Introduction You can enter texts almost unlimited in length (up to 2 GB) in a full text field. You also have considerably extended search options compared to the search via entry fields. Therefore, you have a comfortable and powerful method to quickly and easily find documents even in large and unstructured data volumes. Full text fields offer a big function set and multiple configuration options. This chapter describes all information about full text-fields in depth. i The full text function is NOT available for Linux or Solaris. Furtheron SAPERION does not support Unicode languages (e.g. Chinese, Japanese, Korean) for full text databases. Even if the full text database itself supports these languages, this support cannot be utilized by SAPERION. 1.1 Storing in the Full Text Database When you use full text fields, the information are not saved in the database tables of the search database, but are saved word by word in special full text databases. When you enter a query, the full text database and the search database are searched separately. The individual search results are then put together and displayed as the overall result in the result list. 1.2 Changing a Full Text Database Since the saving structures of the individual full text databases are normally quite different, a direct conversion is not possible. As the import and export functions are only limited, changing the full text database is done by reorganizing the media. During a full text search an export of the database is not possible, depending on the full text database being used. ! You can normally change to another full text database only by reorganizing the media. The time this operation takes should not be underestimated. 1.3 Increasing Performance The performance of full text queries can be increased with parameter "MaxUIDsPerSearch" in section [full text]. This parameter is only effective when external databases like dtSearch are used. [full text] 2 Using Full Text MaxUIDsPerSearch=500 (default 40, max. 1000) 2 Using Full Text To use the full text function, you have to define an index field as full text field (Text-Retrieval). Fig. 2–1: Definition of a full text field ! When you use SQL full text (MS-SQL, ORACLE), only one full text field per DDC is to be used. 2.1 Copying of Texts from an Application File in a Full Text Field In full text fields, you can use the special system variable "FileText" for default values. This copies the content of an application file in a full text field automatically during indexing. SAPERION supports the following file types for this operation: BAT, CMD, CSV, DIF, DOC, DOT, HTM, HTML, INI, LOG, OBD, OBT, PDF, POT, PPT, RTF, SAM, SLK, TXT, WPF, WRI, XLS, XLT. You can configure the copy operation for special file types by means of suitable filters. The following example copies only text from files with the file extension ".TXT": Pos(".txt" , ""+FileName)>0 ? FileText : "" i 2.2 Texts in a full text field are ANSI texts, according to the set system standard codepage. Thus, also Russian texts can be processed, for example. Displaying Hit Ratios in Result List During installation of the full text search and when a full text field exists in the definition, you can enter two additional variables in the result list. These variables display the number of hits in a document (TRHits) or a graphic percentage of the document relevance (TRHitGraph). For the calculation of the percentage the number of hits is compared with the maximum number of hits of all found documents. Documents with 100% also have the highest hit ratios. 3 4 Characteristics of SQL full text For SQL full text, the system field "SYSRELEVANCE" is required for the display of hit ratios (TRHits resp. TRHitGraph). If this field is not existing, hit ratios cannot be displayed. By default, the hit with the highest relevance is normalized to 100 and the other hits are adjusted to this calculation. If this normalization on 100 is not desired, you have to set the following INI entry: [Compatibility] SQLfull textRanking=FALSE 2.3 Displaying the Full Text Field in Result List Since the data in full text databases are only present word by word, they cannot be displayed in the result list. But there is a trick which enables you to display at least part of the full text field: Define a so-called "mirror field" as data type "Character" in a useful text length (e.g. 100 characters) in the DDC file. This field only has the purpose to accept part of the full text field as a copy. If you enter for example: Copy(@Comment,1,100) in the index form as default value in this field, the first 100 characters of the full text fields "Comment" are synchronized automatically with the mirror field. Since you do not enter anything in this field, it makes sense to hide it in the form. 2.4 Highlighting of Hits in the Index Form You can display the index data of a document found via the full text search in the result list by clicking [Index]. During the full text search, all hits are highlighted in the full text field which corresponds to the search criteria. Enter the following in the PROGRAM.INI resp. the local ARCHIEF.INI of the respective client so the highlighting is only carried out in full text fields: Example [Setup] OnlyFulltextWordMarking=TRUE Highlighting of hits Parameter Description OnlyfulltextWordMarking TRUE = Highlighting only carried out in full text fields (default: FALSE) 3 dtSearch 5 Characteristics of dtSearch In some cases the highlighting of hits in dtSearch is not active. If you only require a simple search like with SQL full text, you can activate it via the following INI switch: [full text] SimpleLocationSearch=TRUE 3 dtSearch 3.1 Setup of Full Text Databases When index data are saved, the content of a full text field is written to the medium completely. However, to achieve better performance, this content is only written to the search database word by word. SAPERION saves this word list after each word of the full text field has been synchronized with the dictionaries where applicable (and the words have been filtered (see below). Full text fields are by default quite demanding on memory (20 to 100 byte per word). Therefore, you should enter not too many, but appropriate words during indexing resp. set up dictionaries and filters so that only relevant entries are copied to the search database. When full text is searched with dtSearch, the full text data are saved in a special full text database in separate sub-directories (in the DBS directory of SAPERION). They cannot be copied to a SQL database. The name of the sub-directory consists of the table name (DDC file) and a continuous numbering. During queries, SAPERION executes two queries internally and then connects both results (Cursor). 3.2 Configuration Activating dtSearch To activate dtSearch as a full text engine, make the following entry in the PROGRAM.INI: [Modules32] Retrieval=DTSEAR32.RET Definition and Configuration of Full Text Fields After you have defined index fields of data type "full text", you can already enter default values for extended search conditions when you define query forms for these fields. To minimize the memory usage of the search database, you can use dictionaries (see below) and setup an additional filter. Entries in PROGRAM.INI Section [DtSearch] contains settings for the full text database DtSearch. Example [DtSearch] test=TRUE 6 Log=<Drive>:\<Path>\<File name of log file> AccentSensitive=FALSE MergeLimit=100 ShutDownAfterMerge=TRUE [DtSearch] section Parameter Description test TRUE = Activates the superordinate SAPERION log log Activates the dtSearch log AccentSensitive TRUE = Activates accent-insensitive creation of full text index MergeLimit Maximum amount of documents (Insert Index) ShutDownAfterMerge TRUE = Shutdown after connecting the index This switch is set to TRUE by default. If there are problems with the index, this switch can be set to FALSE 3.3 Search with Operators During installation of the full text search you can connect your search criteria with the Boolean operators AND, OR, NOT or the "word proximity" operator. + AND only finds documents which contain both desired words. The search criteria "Arch* AND Construction" finds all documents with an indexing containing words starting with "Arch" and the word "Construction". + OR finds all documents which contain at least one of the words (e.g. "Arch* OR "Construction"). + NOT finds documents that do not contain the word. "NOT Archive" therefore finds all documents which do not contain the word "Archive". + The word proximity operator W/[n] finds documents in which both words only have a certain amount of other words between them. Example: For "Eva W/2 Smith" there must only be a maximum of two words between "Smith" and "Eva", thus "Eva Anna Smith" and "Eva Smith" are found (but "Eva Anna Theresa Smith" is not found). You can structure more complicated queries as you desire for example "(Architecture OR Construction) AND NOT Archive". Here, all documents are found which contain the words "Architecture" or "Construction", but must not contain the word "Archive". i 3.4 If you include several full text fields in your search, you have to connect them with an OR link. An AND link is not possible due to technical reasons. Search with Wildcards When installing the true full-text search, you can use wildcards such as "?" (which represents exactly one letter) and "*" (any amount of letters) in the search criteria. With "Arch*" as the search criterion, documents are found containing words such as "Archive", "archiving", "Architecture" etc. 3 dtSearch i 7 Searching with wildcards at the beginning of words only works if phonetic search is deactivated. Uppercase and lowercase are not distinguished. 3.5 Using Dictionaries To reduce the memory requirements for the word lists in the search databases, you can use dictionaries for full text searches. You can access these dictionaries using the [Options][Dictionaries] menu command. By doing so, when indexing a full text field, the system will insert only those words into the full-text database, that are not available in a stop word list (words to ignore). The system instantly activates this evaluation when there are entries in this "Stop words" dictionary. ! When using dtSearch, stop words must be defined in lower case. Capitalized stop words are not evaluated. In addition, you can maintain a thesaurus (synonym dictionary) and view words having a similar meaning under a "master" word. You can activate the "Thesaurus" option while defining index forms. The system assesses these synonyms during a query and therefore you must activate them in the query form. Furthermore the true full text search also includes derivations and forms of a word, depending on the full-text database used, so that you do not need to append them to the dictionary list. 3.6 Extended Search Options By using the full text search methodology, you have the option of more sophisticated search possibilities. Extended search options Option Description Phonetic search If activated the sound of words are compared. Fuzzy search If activated a certain amount of letters within a word may be different. The amount can be defined directly after the fuzzy search option (normally one to two characters are reasonable). The fuzzy search option is e.g, for texts recognized by OCR very useful. Root word Using the "Root Word" option, variations of a word (plural, genitive, word combinations, etc.) are included. If you use this option to search for "Handle", you will possibly only find documents containing "Handbag". Therefore, you should first test this option to see whether the results will be useful. Natural language This option allows the formulation of a query in standard of English (e.g. "Please find documents which are including the term hand"). Synonyms If this option is activated all terms from the synonym dictionary are considered automatically. 8 You can pre-set every single option in formulas, but you can also activate or deactivate them using the right mouse button during a query. With the exception of the "Synonyms" option, the search result is not always predictable. 4 Oracle Full Text 4.1 Introduction – Oracle Full Text The Oracle full text option makes it possible to use Oracle for full text fields in SAPERION. In contrast to external full text engines such as dtSearch, the full text is saved together with the index data in the SQL database. 4.2 Setting up the Oracle Full Text In order to use Oracle full text, the following conditions must be met: Oracle + The Oracle full text option must be installed. + The ODBC user needs following permissions on database: CTXAPP and RESOURCE + The ODBC user needs also the right "Create Any Index". SAPERION + The "Full text" box must be checked for the data source that you wish to use. + A full-text field must be created in the DDC file. ! 4.3 Only one full text field can be used for each DDC. A linked full-text search (AND, OR, …) functions only if the Boolean operators are entered in the query form. Searching using Wildcards You can use wildcards at any position in the text and uppercase and lowercase are not distinguished. For Oracle full text the character "%" must be used for the wildcard search (instead of "*"). 5 SQL Server Full Text 5.1 Introduction – SQL Server Full Text The SQL Server full text option allows you to use the Microsoft Search Service for full text fields in SAPERION. Contrary to external full text engines such as dtSearch, the full text is saved in the SQL database along with the index data. 5 SQL Server Full Text 5.2 Setting Up the SQL Server Full Text Requirements + Full text indexing must be activated for the current database with the database command "EXEC sp_fulltext_database 'enable'". For more information, see http://msdn2.microsoft.com/de-de/library/ms190321.aspx i If this command is executed when full-text catalogs already exist in the database, these catalogs will be removed from the database. Installation of the SQL Full Text The full text search (the Microsoft Search Service) is provided automatically with the standard installation of Microsoft SQL Server 2000 Standard or Enterprise edition. The service can also be installed later, however. Maintenance of the SQL Full Text A full-text catalog is created for every DDC. Catalog names are made up of the prefix "CA_” and the name of the DDC. SAPERION creates full-text catalogs with modification logs and the option refreshing in the background, which allows modifications to be transferred directly to the full-text index. SQL Server tools, such as those in the Enterprise Manager, are used to maintain the Full-Text. The full-text catalog can be refilled the procedure for refreshing can be modified, etc. For a detailed description of the possibilities, see the SQL Server documentation. 5.3 Using the SQL Full Text In order to use the functions of the SQL full text, an index field must be defined as a full text field (Text-Retrieval). If you wish the relevance of a hit to be displayed in the result list, the SYSRELEVANCE field, which is of the whole number type, must be defined. The SQL Server full text for an ODBC data source is activated in the "ODBC Data Source Properties" dialog by means of activating the checkbox "Full text". 9 10 Fig. 5–1: Activation of the option "Full text" In the classic full text search, simple or complex linking of search terms can be accomplished with the use of Boolean and proximity operators. Parenthetical expressions can be used to enter complex search queries, e.g., (cheese OR sausage) AND (bread OR cold cuts). Searching for Terms with Boolean Operators Search terms can be linked with Boolean operators: + AND This operator ensures that both search terms are contained in the found documents. + OR With this operator, at least one of the entered search terms will be contained in the found documents. + AND NOT This operator ensures that the search terms following it are not contained in the found documents. Operators of the same type are associative, and their order does not affect the search result. AND operators are evaluated before OR operators. ! A linked full-text search (AND, OR, …) functions only if the Boolean operators are entered in the query form. Searching for Terms Using Proximity Operators The proximity operator NEAR allows searches in which the weighting of search results is influenced by how close two search terms are located to one another in the text. Logically, the results correspond to an AND search, i.e., even texts containing terms that are far from one another will be found in the data record, but such hits are assigned a correspondingly low relevance. Emphasis of Terms Search terms can be weighted differently with the use of the ISABOUT key word. The weight is indicated as a value between 0.0 and 1.0. Logically, the individual terms of the query are linked with OR. 5 SQL Server Full Text Example ISABOUT( Core weight (0.8), "Sap*” weight (0.2) ) Search for Phrases Phrase searches are searches for several words that follow one another in a specific order. The words are entered enclosed in quotation marks. Using Stop Word Lists The stop word list used by MS Search is not the same as the list kept in SAPERION. This is because MS Search is an independent service that may be used by other applications in the organization. The stop word lists are named "Noise", while the file extension stands for the specific language ("Noise.deu" for German). The lists are contained in the "\Mssql\Ftdata\Sqlserver\Config" directory and can be edited. For details on this topic, see the SQL Server documentation. When the stop word list has been modified, the full text catalog must be rebuilt in order for the modifications to take effect. Search with Prefixes Prefix searches allows you to conduct a search using the beginning of words. The beginning of the word is placed in quotation marks, and an asterisk is set before the closing quotation mark. It is possible to apply prefix searches for phrases. Examples + "Cor*" leads to Core, Core Server etc. + "Sa Ser *" leads to SAPERION Server, SAP Service etc. 11