Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Table of Contents Full text search Get Started with Full-Text Search Query with Full-Text Search Search for Words Close to Another Word with NEAR Limit Search Results with RANK Improve the Performance of Full-Text Queries Search Document Properties with Search Property Lists Find Property Set GUIDs and Property Integer IDs for Search Properties Create and Manage Full-Text Catalogs Create and Manage Full-Text Indexes Choose a Language When Creating a Full-Text Index Populate Full-Text Indexes Improve the Performance of Full-Text Indexes Troubleshoot Full-Text Indexing Back Up and Restore Full-Text Catalogs and Indexes Configure and Manage Filters for Search Configure and Manage Word Breakers and Stemmers for Search View or Change Registered Filters and Word Breakers Change the Word Breaker Used for US English and UK English Revert the Word Breakers Used by Search to the Previous Version Customize the Behavior of Word Breakers with a Custom Dictionary Configure and Manage Stopwords and Stoplists for Full-Text Search Configure and Manage Thesaurus Files for Full-Text Search Manage and Monitor Full-Text Search for a Server Instance Set the Service Account for the Full-text Filter Daemon Launcher Upgrade Full-Text Search Full-Text Search DDL, Functions, Stored Procedures, and Views Use the Full-Text Indexing Wizard Deprecated Full-Text Search Features in SQL Server 2016 Semantic Search Install and Configure Semantic Search Enable Semantic Search on Tables and Columns Find Key Phrases in Documents with Semantic Search Find Similar and Related Documents with Semantic Search Manage and Monitor Semantic Search Semantic Search DDL, Functions, Stored Procedures, and Views Full-Text Search 3/24/2017 • 16 min to read • Edit Online Full-Text Search in SQL Server and Azure SQL Database lets users and applications run full-text queries against character-based data in SQL Server tables. Basic tasks This topic provides an overview of Full-Text Search and describes its components and its architecture. If you prefer to get started right away, here are the basic tasks. Get Started with Full-Text Search Create and Manage Full-Text Catalogs Create and Manage Full-Text Indexes Populate Full-Text Indexes Query with Full-Text Search NOTE Full-Text Search is an optional component of the SQL Server Database Engine. If you didn't select Full-Text Search when you installed SQL Server, run SQL Server Setup again to add it. Overview A full-text index includes one or more character-based columns in a table. These columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max) and FILESTREAM. Each full-text index indexes one or more columns from the table, and each column can use a specific language. Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase. A full-text query returns any documents that contain at least one match (also known as a hit). A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms. Full-Text Search queries After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. These queries can search for any of the following: One or more specific words or phrases (simple term) A word or a phrase where the words begin with specified text (prefix term) Inflectional forms of a specific word (generation term) A word or phrase close to another word or phrase (proximity term) Synonymous forms of a specific word (thesaurus) Words or phrases using weighted values (weighted term) Full-text queries are not case-sensitive. For example, searching for "Aluminum" or "aluminum" returns the same results. Full-text queries use a small set of Transact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). However, the search goals of a given business scenario influence the structure of the full-text queries. For example: e-business—searching for a product on a website: SELECT product_id FROM products WHERE CONTAINS(product_description, ”Snap Happy 100EZ” OR FORMSOF(THESAURUS,’Snap Happy’) OR ‘100EZ’) AND product_cost < 200 ; Recruitment scenario—searching for job candidates that have experience working with SQL Server: SELECT candidate_name,SSN FROM candidates WHERE CONTAINS(candidate_resume,”SQL Server”) AND candidate_division =DBA; For more information, see Query with Full-Text Search. Compare Full-Text Search queries to the LIKE predicate In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned. Full-Text Search architecture Full-text search architecture consists of the following processes: The SQL Server process (sqlservr.exe). The filter daemon host process (fdhost.exe). For security reasons, filters are loaded by separate processes called the filter daemon hosts. The fdhost.exe processes are created by an FDHOST launcher service (MSSQLFDLauncher), and they run under the security credentials of the FDHOST launcher service account. Therefore, the FDHOST launcher service must be running for full-text indexing and full-text querying to work. For information about setting the service account for this service, see Set the Service Account for the Full-text Filter Daemon Launcher. These two processes contain the components of the full-text search architecture. These components and their relationships are summarized in the following illustration. The components are described after the illustration. SQL Server process The SQL Server process uses the following components for full-text search: User tables. These tables contain the data to be full-text indexed. Full-text gatherer. The full-text gatherer works with the full-text crawl threads. It is responsible for scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs. Thesaurus files. These files contain synonyms of search terms. For more information, see Configure and Manage Thesaurus Files for Full-Text Search. Stoplist objects. Stoplist objects contain a list of common words that are not useful for the search. For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search. SQL Server query processor. The query processor compiles and executes SQL queries. If a SQL query includes a full-text search query, the query is sent to the Full-Text Engine, both during compilation and during execution. The query result is matched against the full-text index. Full-Text Engine. The Full-Text Engine in SQL Server is fully integrated with the query processor. The FullText Engine compiles and executes full-text queries. As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist. NOTE In SQL Server 2008 and later versions, the Full-Text Engine resides in the SQL Server process, rather than in a separate service. Integrating the Full-Text Engine into the Database Engine improved full-text manageability, optimization of mixed query, and overall performance. Index writer (indexer). The index writer builds the structure that is used to store the indexed tokens. Filter daemon manager. The filter daemon manager is responsible for monitoring the status of the FullText Engine filter daemon host. Filter Daemon Host process The filter daemon host is a process that is started by the Full-Text Engine. It runs the following full-text search components, which are responsible for accessing, filtering, and word breaking data from tables, as well as for word breaking and stemming the query input. The components of the filter daemon host are as follows: Protocol handler. This component pulls the data from memory for further processing and accesses data from a user table in a specified database. One of its responsibilities is to gather data from the columns being full-text indexed and pass it to the filter daemon host, which will apply filtering and word breaker as required. Filters. Some data types require filtering before the data in a document can be full-text indexed, including data in varbinary, varbinary(max), image, or xml columns. The filter used for a given document depends on its document type. For example, different filters are used for Microsoft Word (.doc) documents, Microsoft Excel (.xls) documents, and XML (.xml) documents. Then the filter extracts chunks of text from the document, removing embedded formatting and retaining the text and, potentially, information about the position of the text. The result is a stream of textual information. For more information, see Configure and Manage Filters for Search. Word breakers and stemmers. A word breaker is a language-specific component that finds word boundaries based on the lexical rules of a given language (word breaking). Each word breaker is associated with a language-specific stemmer component that conjugates verbs and performs inflectional expansions. At indexing time, the filter daemon host uses a word breaker and stemmer to perform linguistic analysis on the textual data from a given table column. The language that is associated with a table column in the full-text index determines which word breaker and stemmer are used for indexing the column. For more information, see Configure and Manage Word Breakers and Stemmers for Search. Full-Text Search processing Full-text search is powered by the Full-Text Engine. The Full-Text Engine has two roles: indexing support and querying support. Full-Text indexing process When a full-text population (also known as a crawl) is initiated, the Full-Text Engine pushes large batches of data into memory and notifies the filter daemon host. The host filters and word breaks the data and converts the converted data into inverted word lists. The full-text search then pulls the converted data from the word lists, processes the data to remove stopwords, and persists the word lists for a batch into one or more inverted indexes. When indexing data stored in a varbinary(max) or image column, the filter, which implements the IFilter interface, extracts text based on the specified file format for that data (for example, Microsoft Word). In some cases, the filter components require the varbinary(max), or image data to be written out to the filterdata folder, instead of being pushed into memory. As part of processing, the gathered text data is passed through a word breaker to separate the text into individual tokens, or keywords. The language used for tokenization is specified at the column level, or can be identified within varbinary(max), image, or xml data by the filter component. Additional processing may be performed to remove stopwords, and to normalize tokens before they are stored in the full-text index or an index fragment. When a population has completed, a final merge process is triggered that merges the index fragments together into one master full-text index. This results in improved query performance since only the master index needs to be queried rather than a number of index fragments, and better scoring statistics may be used for relevance ranking. Full-Text querying process The query processor passes the full-text portions of a query to the Full-Text Engine for processing. The Full-Text Engine performs word breaking and, optionally, thesaurus expansions, stemming, and stopword (noise-word) processing. Then the full-text portions of the query are represented in the form of SQL operators, primarily as streaming table-valued functions (STVFs). During query execution, these STVFs access the inverted index to retrieve the correct results. The results are either returned to the client at this point, or they are further processed before being returned to the client. Full-text index architecture The information in full-text indexes is used by the Full-Text Engine to compile full-text queries that can quickly search a table for particular words or combinations of words. A full-text index stores information about significant words and their location within one or more columns of a database table. A full-text index is a special type of tokenbased functional index that is built and maintained by the Full-Text Engine for SQL Server. The process of building a full-text index differs from building other types of indexes. Instead of constructing a B-tree structure based on a value stored in a particular row, the Full-Text Engine builds an inverted, stacked, compressed index structure based on individual tokens from the text being indexed. The size of a full-text index is limited only by the available memory resources of the computer on which the instance of SQL Server is running. Beginning in SQL Server 2008, the full-text indexes are integrated with the Database Engine, instead of residing in the file system as in previous versions of SQL Server. For a new database, the full-text catalog is now a virtual object that does not belong to any filegroup; it is merely a logical concept that refers to a group of the full-text indexes. Note, however, that during upgrade of a SQL Server 2005 database, any full-text catalog that contains data files, a new filegroup is created; for more information, see Upgrade Full-Text Search. Only one full-text index is allowed per table. For a full-text index to be created on a table, the table must have a single, unique nonnull column. You can build a full-text index on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. Creating a full-text index on a column whose data type is varbinary, varbinary(max), image, or xml requires that you specify a type column. A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the document in each row. Full-text index structure A good understanding of the structure of a full-text index will help you understand how the Full-Text Engine works. This topic uses the following excerpt of the Document table in Adventure Works as an example table. This excerpt shows only two columns, the DocumentID column and the Title column, and three rows from the table. For this example, we will assume that a full-text index has been created on the Title column. DOCUMENTID TITLE 1 Crank Arm and Tire Maintenance DOCUMENTID TITLE 2 Front Reflector Bracket and Reflector Assembly 3 3 Front Reflector Bracket Installation For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the Title column of the Document table. Full-text indexes contain more information than is presented in this table. The table is a logical representation of a full-text index and is provided for demonstration purposes only. The rows are stored in a compressed format to optimize disk usage. Notice that the data has been inverted from the original documents. Inversion occurs because the keywords are mapped to the document IDs. For this reason, a full-text index is often referred to as an inverted index. Also notice that the keyword "and" has been removed from the full-text index. This is done because "and" is a stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby improving query performance. For more information about stopwords, see Configure and Manage Stopwords and Stoplists for Full-Text Search. Fragment 1 KEYWORD COLID DOCID OCCURRENCE Crank 1 1 1 Arm 1 1 2 Tire 1 1 4 Maintenance 1 1 5 Front 1 2 1 Front 1 3 1 Reflector 1 2 2 Reflector 1 2 5 Reflector 1 3 2 Bracket 1 2 3 Bracket 1 3 3 Assembly 1 2 6 3 1 2 7 Installation 1 3 4 The Keyword column contains a representation of a single token extracted at indexing time. Word breakers determine what makes up a token. The ColId column contains a value that corresponds to a particular column that is full-text indexed. The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a fulltext indexed table. This mapping is necessary when the full-text key is not an integer data type. In such cases, mappings between full-text key values and DocId values are maintained in a separate table called the DocId Mapping table. To query for these mappings use the sp_fulltext_keymappings system stored procedure. To satisfy a search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve rows from the base table being queried. If the full-text key value of the base table is an integer type, the value directly serves as the DocId and no mapping is necessary. Therefore, using integer full-text key values can help optimize full-text queries. The Occurrence column contains an integer value. For each DocId value, there is a list of occurrence values that correspond to the relative word offsets of the particular keyword within that DocId. Occurrence values are useful in determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. They are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may be used in scoring. Full-text index fragments The logical full-text index is usually split across multiple internal tables. Each internal table is called a full-text index fragment. Some of these fragments might contain newer data than others. For example, if a user updates the following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created. DOCUMENTID TITLE 3 Rear Reflector In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to Fragment 1. Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3. Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments catalog view. Fragment 2 KEYWORD COLID DOCID OCC Rear 1 3 1 Reflector 1 3 2 As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries. Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query performance. To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement performs a master merge, which merges the fragments into a single larger fragment and removes all obsolete entries from the full-text index. After being reorganized, the example index would contain the following rows: KEYWORD COLID DOCID OCC Crank 1 1 1 Arm 1 1 2 Tire 1 1 4 KEYWORD COLID DOCID OCC Maintenance 1 1 5 Front 1 2 1 Rear 1 3 1 Reflector 1 2 2 Reflector 1 2 5 Reflector 1 3 2 Bracket 1 2 3 Assembly 1 2 6 3 1 2 7 Differences between full-text indexes and regular SQL Server indexes:. FULL-TEXT INDEXES REGULAR SQL SERVER INDEXES Only one full-text index allowed per table. Several regular indexes allowed per table. The addition of data to full-text indexes, called a population, can be requested through either a schedule or a specific request, or can occur automatically with the addition of new data. Updated automatically when the data upon which they are based is inserted, updated, or deleted. Grouped within the same database into one or more full-text catalogs. Not grouped. Full-Text search linguistic components and language support Full-text search supports almost 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi. For a complete list of the supported full-text languages, see sys.fulltext_languages (Transact-SQL). Each of the columns contained in the full-text index is associated with a Microsoft Windows locale identifier (LCID) that equates to a language that is supported by full-text search. For example, LCID 1033 equates to U.S English, and LCID 2057 equates to British English. For each supported full-text language, SQL Server provides linguistic components that support indexing and querying full-text data that is stored in that language. Language-specific components include the following: Word breakers and stemmers. A word breaker finds word boundaries based on the lexical rules of a given language (word breaking). Each word breaker is associated with a stemmer that conjugates verbs for the same language. For more information, see Configure and Manage Word Breakers and Stemmers for Search. Stoplists. A system stoplist is provided that contains a basic set stopwords (also known as noise words). A stopword is a word that does not help the search and is ignored by full-text queries. For example, for the English locale words such as "a", "and", "is", and "the" are considered stopwords. Typically, you will need to configure one or more thesaurus files and stoplists. For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search. Thesaurus files. SQL Server also installs a thesaurus file for each full-text language, as well as a global thesaurus file. The installed thesaurus files are essentially empty, but you can edit them to define synonyms for a specific language or business scenario. By developing a thesaurus tailored to your full-text data, you can effectively broaden the scope of full-text queries on that data. For more information, see Configure and Manage Thesaurus Files for Full-Text Search. Filters (iFilters). Indexing a document in a varbinary(max), image, or xml data type column requires a filter to perform extra processing. The filter must be specific to the document type (.doc, .pdf, .xls, .xml, and so forth). For more information, see Configure and Manage Filters for Search. Word breakers (and stemmers) and filters run in the filter daemon host process (fdhost.exe). THIS TOPIC APPLIES TO: SQL Server (starting with 2008) Parallel Data Warehouse Azure SQL Database Azure SQL Data Warehouse Get Started with Full-Text Search 4/7/2017 • 5 min to read • Edit Online SQL Server databases are full-text enabled by default. Before you can run full-text queries, however, you must create a full text catalog and create a full-text index on the tables or indexed views you want to search. Set up full-text search in two steps There are two basic steps to set up full-text search: 1. Create a full-text catalog. 2. Create a full-text index on tables or indexed view you want to search. Each full-text index must belong to a full-text catalog. You can create a separate text catalog for each full-text index, or you can associate multiple full-text indexes with a given catalog. A full-text catalog is a virtual object and does not belong to any filegroup. The catalog is a logical concept that refers to a group of full-text indexes. NOTE These steps assume that you installed the optional Full-Text Search components when you installed SQL Server. If not, you have to run SQL Server Setup again to add them. Set up full-text search with a wizard To set up full-text search by using a wizard, see Use the Full-Text Indexing Wizard. Set up full-text search with Transact-SQL The following two-part example creates a full-text catalog named AdvWksDocFTCat on the AdventureWorks sample database and then creates a full-text index on the Document table in the sample database. This statement creates the full-text catalog in the default directory specified during SQL Server setup. The folder named AdvWksDocFTCat is in the default directory. 1. To create a full-text catalog named statement: AdvWksDocFTCat , the example uses a CREATE FULLTEXT CATALOG USE AdventureWorks; GO CREATE FULLTEXT CATALOG AdvWksDocFTCat; For more info, see Create and Manage Full-Text Catalogs. 2. Before you can create a full-text index on the Document table, ensure that the table has a unique, singlecolumn, non-nullable index. The following CREATE INDEX statement creates a unique index, ui_ukDoc , on the DocumentID column of the Document table: CREATE UNIQUE INDEX ui_ukDoc ON Production.Document(DocumentID); 3. After you have a unique key, you can create a full-text index on the CREATE FULLTEXT INDEX statement. Document table by using the following CREATE FULLTEXT INDEX ON Production.Document ( Document --Full-text index column name TYPE COLUMN FileExtension --Name of column that contains file type information Language 2057 --2057 is the LCID for British English ) KEY INDEX ui_ukDoc ON AdvWksDocFTCat --Unique index WITH CHANGE_TRACKING AUTO --Population type; GO The TYPE COLUMN defined in this example specifies the type column in the table that contains the type of the document in each row of the column 'Document' (which is of binary type). The type column stores the user-supplied file extension - ".doc", ".xls", and so forth - of the document in a given row. The Full-Text Engine uses the file extension in a given row to invoke the correct filter to use for parsing the data in that row. After the filter has parsed the binary data of the row, the specified word breaker parses the content. (In this example, the word breaker for British English is used.) For more information, see Configure and Manage Filters for Search. For more info, see Create and Manage Full-Text Indexes. Choose options for a full-text index Choose a language For information about choosing the column language, see Choose a Language When Creating a Full-Text Index. Choose a filegroup The process of building a full-text index is fairly I/O intensive. In summary, it consists of reading data from SQL Server, and then propagating the filtered data to the full-text index. As a best practice, locate a full-text index in the database filegroup that is best for maximizing I/O performance or locate the full-text indexes in a different filegroup on another volume. Choose a full-text catalog We recommend associating tables with the same update characteristics (such as small number of changes versus large number of changes, or tables that change frequently during a particular time of day) together under the same full-text catalog. By setting up full-text catalog population schedules, full-text indexes stay synchronous with the tables without adversely affecting the resource usage of the database server during periods of high database activity. Consider the following guidelines: If you are indexing a table with millions of rows, assign the table to its own full-text catalog. Consider the amount of change occurring in the tables being full-text indexed, as well as the total number of rows. If the total number of rows being changed, together with the number of rows in the table present during the last full-text population, represents millions of rows, assign the table to its own full-text catalog. Associate a unique index Always select the smallest unique index available for your full-text unique key. (A 4-byte, integer-based index is optimal.) This significantly reduces the resources required by Microsoft Search service in the file system. If the primary key is large (over 100 bytes), consider choosing another unique index in the table (or creating another unique index) as the full-text unique key. Otherwise, if the full-text unique key size exceeds the maximum size allowed (900 bytes), full-text population will not be able to proceed. Associate a stoplist A stoplist is a list of stopwords, also known as noise words. A stoplist is associated with each full-text index, and the words in that stoplist are applied to full-text queries on that index. By default, the system stoplist is associated with a new full-text index. You can create and use your own stoplist too. For example, the following CREATE FULLTEXT STOPLIST Transact-SQL statement creates a new full-text stoplist named myStoplist by copying from the system stoplist: CREATE FULLTEXT STOPLIST myStoplist FROM SYSTEM STOPLIST; GO The following ALTER FULLTEXT STOPLIST Transact-SQL statement alters a stoplist named myStoplist, adding the word 'en', first for Spanish and then for French: ALTER FULLTEXT STOPLIST myStoplist ADD 'en' LANGUAGE 'Spanish'; ALTER FULLTEXT STOPLIST myStoplist ADD 'en' LANGUAGE 'French'; GO For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search. Update a full-text index Like regular SQL Server indexes, full-text indexes can be automatically updated as data is modified in the associated tables. This is the default behavior. Alternatively, you can keep your full-text indexes up-to-date manually, or at specified scheduled intervals. Populating a full-text index can be time-consuming and resourceintensive. Therefore, index updating is usually performed as an asynchronous process that runs in the background and keeps the full-text index up to date after modifications in the base table. Updating a full-text index immediately after each change in the base table is also resource-intensive. Therefore, if you have a high update/insert/delete rate, you may experience some degradation in query performance. If this occurs, consider scheduling manual change tracking updates to keep up with the numerous changes from time to time, rather than competing with queries for resources. For more info, see Populate Full-Text Indexes. Next steps After you set up SQL Server Full-Text Search, you're ready to run full-text queries. For more info, see Query with Full-Text Search. Query with Full-Text Search 4/7/2017 • 14 min to read • Edit Online Write full-text queries by using the full-text predicates CONTAINS and FREETEXT and the rowset-valued functions CONTAINSTABLE and FREETEXTTABLE with the SELECT statement. This topic provides examples of each predicate and function and helps you choose the best one to use. Use CONTAINS and CONTAINSTABLE to match words and phrases. Use FREETEXT and FREETEXTTABLE to match the meaning, but not the exact wording. Simple examples of each predicate and function Example - CONTAINS The following example finds all products with a price of $80.99 that contain the word "Mountain" . USE AdventureWorks2012 GO SELECT Name, ListPrice FROM Production.Product WHERE ListPrice = 80.99 AND CONTAINS(Name, 'Mountain') GO Example - FREETEXT The following example searches for all documents that contain words related to vital, safety, components. USE AdventureWorks2012 GO SELECT Title FROM Production.Document WHERE FREETEXT (Document, 'vital safety components') GO Example - CONTAINSTABLE The following example returns the description ID and description of all products for which the Description column contain the word "aluminum" near either the word "light" or the word "lightweight." Only rows with a rank value of 2 or higher are returned. USE AdventureWorks2012 GO SELECT FT_TBL.ProductDescriptionID, FT_TBL.Description, KEY_TBL.RANK FROM Production.ProductDescription AS FT_TBL INNER JOIN CONTAINSTABLE (Production.ProductDescription, Description, '(light NEAR aluminum) OR (lightweight NEAR aluminum)' ) AS KEY_TBL ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY] WHERE KEY_TBL.RANK > 2 ORDER BY KEY_TBL.RANK DESC; GO Example - FREETEXTTABLE The following example extends a FREETEXTTABLE query to return the highest ranked rows first and to add the ranking of each row to the select list. To specify the query, you must know that ProductDescriptionID is the unique key column for the ProductDescription table. USE AdventureWorks2012 GO SELECT KEY_TBL.RANK, FT_TBL.Description FROM Production.ProductDescription AS FT_TBL INNER JOIN FREETEXTTABLE(Production.ProductDescription, Description, 'perfect all-around bike') AS KEY_TBL ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY] ORDER BY KEY_TBL.RANK DESC GO Here is an extension of the same query that only returns rows with a rank value of 10 or greater: USE AdventureWorks2012 GO SELECT KEY_TBL.RANK, FT_TBL.Description FROM Production.ProductDescription AS FT_TBL INNER JOIN FREETEXTTABLE(Production.ProductDescription, Description, 'perfect all-around bike') AS KEY_TBL ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY] WHERE KEY_TBL.RANK >= 10 ORDER BY KEY_TBL.RANK DESC GO Pick the best predicate or function / CONTAINSTABLE and FREETEXT / FREETEXTTABLE are useful for different kinds of matching. The following table helps you to choose the best predicate or function for your query. CONTAINS For examples, see Simple examples of each predicate and function and Examples of specific types of searches. Also see What you can search for. Type of query CONTAINS/CONTAINSTABLE FREETEXT/FREETEXTTABLE Match single words and phrases with precise or fuzzy (less precise) matching. Match the meaning, but not the exact wording, of specified words, phrases or sentences (the freetext string). Matches are generated if any term or form of any term is found in the fulltext index of a specified column. More query options You can specify the proximity of words within a certain distance of one another. N/a You can return weighted matches. You can use logical operation to combine search conditions. For more info, see Using Boolean operators (AND, OR, and NOT) later in this topic. Compare predicates and functions The predicates CONTAINS / FREETEXT and the rowset-valued functions CONTAINSTABLE / FREETEXTTABLE have different syntax and options. The following table helps you to choose the best predicate or function for your query. For examples, see Simple examples of each predicate and function and Examples of specific types of searches. Also see What you can search for. PREDICATES CONTAINS/FREETEXT FUNCTIONS CONTAINSTABLE/FREETEXTTABLE Usage Use the full-text predicates CONTAINS and FREETEXT in the WHERE or HAVING clause of a SELECT statement. Use the full-text functions CONTAINSTABLE and FREETEXTTABLE functions like a regular table name in the FROM clause of a SELECT statement. More query options You can combine them with any of the other Transact-SQL predicates, such as LIKE and BETWEEN. You have to specify the base table to search when you use either of these functions. As with the predicates, you can specify a single column, a list of columns, or all columns in the table to be searched, and optionally, the language whose resources will be used by given full-text query. You can specify either a single column, a list of columns, or all columns in the table to be searched. Optionally, you can specify the language whose resources will be used by the full-text query for word breaking and stemming, thesaurus lookups, and noise-word removal. Typically you have to join the results of CONTAINSTABLE or FREETEXTTABLE with the base table. To do this, you have to know the unique key column name. This column, which occurs in every full-text enabled table, is used to enforce unique rows for the table (the uniquekey column). For more info about the key column, see Create and Manage Full-Text Indexes. Results PREDICATES CONTAINS/FREETEXT FUNCTIONS CONTAINSTABLE/FREETEXTTABLE The CONTAINS and FREETEXT predicates return a TRUE or FALSE value that indicates whether a given row matches the full-text query. Matching rows are returned in the result set. These functions return a table of zero, one, or more rows that match the fulltext query. The returned table contains only rows from the base table that match the selection criteria specified in the full-text search condition of the function. Queries that use one of these functions also return a relevance ranking value (RANK) and full-text key (KEY) for each row returned, as follows KEY column. The KEY column returns unique values of the returned rows. The KEY column can be used to specify selection criteria. RANK column. The RANK column returns a rank value for each row that indicates how well the row matched the selection criteria. The higher the rank value of the text or document in a row, the more relevant the row is for the given full-text query. Note that different rows can be ranked identically. You can limit the number of matches to be returned by specifying the optional top_n_by_rank parameter. For more information, see Limit Search Results with RANK. Additional options You can use a four-part name in the CONTAINS or FREETEXT predicate to query full-text indexed columns of the target tables on a linked server. To prepare a remote server to receive fulltext queries, create a full-text index on the target tables and columns on the remote server and then add the remote server as a linked server. N/a More info For more info about the syntax and arguments of these predicates, see CONTAINS and FREETEXT. For more info about the syntax and arguments of these functions, see CONTAINSTABLE and FREETEXTTABLE. What you can search for The following table describes the types of words and phrases that you can search for. QUERY-TERM FORM DESCRIPTION SUPPORTED BY QUERY-TERM FORM DESCRIPTION SUPPORTED BY One or more specific words or phrases (simple term) For example, "croissant" is a word, and "café au lait" is a phrase. Words and phrases such as these are called simple terms. CONTAINS and CONTAINSTABLE look for an exact match for the phrase. FREETEXT and FREETEXTTABLE break up the phrase into separate words. In full-text search, a word (or token) is a string whose boundaries are identified by appropriate word breakers, following the linguistic rules of the specified language. A valid phrase consists of multiple words, with or without any punctuation marks between them. For more information, see Searching for Specific word or Phrase (Simple Term), later in this topic. A word or a phrase where the words begin with specified text (prefix term) For a single prefix term, any word starting with the specified term will be part of the result set. For example, the term "auto" matches "automatic", "automobile", and so forth. CONTAINS and CONTAINSTABLE For a phrase, each word within the phrase is considered to be a prefix term. For example, the term "auto tran\" matches "automatic transmission" and "automobile transducer", but it does not match "automatic motor transmission". A prefix term refers to a string that is affixed to the front of a word to produce a derivative word or an inflected form. For more information, see Performing Prefix Searches (Prefix Term), later in this topic. Inflectional forms of a specific word (generation term - inflectional) For example, search for the inflectional form of the word "drive". If various rows in the table include the words "drive", "drives", "drove", "driving", and "driven", all would be in the result set because each of these can be inflectionally generated from the word drive. The inflectional forms are the different tenses and conjugations of a verb or the singular and plural forms of a noun. For more information, see Searching for the Inflectional Form of a Specific Word (Generation Term), later in this topic. FREETEXT and FREETEXTTABLE look for inflectional terms of all specified words by default. CONTAINS and CONTAINSTABLE support an optional INFLECTIONAL argument. QUERY-TERM FORM DESCRIPTION SUPPORTED BY Synonymous forms of a specific word (generation term - thesaurus) For example, if an entry, "{car, automobile, truck, van}", is added to a thesaurus, you can search for the thesaurus form of the word "car". All rows in the table queried that include the words "automobile", "truck", "van", or "car", appear in the result set because each of these words belong to the synonym expansion set containing the word "car". FREETEXT and FREETEXTTABLE use the thesaurus by default. CONTAINS and CONTAINSTABLE support an optional THESAURUS argument. A thesaurus defines user-specified synonyms for terms. For information about the structure of thesaurus files, see Configure and Manage Thesaurus Files for Full-Text Search. A word or phrase close to another word or phrase (proximity term) For example, you want to find the rows in which the word "ice" is near the word "hockey" or in which the phrase "ice skating" is near the phrase "ice hockey". A proximity term indicates words or phrases that are near to each other., You can also specify the maximum number of non-search terms that separate the first and last search terms. In addition, you can search for words or phrases in any order, or in the order in which you specify them. For more information, see Search for Words Close to Another Word with NEAR. CONTAINS and CONTAINSTABLE QUERY-TERM FORM DESCRIPTION SUPPORTED BY Words or phrases using weighted values (weighted term) For example, in a query searching for multiple terms, you can assign each search word a weight value indicating its importance relative to the other words in the search condition. The results for this type of query return the most relevant rows first, according to the relative weight you have assigned to search words. The result sets contain documents or rows containing any of the specified terms (or content between them); however, some results will be considered more relevant than others because of the variation in the weighted values associated with different searched terms. CONTAINSTABLE A weighting value indicates the degree of importance for each word and phrase within a set of words and phrases. A weight value of 0.0 is the lowest, and a weight value of 1.0 is the highest. For more information, see Searching for Words or Phrases Using Weighted Values (Weighted Term), later in this topic. Examples of specific types of searches Search for a specific word or phrase (Simple Term) You can use CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE to search a table for a specific phrase. For example, if you want to search the ProductReview table in the AdventureWorks2012 database to find all comments about a product with the phrase "learning curve", you could use the CONTAINS predicate as follows: USE AdventureWorks2012 GO SELECT Comments FROM Production.ProductReview WHERE CONTAINS(Comments, '"learning curve"') GO The search condition, in this case "learning curve", can be quite complex and can be composed of one or more terms. Search for a word with a prefix (Prefix Term) You can use CONTAINS or CONTAINSTABLE to search for words or phrases with a specified prefix. All entries in the column that contain text beginning with the specified prefix are returned. For example, to search for all rows that contain the prefix top -, as in top``ple , top``ping , and top . The query looks like this: USE AdventureWorks2012 GO SELECT Description, ProductDescriptionID FROM Production.ProductDescription WHERE CONTAINS (Description, '"top*"' ) GO All text that matches the text specified before the asterisk () is returned. If the text and asterisk are not delimited by double quotation marks, as in `CONTAINS (DESCRIPTION, 'top')`, full-text search does not consider the asterisk to be a wildcard.. When the prefix term is a phrase, each token making up the phrase is considered a separate prefix term. All rows that have words beginning with the prefix terms will be returned. For example, the prefix term "light bread*" will find rows with text of "light breaded," "lightly breaded," or "light bread," but it will not return "lightly toasted bread". Search for inflectional forms of a specific word (Generation Term) You can use CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE to search for all the different tenses and conjugations of a verb or both the singular and plural forms of a noun (an inflectional search) or for synonymous forms of a specific word (a thesaurus search). The following example searches for any form of "foot" ("foot", "feet", and so on) in the ProductReview table in the AdventureWorks database. Comments column of the USE AdventureWorks2012 GO SELECT Comments, ReviewerName FROM Production.ProductReview WHERE CONTAINS (Comments, 'FORMSOF(INFLECTIONAL, "foot")') GO Full-text search uses stemmers, which allow you to search for the different tenses and conjugations of a verb, or both the singular and plural forms of a noun. For more information about stemmers, see Configure and Manage Word Breakers and Stemmers for Search. Search for words or phrases using weighted values (Weighted Term) You can use CONTAINSTABLE to search for words or phrases and specify a weighting value. Weight, measured as a number from 0.0 through 1.0, indicates the importance of each word and phrase within a set of words and phrases. A weight of 0.0 is the lowest, and a weight of 1.0 is the highest. The following example shows a query that searches for all customer addresses, using weights, in which any text beginning with the string "Bay" has either "Street" or "View". The results give a higher rank to those rows that contain more of the words specified. USE AdventureWorks2012 GO SELECT AddressLine1, KEY_TBL.RANK FROM Person.Address AS Address INNER JOIN CONTAINSTABLE(Person.Address, AddressLine1, 'ISABOUT ("Bay*", Street WEIGHT(0.9), View WEIGHT(0.1) ) ' ) AS KEY_TBL ON Address.AddressID = KEY_TBL.[KEY] ORDER BY KEY_TBL.RANK DESC GO A weighted term can be used in conjunction with any simple term, prefix term, generation term, or proximity term. Use Boolean operators (AND, OR, and NOT) The CONTAINS predicate and CONTAINSTABLE function use the same search conditions. Both support combining several search terms by using Boolean operators - AND, OR, and NOT - to perform logical operations. You can use AND, for example, to find rows that contain both "latte" and "New York-style bagel". You can use AND NOT, for example, to find the rows that contain "bagel" but do not contain "cream cheese". In contrast, FREETEXT and FREETEXTTABLE treat the Boolean terms as words to be searched. For information about combining CONTAINS with other predicates that use the logical operators AND, OR, and NOT, see Search Condition (Transact-SQL). Example The following example uses the CONTAINS predicate to search for descriptions in which the description ID is not equal to 5 and the description contains both the word "Aluminum" and the word "spindle." The search condition uses the AND Boolean operator. This example uses the ProductDescription table of the AdventureWorks2012 database. USE AdventureWorks2012 GO SELECT Description FROM Production.ProductDescription WHERE ProductDescriptionID <> 5 AND CONTAINS(Description, 'aluminum AND spindle') GO More query options When you write full-text queries, you can also specify the following options. Language, with the LANGUAGE option. Many query terms depend heavily on word-breaker behavior. To ensure that you are using the correct word breaker (and stemmer) and thesaurus file, we recommend that you specify the LANGUAGE option. For more information, see Choose a Language When Creating a FullText Index. Case sensitivity. Full-text search queries are case-insensitive. However, in Japanese, there are multiple phonetic orthographies in which the concept of orthographic normalization is akin to case insensitivity (for example, kana = insensitivity). This type of orthographic normalization is not supported. Stopwords. When defining a full-text query, the Full-Text Engine discards stopwords (also called noise words) from the search criteria. Stopwords are words such as "a," "and," "is," or "the," that can occur frequently but that typically do not help when searching for particular text. Stopwords are listed in a stoplist. Each full-text index is associated with a specific stoplist, which determines what stopwords are omitted from the query or the index at indexing time. For more info, see Configure and Manage Stopwords and Stoplists for Full-Text Search. Thesaurus. FREETEXT and FREETEXTTABLE queries use the thesaurus by default. CONTAINS and CONTAINSTABLE support an optional THESAURUS argument. For more info, see Configure and Manage Thesaurus Files for Full-Text Search. Check the tokenization results After you apply a given word breaker, thesaurus, and stoplist combination in a query, you can view the tokenization results by using the sys.dm_fts_parser dynamic management view. For more information, see sys.dm_fts_parser (Transact-SQL). See Also CONTAINS (Transact-SQL) CONTAINSTABLE (Transact-SQL) FREETEXT (Transact-SQL) FREETEXTTABLE (Transact-SQL) Create Full-Text Search Queries (Visual Database Tools) Improve the Performance of Full-Text Queries Search for Words Close to Another Word with NEAR 3/24/2017 • 5 min to read • Edit Online You can use the proximity term NEAR in a CONTAINS predicate or CONTAINSTABLE function to search for words or phrases near one another. Overview of NEAR NEAR has the following features: You can specify the maximum number of non-search terms that separate the first and last search terms. You can search for words or phrases in any order, or you can search for words and phrases in a specific order. You can specify the maximum number of non-search terms, or maximum distance, that separates the first and last search terms in order to constitute a match. If you specify the maximum number of terms, you can also specify that matches must contain the search terms in the specified order. To qualify as a match, a string of text must: Start with one of the specified search terms and end with the one of the other specified search terms. Contain all of the specified search terms. The number of non-search terms, including stopwords, that occur between the first and last search terms must be less than or equal to the maximum distance, if the maximum distance is specified. Syntax of NEAR The basic syntax of NEAR is: NEAR ( { *search_term* [ ,…*n* ] | (*search_term* [ ,…*n* ] ) [, <maximum_distance> [, <match_order> ] ] } ) For more info about the syntax, see CONTAINS (Transact-SQL). Examples Example 1 For example, you could search for 'John' within two terms of 'Smith', as follows: ... CONTAINS(column_name, 'NEAR((John, Smith), 2)') Some examples of strings that match are " John Jacob Smith " and " Smith, John ". The string " John Jones knows Fred Smith " contains three intervening non-search terms, so it is not a match. To require that the terms be found in the specified order, you would change the example proximity term to NEAR((John, Smith),2, TRUE). This searches for " John " within two terms of " Smith " but only when " John " precedes " Smith ". In a language that reads from left to right, such as English, an example of a string that matches is " John Jacob Smith ". Note that for a language that reads from right to left, such as Arabic or Hebrew, the Full-Text Engine applies the specified terms in reverse order. Also, Object Explorer in SQL Server Management Studio automatically reverses the display order of words specified in right-to-left languages. Example 2 The following example searches the Production.Document table of the AdventureWorks sample database for all document summaries that contain the word "reflector" in the same document as the word "bracket". SELECT DocumentNode, Title, DocumentSummary FROM Production.Document AS DocTable INNER JOIN CONTAINSTABLE(Production.Document, Document, 'NEAR(bracket, reflector)' ) AS KEY_TBL ON DocTable.DocumentNode = KEY_TBL.[KEY] WHERE KEY_TBL.RANK > 50 ORDER BY KEY_TBL.RANK DESC; GO How maximum distance is measured A specific maximum distance, such as 10 or 25, determines how many non-search terms, including stopwords, can occur between the first and last search terms in a given string. For example, NEAR((dogs, cats, "hunting mice"), 3) would return the following row, in which the total number of non-search terms is three (" enjoy ", " but ", and " avoid "): " Cats enjoy hunting mice``, but avoid dogs``. " The same proximity term would not return the following row, because the maximum distance is exceeded by the four non-search terms (" enjoy ", " but ", " usually ", and " avoid "): " Cats enjoy hunting mice``, but usually avoid dogs``. " Combine NEAR with other terms You can combine NEAR with some other terms. You can use AND (&), OR (|), or AND NOT (&!) to combine a custom proximity term with another custom proximity term, a simple term, or a prefix term. For example: CONTAINS('NEAR((term1,term2),5) AND term3') CONTAINS('NEAR((term1,term2),5) OR term3') CONTAINS('NEAR((term1,term2),5) AND NOT term3') CONTAINS('NEAR((term1,term2),5) AND NEAR((term3,term4),2)') CONTAINS('NEAR((term1,term2),5) OR NEAR((term3,term4),2, TRUE)') For example, CONTAINS(column_name, 'NEAR((term1, term2), 5, TRUE) AND term3') You can't combine NEAR with a generation term (ISABOUT …) or a weighted term (FORMSOF …). More info about proximity searches Overlapping occurrences of search terms All proximity searches always look for only non-overlapping occurrences. Overlapping occurrences of search terms never qualify as matches. For example, consider the following proximity term, which searches " A " and " AA " in this order with a maximum distance of two terms: CONTAINS(column_name, 'NEAR((A,AA),2, TRUE') The possible matches are as " AAA ", " A.AA ", and " A..AA ". Rows containing just " AA " would not match. NOTE You can specify terms that overlap, for example, NEAR("mountain bike", "bike trails") or (NEAR(comfort*, comfortable), 5) . Specifying a overlapping terms increases the complexity of the query by increasing the possible match permutations. If you specify a large number of such overlapping terms, the query can run out of resources and fail. If this occurs, simplify the query and try again. NEAR (regardless of whether a maximum distance is specified) indicates the logical distance between terms, rather than the absolute distance between them. For example, terms within different phrases or sentences within a paragraph are treated as farther apart than terms in the same phrase or sentence, regardless of their actual proximity, on the assumption that they are less related. Likewise, terms in different paragraphs are treated as being even farther apart. If a match spans the end of a sentence, paragraph, or chapter, the gap used for ranking a document is increased by 8, 128, or 1024, respectively. Impact of proximity terms on ranking by the CONTAINSTABLE function When NEAR is used in the CONTAINSTABLE function, the number of hits in a document relative to its length as well as the distance between the first and last search terms in each of the hits affects the ranking of each document. For a generic proximity term, if the matched search terms are >50 logical terms apart, the rank returned on a document is 0. For a custom proximity term that does not specify an integer as the maximum distance, a document that contains only hits whose gap is >100 logical terms will receive a ranking of 0. For more information about ranking of custom proximity searches, see Limit Search Results with RANK. The transform noise words server option The value of transform noise words impacts how SQL Server treats stopwords if they are specified in proximity searches. For more information, see transform noise words Server Configuration Option. See Also CONTAINS (Transact-SQL) CONTAINSTABLE (Transact-SQL) Query with Full-Text Search Limit Search Results with RANK 3/24/2017 • 8 min to read • Edit Online The CONTAINSTABLE and FREETEXTTABLE functions return a column named RANK that contains ordinal values from 0 through 1000 (rank values). These values are used to rank the rows returned according to how well they match the selection criteria. The rank values indicate only a relative order of relevance of the rows in the result set, with a lower value indicating lower relevance. The actual values are unimportant and typically differ each time the query is run. NOTE The CONTAINS and FREETEXT predicates do not return any rank values. The number of items matching a search condition is often very large. To prevent CONTAINSTABLE or FREETEXTTABLE queries from returning too many matches, use the optional top_n_by_rank parameter, which returns only a subset of rows. top_n_by_rank is an integer value, n, that specifies that only the n highest ranked matches are to be returned, in descending order. If top_n_by_rank is combined with other parameters, the query could return fewer rows than the number of rows that actually match all the predicates. SQL Server orders the matches by rank and returns only up to the specified number of rows. This choice can result in a dramatic increase in performance. For example, a query that would normally return 100,000 rows from a table of one million rows are processed more quickly if only the top 100 rows are requested. Examples of Using RANK to Limit Search Results Example A: Searching for only the top three matches The following example uses CONTAINSTABLE to return only the top three matches. USE AdventureWorks2012 GO SELECT K.RANK, AddressLine1, City FROM Person.Address AS A INNER JOIN CONTAINSTABLE(Person.Address, AddressLine1, 'ISABOUT ("des*", Rue WEIGHT(0.5), Bouchers WEIGHT(0.9))', 3) AS K ON A.AddressID = K.[KEY] GO Here is the result set. RANK ----------172 172 172 Address -------------------------------9005, rue des Bouchers 5, rue des Bouchers 5, rue des Bouchers (3 row(s) affected) Example B: Searching for the top ten matches City -----------------------------Paris Orleans Metz The following example uses CONTAINSTABLE to return the description of the top 5 products where the Description column contains the word "aluminum" near either the word "light" or the word "lightweight". USE AdventureWorks2012 GO SELECT FT_TBL.ProductDescriptionID, FT_TBL.Description, KEY_TBL.RANK FROM Production.ProductDescription AS FT_TBL INNER JOIN CONTAINSTABLE (Production.ProductDescription, Description, '(light NEAR aluminum) OR (lightweight NEAR aluminum)', 5 ) AS KEY_TBL ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY] GO How Search Query Results Are Ranked Full-text search in SQL Server can generate an optional score (or rank value) that indicates the relevance of the data returned by a full-text query. This rank value is calculated on every row and can be used as an ordering criteria to sort the result set of a given query by relevance. The rank values indicate only a relative order of relevance of the rows in the result set. The actual values are unimportant and typically differ each time the query is run. The rank value does not hold any significance across queries. Statistics for Ranking When an index is built, statistics are collected for use in ranking. The process of building a full-text catalog does not directly result in a single index structure. Instead, the Full-Text Engine for SQL Server creates intermediate indexes as data is indexed. The Full-Text Engine then merges these indexes into a larger index as needed. This process can be repeated many times. The Full-Text Engine then conducts a "master merge" that combines all of the intermediate indexes into one large master index. Statistics are collected at each intermediate index level. The statistics are merged when the indexes are merged. Some statistical values can only be generated during the master merging process. While ranking a query result set, SQL Server uses statistics from the largest intermediate index. This depends on whether intermediate indexes have been merged or not. As a result, ranking statistics can vary in accuracy if the intermediate indexes have not been merged. This explains why the same query can return different rank results over time as full-text indexed data is added, modified, and deleted, and as the smaller indexes are merged. To minimize the size of the index and computational complexity, statistics are often rounded. The list below includes some commonly used terms and statistical values that are important in calculating rank. Property A full-text indexed column of the row. Document The entity that is returned in queries. In SQL Server this corresponds to a row. A document can have multiple properties, just as a row can have multiple full-text indexed columns. Index A single inverted index of one or more documents. This may be entirely in memory or on disk. Many query statistics are relative to the individual index where the match occurred. Full-Text Catalog A collection of intermediate indexes treated as one entity for queries. Catalogs are the unit of organization visible to the SQL Server administrator. Word, token or item The unit of matching in the full-text engine. Streams of text from documents are tokenized into words, or tokens by language-specific word breakers. Occurrence The word offset in a document property as determined by the word breaker. The first word is at occurrence 1, the next at 2, and so on. In order to avoid false positives in phrase and proximity queries, end-of-sentence and end-ofparagraph introduce larger occurrence gaps. TermFrequency The number times the key value occurs in a row. IndexedRowCount Total number of rows indexed. This is computed, based on counts maintained in the intermediate indexes. This number can vary in accuracy. KeyRowCount Total number of rows in the full-text catalog that contain a given key. MaxOccurrence The largest occurrence stored in a full-text catalog for a given property in a row. MaxQueryRank The maximum rank, 1000, returned by the Full-Text Engine. Rank Computation Issues The process of computing rank, depends on a number of factors. Different language word breakers tokenize text differently. For example, the string "dog-house" could be broken into "dog" "house" by one word breaker and into "dog-house" by another. This means that matching and ranking will vary based on the language specified, because not only are the words different, but so is the document length. The document length difference can affect ranking for all queries. Statistics such as IndexRowCount can vary widely. For example, if a catalog has 2 billion rows in the master index, then one new document is indexed into an in-memory intermediate index, and ranks for that document based on the number of documents in the in-memory index could be skewed compared with ranks for documents from the master index. For this reason, it is recommended that after any population that results in large number of rows being indexed or re-indexed the indexes be merged into a master index using the ALTER FULLTEXT CATALOG ... REORGANIZE Transact-SQL statement. The Full-Text Engine will also automatically merge the indexes based on parameters such as the number and size of intermediate indexes. MaxOccurrence values are normalized into 1 of 32 ranges. This means, for example, that a document 50 words long is treated the same as a document 100 words long. Below is the table used for normalization. Because the document lengths are in the range between adjacent table values 32 and 128, they are effectively treated as having the same length, 128 (32 < docLength <= 128). { 16, 32, 128, 256, 512, 725, 1024, 1450, 2048, 2896, 4096, 5792, 8192, 11585, 16384, 23170, 28000, 32768, 39554, 46340, 55938, 65536, 92681, 131072, 185363, 262144, 370727, 524288, 741455, 1048576, 2097152, 4194304 }; Ranking of CONTAINSTABLE CONTAINSTABLE ranking uses the following algorithm: StatisticalWeight = Log2( ( 2 + IndexedRowCount ) / KeyRowCount ) Rank = min( MaxQueryRank, HitCount * 16 * StatisticalWeight / MaxOccurrence ) Phrase matches are ranked just like individual keys except that KeyRowCount (the number of rows containing the phrase) is estimated and can be inaccurate and higher than the actual number. Ranking of NEAR CONTAINSTABLE supports querying for two or more search terms in proximity to each other by using the NEAR option. The rank value of each returned row is based on several parameters. One major ranking factor is the total number of matches (or hits) relative to the length of the document. Thus, for example, if a 100-word document and a 900-word document contain identical matches, the 100-word document is ranked higher. The total length of each hit in a row will also contribute to the ranking of that row based on the distance between the first and last search terms of that hit. The smaller the distance, the more the hit contributes to the rank value of the row. If a full-text query does not specify an integer as the maximum distance, a document that contains only hits whose distances are greater than 100 logical terms apart, will have a ranking of 0. Ranking of ISABOUT CONTAINSTABLE supports querying for weighted terms by using the ISABOUT option. ISABOUT is a vector-space query in traditional information retrieval terminology. The default ranking algorithm used is Jaccard, a widely known formula. The ranking is computed for each term in the query and then combined as described below. ContainsRank = same formula used for CONTAINSTABLE ranking of a single term (above). Weight = the weight specified in the query for each term. Default weight is 1. WeightedSum = Σ[key=1 to n] ContainsRankKey * WeightKey Rank = ( MaxQueryRank * WeightedSum ) / ( ( Σ[key=1 to n] ContainsRankKey^2 ) + ( Σ[key=1 to n] WeightKey^2 ) - ( WeightedSum ) ) Ranking of FREETEXTTABLE FREETEXTTABLE ranking is based on the OKAPI BM25 ranking formula. FREETEXTTABLE queries will add words to the query via inflectional generation (inflected forms of the original query words); these words are treated as separate words with no special relationship to the words from which they were generated. Synonyms generated from the Thesaurus feature are treated as separate, equally weighted terms. Each word in the query contributes to the rank. Rank = Σ[Terms in Query] w ( ( ( k1 + 1 ) tf ) / ( K + tf ) ) * ( ( k3 + 1 ) qtf / ( k3 + qtf ) ) ) Where: w is the Robertson-Sparck Jones weight. In simplified form, w is defined as: w = log10 ( ( ( r + 0.5 ) * ( N – R + r + 0.5 ) ) / ( ( R – r + 0.5 ) * ( n – r + 0.5 ) ) N is the number of indexed rows for the property being queried. n is the number of rows containing the word. K is ( k1 * ( ( 1 – b ) + ( b * dl / avdl ) ) ). dl is the property length, in word occurrences. avdl is the average length of the property being queried, in word occurrences. k1, b, and k3 are the constants 1.2, 0.75, and 8.0, respectively. tf is the frequency of the word in the queried property in a specific row. qtf is the frequency of the term in the query. See Also Query with Full-Text Search Improve the Performance of Full-Text Queries 3/24/2017 • 2 min to read • Edit Online The following is a list of recommendations that will help to improve the performance of full-text queries. The performance of full-text queries is also influenced by hardware resources, such as memory, disk speed, CPU speed, and machine architecture. Defragment the index of the base table by using ALTER INDEX REORGANIZE. Reorganize the full-text catalog by using ALTER FULLTEXT CATALOG REORGANIZE. Make sure that you do this before performance testing because running this statement causes a master merge of the full-text indexes in that catalog. Restrict your choice of full-text key columns to a small column. Although a 900-byte column is supported, we recommend using a smaller key column in a full-text index. int and bigint provide the best performance. Using an integer full-text key avoids a join with the docid mapping table. Therefore, an integer full-text key improves query performance by an order of magnitude and improves crawl performance. Additional performance benefits might result if the full-text key is also the clustered index key. Combine multiple CONTAINS predicates into one CONTAINS predicate. In SQL Server you can specify a list of columns in the CONTAINS query. If you only require full-text key or rank information, use CONTAINSTABLE or FREETEXTTABLE instead of CONTAINS or FREETEXT, respectively. To limit results and increase performance, use the top_n_by_rank parameter of the FREETEXTTABLE and CONTAINSTABLE functions. top_n_by_rank allows you to recall only the most relevant hits. Use this parameter only if your business scenario does not require recalling all possible hits (that is, it does not require total recall). NOTE Total recall is typically necessary for legal scenarios but might be less important than performance for business scenarios such as an e-business. Check the full-text query plan to make sure that the appropriate join plan is chosen. Use a join hint or query hint if you have to. If a parameter is used in the full-text query, the first-time value of the parameter determines the query plan. You can use the OPTIMIZE FOR query hint to force the query to compile with the value you want. This helps achieve a deterministic query plan and better performance. Too many full-text index fragments in the full-text index, can lead to substantial degradation in query performance. To reduce the number of fragments, reorganize the full-text catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement essentially merges all the fragments into a single larger fragment and removes all obsolete entries from the full-text index. In full-text search, logical operators specified in CONTAINSTABLE (AND, OR) can be implemented either as SQL joins or inside the full-text execution streaming table-valued functions (STVF). Typically, queries with only one type of logical operators are implemented purely by full-text execution, whereas queries that mix logical operators also possess SQL joins. Implementation of a logical operator inside the full-text execution STVF uses some special index properties that make it much faster than SQL joins. For this reason, we recommend that, where possible, you frame queries using only a single type of logical operator. For applications that contain selective-relation predications, queries that use selective relational predicates and unselective full-text predicates might perform best when they are written to use the query optimizer. This allows the query optimizer to decide whether it can exploit predicate or range pushdown to produce an effective query plan. This approach is simpler and often more efficient than indexing relational data as fulltext data. Related Resources SQL Server 2008 Full-Text Search: Internals and Enhancements See Also sys.dm_fts_memory_buffers (Transact-SQL) sys.dm_fts_memory_pools (Transact-SQL) Search Document Properties with Search Property Lists 3/24/2017 • 10 min to read • Edit Online The content of document properties was previously indistinguishable from the content of the document body. This limitation restricted full-text queries to generic searches on whole documents. Now, however, you can configure a full-text index to support property-scoped searching on particular properties, such as Author and Title, for supported document types in a varbinary, varbinary(max) (including FILESTREAM), or image binary data column. This form of searching is known as property searching. The associated filter (IFilter) determines whether property searching is possible on a specific type of document. For some document types, the associated IFilter extracts some or all of the properties defined for that type of document, as well as the content of the document body. You can configure a full-text index to support property searching only on properties that are extracted by an IFilter during full-text indexing. Among IFilters that extract a number of document properties are the IFilters for Microsoft Office document types (such as .docx, .xlsx, and .pptx). On the other hand, the XML IFilter does not emit properties. How Full-Text Search Works with Search Properties Internal Property IDs The Full-Text Engine arbitrarily assigns each registered property an internal property ID, which uniquely identifies the property in that particular search list and which is specific to that search property list. Therefore, if a property is added to multiple search property lists, its internal property ID is likely to differ between different lists. When a property is registered for a search list, the Full-Text Engine arbitrarily assigns an internal property ID to the property. The internal property ID is an integer that uniquely identifies the property in that search property list. The following illustration shows a logical view of a search property list that specifies two properties, Title and Keywords. The property-list name for Keywords is "Tags". These properties belong to the same property set, whose GUID is F29F85E0-4FF9-1068-AB91-08002B27B3D9. The property integer identifiers are 2 for Title and 5 for Tags (Keywords). The Full-Text Engine arbitrarily maps each property to an internal property ID that is unique to the search property list. The internal property ID for the Title property is 1, and the internal property ID for the Tags property is 2. The internal property ID is likely to be different from the property integer identifier of the property. If a given property is registered for multiple search property lists, a different internal property ID might be assigned for each search property list. For example, the internal property ID might be 4 in one search property list, 1 in another, 3 in another, and so on. In contrast, the property integer identifier is intrinsic to the property, and it remains the same wherever the property is used. Indexing of Registered Properties After a full-text index is associated with a search property list, the index must be repopulated to index propertyspecific search terms. During full-text indexing, the contents of all properties are stored in the full-text index along with other content. However, when indexing a search term found in a registered property, the full-text indexer also stores the corresponding internal property ID with the term. In contrast, if a property is not registered, it is stored in the full-text index as if it were part of the document body, and it has a value of zero for the internal property ID. The following illustration shows a logical view of how search terms appear in a full-text index that is associated with the search property list shown in the preceding illustration. A sample document, Document 1 contains three properties—Title, Author, and Keywords—as well as the document body. For the properties Title and Keywords, which are specified in the search property list, search terms are associated with their corresponding internal property IDs in the full-text index. In contrast, the content of the Author property is indexed as if it were part of the document body. This means registering a property increases the size of the full-text index somewhat, depending on the amount of content stored in the property. Search terms in the Title property—"Favorite," "Biking," and "Trails"—are associated with the internal property ID assigned to Title for this index, 1. Search terms in the Keywords property—"biking" and "mountain"—are associated with the internal property ID assigned to Tags for this index, 2. For search terms n the Author property —"Jane" and "Doe"—and search terms in the document body, the internal property ID is 0. Note that the term "biking" occurs in the Title property, in the Keywords (Tags) property, and in the document body. A property search for "biking" in the Title or Keywords (Tags) property would return this document in the results. A generic full-text query for "biking" would also return this document, just as if the index were not configured for property searching. A property search for "biking" in the Author property would not return this document. A property-scoped full-text query uses the internal property IDs registered for the current search property list of the full-text index. Impact of Enabling Property Searching Configuring a full-text index to support searching on one or more properties increases the size of the index somewhat, depending on the number of properties you specify in your search property list and on the content of each property. In testing typical corpuses of Microsoft Word, Excel, and PowerPoint documents, we configured a full-text index to index typical search properties. Indexing these properties increased the size of the full-text index size by approximately 5 percent. We anticipate that this approximate size increase will to be typical for most document corpuses. However, ultimately, the size increase will depend on the amount of property data in a given document corpus relative to the amount of overall data. Creating a Search Property List and Enabling Property Search Creating a Search Property List To create a search property list with Transact-SQL Use the CREATE SEARCH PROPERTY LIST (Transact-SQL) statement and provide at least a name the list. To c r e a t e a se a r c h p r o p e r t y l i st i n M a n a g e m e n t St u d i o 1. In Object Explorer, expand the server. 2. Expand Databases, and then expand the database in which you want to create the search property list. 3. Expand Storage, and then right-click Search Property Lists. 4. Select New Search Property List. 5. Specify the property list name. 6. Optionally, specify someone else as the property list owner. 7. Select one of the following options: Create an empty search property list Create from an existing search property list For more information, see New Search Property List. 8. Click OK. Adding Properties to a Search Property List Property searching requires creating a search property list and specifying one or more properties that you want to make searchable. When you add a property to a search property list, the property is registered for that particular list. To add a property to a search property list you need the following values: Property set GUID Each search property belongs to single property set that contains a group of related properties. Each property set is identified by a globally unique identifier (GUID). Property integer identifier Each search property possesses an identifier that is unique within the property set. Note that for a given property, the identifier could be either an integer or a string, however full-text search supports only integer identifiers. Property name This is the name that users will specify in full-text queries to search on the property. A property name can contain internal spaces. The maximum length is 256 characters. The property name can be any of the following: The Windows canonical name of the property, such as System.Author or System.Contact.HomeAddress. A user-friendly name that is easy for your users to remember. Some properties are associated with a well-known user-friendly name, such as "Author" or "Home Address," but you can specify whatever name is most appropriate to your users. NOTE A given combination of property set GUID and property identifier must be unique in a given search property list. This means that you cannot add the same property more than once with different names or descriptions. Property description (optional) When adding a search property to a search property list, you can supply an optional description. For example, you might want to provide information about a property that is not evident from its name, or you might want to describe the property set of the property. To obtain values for a search property list See Find Property Set GUIDs and Property Integer IDs for Search Properties. To add a property to a search property list with Transact-SQL Use the ALTER SEARCH PROPERTY LIST (Transact-SQL) statement with the values that you obtained by using one of the methods described in the topic, Find Property Set GUIDs and Property Integer IDs for Search Properties. The following example demonstrates the use of these values when adding a property to a search property list: ALTER SEARCH PROPERTY LIST DocumentTablePropertyList ADD 'Title' WITH ( PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 2, PROPERTY_DESCRIPTION = 'System.Title - Title of the item.' ); To add a property to a search property list in Management Studio Use the Search Property List Properties dialog box to add and remove search properties. You can find Search Property Lists in Object Explorer under the Storage node of the associated database. Associating a Search Property List with a Full-Text Index For a full-text index to support property searching on the properties that are registered for a search property list, you need to associate the search property list with the index and repopulate the index. Repopulating the full-text index creates property-specific index entries for search terms in each of the registered properties. As long as the full-text index remains associated with this search property list, full-text query can use the PROPERTY option of the CONTAINS predicate to search on properties that are registered for that search property list. If you change the search property list associated with a full-text index, then the index must be rebuilt to bring it into a consistent state. The index is truncated immediately and is empty until the full population runs. For more information about when changing the search property list causes rebuilding the index, see "Remarks," in ALTER FULLTEXT INDEX (Transact-SQL). To associate a search property list with a full-text index with Transact-SQL Use the ALTER FULLTEXT INDEX (Transact-SQL) statement with the SET SEARCH PROPERTY LIST = <property_list_name> clause. To associate a search property list with a full-text index with Management Studio Specify a value for Search Property List on the General page of the Full-Text Index Properties dialog box. Querying Search Properties with CONTAINS The basic CONTAINS syntax for a property-scoped full-text query is as follows: SELECT column_name FROM table_name WHERE CONTAINS ( PROPERTY ( column_name, 'property_name' ), '<contains_search_condition>' ) For example, the following query searches on an indexed property, Title , in the Document column of the Production.Document table of the AdventureWorks database. The query returns only documents whose Title property contains the string Maintenance or Repair USE AdventureWorks GO SELECT Document FROM Production.Document WHERE CONTAINS ( PROPERTY ( Document, 'Title' ), 'Maintenance OR Repair') GO This example assumes that the IFilter for the document extracts its Title property, that the Title property is added to the search property list, and that the search property list is associated with the full-text index. Managing Search Property Lists Viewing and Changing a Search Property List To change a search property list with Transact-SQL Use the ALTER SEARCH PROPERTY LIST (Transact-SQL) statement to add or remove search properties. To v i e w a n d c h a n g e a se a r c h p r o p e r t y l i st i n M a n a g e m e n t St u d i o 1. In Object Explorer, expand the server. 2. Expand Databases, and then expand the database. 3. Expand Storage. 4. Expand Search Property Lists to display the search property lists. 5. Right-click the property list, and select Properties. 6. In the Search Property List Editor dialog box, use the Properties grid to add or remove search properties: a. To remove a document property, click the row header to the left of the property, and press DEL . b. To add a document property, click in the empty row at the bottom of the list, to the right of the \*, and enter the values for the new property. For information about these values, see Search Property List Editor. For information about how to obtain these values for properties defined by Microsoft, see Find Property Set GUIDs and Property Integer IDs for Search Properties. For information about properties defined by an independent software vendor (ISV), see the documentation of that vendor. 7. Click OK. Deleting a Search Property List You cannot drop a property list from a database while the list is associated with any full-text index. To delete a search property list with Transact-SQL Use the DROP SEARCH PROPERTY LIST (Transact-SQL) statement. To d e l e t e a se a r c h p r o p e r t y l i st i n M a n a g e m e n t St u d i o 1. In Object Explorer, expand the server. 2. Expand Databases, and then expand the database. 3. Expand Storage, and then expand the Search Property Lists node. 4. Right-click the property list that you want to delete, and click Delete. 5. Click OK. See Also Find Property Set GUIDs and Property Integer IDs for Search Properties Configure and Manage Filters for Search Find Property Set GUIDs and Property Integer IDs for Search Properties 3/24/2017 • 4 min to read • Edit Online This topic discusses how to obtain the values that are required before you can add a property to a search property list and make it searchable by full-text search. These values include the property set GUID and property integer identifier of a document property. Document properties that are extracted by IFilters from binary data – that is, from data stored in a varbinary, varbinary(max) (including FILESTREAM), or image data type column – can be made available for full-text search. To make an extracted property searchable, the property must be manually added to a search property list. The search property list must also be associated with one or more full-text indexes. For more information, see Search Document Properties with Search Property Lists. Before you can add an available property to a property list, you have to find 2 pieces of information about the property: The property set GUID of the property. The integer ID of the property. (When you add a property to a property list, you also have to provide a name and description. However you do not have to use the canonical name and description of the property.) This topic describes the commonly-used methods to find information about available properties, especially about properties that are defined by Microsoft. For information about properties that have been defined by a third party, refer to the third-party documentation or contact the vendor. Finding Information about Widely Used, Well-Known Microsoft Properties Microsoft defines hundreds of document properties for use in many contexts, but only a small subset of the available properties are used by each file format. Among the frequently used Windows properties is a small set of generic properties. Some examples of well-known generic properties are shown in the following table. The table shows the well-known name, the Windows canonical name (from the property description published by Microsoft), the property set GUID, the property integer identifier, and a brief description. WELL-KNOWN NAME WINDOWS CANONICAL NAME Authors PROPERTY SET GUID INTEGER ID DESCRIPTION System.Author F29F85E0-4FF91068-AB9108002B27B3D9 4 Author or authors of a given item. Tags System.Keywords F29F85E0-4FF91068-AB9108002B27B3D9 5 Set of keywords (also known as tags) assigned to the item. Type System.PerceivedTy pe 28636AA6-953D11D2-B5D600C04FD918D0 9 Perceived file type based on its canonical type. WELL-KNOWN NAME WINDOWS CANONICAL NAME Title System.Title PROPERTY SET GUID INTEGER ID DESCRIPTION F29F85E0-4FF91068-AB9108002B27B3D9 2 Title of the item. For example, the title of a document, the subject of a message, the caption of a photo, or the name of a music track. To encourage consistency among file formats, Microsoft has identified subsets of frequently used, high-priority document properties for several categories of documents. These include communications, contacts, documents, music files, pictures, and videos. For more information about the top-ranked properties for each category, see system-defined properties for custom file formats in the Windows Search documentation. A specific file format might implement properties of three types: Generic properties defined by Microsoft. Category-specific properties defined by Microsoft. Custom, application-specific properties defined by the software vendor. Finding Information about Available Properties by using FILTDUMP.EXE To learn what properties are discovered and extracted by an installed IFilter, you can install and run the filtdump.exe utility, which is part of the Microsoft Windows SDK. You run filtdump.exe from the command prompt and provide a single argument. This argument is the name of an individual file that has a file type for which an IFilter is installed. The utility displays a list of all the properties discovered by the IFilter in the document, with their property set GUIDs, integer IDs, and additional information. For information about installing this software, see Microsoft Windows SDK for Windows 7 and .NET Framework 4. After you download and install the SDK, look in the following folders for the filtdump.exe utility. For the 64-bit version, look in C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\x64 For the 32-bit version, look in C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin . . Finding Values for a Search Property from a Windows Property Description For a well-known Windows search property, you can obtain the information that you need from the formatID and propID attributes of the property description (propertyDescription). The following example shows the relevant part of a typical Microsoft property description, in this case, of the System.Author property. The formatID attribute specifies the property set GUID, F29F85E0-4FF9-1068-AB91-08002B27B3D9 , and the propID attribute specifies the property integer ID, 4. Notice that the name attribute specifies the Windows canonical property name, System.Author . (This example omits portions of the property description that are not relevant.) . propertyDescription name = System.Author … formatID = F29F85E0-4FF9-1068-AB91-08002B27B3D9 propID = 4 … For the complete description of this property, see System.Author in the Windows Search documentation. For a complete list of Windows properties, see Windows Properties, also in the Windows Search documentation. Adding a Property to a Search Property List The following example shows how to add a property to a search property list. The example uses an ALTER SEARCH PROPERTY LIST statement to add the System.Author property to a search property list named PropertyList1 , and provides a user friendly name for the property, Author . ALTER SEARCH PROPERTY LIST PropertyList1 ADD 'Author' WITH ( PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = 'System.Author - the author or authors of the item' ) GO For more information about creating a search property list and associating it with a full-text index, see Search Document Properties with Search Property Lists. See Also Search Document Properties with Search Property Lists Configure and Manage Filters for Search Create and Manage Full-Text Catalogs 3/31/2017 • 2 min to read • Edit Online A full-text catalog is a logical container for a group of full-text indexes. You have to create a full-text catalog before you can create a full-text index. A full-text catalog is a virtual object that does not belong to any filegroup. Create a Full-Text Catalog Create a full-text catalog with Transact-SQL Use CREATE FULLTEXT CATALOG. For example: USE AdventureWorks; GO CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT; GO Create a full-text catalog with Management Studio 1. In Object Explorer, expand the server, expand Databases, and expand the database in which you want to create the full-text catalog. 2. Expand Storage, and then right-click Full Text Catalogs. 3. Select New Full-Text Catalog. 4. In the New Full-Text Catalog dialog box, specify the information for the catalog that you are re-creating. For more information, see New Full-Text Catalog (General Page). NOTE Full-text catalog IDs begin at 00005 and are incremented by one for each new catalog created. 5. Click OK. Get the properties of a full-text catalog Use the Transact-SQL function FULLTEXTCATALOGPROPERTY to get the value of various properties related to full-text catalogs. For more info, see FULLTEXTCATALOGPROPERTY. For example, run the following query to get the count of indexes in the full-text catalog Catalog1 . USE <database>; GO SELECT fulltextcatalogproperty('Catalog1', 'ItemCount'); GO The following table lists the properties that are related to full-text catalogs. This information may be useful for administering and troubleshooting full-text search. PROPERTY DESCRIPTION AccentSensitivity Accent-sensitivity setting. ImportStatus Whether the full-text catalog is being imported. IndexSize Size of the full-text catalog in megabytes (MB). ItemCount Number of full-text indexed items currently in the full-text catalog. MergeStatus Whether a master merge is in progress. PopulateCompletionAge Difference in seconds between the completion of the last fulltext index population and 01/01/1990 00:00:00. PopulateStatus Populate status. This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. UniqueKeyCount Number of unique keys in the full-text catalog. Rebuild a full-text catalog Run the Transact-SQL statement ALTER FULLTEXT CATALOG ... REBUILD, or do the following things in SQL Server Management Studio (SSMS). 1. In SSMS, in Object Explorer, expand the server, expand Databases, and then expand the database that contains the full-text catalog that you want to rebuild. 2. Expand Storage, and then expand Full Text Catalogs. 3. Right-click the name of the full-text catalog that you want to rebuild, and select Rebuild. 4. To the question Do you want to delete the full-text catalog and rebuild it?, click OK. 5. In the Rebuild Full-Text Catalog dialog box, click Close. Rebuild all full-text catalogs for a database 1. In SSMS, in Object Explorer, expand the server, expand Databases, and then expand the database that contains the full-text catalogs that you want to rebuild. 2. Expand Storage, and then right-click Full Text Catalogs. 3. Select Rebuild All. 4. To the question, Do you want to delete all full-text catalogs and rebuild them?, click OK. 5. In the Rebuild All Full-Text Catalogs dialog box, click Close. Remove a full-text catalog from a database Run the Transact-SQL statement DROP FULLTEXT CATALOG, or do the following things in SQL Server Management Studio (SSMS). 1. In SSMS, in Object Explorer, expand the server, expand Databases, and expand the database that contains the full-text catalog you want to remove. 2. Expand Storage, and expand Full Text Catalogs. 3. Right-click the full-text catalog that you want to remove, and then select Delete. 4. In the Delete Objects dialog box, click OK. Next step Create and Manage Full-Text Indexes Create and Manage Full-Text Indexes 3/24/2017 • 8 min to read • Edit Online This topic describes how to create, populate, and manage full-text indexes in SQL Server. Prerequisite - Create a full-text catalog Before you can create a full-text index, you have to have a full-text catalog. The catalog is a virtual container for one or more full-text indexes. For more info, see Create and Manage Full-Text Catalogs. Create, alter, or drop a full-text index Create a full-text index CREATE FULLTEXT INDEX (Transact-SQL) Alter a full-text index ALTER FULLTEXT INDEX (Transact-SQL) Drop a full-text index DROP FULLTEXT INDEX (Transact-SQL) Populate a full-text index The process of creating and maintaining a full-text index is called a population (also known as a crawl). There are three types of full-text index population: Full population Population based on change tracking Incremental population based on a timestamp. For more info, see Populate Full-Text Indexes. View the properties of a full-text index View the properties of a full-text index with Transact-SQL CATALOG OR DYNAMIC MANAGEMENT VIEW DESCRIPTION sys.fulltext_index_catalog_usages (Transact-SQL) Returns a row for each full-text catalog to full-text index reference. sys.fulltext_index_columns (Transact-SQL) Contains a row for each column that is part of a full-text index. sys.fulltext_index_fragments (Transact-SQL) A fulltext index uses internal tables called full-text index fragments to store the inverted index data. This view can be used to query the metadata about these fragments. This view contains a row for each full-text index fragment in every table that contains a full-text index. sys.fulltext_indexes (Transact-SQL) Contains a row per full-text index of a tabular object. CATALOG OR DYNAMIC MANAGEMENT VIEW DESCRIPTION sys.dm_fts_index_keywords (Transact-SQL) Returns information about the content of a full-text index for the specified table. sys.dm_fts_index_keywords_by_document (Transact-SQL) Returns information about the document-level content of a full-text index for the specified table. A given keyword can appear in several documents. sys.dm_fts_index_population (Transact-SQL) Returns information about the full-text index populations currently in progress. View the properties of a full-text index with Management Studio 1. In Management Studio, in Object Explorer, expand the server. 2. Expand Databases, and then expand the database that contains the full-text index. 3. Expand Tables. 4. Right-click the table on which the full-text index is defined, select Full-Text index, and on the Full-Text index context menu, click Properties. This opens the Full-text index Properties dialog box. 5. In the Select a page pane, you can select any of the following pages: PAGE DESCRIPTION General Displays basic properties of the full-text index. These include several modifiable properties and a number of unchangeable properties such as database name, table name, and the name of full-text key column. The modifiable properties are: Full-Text Index Stoplist Full-Text Indexing Enabled Change Tracking Search Property List For more info, see Full-Text Index Properties (General Page). Columns Displays the table columns that are available for full-text indexing. The selected column or columns are full-text indexed. You can select as many of the available columns as you want to include in the full-text index. For more info, see Full-Text Index Properties (Columns Page). Schedules Use this page to create or manage schedules for a SQL Server Agent job that starts an incremental table population for the full-text index populations. For more info, see Populate Full-Text Indexes. Note: After you exit the Full-Text Index Properties dialog box, any newly created schedule is associated with a SQL Server Agent job (Start Incremental Table Population on database_name.table_name). 6. Click OK. to save any changes and exit the Full-text index Properties dialog box. View the properties of indexed tables and columns Several Transact-SQL functions such as OBJECTPROPERTYEX can be used to obtain the value of various full-text indexing properties. This information is useful for administering and troubleshooting full-text search. The following table lists the full-text properties related to indexed tables and columns and their related TransactSQL functions. PROPERTY DESCRIPTION FUNCTION FullTextTypeColumn TYPE COLUMN in the table that holds the document type information of the column. COLUMNPROPERTY IsFulltextIndexed Whether a column has been enabled for full-text indexing. COLUMNPROPERTY IsFulltextKey Whether the index is the full-text key for a table. INDEXPROPERTY TableFulltextBackgroundUpdateInd exOn Whether a table has full-text background update indexing. OBJECTPROPERTYEX TableFulltextCatalogId Full-text catalog ID in which the fulltext index data for the table resides. OBJECTPROPERTYEX TableFulltextChangeTrackingOn Whether a table has full-text changetracking enabled. OBJECTPROPERTYEX TableFulltextDocsProcessed Number of rows processed since the start of full-text indexing. OBJECTPROPERTYEX TableFulltextFailCount Number of rows Full-Text Search did not index. OBJECTPROPERTYEX TableFulltextItemCount Number of rows that were successfully full-text indexed. OBJECTPROPERTYEX TableFulltextKeyColumn The column ID of the full-text unique key column. OBJECTPROPERTYEX TableFullTextMergeStatus Whether a table that has a full-text index is currently in merging. OBJECTPROPERTYEX TableFulltextPendingChanges Number of pending change tracking entries to process. OBJECTPROPERTYEX TableFulltextPopulateStatus Population status of a full-text table. OBJECTPROPERTYEX TableHasActiveFulltextIndex Whether a table has an active full-text index. OBJECTPROPERTYEX Get info about the full-text key column Typically, the result of CONTAINSTABLE or FREETEXTTABLE rowset-valued functions need to be joined with the base table. In such cases, you need to know the unique key column name. You can inquire whether a given unique index is used as the full-text key, and you can obtain the identifier of the full-text key column. Determine whether a given unique index is used as the full-text key column Use a SELECT statement to call the INDEXPROPERTY function. In the function call use the OBJECT_ID function to convert the name of the table (table_name) into the table ID, specify the name of a unique index for the table, and specify the IsFulltextKey index property, as follows: SELECT INDEXPROPERTY( OBJECT_ID('table_name'), 'index_name', 'IsFulltextKey' ); This statement returns 1 if the index is used to enforce uniqueness of the full-text key column and 0 if it is not. Example The following example inquires whether the the full-text key column, as follows: PK_Document_DocumentID index is used to enforce the uniqueness of USE AdventureWorks GO SELECT INDEXPROPERTY ( OBJECT_ID('Production.Document'), 'PK_Document_DocumentID', 'IsFulltextKey' ) This example returns 1 if the PK_Document_DocumentID index is used to enforce uniqueness of the full-text key column. Otherwise, it returns 0 or NULL. NULL implies you are using an invalid index name, the index name does not correspond to the table, the table does not exist, or so forth. Find the identifier of the full-text key column Each full-text enabled table has a column that is used to enforce unique rows for the table (the uniquekey column). The TableFulltextKeyColumn property, obtained from the OBJECTPROPERTYEX function, contains the column ID of the unique key column. To obtain this identifier, you can use a SELECT statement to call the OBJECTPROPERTYEX function. Use the OBJECT_ID function to convert the name of the table (table_name) into the table ID and specify the TableFulltextKeyColumn property, as follows: SELECT OBJECTPROPERTYEX(OBJECT_ID( 'table_name'), 'TableFulltextKeyColumn' ) AS 'Column Identifier'; Examples The following example returns the identifier of the full-text key column or NULL. NULL implies that you are using an invalid index name, the index name does not correspond to the table, the table does not exist, or so forth. USE AdventureWorks; GO SELECT OBJECTPROPERTYEX(OBJECT_ID('Production.Document'), 'TableFulltextKeyColumn'); GO The following example shows how to use the identifier of the unique key column to obtain the name of the column. USE AdventureWorks; GO DECLARE @key_column sysname SET @key_column = Col_Name(Object_Id('Production.Document'), ObjectProperty(Object_id('Production.Document'), 'TableFulltextKeyColumn') ) SELECT @key_column AS 'Unique Key Column'; GO This example returns a result set column named Unique Key Column , which displays a single row containing the name of the unique key column of the Document table, DocumentID. Note that if this query contained an invalid index name, the index name did not correspond to the table, the table did not exist, and so forth, it would return NULL. Index varbinary(max) and xml columns If a varbinary(max), varbinary, or xml column is full-text indexed, it can be queried using the full-text predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE), like any other full-text indexed column. Index varbinary(max) or varbinary data A single varbinary(max) or varbinary column can store many types of documents. SQL Server supports any document type for which a filter is installed and available in the operative system. The document type of each document is identified by the file extension of the document. For example, for a .doc file extension, full-text search uses the filter that supports Microsoft Word documents. For a list of available document types, query the sys.fulltext_document_types catalog view. Note that the Full-Text Engine can leverage existing filters that are installed in the operating system. Before you can use operating-system filters, word breakers, and stemmers, you must load them in the server instance, as follows: EXEC sp_fulltext_service @action='load_os_resources', @value=1 To create a full-text index on a varbinary(max) column, the Full-Text Engine needs access to the file extensions of the documents in the varbinary(max) column. This information must be stored in a table column, called a type column, that must be associated with the varbinary(max) column in the full-text index. When indexing a document, the Full-Text Engine uses the file extension in the type column to identify which filter to use. Index xml data An xml data type column stores only XML documents and fragments, and only the XML filter is used for the documents. Therefore, a type column is unnecessary. On xml columns, the full-text index indexes the content of the XML elements, but ignores the XML markup. Attribute values are full-text indexed unless they are numeric values. Element tags are used as token boundaries. Well-formed XML or HTML documents and fragments containing multiple languages are supported. For more info about indexing and querying on an xml column, see Use Full-Text Search with XML Columns. Disable or re-enable tull-text indexing for a table In SQL Server, all user-created databases are full-text enabled by default. Additionally, an individual table is automatically enabled for full-text indexing as soon as a full-text index is created on it and a column is added to the index. A table is automatically disabled for full-text indexing when the last column is dropped from its full-text index. On a table that has a full-text index, you can manually disable or re-enable a table for full-text indexing using SQL Server Management Studio. 1. Expand the server group, expand Databases, and expand the database that contains the table you want to enable for full-text indexing. 2. Expand Tables, and right-click the table that you want to disable or re-enable for full-text indexing. 3. Select Full-Text index, and then click Disable Full-Text index or Enable Full-Text index. Remove a full-text index from a table 1. In Object Explorer, right-click the table that has the full-text index that you want to delete. 2. Select Delete Full-Text index. 3. When prompted, click OK to confirm that you want to delete the full-text index. Choose a Language When Creating a Full-Text Index 3/24/2017 • 7 min to read • Edit Online When creating a full-text index, you need to specify a column-level language for the indexed column. The word breaker and stemmers of the specified language will be used by full-text queries on the column. There are a couple of things to consider when choosing the column language when creating a full-text index. These considerations relate to how your text is tokenized and then indexed by Full-Text Engine. NOTE To specify a column-level language for a column of full-text index, use the LANGUAGE language_term clause when specifying the column. For more information, see CREATE FULLTEXT INDEX (Transact-SQL) and ALTER FULLTEXT INDEX (Transact-SQL). Language Support in Full-Text Search This section provides an introduction to word breakers and stemmers, and discusses how full-text search uses the LCID of the column-level language. Introduction to Word Breakers and Stemmers SQL Server 2008 and later versions include a complete new family of word breakers and stemmers that are significantly better than those previously available in SQL Server. NOTE The Microsoft Natural Language Group (MS NLG) implemented and supports these new linguistic components. The new word breakers provide the following benefits: Robustness Testing has shown that the new word breakers are robust in high-pressure query environments. Security The new word breakers are enabled by default in SQL Server thanks to security improvements in linguistic components. We highly recommend that external components such as word breakers and filters be signed to improve the overall security and robustness of SQL Server. You can configure full-text to verify that these components are signed as follows: EXEC sp_fulltext_service 'verify_signature'; Quality Word breakers have been redesigned, and testing has shown that the new word breakers provide better semantic quality than previous word breakers. This increases the recall accuracy. Coverage for a vast list of languages, word breakers are included in SQL Server out of the box and enabled by default . For a list of the languages for which SQL Server includes a word breaker and stemmers, see sys.fulltext_languages (Transact-SQL). How Full-Text Search Uses the Name of the Column-Level Language When creating a full-text index, you need to specify a valid language name for each column. If a language name is valid but not returned by the sys.fulltext_languages (Transact-SQL) catalog view, full-text search falls back to the closest available language name of the same language family, if any. Otherwise, full-text search falls back to the Neutral word breaker. This fall-back behavior might affect the recall accuracy. Therefore we strongly recommend that you specify a valid and available language name for each column when creating a full-text index. NOTE The LCID is used against all data types eligible for full-text indexing (such as char or nchar). If you have the sort order of a char, varchar, or text type column set to a language setting different from the language identified by the LCID, the LCID is used anyway during full-text indexing and querying of those columns. Word Breaking A word breaker tokenizes the text being indexed on word boundaries, which are language specific. Therefore, word-breaking behavior differs among different languages. If you use one language, x, to index a number of languages {x, y, and ,z}, some of the behavior might cause unexpected results. For example, a dash (-) or a comma (,) might be a word-break element that will be thrown away in one language but not in another. Also rarely unexpected stemming behavior might occur because a given word might stem differently in different language. For example, in the English language, word boundaries are typically white space or some form of punctuation. In other languages, such as German, words or characters may be combined together. Therefore, the column-level language that you choose should represent the language that you expect will be stored in rows of that column. Western Languages For the Western family of languages, if you are unsure which languages will be stored in a column or you expect more than one to be stored, a general workaround is to use the word breaker for the most complex language that might be stored in the column. For instance, you might expect to store English, Spanish and German content in a single column. These three Western languages possess very similar word-breaking patterns, with the German patterns being the most complex. Therefore, a good choice is this case would be to use the German word breaker, which should be able to process English and Spanish text correctly. In contrast, the English word breaker might not process German text perfectly because of the compound words of German. Note that using the word breaker of the most complex language in a language family does not guarantee perfect indexing of every language in the family. Corner cases might exist in which the most complex word breaker cannot correctly handle text written in another language. Non Western Languages For non Western languages (such as Chinese, Japanese, Hindi, and so forth) the above workaround does not necessarily work, for linguistic reasons. For non Western languages, consider one of the following workarounds: For languages from different families If a column might contain dramatically different languages, for example, Spanish and Japanese, consider storing the content of different languages in separate columns. This would allow you to use the languagespecific word breaker for each column. If you choose this solution and you don't know the query language at query time, you might need to issue the query against both columns to ensure that the query finds the right row or document. For Binary content (such as Microsoft Word documents) When the indexed content is of binary type, the full-text search filter that processes the textual content before sending it to the word breaker might honor specific language tags existing within the binary file. In this case, at indexing time, the filter will emit the right LCID for a document or section of a document. The Full-Text Engine will then call the word breaker for the language with that LCID. However, after indexing multi language content, we recommend that you verify that the content was correctly indexed. For plain text content When your content is plain text, you can convert it to the xml data type and add language tags that indicate the language corresponding to each specific document or document section. For this to work, however, you need to know the language before full-text indexing. Stemming An additional consideration when choosing your column-level language is stemming. Stemming in full-text queries is the process of searching for all stemmed (inflectional) forms of a word in a particular language. When you use a generic word breaker to process several languages, the stemming process works only for the language specified for the column, not for other languages in the column. For example, German stemmers do not work for English or Spanish (and so forth). This might affect your recall depending of which language you choose at query time. Effect of Column Type on Full-Text Search Another consideration in language choice is related to how the data is represented. For data that is not stored in varbinary(max) column, no special filtering is performed. Rather, the text is generally passed through the word breaking component as-is. Also, word breakers are designed mainly to process written text. So, if you have any type of markup (such as HTML) on your text, you may not get great linguistic accuracy during indexing and search. In that case, you have two choices—the preferred method is simply to store the text data in varbinary(max) column, and to indicate its document type so it may be filtered. If this is not an option, you may consider using the neutral word breaker and, if possible, adding markup data (such as 'br' in HTML) to your noise word lists. NOTE Language based stemming does not come into play when you specify the neutral language. Specifying a Non-default Column-Level Language in a Full-Text Query By default, in SQL Server, full-text search will parse the query terms using the language specified for each column that is included in the full-text clause. To override this behavior, specify a nondefault language at query time. For supported languages whose resources are installed, the LANGUAGE language_term clause of a CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE query can be used to specify the language used for word breaking, stemming, thesaurus, and stopword processing of the query terms. See Also CONTAINS (Transact-SQL) CONTAINSTABLE (Transact-SQL) Data Types (Transact-SQL) FREETEXT (Transact-SQL) FREETEXTTABLE (Transact-SQL) Configure and Manage Filters for Search sp_fulltext_service (Transact-SQL) sys.fulltext_languages (Transact-SQL) Configure and Manage Word Breakers and Stemmers for Search Populate Full-Text Indexes 3/30/2017 • 8 min to read • Edit Online Creating and maintaining a full-text index involves populating the index by using a process called a population (also known as a crawl). Types of population A full-text index supports the following types of population: Full population Automatic or manual population based on change tracking Incremental population based on a timestamp Full population During a full population, index entries are built for all the rows of a table or indexed view. A full population of a full-text index, builds index entries for all the rows of the base table or indexed view. By default, SQL Server populates a new full-text index fully as soon as it is created. On the one hand, a full population can consume a significant amount of resources. Therefore, when creating a full-text index during peak periods, it is often a best practice to delay the full population until an off-peak time, particularly if the base table of an full-text index is large. On the other hand, the full-text catalog to which the index belongs is not usable until all of its full-text indexes are populated. To create a full-text index without populating it immediately, specify the CHANGE_TRACKING OFF, NO POPULATION clause in the CREATE FULLTEXT INDEX statement. If you specify CHANGE_TRACKING MANUAL , the Full-Text Engine doesn't populate the new full-text index until you execute an ALTER FULLTEXT INDEX statement using the START FULL POPULATION or START INCREMENTAL POPULATION clause. Example - Create a full-text index without running a full population The following example creates a full-text index on the Production.Document table of the AdventureWorks sample database. This example uses WITH CHANGE_TRACKING OFF, NO POPULATION to delay the initial full population. CREATE UNIQUE INDEX ui_ukDoc ON Production.Document(DocumentID); CREATE FULLTEXT CATALOG AW_Production_FTCat; CREATE FULLTEXT INDEX ON Production.Document ( Document --Full-text index column name TYPE COLUMN FileExtension --Name of column that contains file type information Language 1033 --1033 is LCID for the English language ) KEY INDEX ui_ukDoc ON AW_Production_FTCat WITH CHANGE_TRACKING OFF, NO POPULATION; GO Example - Run a full population on a table The following example runs a full population on the database. Production.Document table of the AdventureWorks sample ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION; Population based on change tracking Optionally, you can use change tracking to maintain a full-text index after its initial full population. There is a small overhead associated with change tracking because SQL Server maintains a table in which it tracks changes to the base table since the last population. When you use change tracking, SQL Server maintains a record of the rows in the base table or indexed view that have been modified by updates, deletes, or inserts. Data changes made through WRITETEXT and UPDATETEXT are not reflected in the full-text index, and are not picked up with change tracking. NOTE For tables containing a timestamp column, you can use incremental population instead of change tracking. When you enable change tracking during index creation, SQL Server fully populates the new full-text index immediately after it is created. Thereafter, changes are tracked and propagated to the full-text index. Enable change tracking There are two types of change tracking: Automatic ( CHANGE_TRACKING AUTO option). Automatic change tracking is the default behavior. Manual ( CHANGE_TRACKING MANUAL option). The type of change tracking determines how the full-text index is populated, as follows: Automatic population By default, or if you specify CHANGE_TRACKING AUTO , the Full-Text Engine uses automatic population on the full-text index. After the initial full population completes, changes are tracked as data is modified in the base table, and the tracked changes are propagated automatically. The full-text index is updated in the background, however, so propagated changes might not be reflected immediately in the index. To start tracking changes with automatic population CREATE FULLTEXT INDEX … WITH CHANGE_TRACKING AUTO ALTER FULLTEXT INDEX … SET CHANGE_TRACKING AUTO Example - Alter a full-text index to use automatic change tracking The following example changes the full-text index of the HumanResources.JobCandidate table of the AdventureWorks sample database to use change tracking with automatic population. USE AdventureWorks; GO ALTER FULLTEXT INDEX ON HumanResources.JobCandidate SET CHANGE_TRACKING AUTO; GO Manual population If you specify CHANGE_TRACKING MANUAL, the Full-Text Engine uses manual population on the full-text index. After the initial full population completes, changes are tracked as data is modified in the base table. However, they are not propagated to the full-text index until you execute an ALTER FULLTEXT INDEX … START UPDATE POPULATION statement. You can use SQL Server Agent to call this Transact-SQL statement periodically. To start tracking changes with manual population CREATE FULLTEXT INDEX … WITH CHANGE_TRACKING MANUAL ALTER FULLTEXT INDEX … SET CHANGE_TRACKING MANUAL Example - Create a full-text index with manual change tracking The following example creates a full-text index that will use change tracking with manual population on the HumanResources.JobCandidate table of the AdventureWorks sample database. USE AdventureWorks; GO CREATE UNIQUE INDEX ui_ukJobCand ON HumanResources.JobCandidate(JobCandidateID); CREATE FULLTEXT CATALOG ft AS DEFAULT; CREATE FULLTEXT INDEX ON HumanResources.JobCandidate(Resume) KEY INDEX ui_ukJobCand WITH CHANGE_TRACKING=MANUAL; GO Example - Run a manual population The following example runs a manual population on the change-tracked full-text index of the HumanResources.JobCandidate table of the AdventureWorks sample database. USE AdventureWorks; GO ALTER FULLTEXT INDEX ON HumanResources.JobCandidate START UPDATE POPULATION; GO Disable change tracking CREATE FULLTEXT INDEX … WITH CHANGE_TRACKING OFF ALTER FULLTEXT INDEX … SET CHANGE_TRACKING OFF Incremental population based on a timestamp An incremental population is an alternative mechanism for manually populating a full-text index. If a table experiences a high volume of inserts, using incremental population can be more efficient that using manual population. You can run an incremental population for a full-text index that has CHANGE_TRACKING set to MANUAL or OFF. The requirement for incremental population is that the indexed table must have a column of the timestamp data type. If a timestamp column does not exist, incremental population cannot be performed. SQL Server uses the timestamp column to identify rows that have changed since the last population. The incremental population then updates the full-text index for rows added, deleted, or modified after the last population, or while the last population was in progress. At the end of a population, the Full-Text Engine records a new timestamp value. This value is the largest timestamp value that SQL Gatherer has found. This value will be used when the next incremental population starts. In some cases, the request for an incremental population results in a full population. A request for incremental population on a table without a timestamp column results in a full population operation. If the first population on a full-text index is an incremental population, it indexes all rows, making it equivalent to a full population. If any metadata that affects the full-text index for the table has changed since the last population, incremental population requests are implemented as full populations. This includes metadata changes caused by altering any column, index, or full-text index definitions. Run an incremental population To run an incremental population, execute an START INCREMENTAL POPULATION clause. ALTER FULLTEXT INDEX statement using the Create or change a schedule for incremental population 1. In Management Studio, in Object Explorer, expand the server. 2. Expand Databases, and then expand the database that contains the full-text index. 3. Expand Tables. Right-click the table on which the full-text index is defined, select Full-Text index, and on the Full-Text index context menu, click Properties. This opens the Full-text index Properties dialog box. IMPORTANT If the base table or view does not contain a column of the timestamp data type, incremental population is not possible. 4. In the Select a page pane, select Schedules. Use this page to create or manage schedules for a SQL Server Agent job that starts an incremental table population on the base table or indexed view of the full-text index. The options are as follows: To create a new schedule, click New. This opens the New Full-Text Indexing Table Schedule dialog box, where you can create a schedule. To save the schedule, click OK. IMPORTANT A SQL Server Agent job (Start Incremental Table Population on database_name.table_name) is associated with a new schedule after you exit the Full-Text Index Properties dialog box. If you create multiple schedules for the same full-text index, they all use the same job. To change an existing schedule, select the existing schedule and click Edit. This opens the New Full-Text Indexing Table Schedule dialog box, where you can modify the schedule. NOTE For information about modifying a SQL Server Agent job, see Modify a Job. To remove an existing schedule, select the existing schedule and click Delete. 5. Click OK. Troubleshoot errors in a full-text population (crawl) When an error occurs during a crawl, the Full-Text Search crawl logging facility creates and maintains a crawl log, which is a plain text file. Each crawl log corresponds to a particular full-text catalog. By default, crawl logs for a given instance (in this example, the default instance) are located in %ProgramFiles%\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\LOG folder. The crawl log file follows the following naming scheme: SQLFT<DatabaseID><FullTextCatalogID>.LOG[<n>] The variable parts of the crawl log file name are the following. <DatabaseID> - The ID of a database. <dbid> is a five digit number with leading zeros. <FullTextCatalogID> - Full-text catalog ID. <catid> is a five digit number with leading zeros. <n> - Is an integer that indicates one or more crawl logs of the same full-text catalog exist. For example, SQLFT0000500008.2 is the crawl log file for a database with database ID = 5, and full-text catalog ID = 8. The 2 at the end of the file name indicates that there are two crawl log files for this database/catalog pair. See Also sys.dm_fts_index_population (Transact-SQL) Get Started with Full-Text Search Create and Manage Full-Text Indexes CREATE FULLTEXT INDEX (Transact-SQL) ALTER FULLTEXT INDEX (Transact-SQL) Improve the Performance of Full-Text Indexes 3/30/2017 • 11 min to read • Edit Online This topic describes some of the common causes of poor performance for full-text indexes and queries. It also provides a few suggestions to mitigate these issues and improve performance. Common causes of performance issues Hardware resource issues The performance of full-text indexing and full-text queries is influenced by hardware resources, such as memory, disk speed, CPU speed, and machine architecture. The main cause for reduced full-text indexing performance is hardware-resource limits. CPU. If CPU usage by the filter daemon host process (fdhost.exe) or the SQL Server process (sqlservr.exe) is close to 100 percent, the CPU is the bottleneck. Memory. If there is a shortage of physical memory, memory might be the bottleneck. Disk. If the average disk-waiting queue length is more than two times the number of disk heads, there is a bottleneck on the disk. The primary workaround is to create full-text catalogs that are separate from the SQL Server database files and logs. Put the logs, database files, and full-text catalogs on separate disks. Installing faster disks and using RAID can also help improve indexing performance. NOTE Beginning in SQL Server 2008, the Full-Text Engine can use AWE memory because the Full-Text Engine is part of the sqlservr.exe process. Full-text batching issues If the system has no hardware bottlenecks, the indexing performance of full-text search mostly depends on the following: How long it takes SQL Server to create full-text batches. How quickly the filter daemon can consume those batches. Full-text index population issues Type of population. Unlike full population, incremental, manual, and auto change tracking population are not designed to maximize hardware resources to achieve faster speed. Therefore, the tuning suggestions in this topic may not enhance performance for full-text indexing when it uses incremental, manual, or auto change tracking population. Master merge. When a population has completed, a final merge process is triggered that merges the index fragments together into one master full-text index. This results in improved query performance since only the master index needs to be queried rather than a number of index fragments, and better scoring statistics may be used for relevance ranking. However the master merge can be I/O intensive, because large amounts of data must be written and read when index fragments are merged, though it does not block incoming queries. Master merging a large amount of data can create a long running transaction, delaying truncation of the transaction log during checkpoint. In this case, under the full recovery model, the transaction log might grow significantly. As a best practice, before reorganizing a large full-text index in a database that uses the full recovery model, ensure that your transaction log contains sufficient space for a long-running transaction. For more information, see Manage the Size of the Transaction Log File. Tune the performance of full-text indexes To maximize the performance of your full-text indexes, implement the following best practices: To use all CPU processors or cores to the maximum, set sp_configure 'max full-text crawl range' to the number of CPUs on the system. For information about this configuration option, see max full-text crawl range Server Configuration Option. Make sure that the base table has a clustered index. Use an integer data type for the first column of the clustered index. Avoid using GUIDs in the first column of the clustered index. A multi-range population on a clustered index can produce the highest population speed. We recommend that the column serving as the full-text key be an integer data type. Update the statistics of the base table by using the UPDATE STATISTICS statement. More important, update the statistics on the clustered index or the full-text key for a full population. This helps a multi-range population to generate good partitions on the table. Before you perform a full population on a large multi-CPU computer, we recommend that you temporarily limit the size of the buffer pool by setting the max server memory value to leave enough memory for the fdhost.exe process and operating system use. For more information, see "Estimating the Memory Requirements of the Filter Daemon Host Process (fdhost.exe)," later in this topic. If you use incremental population based on a timestamp column, build a secondary index on the timestamp column to improve the performance of incremental population. Troubleshoot the performance of full populations Review the full-text crawl logs To help diagnose performance problems, look at the full-text crawl logs. When an error occurs during a crawl, the Full-Text Search crawl logging facility creates and maintains a crawl log, which is a plain text file. Each crawl log corresponds to a particular full-text catalog. By default, crawl logs for a given instance (in this example, the default instance) are located in %ProgramFiles%\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\LOG folder. The crawl log file follows the following naming scheme: SQLFT<DatabaseID\><FullTextCatalogID\>.LOG[<n\>] The variable parts of the crawl log file name are the following. <DatabaseID> - The ID of a database. <dbid> is a five digit number with leading zeros. <FullTextCatalogID> - Full-text catalog ID. <catid> is a five digit number with leading zeros. <n> - Is an integer that indicates one or more crawl logs of the same full-text catalog exist. For example, SQLFT0000500008.2 is the crawl log file for a database with database ID = 5, and full-text catalog ID = 8. The 2 at the end of the file name indicates that there are two crawl log files for this database/catalog pair. Check physical memory usage During a full-text population, it is possible for fdhost.exe or sqlservr.exe to run low on memory or to run out of memory. If the full-text crawl log shows that fdhost.exe is being restarted often or that error code 8007008 is being returned it means one of these processes is running out of memory. If fdhost.exe is producing dumps, particularly on large, multi-CPU computers, it might be running out of memory. To get information about memory buffers used by a full-text crawl, see sys.dm_fts_memory_buffers (Transact-SQL). The possible causes of low memory or out of memory issues are the following: Insufficient memory. If the amount of physical memory that is available during a full population is zero, the SQL Server buffer pool might be consuming most of the physical memory on the system. The sqlservr.exe process tries to grab all available memory for the buffer pool, up to the configured maximum server memory. If the max server memory allocation is too large, out-of-memory conditions and failure to allocate shared memory can occur for the fdhost.exe process. You can solve this problem by setting the max server memory value of the SQL Server buffer pool appropriately. For more information, see "Estimating the Memory Requirements of the Filter Daemon Host Process (fdhost.exe)," later in this topic. Reducing the batch size used for full-text indexing may also help. Memory contention. During a full-text population on a multi-CPU computer, contention for the buffer pool memory can occur between fdhost.exe or sqlservr.exe. The resulting lack of shared memory causes batch retries, memory thrashing, and dumps by the fdhost.exe process. Paging issues. Insufficient page-file size, such as on a system that has a small page file with restricted growth, can also cause the fdhost.exe or sqlservr.exe to run out of memory. If the crawl logs do not indicate any memory-related failures, it is likely that performance is slow due to excessive paging. Estimate the memory requirements of the Filter Daemon Host process (fdhost.exe ) The amount of memory required by the fdhost.exe process for a population depends mainly on the number of fulltext crawl ranges it uses, the size of inbound shared memory (ISM), and the maximum number of ISM instances. The amount of memory (in bytes) consumed by the filter daemon host can be roughly estimated by using the following formula: number_of_crawl_ranges * ism_size * max_outstanding_isms * 2 The default values of the variables in the preceding formula are as follows: VARIABLE DEFAULT VALUE number_of_crawl_ranges The number of CPUs ism_size 1 MB for x86 computers 4 MB, 8 MB, or 16MB for x64 computers, depending on the total physical memory max_outstanding_isms 25 for x86 computers 5 for x64 computers The following table presents guidelines about how to estimate the memory requirements of fdhost.exe. The formulas in this table use the following values: F, which is an estimate of memory needed by fdhost.exe (in MB). T, which is the total physical memory available on the system (in MB). M, which is the optimal max server memory setting. For essential information about the following formulas, see the notes that follow the table. PLATFORM ESTIMATING FDHOST.EXE MEMORY REQUIREMENTS IN MB—F^1 FORMULA FOR CALCULATING MAX SERVER MEMORY—M^2 x86 F = Number of crawl ranges * 50 M =minimum(T, 2000) – F – 500 x64 F = Number of crawl ranges * 10 * 8 M = T – F – 500 Notes about the formulas 1. If multiple full populations are in progress, calculate the fdhost.exe memory requirements of each separately, as F1, F2, and so forth. Then calculate M as T– sigma(Fi). 2. 500 MB is an estimate of the memory required by other processes in the system. If the system is doing additional work, increase this value accordingly. 3. .ism_size is assumed to be 8 MB for x64 platforms. Example: Estimate the memory requirements of fdhost.exe This example is for an 64-bit computer that has 8GM of RAM and 4 dual core processors. The first calculation estimates of memory needed by fdhost.exe—F. The number of crawl ranges is 8 . F = 8*10*8=640 The next calculation obtains the optimal value for max server memory—M. The total physical memory available on this system in MB—T—is 8192 . M = 8192-640-500=7052 Example: Setting max server memory This example uses the sp_configure and RECONFIGURE Transact-SQL statements to set max server memory to the value calculated for M in the preceding example, 7052 : USE master; GO EXEC sp_configure 'max server memory', 7052; GO RECONFIGURE; GO For more info about the server memory options, see Server Memory Server Configuration Options. Check CPU usage The performance of full populations is not optimal when the average CPU consumption is lower than about 30 percent. Here are some factors that affect CPU consumption. High wait time for pages To find out whether a page wait time is high, run the following Transact-SQL statement: Execute SELECT TOP 10 * FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC; The following table describes the wait types of interest here. WAIT TYPE DESCRIPTION POSSIBLE RESOLUTION PAGEIO_LATCH_SH (_EX or _UP) This could indicate an IO bottleneck, in which case you would typically also see a high average disk-queue length. Moving the full-text index to a different filegroup on a different disk could help reduce the IO bottleneck. PAGELATCH_EX (or _UP) This could indicate a lot of contention among threads that are trying to write to the same database file. Adding files to the filegroup on which the fulltext index resides could help alleviate such contention. For more info, see sys.dm_os_wait_stats (Transact-SQL). Inefficiencies in scanning the base table A full population scans the base table to produce batches. This table scanning could be inefficient in the following scenarios: If the base table has a high percentage of out-of-row columns that are being full-text indexed, scanning the base table to produce batches might be the bottleneck. In this case, moving the smaller data in-row using varchar(max) or nvarchar(max) might help. If the base table is very fragmented, scanning might be inefficient. For information about computing out-of-row data and index fragmentation, see sys.dm_db_partition_stats (Transact-SQL) and sys.dm_db_index_physical_stats (Transact-SQL). To reduce fragmentation, you can reorganize or rebuild the clustered index. For more information, see Reorganize and Rebuild Indexes. Troubleshoot slow indexing of documents NOTE This section describes an issue that only affects customers who index documents (such as Microsoft Word documents) in which other document types are embedded. The Full-Text Engine uses two types of filters when it populates a full-text index: multithreaded filters and singlethreaded filters. Some documents, such as Microsoft Word documents, are filtered using a multithreaded filter. Other documents, such as Adobe Acrobat Portable Document Format (PDF) documents, are filtered using a single-threaded filter. For security reasons, filters are loaded by the filter daemon host processes. A server instance uses a multithreaded process for all multithreaded filters and a single-threaded process for all single-threaded filters. When a document that uses a multithreaded filter contains an embedded document that uses a single-threaded filter, the Full-Text Engine launches a single-threaded process for the embedded document. For example, on encountering a Word document that contains a PDF document, the Full-Text Engine uses the multithreaded process for the Word content and launches a single-threaded process for the PDF content. A single-threaded filter might not work well in this environment, however, and could destabilize the filtering process. In certain circumstances where such embedding is common, destabilization might lead to crashes of the process. When this occurs, the Full-Text Engine re-routes any failed document - for example, a Word document that contains embedded PDF content - to the single-threaded filtering process. If re-routing occurs frequently, it results in performance degradation of the full-text indexing process. To work around this problem, mark the filter for the container document (the Word document, in this example) as a single-threaded filter. To mark a filter as a single-threaded filter, set the ThreadingModel registry value for the filter to Apartment Threaded. For information about single-threaded apartments, see the white paper Understanding and Using COM Threading Models. See Also Server Memory Server Configuration Options max full-text crawl range Server Configuration Option Populate Full-Text Indexes Create and Manage Full-Text Indexes sys.dm_fts_memory_buffers (Transact-SQL) sys.dm_fts_memory_pools (Transact-SQL) Troubleshoot Full-Text Indexing Troubleshoot Full-Text Indexing 3/24/2017 • 2 min to read • Edit Online Troubleshoot Full-Text Indexing Failures While populating or maintaining a full-text index, the full-text indexer, for reasons described below, might fail to index one or more rows. These row-level errors do not prevent the population from completing. The indexer skips these rows, which means that you are not able to query for content contained in these rows. Indexing failures can occur when: The indexer cannot find or load a filter or word breaker component. This failure can occur if the table row contains a document format or content in a language that has not been registered with the instance of SQL Server. This failure can also happen if the registered word breaker or filter component was not signed or failed signature verification when it was being loaded. A component, such as a word breaker or filter, fails and returns an error to the indexer. This can happen if the document being indexed is corrupt and the filter is unable to extract text from the document. This can also occur when a component is unable to handle the content of a single row above a certain size, due to memory limits on the full-text filter daemon host (fdhost.exe). For each row-level failure, the crawl log contains details on the reason for the failure. The error counts are summarized at the end of a full or incremental population. There are other failures that can impact the indexing process itself and prevent the population from completing: The full-text index exceeds the limit for the number of rows that can be contained in a full-text catalog. A clustered index or full-text key index on the table being indexed gets altered, dropped, or rebuilt. A hardware failure or disk corruption results in the corruption of the full-text catalog. A file group that contains the table being full-text indexed goes offline, or is made read-only. You should view the crawl log at the end of any significant full-text index population operation, or when you find that a population did not complete. Unsigned Components By default, the full-text indexer requires the filters and word breakers that it loads to be signed. If they are not signed, which is the case sometimes when custom components are installed, you must configure the full-text indexer to ignore signature verification. IMPORTANT Ignoring signature verification makes the instance of SQL Server less secure. We recommend that you sign any components that you implement or ensure that any components that you acquire are signed. For information about signing components, see sp_fulltext_service (Transact-SQL). Full-Text Index in Inconsistent State after Transaction Log Restored When restoring the transaction log of a database, you might see a warning indicating that the full-text index is not in a consistent state. The reason for this is that the full-text index on a table was modified after the database was backed up. To bring the full-text index to a consistent state, you must run a full population (crawl) on the table. For more information, see Populate Full-Text Indexes. See Also ALTER FULLTEXT CATALOG (Transact-SQL) Populate Full-Text Indexes Back Up and Restore Full-Text Catalogs and Indexes 3/24/2017 • 2 min to read • Edit Online THIS TOPIC APPLIES TO: SQL Server (starting with 2016) Warehouse Parallel Data Warehouse Azure SQL Database Azure SQL Data This topic explains how to back up and restore full-text indexes created in SQL Server. In SQL Server, the full-text catalog is a logical concept and does not reside in a filegroup. Therefore, to back up a full-text catalog in SQL Server, you must identify every filegroup that contains a full-text index that belongs to the catalog. Then you must back up those filegroups, one by one. IMPORTANT It is possible to import full-text catalogs when upgrading a SQL Server 2005 database. Each imported full-text catalog is a database file in its own filegroup. To back up an imported catalog, simply back up its filegroup. For more information, see Backing Up and Restoring Full-Text Catalogs, in SQL Server 2005 Books Online. Backing Up the Full-Text Indexes of a Full-Text Catalog Finding the Full-Text Indexes of a Full-Text Catalog You can retrieve the properties of the full-text indexes by using the following SELECT statement, which selects columns from the sys.fulltext_indexes and sys.fulltext_catalogs catalog views. USE AdventureWorks2012; GO DECLARE @TableID int; SET @TableID = (SELECT OBJECT_ID('AdventureWorks2012.Production.Product')); SELECT object_name(@TableID), i.is_enabled, i.change_tracking_state, i.has_crawl_completed, i.crawl_type, c.name as fulltext_catalog_name FROM sys.fulltext_indexes i, sys.fulltext_catalogs c WHERE i.fulltext_catalog_id = c.fulltext_catalog_id; GO Finding the Filegroup or File That Contains a Full-Text Index When a full-text index is created, it is placed in one of the following locations: A user-specified filegroup. The same filegroup as base table or view, for a nonpartitioned table. The primary filegroup, for a partitioned table. NOTE For information about creating a full-text index, see Create and Manage Full-Text Indexes and CREATE FULLTEXT INDEX (Transact-SQL). To find the filegroup of full-text index on a table or view, use the following query, where object_name is the name of the table or view: SELECT name FROM sys.filegroups f, sys.fulltext_indexes i WHERE f.data_space_id = i.data_space_id and i.object_id = object_id('object_name'); GO Backing Up the Filegroups That Contain Full-Text Indexes After you find the filegroups that contain the indexes of a full-text catalog, you need back up each of the filegroups. During the backup process, full-text catalogs may not be dropped or added. The first backup of a filegroup must be a full file backup. After you have created a full file backup for a filegroup, you could back up only the changes in a filegroup by creating a series of one or more differential file backups that are based on the full file backup. To back up files and filegroups Back Up Files and Filegroups (SQL Server) BACKUP (Transact-SQL) Restoring a Full-Text Index Restoring a backed-up filegroup restores the full-text index files, as well as the other files in the filegroup. By default, the filegroup is restored to the disk location on which the filegroup was backed up. If a full-text indexed table was online and a population was running when the backup was created, the population is resumed after the restore. To restore a filegroup Restore Files and Filegroups (SQL Server) Restore Files and Filegroups over Existing Files (SQL Server) Restore Files to a New Location (SQL Server) RESTORE (Transact-SQL) See Also Manage and Monitor Full-Text Search for a Server Instance Upgrade Full-Text Search Configure and Manage Filters for Search 3/24/2017 • 1 min to read • Edit Online Indexing documents in an varbinary, varbinary(max), image, or xml data type column requires extra processing. This processing must be performed by a filter. The filter extracts the textual information from the document (removing the formatting). The filter then sends the text to the word-breaker component for the language associated with the table column. A given filter is specific to a given document type (.doc, .pdf, .xls, .xml, and so forth). These filters implement the IFilter interface. For more information about these document types, query the sys.fulltext_document_types catalog view. Binary documents can be stored in a single varbinary(max) or image column. For each document, SQL Server chooses the correct filter based on the file extension. Because the file extension is not visible when the file is stored in a varbinary(max) or image column, the file extension (.doc, .xls, .pdf, and so forth) must be stored in a separate column in the table, called a type column. This type column can be of any character-based data type and contains the document file extension, such as .doc for a Microsoft Word document. In the Document table in Adventure Works, the Document column is of type varbinary(max), and the type column, FileExtension, is of type nvarchar(8). NOTE A filter might be able to handle objects embedded in the parent object, depending on its implementation. However, SQL Server does not configure filters to follow links to other objects. SQL Server installs its own XML and HTML filters. In addition, any filters for Microsoft proprietary formats (.doc, .xdoc, .ppt and so on) that are already installed on the operating system are also loaded by SQL Server. To identify the filters that are currently loaded on an instance of SQL Server, use the sp_help_fulltext_system_components stored procedure, as follows: EXEC sp_help_fulltext_system_components 'filter'; Before you can use filters for non Microsoft formats, however, you must manually load them into the server instance. For information about installing additional filters, see View or Change Registered Filters and Word Breakers. To view the type column in an existing full-text index sys.fulltext_index_columns (Transact-SQL) See Also sys.fulltext_index_columns (Transact-SQL) FILESTREAM Compatibility with Other SQL Server Features Configure and Manage Word Breakers and Stemmers for Search 3/31/2017 • 6 min to read • Edit Online Word breakers and stemmers perform linguistic analysis on all full-text indexed data. Linguistic analysis does the following two things: Find word boundaries (word-breaking). The word breaker identifies individual words by determining where word boundaries exist based on the lexical rules of the language. Each word (also known as a token) is inserted into the full-text index using a compressed representation to reduce its size. Conjugate verbs (stemming). The stemmer generates inflectional forms of a particular word based on the rules of that language (for example, "running", "ran", and "runner" are various forms of the word "run"). Word breakers and stemmers are language specific Word breakers and stemmers are language specific, and the rules for linguistic analysis differ for different languages. Language-specific word breakers make the resulting terms more accurate for that language. To use the word breakers and stemmers provided for all the languages supported by SQL Server, you typically don't have to take any action. Where there is a word breaker for the language family, but not for the specific sub-language, the major language is used. For example, the French word breaker is used to handle text that is French Canadian. If no word breaker is available for a particular language, the neutral word breaker is used. With the neutral word breaker, words are broken at neutral characters such as spaces and punctuation marks. Get the list of supported languages To see the list of languages supported by SQL Server Full-Text Search, use the following Transact-SQL statement. The presence of a language in this list indicates that word breakers are registered for the language. SELECT * FROM sys.fulltext_languages Get the list of registered word breakers For Full-Text Search to use the word breakers for a language, they must be registered. For registered word breakers, associated linguistic resources - stemmers, noise words (stopwords), and thesaurus files - also become available to full-text indexing and querying operations. To see the list of registered word breaker components, use the following statement. EXEC sp_help_fulltext_system_components 'wordbreaker'; GO For additional options and more info, see sp_help_fulltext_system_components (Transact-SQL). If you add or remove a word breaker If you add, remove, or alter a word breaker, you need to refresh the list of Microsoft Windows locale identifiers (LCIDs) that are supported for full-text indexing and querying. For more information, see View or Change Registered Filters and Word Breakers. Set the default full-text language option For a localized version of SQL Server, SQL Server Setup sets the default full-text language option to the language of the server if an appropriate match exists. For a non-localized version of SQL Server, the default fulltext language option is English. When you create or alter a full-text index, you can specify a different language for each full-text indexed column. If no language is specified for a column, the default is the value of the configuration option default full-text language. NOTE All columns listed in a single full-text query function clause must use the same language, unless the LANGUAGE option is specified in the query. The language used for the full-text indexed column being queried determines the linguistic analysis performed on arguments of the full-text query predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). Choose the language for an indexed column When creating a full-text index, we recommend that you specify a language for each indexed column. If a language is not specified for a column, the system default language is used. The language of a column determines which word breaker and stemmer are used for indexing that column. Also, the thesaurus file of that language will be used by full-text queries on the column. There are a couple of things to consider when choosing the column language for creating a full-text index. These considerations relate to how your text is tokenized and then indexed by Full-Text Engine. For more information, see Choose a Language When Creating a Full-Text Index. To view the word breaker language of specific columns, run the following statement. SELECT 'language_id' AS "LCID" FROM sys.fulltext_index_columns; For additional options and more info, see sys.fulltext_index_columns (Transact-SQL). Troubleshoot word-breaking time-out errors A word-breaking time-out error may occur in a variety of situations. or information about these situations and how to respond in each situation, see MSSQLSERVER_30053. Info about the MSSQLSERVER_30053 error PROPERTY VALUE Product Name SQL Server Event ID 30053 Event Source MSSQLSERVER Component SQLEngine PROPERTY VALUE Symbolic Name FTXT_QUERY_E_WORDBREAKINGTIMEOUT Message Text Word breaking timed out for the full-text query string. This can happen if the wordbreaker took a long time to process the full-text query string, or if a large number of queries are running on the server. Try running the query again under a lighter load. Explanation A word-breaking timeout error can occur in the following situations: The word breaker for the query language is configured incorrectly; for example, its registry settings are incorrect. The word breaker malfunctions for a specific query string. The word breaker returns too much data for a specific query string. Excess data is treated as a potential buffer overrun attack, and shuts down the filter daemon process (fdhost.exe), which hosts the wordbreaking services. The filter daemon process configuration is incorrect. The most common configuration problems are password expiration or a domain policy that prevents the filter daemon account from logging on. A very heavy query workload is running on the server instance; for example, the word-breaker took a long time to process the full-text query string, or a large number of queries are running on the server. Note that this is the least likely cause. User Action Select the user action that is appropriate to the probable cause of the timeout, as follows: PROBABLE CAUSE USER ACTION The word breaker for the query language is configured incorrectly. If you are using a third-party word breaker it might be incorrectly registered with the operating system. In this case, re-register the word breaker. For more information, see Revert the Word Breakers Used by Search to the Previous Version. The word breaker malfunctions for a specific query string. If the word breaker is supported by SQL Server, contact Microsoft Customer Service and Support. The word breaker returns too much data for a specific query string. If the word breaker is supported by SQL Server, contact Microsoft Customer Service and Support. The filter daemon process configuration is incorrect. Ensure that you are using the current password and that a domain policy is not preventing the filter daemon account from logging on. A very heavy query workload is running on the server instance. Try running the query again under a lighter load. Understand the impact of updated word breakers Each version of SQL Server typically includes new word breakers that have better linguistic rules and are more accurate than earlier word breakers. Potentially, the new word breakers might behave slightly differently from the word breakers in full-text indexes that were imported from previous versions of SQL Server. This is significant if a full-text catalog was imported when a database was upgraded to the current version of SQL Server. One or more languages used by the full-text indexes in the full-text catalog might now be associated with new word breakers. For more information, see Upgrade Full-Text Search. See Also CREATE FULLTEXT INDEX (Transact-SQL) ALTER FULLTEXT INDEX (Transact-SQL) Configure and Manage Stopwords and Stoplists for Full-Text Search View or Change Registered Filters and Word Breakers 3/24/2017 • 1 min to read • Edit Online After any word breakers or filters are installed or uninstalled on a system, the changes do not automatically take effect on server instances. This topic describes how to view the currently registered word breaker or filters and how to register newly installed word breakers and filters on an instance of SQL Server. To view a list of languages whose word breakers are currently registered 1. Use the sys.fulltext_languages catalog view, as follows: SELECT * FROM sys.fulltext_languages; To view a list of the filters that are currently registered 1. Use the sp_help_fulltext_system_components system stored procedure, as follows: EXEC sp_help_fulltext_system_components 'filter'; To register newly installed word breakers and filters 1. Use the sp_fulltext_service system stored procedure to update the list of languages, as follows: exec sp_fulltext_service 'update_languages'; To unregister uninstalled word breakers and filters 1. Use the sp_fulltext_service to update the list of languages, as follows: exec sp_fulltext_service 'update_languages' 2. Use the sp_fulltext_service to restart the filter daemon host processes (fdhost.exe), as follows: exec sp_fulltext_service 'restart_all_fdhosts'; To replace existing word breakers or filters when installing new ones 1. When preparing to install a DLL file that contains new word breakers or filters, verify that it has a different filename from any of the existing DLL files installed on your server instance. 2. Copy the new DLL file into the directory containing the standard SQL Server DLL files for the server instance. The default location is: C:\Program Files\Microsoft SQL Server\MSSQL.instance_name\MSSQL\Binn IMPORTANT We highly recommend that you load only signed and verified components. Also, we recommend that you run the FDHOST Launcher (MSSQLFDLauncher) Service with the least possible privileges. 3. Install the new word breaker or filters. To install and load Microsoft Filter Pack IFilters How to register Microsoft Filter Pack IFilters with SQL Server 4. Use sp_fulltext_service to load newly installed word breakers and filters in the server instance, as follows: EXEC sp_fulltext_service @action='load_os_resources', @value=1; 5. Use sp_fulltext_service to update the list of languages, as follows: EXEC sp_fulltext_service 'update_languages'; 6. Restart the filter daemon host processes (fdhost.exe), using sp_fulltext_service as follows: EXEC sp_fulltext_service 'restart_all_fdhosts'; See Also Set the Service Account for the Full-text Filter Daemon Launcher Configure and Manage Filters for Search Configure and Manage Word Breakers and Stemmers for Search Change the Word Breaker Used for US English and UK English 3/24/2017 • 3 min to read • Edit Online SQL Server 2016 installs a new version (version 14.0.4999.1038) of the word breaker and stemmer for the English language, replacing the previous version of these components (version 12.0.6828.0). For information about the changed behavior of the new components, see Behavior Changes to Full-Text Search. This topic describes how to switch from the new version of these components to the previous version, or to switch back from the previous version to the new version. For cluster installations, these changes should be made on all the primary and passive nodes. Previous versions of SQL Server used different word breakers represented by different CLSIDs for US English (LCID 1033) and UK English (LCID 2057). In this release, both LCIDs use the same components with the same CLSIDs, as shown in the following table: WORD BREAKER INSTALLED BY PREVIOUS VERSIONS WORD BREAKER INSTALLED BY THIS VERSION LCID VERSION 12.0.6828.0 STEMMER INSTALLED BY PREVIOUS VERSIONS 1033 (US English) 188D6CC5-CB034C01-912E47D21295D77E EEED4C20-7F1B11CE-BE5700AA0051FE20 9faed859-0b304434-ae65412e14a16fb8 e1e5ef84-c4a6-4e508188-99aef3de2659 2057 (UK English) 173C97E2-AEBE437C-944501B237ABF2F6 D99F7670-7F1A11CE-BE5700AA0051FE20 9faed859-0b304434-ae65412e14a16fb8 e1e5ef84-c4a6-4e508188-99aef3de2659 VERSION 14.0.4999.1038 STEMMER INSTALLED BY THIS VERSION The components described in this topic are DLL files that are installed in the MSSQL\Binn folder for the SQL Server instance. The full path is typically C:\Program Files\Microsoft SQL Server\<instance>\MSSQL\Binn . For more information about word breakers and stemmers, see Configure and Manage Word Breakers and Stemmers for Search. Switching from the current English word breaker to the previous English word breakers To switch from the current version of the US English word breaker to the previous version 1. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID. 2. Use the following steps to add new keyS for the COM ClassIDs for the previous US English word breaker and stemmer interfaces for LCID 1033: a. Add a new key with the value {188D6CC5-CB03-4C01-912E-47D21295D77E} for the previous word breaker. b. Update the (Default) data of that key value to langwrbk.dll. c. Add a new key with the value {EEED4C20-7F1B-11CE-BE57-00AA0051FE20} for the previous stemmer. d. Update the (Default) data of that key value to infosoft.dll. 3. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\enu. 4. Update the WBreakerClass key value to {188D6CC5-CB03-4C01-912E-47D21295D77E}. 5. Update the StemmerClass key value to {EEED4C20-7F1B-11CE-BE57-00AA0051FE20}. 6. Restart SQL Server. To switch from the current version of the UK English word breaker to the previous version 1. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID. 2. Use the following steps to add a new key for the COM ClassIDs for the previous UK English word breaker and stemmer interfaces for LCID 2057: a. Add a new key with the value {173C97E2-AEBE-437C-9445-01B237ABF2F6} for the previous word breaker. b. Update the (Default) data of that key value to langwrbk.dll. c. Add a new key with the value {D99F7670-7F1A-11CE-BE57-00AA0051FE20} for the previous stemmer. d. Update the (Default) data of that key value to infosoft.dll. 3. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\eng. 4. Update the WBreakerClass key value to {173C97E2-AEBE-437C-9445-01B237ABF2F6}. 5. Update the StemmerClass key value to {D99F7670-7F1A-11CE-BE57-00AA0051FE20}. 6. Restart SQL Server. Switching back from the previous English word breakers to the current English word breaker To switch back from the previous version of the US English word breaker to the current version 1. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID. 2. If the following keys do not exist, then use the following steps to add a new key for the COM ClassIDs for the current US English word breaker and stemmer interfaces for LCID 1033: a. Add a new key with the value {9faed859-0b30-4434-ae65-412e14a16fb8} for the current word breaker. b. Update the (Default) data of that key value to MsWb7.dll. c. Add a new key with the value {e1e5ef84-c4a6-4e50-8188-99aef3de2659} for the current stemmer. d. Update the (Default) data of that key value to MsWb7.dll. 3. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\eng. 4. Update the WBreakerClass key value to {9faed859-0b30-4434-ae65-412e14a16fb8}. 5. Update the StemmerClass key value to {e1e5ef84-c4a6-4e50-8188-99aef3de2659}. 6. Restart SQL Server. To switch back from the previous version of the UK English word breaker to the current version 1. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID. 2. If the following keys do not exist, then use the following steps to add a new key for the COM ClassIDs for the current UK English word breaker and stemmer interfaces for LCID 2057: a. Add a new key with the value {9faed859-0b30-4434-ae65-412e14a16fb8} for the current word breaker. b. Update the (Default) data of that key value to MsWb7.dll. c. Add a new key with the value {e1e5ef84-c4a6-4e50-8188-99aef3de2659} for the current stemmer. d. Update the (Default) data of that key value to MsWb7.dll. 3. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\eng. 4. Update the WBreakerClass key value to {9faed859-0b30-4434-ae65-412e14a16fb8}. 5. Update the StemmerClass key value to {e1e5ef84-c4a6-4e50-8188-99aef3de2659}. 6. Restart SQL Server. See Also Revert the Word Breakers Used by Search to the Previous Version Behavior Changes to Full-Text Search Revert the Word Breakers Used by Search to the Previous Version 3/30/2017 • 12 min to read • Edit Online SQL Server 2016 installs and enables a version of the word breakers and stemmers for all languages supported by Full-Text Search with the exception of Korean. This topic describes how to switch from this version of these components to the previous version, or to switch back from the previous version to the new version. This topic does not discuss the following languages: English. To revert or restore the English components, see Change the Word Breaker Used for US English and UK English. Danish, Polish, and Turkish. The third-party word breakers for Danish, Polish, and Turkish that were included with previous releases of SQL Server have been replaced with Microsoft components. Czech and Greek. There are new word breakers for Czech and Greek. Previous releases of SQL Server FullText Search did not include support for these two languages. Korean. The word breaker and stemmer for the Korean language are not upgraded in this release. For general information about word breakers and stemmers, see Configure and Manage Word Breakers and Stemmers for Search. Overview of reverting and restoring word breakers and stemmers The instructions for reverting and restoring word breakers and stemmers depend on the language. The following table summarizes the 3 sets of actions that may be required to revert to the previous version of the components. CURRENT FILE PREVIOUS FILE NUMBER OF AFFECTED LANGUAGES NaturalLanguage6.dll NaturalLanguage6.dll 34 ACTION FOR FILES ACTION FOR REGISTRY ENTRIES Obtain and install a previous version of NaturalLanguage6.dll, overwriting the current version of the file. No action required. The registry keys and values have not changed for this release. (Other file name) NaturalLanguage6.dll 5 Obtain and install a previous version of NaturalLanguage6.dll, overwriting the current version of the file. Change a set of registry entries to specify the previous version of the components. (Other file name) (Other file name) 6 No action required. Change a set of registry entries to specify the previous version of the components. SQL Server 2016 setup copies both the current and the previous versions of the components to the Binn folder. WARNING If you replace the current version of the file NaturalLanguage6.dll with a different version, then the behavior of all the languages that use this file is affected. The files described in this topic are DLL files that are installed in the The full path is typically the following path: MSSQL\Binn folder for the SQL Server instance. C:\Program Files\Microsoft SQL Server\<instance>\MSSQL\Binn Languages for which the file name of both the current and previous word breaker is NaturalLanguage6.dll For the languages in the following table, the file name of both the current and previous word breaker is NaturalLanguage6.dll. To revert or restore these components, you have to overwrite NaturalLanguage6.dll with a different version of the same file. You do not have to change any registry entries, because the registry entries have not changed for this release. WARNING If you replace the current version of the file NaturalLanguage6.dll with a different version, then the behavior of all the languages that use this file is affected. List of affected languages LANGUAGE ABBREVIATION USED IN THE REGISTRY LCID Bengali ben 1093 Bulgarian bgr 1026 Catalan cat 1027 Spanish esn 3082 French fra 1036 Gujarati guj 1095 Hebrew heb 1037 Hindi hin 1081 Croatian hrv 1050 Indonesian ind 1057 Icelandic isl 1039 Italian ita 1040 LANGUAGE ABBREVIATION USED IN THE REGISTRY LCID Kannada kan 1099 Lithuanian lth 1063 Latvian lvi 1062 Malayalam mal 1100 Marathi mar 1102 Malay msl 1086 Neutral Neutral 0000 Norwegial Bokmaal nor 1044 Punjabi pan 1094 Brazilian Portuguese ptb 1046 Portuguese ptg 2070 Romanian rom 1048 Slovak sky 1051 Slovenian slv 1060 Serbian - Cyrillic srb 3098 Serbian - Latin srl 2074 Swedish sve 1053 Tamil tam 1097 Telugu tel 1098 Ukrainian ukr 1058 Urdu urd 1056 Vietnamese vit 1066 The preceding table is sorted alphabetically on the Abbreviation column. To revert to the previous components 1. Navigate to the Binn folder described above. 2. Back up the SQL Server 2016 version of NaturalLanguage6.dll to another location. 3. Copy the previous version of NaturalLanguage6.dll from the Binn folder of an instance of SQL Server 2008 R2 or SQL Server 2008 into the Binn folder of the SQL Server 2016 instance. WARNING This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version. 4. Restart SQL Server. To restore the current components 1. Navigate to the location where you backed up the SQL Server 2016 version of NaturalLanguage6.dll. 2. Copy the current version of NaturalLanguage6.dll from the backup location into the Binn folder of the SQL Server 2016 instance. WARNING This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version. 3. Restart SQL Server. Languages for which the file name of the previous word breaker only is NaturalLanguage6.dll For the languages in the following table, the file name of the previous word breaker is different from the file name of the new version. The previous file name is NaturalLanguage6.dll. To revert to the previous version, you have to overwrite the current version of NaturalLanguage6.dll with an earlier version of the same file. You also have to change a set of registry entries to specify the previous or current version of the components. WARNING If you replace the current version of the file NaturalLanguage6.dll with a different version, then the behavior of all the languages that use this file is affected. List of affected languages LANGUAGE ABBREVIATION USED IN THE REGISTRY LCID Arabic ara 1025 German deu 1031 Japanese jpn 1041 Dutch nld 1043 Russian rus 1049 The preceding table is sorted alphabetically on the Abbreviation column. Use the following instructions together with the list of values in the section File names and registry values for reverting and restoring word breakers and stemmers. To revert to the previous components 1. Navigate to the Binn folder described above. 2. Do not remove the files for the current version of the components from the Binn folder. 3. Back up the SQL Server 2016 version of NaturalLanguage6.dll to another location. 4. Copy the previous version of NaturalLanguage6.dll from the Binn folder of an instance of SQL Server 2008 R2 or SQL Server 2008 into the Binn folder of the SQL Server 2016 instance. WARNING This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version. 5. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\CLSID. 6. Use the following steps to add new keys for the COM ClassIDs for the previous word breaker and stemmer interfaces for the selected language: a. Add a new key with the value from the table for the previous word breaker. b. Update the (Default) data of that key value to the file name of the previous word breaker from the table. c. If the selected language uses a stemmer, then add a new key with the value from the table for the previous stemmer. d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file name of the previous stemmer from the table. 7. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the language that is used in the registry; for example, "fra" for French and "esn" for Spanish. 8. Update the WBreakerClass key value to the value from the table for the current word breaker. 9. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the table for the current stemmer. 10. Restart SQL Server. To restore the current components 1. Navigate to the location where you backed up the SQL Server 2016 version of NaturalLanguage6.dll. 2. Copy the current version of NaturalLanguage6.dll from the backup location into the Binn folder of the SQL Server 2016 instance. WARNING This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version. 3. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\CLSID. 4. If the following keys do not exist, then use the following steps to add new keys for the COM ClassIDs for the current word breaker and stemmer interfaces for the selected language: a. Add a new key with the value from the table for the current word breaker. b. Update the (Default) data of that key value to the file name of the current word breaker from the table. c. If the selected language uses a stemmer, then add a new key with the value from the table for the current stemmer. d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file name of the current stemmer from the table. 5. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the language that is used in the registry; for example, "fra" for French and "esn" for Spanish. 6. Update the WBreakerClass key value to the value from the table for the previous word breaker. 7. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the table for the previous stemmer. 8. Restart SQL Server. File names and registry values for reverting and restoring word breakers and stemmers Use the following list of file names and registry entries together with the instructions in the preceding section. Use the previous values to revert to the previous version, or use the current values to restore the current version of the components. The following listed is sorted alphabetically on the abbreviation used for each language. Arabic (ara), LCID 1025 COMPONENT WORD BREAKER STEMMER Previous CLSID 7EFD3C7E-9E4B-4a93-9503DECD74C0AC6D 483B0283-25DB-4c92-9C15A65925CB95CE Previous file name NaturalLanguage6.dll NaturalLanguage6.dll Current CLSID 04b37e30-c9a9-4a7d-8f20792fc87ddf71 None Current file name MSWB7.dll None COMPONENT WORD BREAKER STEMMER Previous CLSID 45EACA36-DBE9-4e4a-A26D5C201902346D 65170AE4-0AD2-4fa5-B3BA7CD73E2DA825 Previous file name NaturalLanguage6.dll NaturalLanguage6.dll German (deu), LCID 1031 COMPONENT WORD BREAKER STEMMER Current CLSID dfa00c33-bf19-482e-a7913c785b0149b4 8a474d89-6e2f-419c-8dd59b50edc8c787 Current file name MsWb7.dll MsWb7.dll COMPONENT WORD BREAKER STEMMER Previous CLSID E1E8F15E-8BEC-45df-83BF50FF84D0CAB5 3D5DF14F-649F-4cbc-853DF18FEDE9CF5D Previous file name NaturalLanguage6.dll NaturalLanguage6.dll Current CLSID 04096682-6ece-4e9e-90c152d81f0422ed None Current file name MsWb70011.dll None COMPONENT WORD BREAKER STEMMER Previous CLSID 2C9F6BEB-C5B0-42b6-A5EE84C24DC0D8EF F7A465EE-13FB-409a-B878195B420433AF Previous file name NaturalLanguage6.dll NaturalLanguage6.dll Current CLSID 69483c30-a9af-4552-8f84a0796ad5285b CF923CB5-1187-43ab-B0533E44BED65FFA Current file name MsWb7.dll MsWb7.dll COMPONENT WORD BREAKER STEMMER Previous CLSID 2CB6CDA4-1C14-4392-A8EC81EEF1F2E079 E06A0DDD-E81A-4e93-8A8DF386C3A1B670 Previous file name NaturalLanguage6.dll NaturalLanguage6.dll Current CLSID aaa3d3bd-6de7-4317-91a0d25e7d3babc3 d42c8b70-adeb-4b81-a52fc09f24f77dfa Current file name MsWb7.dll MsWb7.dll Japanese (jpn), LCID 1041 Dutch (nld), LCID 1043 Russian (rus), LCID 1049 Languages for which neither the previous nor the current file name is NaturalLanguage6.dll For the languages in the following table, the file names of the previous word breakers and stemmers are different from the file names of the new versions. Neither the previous nor the current file name is NaturalLanguage6.dll. You do not have to replace any files, because SQL Server 2016 setup copies both the current and the previous versions of the components to the Binn folder. However you have to change a set of registry entries to specify the previous or current version of the components. List of affected languages LANGUAGE ABBREVIATION USED IN THE REGISTRY LCID Simplified Chinese chs 2052 Traditional Chinese cht 1028 Thai tha 1054 Chinese Traditional zh-hk 3076 Chinese Traditional zh-mo 5124 Chinese Simplified zh-sg 4100 The preceding table is sorted alphabetically on the Abbreviation column. Use the following instructions together with the list of values in the section File names and registry values for reverting and restoring word breakers and stemmers. To revert to the previous components 1. Do not remove the files for the current version of the components from the Binn folder. 2. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\CLSID. 3. Use the following steps to add new keys for the COM ClassIDs for the previous word breaker and stemmer interfaces for the selected language: a. Add a new key with the value from the table for the previous word breaker. b. Update the (Default) data of that key value to the file name of the previous word breaker from the table. c. If the selected language uses a stemmer, then add a new key with the value from the table for the previous stemmer. d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file name of the previous stemmer from the table. 4. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the language that is used in the registry; for example, "fra" for French and "esn" for Spanish. 5. Update the WBreakerClass key value to the value from the table for the current word breaker. 6. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the table for the current stemmer. 7. Restart SQL Server. To restore the previous components 1. Do not remove the files for the previous version of the components from the Binn folder. 2. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\CLSID. 3. If the following keys do not exist, then use the following steps to add new keys for the COM ClassIDs for the current word breaker and stemmer interfaces for the selected language: a. Add a new key with the value from the table for the current word breaker. b. Update the (Default) data of that key value to the file name of the current word breaker from the table. c. If the selected language uses a stemmer, then add a new key with the value from the table for the current stemmer. d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file name of the current stemmer from the table. 4. In the registry, navigate to the following node: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the language that is used in the registry; for example, "fra" for French and "esn" for Spanish. 5. Update the WBreakerClass key value to the value from the table for the previous word breaker. 6. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the table for the previous stemmer. 7. Restart SQL Server. File names and registry values for reverting and restoring word breakers and stemmers Use the following list of file names and registry entries together with the instructions in the preceding section. Use the previous values to revert to the previous version, or use the current values to restore the current version of the components. The following listed is sorted alphabetically on the abbreviation used for each language. Simplified Chinese (chs), LCID 2052 COMPONENT WORD BREAKER Previous CLSID 12CE94A0-DEFB-11D2-B31D-00600893A857 Previous file name chsbrkr.dll Current CLSID E0831C90-BAB0-4ca5-B9BD-EA254B538DAC Current file name MsWb70804.dll Traditional Chinese (cht), LCID 1028 COMPONENT WORD BREAKER Previous CLSID 1680E7C3-9430-4A51-9B82-1E7E7AEE5258 COMPONENT WORD BREAKER Previous file name chtbrkr.dll Current CLSID E9B1DF65-08F1-438b-8277-EF462B23A792 Current file name MsWb70404.dll Thai (tha), LCID 1054 COMPONENT WORD BREAKER STEMMER Previous CLSID CCA22CF4-59FE-11D1-BBFF00C04FB97FDA CEDC01C7-59FE-11D1-BBFF00C04FB97FDA Previous file name Thawbrkr.dll Thawbrkr.dll Current CLSID F70C0935-6E9F-4ef1-9F067876536DB900 None Current file name MsWb7001e.dll None Chinese Traditional (zh-hk), LCID 3076 COMPONENT WORD BREAKER Previous CLSID 1680E7C3-9430-4A51-9B82-1E7E7AEE5258 Previous file name chtbrkr.dll Current CLSID E9B1DF65-08F1-438b-8277-EF462B23A792 Current file name MsWb70404.dll Chinese Traditional (zh-mo), LCID 5124 COMPONENT WORD BREAKER Previous CLSID 1680E7C3-9430-4A51-9B82-1E7E7AEE5258 Previous file name chtbrkr.dll Current CLSID E9B1DF65-08F1-438b-8277-EF462B23A792 Current file name MsWb70404.dll Chinese Simplified (zh-sg), LCID 4100 COMPONENT WORD BREAKER Previous CLSID 12CE94A0-DEFB-11D2-B31D-00600893A857 Previous file name chsbrkr.dll COMPONENT WORD BREAKER Current CLSID E0831C90-BAB0-4ca5-B9BD-EA254B538DAC Current file name MsWb70804.dll See Also Change the Word Breaker Used for US English and UK English Behavior Changes to Full-Text Search Customize the Behavior of Word Breakers with a Custom Dictionary 3/24/2017 • 1 min to read • Edit Online You can customize the behavior of the word breaker for a particular language by creating a language-specific custom dictionary file. For example, you can prevent the word breaker from breaking certain terms or patterns. For more information, see the following SharePoint article: Create a custom dictionary (SharePoint Server 2010) For SQL Server, place custom dictionary files in the following folder: C:\Program Files\Microsoft SQL Server\<instance name>\MSSQL\Binn After creating or changing custom dictionary files, restart the SQL Full-text Daemon Launcher with the following command: exec sp_fulltext_service 'restart_all_fdhosts' Configure and Manage Stopwords and Stoplists for Full-Text Search 3/24/2017 • 3 min to read • Edit Online To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords. Stopwords. A stopword can be a word with meaning in a specific language. For example, in the English language, words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a search. A stopword can also be a token that does not have linguistic meaning. Stoplists. Stopwords are managed in databases using objects called stoplists. A stoplist is a list of stopwords that, when associated with a full-text index, is applied to full-text queries on that index. Use an existing stoplist You can use an existsing stoplist in the following ways: Use the system-supplied stoplist in the database. SQL Server ships with a system stoplist that contains the most commonly used stopwords for each supported language, that is for every language associated with given word breakers by default. You can copy the system stoplist and customize your copy by adding and removing stopwords. The system stoplist is installed in the Resource database. Use an existing custom stoplist from another database in the current server instance, then add or drop stopwords as appropriate. Create a new stoplist Create a new stoplist with Transact-SQL Use CREATE FULLTEXT STOPLIST. Create a new stoplist with Management Studio 1. In Object Explorer, expand the server. 2. Expand Databases, and then expand the database in which you want to create the full-text stoplist. 3. Expand Storage, and then right-click Full-Text Stoplists. 4. Select New Full-Text Stoplist. 5. Enter your new stoplist's name. 6. Optionally, specify someone else as the stoplist owner. 7. Select one of the following create stoplist options: Create an empty stoplist Create from the system stoplist Create from an existing full-text stoplist For more information, see New Full-Text Stoplist (General Page). 8. Click OK. Use a stoplist in full-text queries To use a stoplist in queries, you must associate it with a full-text index. You can attach a stoplist to a full-text index when you create the index, or you can alter the index later to add a stoplist. Create a full-text index and associate a stoplist with it Use CREATE FULLTEXT INDEX (Transact-SQL). Associate or disassociate a stoplist with an existing full-text index Use ALTER FULLTEXT INDEX (Transact-SQL). Change the stopwords in a stoplist Add or drop stopwords from a stoplist with Transact-SQL Use ALTER FULLTEXT STOPLIST (Transact-SQL). Add or drop stopwords from a stoplist with Management Studio 1. In Object Explorer, expand the server. 2. Expand Databases, and then expand the database. 3. Expand Storage, and then select Full Text Stoplists. 4. Right-click the stoplist whose properties you want to change, and select Properties. 5. In the Full-Text Stoplist Properties dialog box: a. In the Action list box, select one of the following actions: Add stopword, Delete stopword, Delete all stopwords, or Clear stoplist. b. If the Stopword text box is enabled for the selected action, enter a single stopword. This stopword must be unique; that is, not yet in this stoplist for the language that you select. c. If the Full-text language list box is enabled for the selected action, select a language. 6. Click OK. Manage stoplists and their usage View all the stopwords in a stoplist Use sys.fulltext_stopwords (Transact-SQL). Get info about all the stoplists in the current database Use sys.fulltext_stoplists (Transact-SQL) and sys.fulltext_stopwords (Transact-SQL). View the tokenization result of a word breaker, thesaurus, and stoplist combination Use sys.dm_fts_parser (Transact-SQL). Suppress an error message if stopwords cause a Boolean operation on a full-text query to fail Use the transform noise words Server Configuration Option. More info about stopword position Although it ignores the inclusion of stopwords, the full-text index does take into account their position. For example, consider the phrase, "Instructions are applicable to these Adventure Works Cycles models". The following table depicts the position of the words in the phrase: WORD POSITION Instructions 1 are 2 applicable 3 to 4 these 5 Adventure 6 Works 7 Cycles 8 models 9 The stopwords "are", "to", and "these" that are in positions 2, 4, and 5 are left out of the full-text index. However, their positional information is maintained, thereby leaving the position of the other words in the phrase unaffected. Upgrade noise words from SQL Server 2005 SQL Server 2005 noise words have been replaced by stopwords. When a database is upgraded from SQL Server 2005, the noise-word files are no longer used. However, the noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding stoplists. For information about upgrading noise-word files to stoplists, see Upgrade Full-Text Search. Configure and Manage Thesaurus Files for Full-Text Search 4/7/2017 • 7 min to read • Edit Online SQL Server Full-Text Search queries can search for synonyms of user-specified terms through the use of a Full-Text Search thesaurus. Each thesaurus defines a set of synonyms for a specific language. By developing a thesaurus tailored to your full-text data, you can effectively broaden the scope of full-text queries on that data. Thesaurus matching occurs for all FREETEXT and FREETEXTABLE queries and for any CONTAINS and CONTAINSTABLE queries that specify the FORMSOF THESAURUS clause. A Full-Text Search thesaurus is an XML text file. What's in a thesaurus Before full-text search queries can look for synonyms in a given language, you have to define thesaurus mappings (that is, synonyms) for that language. Each thesaurus must be manually configured to define the following: Expansion set An expansion set contains a group of synonyms such as "writer", "author", and "journalist" that are substituted for one another by a full-text query. Queries that contain a match for any synonym in an expansion set are expanded to include every other synonym in the expansion set. For more information, see XML Structure of an Expansion Set later in this topic. Replacement set A replacement set contains a text pattern to be replaced by a substitution set. For an example, see the section XML Structure of a Replacement Set later in this topic. Diacritics setting For a given thesaurus, all search patterns are either sensitive or insensitive to diacritical marks such as a tilde (~), acute accent mark (´), or umlaut (¨) (that is, accent sensitive or accent insensitive). For example, suppose you specify the pattern "café" to be replaced by other patterns in a full-text query. If the thesaurus is accentinsensitive, full-text search replaces the patterns "café" and "cafe". If the thesaurus is accent-sensitive, full-text search replaces only the pattern "café". By default, a thesaurus is accent-insensitive. Default thesaurus files SQL Server provides a set of XML thesaurus files, one for each supported language. These files are essentially empty. They contain only the top-level XML structure that is common to all SQL Server thesauruses and a commented-out sample thesaurus. Location of thesaurus files The default location of the thesaurus files is: <SQL_Server_data_files_path>\MSSQL13.MSSQLSERVER\MSSQL\FTDATA\ This default location contains the following files: Language-specific thesaurus files Setup installs empty thesaurus files in the above location. A separate file is provided for each supported language. A system administrator can customize these files. The default file names of the thesaurus files use following format: 'ts' + <three-letter language-abbreviation> + '.xml' The name of the thesaurus file for a given language is specified in the registry in the following value: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\<instance-name>\MSSearch\<language-abbrev> The global thesaurus file An empty global thesaurus file, tsGlobal.xml. Change the location of a thesaurus file You can change the location and names of a thesaurus file by changing its registry key. For each language, the location of the thesaurus file is specified in the following value in the registry: HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\<instance name>\MSSearch\Language\<language-abbreviation>\TsaurusFile The global thesaurus file corresponds to the Neutral language with LCID 0. This value can be changed by administrators only. How full-text queries use the thesaurus A thesaurus query uses both a language-specific thesaurus and the global thesaurus. 1. First, the query looks up the language-specific file and loads it for processing (unless it is already loaded). The query is expanded to include the language-specific synonyms specified by the expansion set and replacement set rules in the thesaurus file. 2. These steps are then repeated for the global thesaurus. However, if a term is already part of a match in the language specific thesaurus file, the term is ineligible for matching in the global thesaurus. Structure of a thesaurus file Each thesaurus file defines an XML container whose ID is Microsoft Search Thesaurus , and a comment, <!-- … --> , that contains a sample thesaurus. The thesaurus is defined in a <thesaurus> element that contains samples of the child elements that define the diacritics setting, expansion sets, and replacement sets. A typical empty thesaurus file contains the following XML text: <XML ID="Microsoft Search Thesaurus"> <!-- Commented out <thesaurus xmlns="x-schema:tsSchema.xml"> <diacritics_sensitive>0</diacritics_sensitive> <expansion> <sub>Internet Explorer</sub> <sub>IE</sub> <sub>IE5</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub>Windows 2012</sub> </replacement> <expansion> <sub>run</sub> <sub>jog</sub> </expansion> </thesaurus> --> </XML> XML structure of an expansion set Each expansion set is enclosed within an <expansion> element. Within this element, you specify one or more substitutions in a <sub> element. In the expansion set, you can specify a group of substitutions that are synonyms of each other. For example, you can edit the expansion section to treat the substitutions "writer", "author", and "journalist" as synonyms. full-text search queries that contain matches in one substitution are expanded to include all other substitutions specified in the expansion set. Therefore, in the preceding example, when you issue a FORMS OF THESAURUS or a FREETEXT query for the word "author", full-text search also returns search results containing the words "writer" and "journalist". This is what the expansion set section would look like for the above example: <expansion> <sub>writer</sub> <sub>author</sub> <sub>journalist</sub> </expansion> XML structure of a replacement set Each replacement set is enclosed within a <replacement> element. Within this element you can specify one or more patterns in a <pat> element and zero or more substitutions in <sub> elements, one per synonym. You can specify a pattern to be replaced by a substitution set. Patterns and substitutions can contain a word, or a sequence of words. If there is no substitution specified for a pattern, it has the effect of removing the pattern from the user query. For example, suppose you want queries for "Win8", the pattern, to be replaced by "Windows Server 2012" or "Windows 8.0", the substitutions. If you run a full-text query for "Win8", full-text search only returns search results containing "Windows Server 2012" or "Windows 8.0". It does not return results containing "Win8". This is because the pattern "Win8" has been "replaced" by the patterns "Windows Server 2012" and "Windows 8.0". This is what the replacement set section would look like for the above example: <replacement> <pat>Win8</pat> <sub>Windows Server 2012</sub> <sub>Windows 8.0</sub> </replacement> If you have two replacement sets with similar patterns being matched, the longer of the two takes precedence. For example, if you run a FORMS OF THESAURUS query for "Internet Explorer online community" and you have the following replacement sets, the "Internet Explorer" replacement set takes precedence over the "Internet" replacement set. The query will therefore be processed as "IE online community" or "IE 9 online community". <replacement> <pat>Internet</pat> <sub>intranet</sub> </replacement> and <replacement> <pat>Internet Explorer</pat> <sub>IE</sub> <sub>IE 9</sub> </replacement> XML structure of the diacritics setting The diacritics setting of a thesaurus is specified in a single integer value that controls accent sensitivity, as follows: <diacritics_sensitive> element. This element contains an DIACRITICS SETTING VALUE XML Accent insensitive 0 <diacritics_sensitive>0</diacritics_sensitive> Accent sensitive 1 <diacritics_sensitive>1</diacritics_sensitive> NOTE This setting can only be applied one time in the file, and it applies to all search patterns in the file. This setting cannot be specified for individual patterns. Edit a thesaurus file You can configure the thesaurus for a given language by editing its thesaurus file (an XML file). During setup, empty thesaurus files that contain only the <xml> container and a commented-out sample <thesaurus > element are installed. In order for full-text search queries that look for synonyms to work properly, you have to create an actual <thesaurus > element that defines a set of synonyms. You can define two forms of synonyms, expansion sets and replacement sets. Edit a thesaurus file 1. Open the thesaurus file in Notepad or another text editor. 2. If you are editing the thesaurus file for the first time, remove the following comment lines at the beginning and end of the file, respectively: <!--Commented out --> 3. Add, modify, or delete a replacement set or an expansion set. 4. Save the file and close Notepad. 5. Use sp_fulltext_load_thesaurus_file to load the content of the thesaurus file into tempdb, specifying the local identifier (LCID) that corresponds to the language of the thesaurus file. For example, for the English thesaurus file, tsenu.xml, the corresponding LCID is 1033. USE AdventureWorks; EXEC sys.sp_fulltext_load_thesaurus_file 1033; GO Recommendations for editing thesaurus files We recommend that entries in the thesaurus file contain no special characters. This is because word breakers have subtle behaviors with respect to special characters. If a thesaurus entry contains any special characters, word breakers used in combination with that entry can have subtle behavioral implications for a full-text query. We recommend that <sub> entries contain no stopwords since stopwords are omitted from the full-text index. Queries are expanded to include the <sub> entries from a thesaurus file, and if a <sub> entry contains stopwords, query size increases unnecessarily. Restrictions for editing thesaurus files The following restrictions apply to editing a thesaurus file: Only system administrators can update, modify, or delete thesaurus files. When editing thesaurus files using text editor tools, the files must be saved in Unicode format, and Byte Order Marks must be specified. Thesaurus entries cannot be empty or word break to an empty string. Phrases in the thesaurus file must be no longer than 512 characters. A thesaurus must not contain any duplicate entries among the elements of replacement sets. See Also CONTAINS (Transact-SQL) CONTAINSTABLE (Transact-SQL) FREETEXT (Transact-SQL) FREETEXTTABLE (Transact-SQL) sp_fulltext_load_thesaurus_file (Transact-SQL) sys.dm_fts_parser (Transact-SQL) <sub> entries of expansion sets and the <pat> Manage and Monitor Full-Text Search for a Server Instance 3/24/2017 • 3 min to read • Edit Online Full-text administration for a server instance includes: System management tasks such as managing the FDHOST Launcher service (MSSQLFDLauncher), restarting filter daemon host process if you change the service account credentials, configuring server-wide full-text properties, and backing up full-text catalogs. At the server level, for example, you can specify a default fulltext language that differs from the default language of the server instance as a whole. Configuring full-text linguistic components (word breakers and stemmers, thesaurus file, and stopwords and stoplists). Configuring a user database for full-text search. This involves creating one or more full-text catalogs for the database and defining a full-text index on each table or indexed view on which you want to execute full-text queries. Viewing or Changing Server Properties for Full-Text Search You can view the full-text properties of an instance of SQL Server in SQL Server Management Studio. To view and change server properties for full-text search 1. In Object Explorer, right-click a server, and then click Properties. 2. In the Server Properties dialog box, click the Advanced page to view server information about full-text search. The full-text properties are as follows: Default Full-Text Language Specifies a default language for full-text indexed columns. Linguistic analysis of full-text indexed data is dependent on the language of the data. The default value of this option is the language of the server. For the language that corresponds to the displayed setting, see sys.fulltext_languages (Transact-SQL). Full-Text Upgrade Option This server property controls how full-text indexes are migrated when upgrading a database from SQL Server 2005 to a later version. This property applies to upgrading by attaching a database, restoring a database backup, restoring a file backup, or copying the database by using the Copy Database Wizard. The alternatives are as follows: Import Full-text catalogs are imported. Typically, import is significantly faster than rebuild. For example, when using only one CPU, import runs about 10 times faster than rebuild. However, an imported fulltext catalog does not use the new and enhanced word breakers introduced in SQL Server 2008, so you might want to rebuild your full-text catalogs eventually. NOTE Rebuild can run in multi-threaded mode, and if more than 10 CPUs are available, rebuild might run faster than import if you allow rebuild to use all of the CPUs. If a full-text catalog is not available, the associated full-text indexes are rebuilt. This option is available for only SQL Server 2005 databases. Rebuild Full-text catalogs are rebuilt using the new and enhanced word breakers. Rebuilding indexes can take awhile, and a significant amount of CPU and memory might be required after the upgrade. Reset Full-text catalogs are reset. SQL Server 2005 full-text catalog files are removed, but the metadata for full-text catalogs and full-text indexes is retained. After being upgraded, all full-text indexes are disabled for change tracking and crawls are not started automatically. The catalog will remain empty until you manually issue a full population, after the upgrade completes. For information about choosing a full-text upgrade option, see full-Upgrade Full-Text Search. NOTE The full-text upgrade option can also be set by using the sp_fulltext_serviceupgrade_option action. Viewing Additional Full-Text Server Properties Transact-SQL functions can be used to obtain the value of various server-level properties of full-text search. This information is useful for administrating and troubleshooting full-text search. The following table lists full-text properties of a SQL Server server instance and their related Transact-SQL functions. PROPERTY DESCRIPTION FUNCTION IsFullTextInstalled Whether the full-text component is installed with the current instance of SQL Server. FULLTEXTSERVICEPROPERTY LoadOSResources Whether operating system word breakers and filters are registered and used with this instance of SQL Server. FULLTEXTSERVICEPROPERTY VerifySignature Specifies whether only signed binaries are loaded by the Full-Text Engine. FULLTEXTSERVICEPROPERTY SERVERPROPERTY Monitoring Full-Text Search Activity Several dynamic management views and functions are useful monitoring full-text search activity on a server instance. To view information about the full-text catalogs with in-progress population activity sys.dm_fts_active_catalogs (Transact-SQL) To view current activity of a filter daemon host process sys.dm_fts_fdhosts (Transact-SQL) To view information about in-progress index populations sys.dm_fts_index_population (Transact-SQL) To view memory buffers in a memory pool that are used as part of a crawl or crawl range. sys.dm_fts_memory_buffers (Transact-SQL) To view the shared memory pools available to the full-text gatherer component for a full-text crawl or a full-text crawl range sys.dm_fts_memory_pools (Transact-SQL) To view information about each full-text indexing batch sys.dm_fts_outstanding_batches (Transact-SQL) To view information about the specific ranges related to an in-progress population sys.dm_fts_population_ranges (Transact-SQL) Set the Service Account for the Full-text Filter Daemon Launcher 3/24/2017 • 3 min to read • Edit Online This topic describes how to set or change the service account for the SQL Full-text Filter Daemon Launcher service (MSSQLFDLauncher) by using SQL Server Configuration Manager. The default service account used by SQL Server setup is NT Service\MSSQLFDLauncher . About the SQL Full-text Filter Daemon Launcher service The SQL Full-text Filter Daemon Launcher service is used by SQL Server Full-Text Search to start the filter daemon host process, which handles full-text search filtering and word breaking. The Launcher service must be running to use full-text search. The SQL Full-text Filter Daemon Launcher service is an instance-aware service that is associated with a specific instance of SQL Server. The SQL Full-text Filter Daemon Launcher service propagates the service account information to each filter daemon host process that it launches. Set the service account 1. On the Start menu, point to All Programs, expand Microsoft SQL Server 2016, and then click SQL Server 2016 Configuration Manager. 2. In SQL Server Configuration Manager, click SQL Server Services, right-click SQL Full-text Filter Daemon Launcher (instance name), and then click Properties. 3. Click the Log On tab of the dialog box, and then select or enter the account under which to run the processes that the SQL Full-text Filter Daemon Launcher service starts. 4. After you close the dialog box, click Restart to restart the SQL Full-text Filter Daemon Launcher service. Troubleshoot the SQL Full-text Filter Daemon Launcher service if it doesn't start If the SQL Full-text Filter Daemon Launcher service doesn't start, review the following possible causes: Permissions issues The SQL Server service group does not have permission to start SQL Full-text Filter Daemon Launcher service. Make sure the SQL Server service group has permissions to the SQL Full-text Filter Daemon Launcher service account. During the installation of SQL Server, the SQL Server service group is granted default permission to manage, query, and start the SQL Full-text Filter Daemon Launcher service. If SQL Server service group permissions to the SQL Full-text Filter Daemon Launcher service account have been removed after SQL Server installation, the SQL Full-text Filter Daemon Launcher service will not start, and full-text search will be disabled. The account used to log in to the service does not have privileges. You may be using an account that does not have login privileges on the computer where the server instance is installed. Verify that you are logging in with an account that has User rights and permissions on the local computer. Service account and password issues The user account or password of the service account is incorrect. In SQL Server 2016 Configuration Manager, make sure the service is using the correct service account and password. The password associated with the SQL Full-text Filter Daemon Launcher service account has expired. If you use a local user account for the SQL Full-text Filter Daemon Launcher service and the password expires, you have to: 1. Set a new Windows password for the account. 2. In SQL Server 2016 Configuration Manager, update the SQL Full-text Filter Daemon Launcher service to use the new password. Named pipes configuration issues The SQL Full-text Filter Daemon Launcher service is not configured correctly. If named pipes functionality has been disabled on the local computer, or if SQL Server has been configured to use a named pipe other than the default named pipe, the SQL Full-text Filter Daemon Launcher service might not start. Another instance of the same named pipe is already running. The SQL Server service acts as a named pipe server for the SQL Full-text Filter Daemon Launcher service client. If the named pipe was already created by another process before SQL Server starts, an error will be logged in the SQL Server error log and the Windows Event Log, and full-text search will not be available. Determine what process or application is attempting to use the same named pipe and stop the application. See Also Managing Services How-to Topics (SQL Server Configuration Manager) Upgrade Full-Text Search Upgrade Full-Text Search 3/24/2017 • 11 min to read • Edit Online Upgrading full-text search to SQL Server 2016 is done during setup and when database files and full-text catalogs from the earlier version of SQL Server are attached, restored, or copied using the Copy Database Wizard. Upgrade a server instance For an in-place upgrade, an instance of SQL Server 2016 is set up side-by-side with the old version of SQL Server, and data is migrated. If the old version of SQL Server had full-text search installed, a new version of full-text search is automatically installed. Side-by-side install means that each of the following components exists at the instance-level of SQL Server. Word breakers, stemmers, and filters Each instance now uses its own set of word breakers, stemmers, and filters, rather than relying on the operating system version of these components. These components are also easier to register and configure at a perinstance level. For more information, see Configure and Manage Word Breakers and Stemmers for Search and Configure and Manage Filters for Search. Filter daemon host The full-text filter daemon hosts are processes that safely load and drive extensible external components used for index and query, such as word breakers, stemmers, and filters, without compromising the integrity of the Full-Text Engine. A server instance uses a multithreaded process for all multithreaded filters and a single-threaded process for all single-threaded filters. NOTE SQL Server 2008 introduced a service account for the FDHOST Launcher service (MSSQLFDLauncher). This service propagates the service account information to the filter daemon host processes of a specific instance of SQL Server. For information about setting the service account, see Set the Service Account for the Full-text Filter Daemon Launcher. In SQL Server 2005, each full-text index resides in a full-text catalog that belongs to a filegroup, has a physical path, and is treated as a database file. In SQL Server 2008 and later versions, a full-text catalog is a logical or virtual object that contains a group of full-text indexes. Therefore, a new full-text catalog is not treated as a database file with a physical path. However, during upgrade of any full-text catalog that contains data files, a new filegroup is created on same disk. This maintains the old disk I/O behavior after upgrade. Any full-text index from that catalog is placed in the new filegroup if the root path exists. If the old full-text catalog path is invalid, the upgrade keeps the full-text index in the same filegroup as the base table or, for a partitioned table, in the primary filegroup. Full-text upgrade options When upgrading a server instance to SQL Server 2016, the user interface allows you to choose one of the following full-text upgrade options. Import Full-text catalogs are imported. Typically, import is significantly faster than rebuild. For example, when using only one CPU, import runs about 10 times faster than rebuild. However, an imported full-text catalog does not use the new word breakers installed with the latest version of SQL Server. To ensure consistency in query results, full-text catalogs have to be rebuilt. NOTE Rebuild can run in multi-threaded mode, and if more than 10 CPUs are available, rebuild might run faster than import if you allow rebuild to use all of the CPUs. If a full-text catalog is not available, the associated full-text indexes are rebuilt. This option is available for only SQL Server 2005 databases. For information about the impact of importing full-text index, see "Considerations for Choosing a Full-Text Upgrade Option," later in this topic. Rebuild Full-text catalogs are rebuilt using the new and enhanced word breakers. Rebuilding indexes can take a while, and a significant amount of CPU and memory might be required after the upgrade. Reset Full-text catalogs are reset. When upgrading from SQL Server 2005, full-text catalog files are removed, but the metadata for full-text catalogs and full-text indexes is retained. After being upgraded, all full-text indexes are disabled for change tracking and crawls are not started automatically. The catalog will remain empty until you manually issue a full population, after the upgrade completes. Considerations for choosing a full-text upgrade option When choosing the upgrade option for your upgrade, consider the following: Do you require consistency in query results? SQL Server 2016 installs new word breakers for use by Full-Text and Semantic Search. The word breakers are used both at indexing time and at query time. If you do not rebuild the full-text catalogs, your search results may be inconsistent. If you issue a full-text query that looks for a phrase that is broken differently by the word breaker in a previous version of SQL Server and the current word breaker, a document or row containing the phrase might not be retrieved. This is because the indexed phrases were broken using different logic than the query is using. The solution is to repopulate (rebuild) the full-text catalogs with the new word breakers so that index time and query time behavior are identical. You can choose the Rebuild option to accomplish this, or you can rebuild manually after choosing the Import option. Were any full-text indexes built on integer full-text key columns? Rebuilding performs internal optimizations that improve the query performance of the upgraded full-text index in some cases. Specifically, if you have full-text catalogs that contain full-text indexes for which the full-text key column of the base table is an integer data type, rebuilding achieves ideal performance of fulltext queries after upgrade. In this case, we highly recommend you to use the Rebuild option. NOTE For full-text indexes in SQL Server 2016, we recommend that the column serving as the full-text key be an integer data type. For more information, see Improve the Performance of Full-Text Indexes. What is the priority for getting your server instance online? Importing or rebuilding during upgrade takes a lot of CPU resources, which delays getting the rest of the server instance upgraded and online. If getting the server instance online as soon as possible is important and if you are willing to run a manual population after the upgrade, Reset is suitable. Ensure consistent query results after importing a full-text index If a full-text catalog was imported when upgrading a SQL Server 2005 database to SQL Server 2016, mismatches between the query and the full-text index content might occur because of differences in the behavior of the old and new word breakers. In this case, to guarantee a total match between queries and the full-text index content, choose one of the following options: Rebuild the full-text catalog that contains the full-text index (ALTER FULLTEXT CATALOGcatalog_name REBUILD) Issue a FULL POPULATION on the full-text index (ALTER FULLTEXT INDEX ON table_name START FULL POPULATION). For more information about word breakers, see Configure and Manage Word Breakers and Stemmers for Search. Upgrade noise-word files to stoplists When a database is upgraded to SQL Server 2016 from SQL Server 2005, the noise-word files are no longer used. However, the old noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding SQL Server 2016 stoplists. After upgrading from SQL Server 2005: If you never added, modified, or deleted any noise-word files in your installation of SQL Server 2005, the system stoplist should meet your needs. If your noise-word files were modified in SQL Server 2005, those modifications are lost during upgrade. To re-create those updates, you must manually recreate those modifications in the corresponding SQL Server 2008 stoplist. For more information, see ALTER FULLTEXT STOPLIST (Transact-SQL). If you do not want to apply any stopwords to your full-text indexes (for example, if you deleted or erased your noise-word files in your SQL Server 2005 installation), you must turn off the stoplist for each upgraded full-text index. Run the following Transact-SQL statement (replacing database with the name of the upgraded database and table with the name of the table): Use database; ALTER FULLTEXT INDEX ON table SET STOPLIST OFF; GO The STOPLIST OFF clause removes stop-word filtering, and it will trigger a population of the table, without filtering any words considered to be noise. Backup and imported full-text catalogs For full-text catalogs that are rebuilt or reset during upgrade (and for new full-text catalogs), the fulltext catalog is a logical concept and does not reside in a filegroup. Therefore, to back up a full-text catalog in SQL Server 2016, you must identify every filegroup that contains a full-text index of the catalog and back each of them up, one by one. For more information, see Back Up and Restore Full-Text Catalogs and Indexes. For full-text catalogs that have been imported from SQL Server 2005, the full-text catalog is still a database file in its own filegroup. The SQL Server 2005 backup process for full-text catalogs still applies except that the MSFTESQL service does not exist in SQL Server 2016. For information about the SQL Server 2005 process, see Backing Up and Restoring Full-Text Catalogs in SQL Server 2005 Books Online. Migrating full-text indexes when upgrading a database to SQL Server 2016 Database files and full-text catalogs from a previous version of SQL Server can be upgraded to an existing SQL Server 2016 server instance by using attach, restore, or the Copy Database Wizard. SQL Server 2005 full-text indexes, if any, are either imported, reset, or rebuilt. The upgrade_option server property controls which full-text upgrade option the server instance uses during these database upgrades. After you attach, restore, or copy any SQL Server 2005 database to SQL Server 2016, the database becomes available immediately and is then automatically upgraded. Depending the amount of data being indexed, importing can take several hours, and rebuilding can take up to ten times longer. Note also that when the upgrade option is set to import, if a full-text catalog is not available, the associated full-text indexes are rebuilt. To change full-text upgrade behavior on a server instance Transact-SQL: Use the upgrade_option action of sp_fulltext_service SQL Server Management Studio : Use the Full-Text Upgrade Option of the Server Properties dialog box. For more information, see Manage and Monitor Full-Text Search for a Server Instance. Considerations for Restoring a SQL Server 2005 Full-Text Catalog to SQL Server 2016 One method of upgrading fulltext data from a SQL Server 2005 database to SQL Server 2016 is to restore a full database backup to SQL Server 2016. While importing a SQL Server 2005 full-text catalog, you can back up and restore the database and the catalog file. The behavior is the same as in SQL Server 2005: The full database backup will include the full-text catalog. To refer to the full-text catalog, use its SQL Server 2005 file name, sysft_+catalog-name. If the full-text catalog is offline, the backup will fail. For more information about backing up and restoring SQL Server 2005 full-text catalogs, see Backing Up and Restoring Full-Text Catalogs and File Backup and Restore and Full-Text Catalogsin SQL Server 2005 Books Online. When the database is restored on SQL Server 2016, a new database file will be created for the full-text catalog. The default name of this file is ftrow_catalog-name.ndf. For example, if you catalog-name is cat1 , the default name of the SQL Server 2016 database file would be ftrow_cat1.ndf . But if the default name is already being used in the target directory, the new database file would be named ftrow_ catalog-name { GUID }.ndf , where GUID is the Globally Unique Identifier of the new file. After the catalogs have been imported, the sys.database_files and sys.master_files are updated to remove the catalog entries and the path column in sys.fulltext_catalogs is set to NULL. To back up a database Full Database Backups (SQL Server) Transaction Log Backups (SQL Server) (full recovery model only) To restore a database backup Complete Database Restores (Simple Recovery Model) Complete Database Restores (Full Recovery Model) Example The following example uses the MOVE clause in the RESTORE statement, to restore a SQL Server 2005 database named ftdb1 . The SQL Server 2005 database, log, and catalog files are moved to new locations on the SQL Server 2016 server instance, as follows: The database file, ftdb1.mdf , is moved to C:\Program Files\Microsoft SQL Server\MSSQL.1MSSQL13.MSSQLSERVER\MSSQL\DATA\ftdb1.mdf The log file, ftdb1_log.ldf \ftdb1_log.ldf , is moved to a log directory on your log disk drive, log_drive . :\ log_directory . The catalog files that correspond to the sysft_cat90 catalog are moved to C:\temp . After the full-text indexes are imported, they will automatically be placed in a database file, C:\ftrow_sysft_cat90.ndf, and the C:\temp will be deleted. RESTORE DATABASE [ftdb1] FROM DISK = N'C:\temp\ftdb1.bak' WITH FILE = 1, MOVE N'ftdb1' TO N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ftdb1.mdf', MOVE N'ftdb1_log' TO N'log_drive:\log_directory\ftdb1_log.ldf', MOVE N'sysft_cat90' TO N'C:\temp'; Attaching a SQL Server 2005 database to SQL Server 2016 In SQL Server 2008 and later versions, a full-text catalog is a logical concept that refers to a group of full-text indexes. The full-text catalog is a virtual object that does not belong to any filegroup. However, when you attach a SQL Server 2005 database that contains full-text catalog files onto a SQL Server 2016 server instance, the catalog files are attached from their previous location along with the other database files, the same as in SQL Server 2005. The state of each attached full-text catalog on SQL Server 2016 is the same as when the database was detached from SQL Server 2005. If any full-text index population was suspended by the detach operation, the population is resumed on SQL Server 2016, and the full-text index becomes available for full-text search. If SQL Server 2016 cannot find a full-text catalog file or if the full-text file was moved during the attach operation without specifying a new location, the behavior depends on the selected full-text upgrade option. If the full-text upgrade option is Import or Rebuild, the attached full-text catalog is rebuilt. If the full-text upgrade option is Reset, the attached full-text catalog is reset. For more information about detaching and attaching a database, see Database Detach and Attach (SQL Server), CREATE DATABASE (SQL Server Transact-SQL), sp_attach_db, and sp_detach_db (Transact-SQL). See also Get Started with Full-Text Search Configure and Manage Word Breakers and Stemmers for Search Configure and Manage Filters for Search Full-Text Search DDL, Functions, Stored Procedures, and Views 3/24/2017 • 1 min to read • Edit Online Lists the Transact-SQL statements and the SQL Server database objects that support full-text search, including the property search feature. This list does not include deprecated objects. For the list of database objects that support semantic search, see Semantic Search DDL, Functions, Stored Procedures, and Views. Transact-SQL Data Definition Language (DDL) Statements CREATE FULLTEXT CATALOG (Transact-SQL) CREATE FULLTEXT INDEX (Transact-SQL) CREATE FULLTEXT STOPLIST (Transact-SQL) CREATE SEARCH PROPERTY LIST (Transact-SQL) ALTER FULLTEXT CATALOG (Transact-SQL) ALTER FULLTEXT INDEX (Transact-SQL) ALTER FULLTEXT STOPLIST (Transact-SQL) ALTER SEARCH PROPERTY LIST (Transact-SQL) DROP FULLTEXT CATALOG (Transact-SQL) DROP FULLTEXT INDEX (Transact-SQL) DROP FULLTEXT STOPLIST (Transact-SQL) DROP SEARCH PROPERTY LIST (Transact-SQL) System Predicates and Functions CONTAINS (Transact-SQL) CONTAINSTABLE (Transact-SQL) FREETEXT (Transact-SQL) FREETEXTTABLE (Transact-SQL) System Metadata Functions COLUMNPROPERTY (Transact-SQL) FULLTEXTCATALOGPROPERTY (Transact-SQL) FULLTEXTSERVICEPROPERTY (Transact-SQL) INDEXPROPERTY (Transact-SQL) OBJECTPROPERTY (Transact-SQL) OBJECTPROPERTYEX (Transact-SQL) SERVERPROPERTY (Transact-SQL) System Stored Procedures sp_fulltext_keymappings (Transact-SQL) sp_fulltext_load_thesaurus_file (Transact-SQL) sp_fulltext_pendingchanges (Transact-SQL) sp_fulltext_service (Transact-SQL) sp_help_fulltext_system_components (Transact-SQL) System Views – Catalog Views sys.fulltext_catalogs (Transact-SQL) sys.fulltext_document_types (Transact-SQL) sys.fulltext_index_catalog_usages (Transact-SQL) sys.fulltext_index_columns (Transact-SQL) sys.fulltext_index_fragments (Transact-SQL) sys.fulltext_indexes (Transact-SQL) sys.fulltext_languages (Transact-SQL) sys.fulltext_stoplists (Transact-SQL) sys.fulltext_stopwords (Transact-SQL) sys.fulltext_system_stopwords (Transact-SQL) sys.registered_search_properties (Transact-SQL) sys.registered_search_property_lists (Transact-SQL) System Views – Dynamic Management Views sys.dm_fts_active_catalogs (Transact-SQL) sys.dm_fts_fdhosts (Transact-SQL) sys.dm_fts_index_keywords (Transact-SQL) sys.dm_fts_index_keywords_by_document (Transact-SQL) sys.dm_fts_index_keywords_by_property (Transact-SQL) sys.dm_fts_index_population (Transact-SQL) sys.dm_fts_memory_buffers (Transact-SQL) sys.dm_fts_memory_pools (Transact-SQL) sys.dm_fts_outstanding_batches (Transact-SQL) sys.dm_fts_parser (Transact-SQL) sys.dm_fts_population_ranges (Transact-SQL) Use the Full-Text Indexing Wizard 3/30/2017 • 4 min to read • Edit Online The Full-Text Indexing Wizard in SSMS walks you through a series of steps designed to help you create a full-text index. Create a Full-Text index 1. In Object Explorer, right-click the table on which you want to create a full-text index, point to Full-Text index, and then click Define Full-Text Index. This action launches the Wizard in a separate window. Click Next 2. Unique Index. Select an index from the drop down list. The index must be a single-key-column, unique, non-nullable index. Select the smallest unique key index for the full-text unique key. For best performance, a clustered index is recommended. 3. Available Columns. Check the box next to all column names for columns you want to include. check box next to the column name. Ineligible columns are greyed out and their check boxes disabled. 4. Language for Word Breaker. Select a language from the drop-down list. This choice will be used to identify the correct word breakers for the index. SQL Server uses word breakers to identify word boundaries in the full-text indexed data. 5. Type Column. Select the name of the column that holds the document type of column being full-text indexed. NOTE: The Type Column is enabled only when the column named in the Available Columns column is of type varbinary(max) or image. 6. Statistical Semantics. Select whether to enable semantic indexing for the selected column. For more information, see Semantic Search (SQL Server). NOTES If your selected language does not have an associated Semantic Language Model, then the Statistical Semantics checkbox is not enabled. If you select Statistical Semantics prior to selecting a Language, the languages available in the drop-down combo box will be restricted to those for which there is Semantic Language Model support. Semantic Search is not available for Azure SQL Database. The Statistical Semantics option does not appear when running this Wizard on an Azure SQL Database. 1. Select the change tracking options. Automatically Select this radio button to have the full-text index updated automatically as changes occur to the underlying data. Manually Select this radio button if you do not want the full-text index to be updated automatically as changes occur to the underlying data. Changes to the underlying data are maintained. However, to apply the changes to the full-text index you must start or schedule this process manually. Do not track changes Select this radio button if you do not want the full-text index to be updated with changes to the underlying data. 2. Start full population when index is created (Available only when you Do not track changes). Select this radio button to kick off a full population at the successful completion of this wizard. This will consist of creating the full-text index structure in the catalog and populating it with full-text indexed data. Click Next Catalog, Index Filegroup and Stoplist 1. Select full-text catalog Select a catalog: Select a full-text catalog from the list. The default catalog for the database will be the selected item by default in the list. If no catalogs are available, the list will be disabled, and the Create a new catalog checkbox will be checked and disabled. OR a. Create a new catalog b. Select full-text catalog. a. Name Enter a name for your new full-text catalog. b. Set as default catalog Select to make the catalog the default for this database. c. Accent sensitivity Specify whether the new catalog will be accent-sensitive or accent-insensitive. If the database is accentsensitive, Sensitive is selected by default. d. Select index filegroup Specify the filegroup on which to create the full-text index. e. Select a value: VALUE DESCRIPTION If the table or view is not partitioned, select to use the same filegroup as the underlying table or view. If the table or view is partitioned, the primary filegroup is used PRIMARY Select to use the primary filegroup for the new full-text index. user-specified default filegroup If a user-defined default stoplist exists, select its name from the list to use that filegroup for the new full-text index. a. Select full-text stoplist Specify a stoplist to use for the full-text index, or disable stoplist use. Stopwords are managed in databases using objects called stoplists. A stoplist is a list of stopwords that, when associated with a full-text index, is applied to full-text queries on that index. For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search. Select one of the following values: VALUE DESCRIPTION Select to use the system stoplist on the new full-text index. This is the default Select to disable stoplists for the new full-text index. user-defined-stoplist-name The list displays the name of each user-defined stoplist, if any, that has been created on the database. Select any user-defined stoplist to use for the new full-text index. Click Next 2. Optionally, SQL Server only, define the population schedule. Indexing operations will begin immediately unless they have been scheduled for future execution. Schedules will be created immediately, although they will not run until their scheduled time. New Table Schedule Define a population schedule for a table. New Catalog Schedule Define a population schedule for a full-text catalog. Edit Edit a schedule. Delete Delete a schedule. 3. View or control the progress of the Full-Text Indexing Wizard. Stop Interrupts the current operation and prevents subsequent full-text operations from being performed by the wizard during this session. Report When all of the operations have finished executing, click this button to access a report on the operations performed. You can view the report, print it to a file, copy it to the clipboard, or e-mail the report. Deprecated Full-Text Search Features in SQL Server 2016 3/30/2017 • 1 min to read • Edit Online This topic describes the deprecated full-text search features still available in SQL Server. These features are scheduled to be removed in a future release. Do not use deprecated features in new applications. Monitor your use of deprecated features by using the SQL Server:Deprecated Features object performance counter and trace events. For more information, see Use SQL Server Objects. Features no longer supported DEPRECATED FEATURE REPLACEMENT FEATURE NAME FEATURE ID FULLTEXTCATALOGPROPER TY property: LogSize None. FULLTEXTCATALOGPROPER TY('LogSize') 211 FULLTEXTSERVICEPROPERTY property: None. FULLTEXTSERVICEPROPERTY ('ConnectTimeout') 210 209 ConnectTimeout FULLTEXTSERVICEPROPERTY ('DataTimeout') DataTimeout sp_fulltext_catalog CREATE FULL CATALOG sp_fulltext_catalog 84 ALTER FULLTEXT CATALOG DROP FULLTEXT CATALOG sp_fulltext_column CREATE FULL INDEX sp_fulltext_column 86 sp_fulltext_database ALTER FULLTEXT INDEX sp_fulltext_database 87 sp_fulltext_table DROP FULLTEXT INDEX sp_fulltext_table 85 sp_help_fulltext_catalogs sys.fulltext_catalogs sp_help_fulltext_catalogs 88 sp_help_fulltext_catalog_com ponents sys.fulltext_index_columns sp_help_fulltext_catalog_com ponents 203 sys.fulltext_indexes 90 sp_help_fulltext_catalogs_cur sor sp_help_fulltext_catalogs_cur sor 92 sp_help_fulltext_columns sp_help_fulltext_columns 93 sp_help_fulltext_columns_cur sor sp_help_fulltext_columns_cur sor 91 sp_help_fulltext_tables sp_help_fulltext_table sp_help_fulltext_tables_curso r sp_help_fulltext_tables_curso r 89 DEPRECATED FEATURE REPLACEMENT FEATURE NAME FEATURE ID sp_fulltext_service action values: clean_up, connect_timeout, and data_timeout return zero None sp_fulltext_service @action=clean_up<br /> sp_fulltext_service @action=connect_timeout< br /> sp_fulltext_service @action=data_timeout 116 dm_fts_active_catalogs.is_pa used 218 sys.dm_fts_active_catalogs columns: None. 117 118 221 is_paused dm_fts_active_catalogs.previ ous_status 222 previous_status dm_fts_active_catalogs.previ ous_status_description previous_status_description 224 219 row_count_in_thousands dm_fts_active_catalogs.row_c ount_in_thousands 220 dm_fts_active_catalogs.status 223 status status_description dm_fts_active_catalogs.status _description worker_count dm_fts_active_catalogs.worke r_count sys.dm_fts_memory_buffers column: None. dm_fts_memory_buffers.row_ count 225 None. fulltext_catalogs.path 215 fulltext_catalogs.data_space_i d 216 row_count sys.fulltext_catalogs columns: path data_space_id 217 fulltext_catalogs.file_id file_id columns Features Not Supported in a Future Version of SQL Server The following full-text search features are supported in the next version of SQL Server, but will be removed in a later version. The specific version of SQL Server has not been determined. The Feature name value appears in trace events as the ObjectName and in performance counters and sys.dm_os_performance_counters as the instance name. The Feature ID value appears in trace events as the ObjectId. DEPRECATED FEATURE REPLACEMENT FEATURE NAME FEATURE ID CONTAINS and CONTAINSTABLE generic NEAR operator: The custom NEAR operator: FULLTEXT_OLD_NEAR_SYNT AX 247 {|} { { | } [ ,…n ] { | ( { | } [,…n] ) { { NEAR | ~ } { | } } [...n] [, [,] ] } } CREATE FULLTEXT CATLOG IN PATH 237 NEAR( ) ::= {integer | MAX} ::= {TRUE | FALSE} CREATE FULLTEXT CATALOG option: None. None.* None.* IN PATH 'rootpath' ON FILEGROUP filegroup DATABASEPROPERTYEX property: IsFullTextEnabled None. DATABASEPROPERTYEX('IsF ullTextEnabled') 202 sp_detach_db option: None. sp_detach_db @keepfulltextindexfile 226 None sp_fulltext_service @action=resource_usage 200 [ @keepfulltextindexfile = ] 'KeepFulltextIndexFile' sp_fulltext_service action values: resource_usage has no function. The **SQL Server:Deprecated Features* object does not monitor occurrences of CREATE FULLTEXT CATLOG ON FILEGROUP filegroup. Semantic Search (SQL Server) 3/24/2017 • 3 min to read • Edit Online Statistical Semantic Search provides deep insight into unstructured documents stored in SQL Server databases by extracting and indexing statistically relevant key phrases. Then it uses these key phrases to identify and index documents that are similar or related. What can you do with Semantic Search? Semantic search builds upon the existing full-text search feature in SQL Server, but enables new scenarios that extend beyond keyword searches. While full-text search lets you query the words in a document, semantic search lets you query the meaning of the document. Solutions that are now possible include automatic tag extraction, related content discovery, and hierarchical navigation across similar content. For example, you can query the index of key phrases to build the taxonomy for an organization, or for a corpus of documents. Or, you can query the document similarity index to identify resumes that match a job description. The following examples demonstrate the capabilities of Semantic Search. At the same time these examples demonstrate the three Transact-SQL rowset functions that you use to query the semantic indexes and retrieve the results as structured data. Find the key phrases in a document The following query gets the key phrases that were identified in the sample document. It presents the results in descending order by the score that ranks the statistical significance of each key phrase. This query calls the semantickeyphrasetable function. SET @Title = 'Sample Document.docx' SELECT @DocID = DocumentID FROM Documents WHERE DocumentTitle = @Title SELECT @Title AS Title, keyphrase, score FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID) ORDER BY score DESC Find similar or related documents The following query gets the documents that were identified as similar or related to the sample document. It presents the results in descending order by the score that ranks the similarity of the two documents. This query calls the semanticsimilaritytable function. SET @Title = 'Sample Document.docx' SELECT @DocID = DocumentID FROM Documents WHERE DocumentTitle = @Title SELECT @Title AS SourceTitle, DocumentTitle AS MatchedTitle, DocumentID, score FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID) INNER JOIN Documents ON DocumentID = matched_document_key ORDER BY score DESC Find the key phrases that make documents similar or related The following query gets the key phrases that make the two sample documents similar or related to one another. It presents the results in descending order by the score that ranks the weight of each key phrase. This query calls the semanticsimilaritydetailstable function. SET @SourceTitle = 'first.docx' SET @MatchedTitle = 'second.docx' SELECT @SourceDocID = DocumentID FROM Documents WHERE DocumentTitle = @SourceTitle SELECT @MatchedDocID = DocumentID FROM Documents WHERE DocumentTitle = @MatchedTitle SELECT @SourceTitle AS SourceTitle, @MatchedTitle AS MatchedTitle, keyphrase, score FROM semanticsimilaritydetailstable(Documents, DocumentContent, @SourceDocID, DocumentContent, @MatchedDocID) ORDER BY score DESC Store your documents in SQL Server Before you can index documents with Semantic Search, you have to store the documents in a SQL Server database. The FileTable feature in SQL Server makes unstructured files and documents first-class citizens of the relational database. As a result, database developers can manipulate documents together with structured data in TransactSQL set-based operations. For more info about the FileTable feature, see FileTables (SQL Server). For info about the FILESTREAM feature, which is another option for storing documents in the database, see FILESTREAM (SQL Server). Related tasks Install and Configure Semantic Search Describes the prerequisites for statistical semantic search and how to install or check them. Enable Semantic Search on Tables and Columns Describes how to enable or disable statistical semantic indexing on selected columns that contain documents or text. Find Key Phrases in Documents with Semantic Search Describes how to find the key phrases in documents or text columns that are configured for statistical semantic indexing. Find Similar and Related Documents with Semantic Search Describes how to find similar or related documents or text values, and information about how they are similar or related, in columns that are configured for statistical semantic indexing. Manage and Monitor Semantic Search Describes the process of semantic indexing and the tasks related to monitoring and managing the indexes. Related content Semantic Search DDL, Functions, Stored Procedures, and Views Lists the Transact-SQL statements and the SQL Server database objects added or changed to support statistical semantic search. Install and Configure Semantic Search 3/24/2017 • 5 min to read • Edit Online Describes the prerequisites for statistical semantic search and how to install or check them. Install Semantic Search Check whether Semantic Search is installed Query the IsFullTextInstalled property of the SERVERPROPERTY (Transact-SQL) metadata function. A return value of 1 indicates that Full-Text Search and Semantic Search are installed; a return value of 0 indicates that they are not installed. SELECT SERVERPROPERTY('IsFullTextInstalled'); GO Install Semantic Search To install Semantic Search, select Full-Text and Semantic Extractions for Search on the Features to Install page during SQL Server setup. Statistical Semantic Search depends on Full-Text Search. These two optional features of SQL Server are installed together. Install the Semantic Language Statistics Database Semantic Search has an additional external dependency that is called the semantic language statistics database. This database contains the statistical language models required by semantic search. A single semantic language statistics database contains the language models for all the languages that are supported for semantic indexing. Check whether the Semantic Language Statistics Database is installed Query the catalog view sys.fulltext_semantic_language_statistics_database (Transact-SQL). If the semantic language statistics database is installed and registered for the instance, then the query results contain a single row of information about the database. SELECT * FROM sys.fulltext_semantic_language_statistics_database; GO Install, attach, and register the Semantic Language Statistics Database The semantic language statistics database is not installed by the SQL Server setup program. To set up the Semantic Language Statistics database as a prerequisite for semantic indexing, do the following things: 1. Install the semantic language statistics database. 1. Locate the semantic language statistics database on the SQL Server installation media or download it from the Web. a. Locate the Windows installer package named SemanticLanguageDatabase.msi on the SQL Server installation media. b. Download the installer package from the Microsoft® SQL Server® 2016 Semantic Language Statistics page on the Microsoft Download Center. 1. Run the SemanticLanguageDatabase.msi Windows installer package to extract the database and log file. You can optionally change the destination directory. By default, the installer extracts the files to a folder named Microsoft Semantic Language Database in the Program Files folder. The MSI file contains a compressed database file and log file. 2. Move the extracted database file and log file to a suitable location in the file system. If you leave the files in their default location, it will not be possible to extract another copy of the database for another instance of SQL Server. IMPORTANT When the semantic language statistics database is extracted, restricted permissions are assigned to the database file and log file in the default location in the file system. As a result, you may not have permission to attach the database if you leave it in the default location. If an error is raised when you try to attach the database, move the files, or check and fix file system permissions as appropriate. 2. Attach the semantic language statistics database. Attach the database to the instance of SQL Server by using Management Studio or by calling CREATE DATABASE (SQL Server Transact-SQL) with the FOR ATTACH syntax. For more information, see Database Detach and Attach (SQL Server). By default, the name of the database is semanticsdb. You can optionally give the database a different name when you attach it. You have to provide this name when you register the database in the subsequent step. CREATE DATABASE semanticsdb ON ( FILENAME = 'C:\Microsoft Semantic Language Database\semanticsdb.mdf' ) LOG ON ( FILENAME = 'C:\Microsoft Semantic Language Database\semanticsdb_log.ldf' ) FOR ATTACH; GO This code sample assumes that you have moved the database from its default location to a new location. 3. Register the semantic language statistics database. Call the stored procedure sp_fulltext_semantic_register_language_statistics_db (Transact-SQL) and provide the name that you gave to the database when you attached it. EXEC sp_fulltext_semantic_register_language_statistics_db @dbname = N'semanticsdb'; GO Requirements and restrictions for the Semantic Language Statistics Database You can only attach and register one semantic language statistics database on an instance of SQL Server. Each instance of SQL Server on a single computer requires a separate physical copy of the semantic language statistics database. Attach one copy to each instance. You cannot detach a valid and registered semantic language statistics database and replace it with an arbitrary database that has the same name. Doing so will cause active or future index populations to fail. The semantic language statistics database is read-only. You cannot customize this database. If you alter the content of the database in any way, the results for future semantic indexing are indeterministic. To restore the original state of this data, you can drop the altered database, and download and attach a new and unaltered copy of the database. It is possible to detach or drop the semantic language statistics database. If there are any active indexing operations that have read locks on the database, then the detach or drop operation will fail or time out. This is consistent with existing behavior. After the database is removed, semantic indexing operations will fail. Remove the Semantic Language Statistics Database Unregister, detach, and remove the Semantic Language Statistics Database 1. Unregister the semantic language statistics database. Call the stored procedure sp_fulltext_semantic_unregister_language_statistics_db (Transact-SQL). You do not have to provide the name of the database since an instance can have only one semantic language statistics database. EXEC sp_fulltext_semantic_unregister_language_statistics_db; GO 2. Detach the semantic language statistics database. Call the stored procedure sp_detach_db (Transact-SQL) and provide the name of the database. USE master; GO EXEC sp_detach_db @dbname = N'semanticsdb'; GO 3. Remove the semantic language statistics database. After unregistering and detaching the database, you can simply delete the database file. There is no uninstall program and there is no entry in Programs and Features in the Control Panel. Install optional support for newer document types Install the latest filters for Microsoft Office and other Microsoft document types SQL Server installs the latest Microsoft word breakers and stemmers, but does not install the latest filters for Microsoft Office documents and other Microsoft document types. These filters are required for indexing documents created with recent versions of Microsoft Office and other Microsoft applications. To download the latest filters, see Microsoft Office 2010 Filter Packs. (There does not appear to be a Filter Pack release for Office 2013 or Office 2016.) Enable Semantic Search on Tables and Columns 3/24/2017 • 9 min to read • Edit Online Describes how to enable or disable statistical semantic indexing on selected columns that contain documents or text. Statistical Semantic Search uses the indexes that are created by Full-Text Search, and creates additional indexes. As a result of this dependency on full-text search, you create a new semantic index when you define a new full-text index, or when you alter an existing full-text index. You can create a new semantic index by using Transact-SQL statements, or by using the Full-Text Indexing Wizard and other dialog boxes in SQL Server Management Studio, as described in this topic. Create a semantic index Requirements and restrictions for creating a semantic index You can create an index on any of the database objects that are supported for full-text indexing, including tables and indexed views. Before you can enable semantic indexing for specific columns, the following prerequisites must exist: A full-text catalog must exist for the database. The table must have a full-text index. The selected columns must participate in the full-text index. You can create and enable all these requirements at the same time. You can create a semantic index on columns that have any of the data types that are supported for full-text indexing. For more information, see Create and Manage Full-Text Indexes. You can specify any document type that is supported for full-text indexing for varbinary(max) columns. For more information, see How To: Determine Which Document Types Can Be Indexed in this topic. Semantic indexing creates two types of indexes for the columns that you select – an index of key phrases, and an index of document similarity. You cannot select only one type of index or the other when you enable semantic indexing. However you can query these two indexes independently. For more information, see Find Key Phrases in Documents with Semantic Search and Find Similar and Related Documents with Semantic Search. If you do not explicitly specify an LCID for a semantic index, then only the primary language and its associated language statistics are used for semantic indexing. If you specify a language for a column for which the language model is not available, the creation of the index fails and returns an error message. Create a semantic index when there is no full-text index When you create a new full-text index with the CREATE FULLTEXT INDEX statement, you can enable semantic indexing at the column level by specifying the keyword STATISTICAL_SEMANTICS as part of the column definition. You can also enable semantic indexing when you use the Full-Text Indexing Wizard to create a new fulltext index. Create a new semantic index by using Transact-SQL Call the CREATE FULLTEXT INDEX statement and specify STATISTICAL_SEMANTICS for each column on which you want to create a semantic index. For more information about all the options for this statement, see CREATE FULLTEXT INDEX (Transact-SQL). Example 1: Create a unique index, full-text index, and semantic index The following example creates a default full-text catalog, ft. The example then creates a unique index on the JobCandidateID column of the HumanResources.JobCandidate table of the AdventureWorks2012 sample database. This unique index is required as the key column for a full-text index. The example then creates a full-text index and a semantic index on the Resume column. CREATE FULLTEXT CATALOG ft AS DEFAULT GO CREATE UNIQUE INDEX ui_ukJobCand ON HumanResources.JobCandidate(JobCandidateID) GO CREATE FULLTEXT INDEX ON HumanResources.JobCandidate (Resume Language 1033 Statistical_Semantics ) KEY INDEX JobCandidateID WITH STOPLIST = SYSTEM GO Example 2: Create a full-text and semantic index on several columns with delayed index population The following example creates a full-text catalog, documents_catalog, in the AdventureWorks2012 sample database. The example then creates a full-text index that uses this new catalog. The full-text index is created on the Title, DocumentSummary, and Document columns of the Production.Document table, while the semantic index is only created on the Document column. This full-text index uses the newly-created full-text catalog and an existing unique key index, PK_Document_DocumentID. As recommended, this index key is created on an integer column, DocumentID. The example specifies the LCID for English, 1033, which is the language of the data in the columns. This example also specifies that change tracking is off with no population. Later, during off-peak hours, the example uses an ALTER FULLTEXT INDEX statement to start a full population on the new index and enable automatic change tracking. CREATE FULLTEXT CATALOG documents_catalog GO CREATE FULLTEXT INDEX ON Production.Document ( Title Language 1033, DocumentSummary Language 1033, Document TYPE COLUMN FileExtension Language 1033 Statistical_Semantics ) KEY INDEX PK_Document_DocumentID ON documents_catalog WITH CHANGE_TRACKING OFF, NO POPULATION GO Later, at an off-peak time, the index is populated: ALTER FULLTEXT INDEX ON Production.Document SET CHANGE_TRACKING AUTO GO Create a new semantic index by using SQL Server Management Studio Run the Full-Text Indexing Wizard and enable Statistical Semantics on the Select Table Columns page for each column on which you want to create a semantic index. For more information, including information about how to start the Full-Text Indexing Wizard, see Use the Full-Text Indexing Wizard. Create a semantic index when there is an existing full-text index You can add semantic indexing when you alter an existing full-text index with the ALTER FULLTEXT INDEX statement. You can also add semantic indexing by using various dialog boxes in SQL Server Management Studio. Add a semantic index by using Transact-SQL Call the ALTER FULLTEXT INDEX statement with the options described below for each column on which you want to add a semantic index. For more information about all the options for this statement, see ALTER FULLTEXT INDEX (Transact-SQL). Both full-text and semantic indexes are repopulated after a call to ALTER, unless you specify otherwise. To add full-text indexing only to a column, use the ADD syntax. To add both full-text and semantic indexing to a column, use the ADD syntax with the STATISTICAL_SEMANTICS option. To add semantic indexing to a column that is already enabled for full-text indexing, use the ADD STATISTICAL_SEMANTICS option. You can only add semantic indexing to one column in a single ALTER statement. Example: Add semantic indexing to a column that already has full-text indexing The following example alters an existing full-text index on Production.Document table in AdventureWorks2012 sample database. The example adds a semantic index on the Document column of the Production.Document table, which already has a full-text index. The example specifies that the index will not be repopulated automatically. ALTER FULLTEXT INDEX ON Production.Document ALTER COLUMN Document ADD Statistical_Semantics WITH NO POPULATION GO Add a semantic index by using SQL Server Management Studio You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes. Alter a semantic index Requirements and restrictions for altering an existing index You cannot alter an existing index while population of the index is in progress. For more information on monitoring the progress of index population, see Manage and Monitor Semantic Search. You cannot add indexing to a column, and alter or drop indexing for the same column, in a single call to the ALTER FULLTEXT INDEX statement. Drop a semantic index You can drop semantic indexing when you alter an existing full-text index with the ALTER FULLTEXT INDEX statement. You can also drop semantic indexing by using various dialog boxes in SQL Server Management Studio. Drop a semantic index by using Transact-SQL To drop semantic indexing only from a column or columns, call the ALTER FULLTEXT INDEX statement with the ALTER COLUMNcolumn_nameDROP STATISTICAL_SEMANTICS option. You can drop the indexing from multiple columns in a single ALTER statement. USE database_name GO ALTER FULLTEXT INDEX ALTER COLUMN column_name DROP STATISTICAL_SEMANTICS GO To drop both full-text and semantic indexing from a column, call the ALTER FULLTEXT INDEX statement with the ALTER COLUMNcolumn_nameDROP option. USE database_name GO ALTER FULLTEXT INDEX ALTER COLUMN column_name DROP GO Drop a semantic index by using SQL Server Management Studio You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes. Requirements and restrictions for dropping a semantic index You cannot drop full-text indexing from a column while retaining semantic indexing. Semantic indexing depends on full-text indexing for document similarity results. You cannot specify the NO POPULATION option when you drop semantic indexing from the last column in a table for which semantic indexing was enabled. A population cycle is required to remove the results that were indexed previously. Check whether semantic search is enabled on database objects Is semantic search enabled for a database? Query the IsFullTextEnabled property of the DATABASEPROPERTYEX (Transact-SQL) metadata function. A return value of 1 indicates that full-text search and semantic search are enabled for the database; a return value of 0 indicates that they are not enabled. SELECT DATABASEPROPERTYEX('database_name', 'IsFullTextEnabled') GO Is semantic search enabled for a table? Query the TableFullTextSemanticExtraction property of the OBJECTPROPERTYEX (Transact-SQL) metadata function. A return value of 1 indicates that semantic search is enabled for the table; a return value of 0 indicates that it is not enabled. SELECT OBJECTPROPERTYEX(OBJECT_ID('table_name'), 'TableFullTextSemanticExtraction') GO Is semantic search enabled for a column? To determine whether semantic search is enabled for a specific column: Query the StatisticalSemantics property of the COLUMNPROPERTY (Transact-SQL) metadata function. A return value of 1 indicates that semantic search is enabled for the column; a return value of 0 indicates that it is not enabled. SELECT COLUMNPROPERTY(OBJECT_ID('table_name'), 'column_name', 'StatisticalSemantics') GO Query the catalog view sys.fulltext_index_columns (Transact-SQL) for the full-text index. A value of 1 in the statistical_semantics column indicates that the specified column is enabled for semantic indexing in addition to full-text indexing. SELECT * FROM sys.fulltext_index_columns WHERE object_id = OBJECT_ID('table_name') GO In Object Explorer in Management Studio, right-click on a column and select Properties. On the General page of the Column Properties dialog box, check the value of the Statistical Semantics property. A value of True indicates that the specified column is enabled for semantic indexing in addition to full-text indexing. Determine what can be indexed for Semantic Search Check which languages are supported for Semantic Search IMPORTANT Fewer languages are supported for semantic indexing than for full-text indexing. As a result, there may be columns that you can index for full-text search, but not for semantic search. Query the catalog view sys.fulltext_semantic_languages (Transact-SQL). SELECT * FROM sys.fulltext_semantic_languages GO The following languages are supported for semantic indexing. This list represents the output of the catalog view sys.fulltext_semantic_languages (Transact-SQL), ordered by LCID. LANGUAGE LCID German 1031 English (US) 1033 French 1036 Italian 1040 Portuguese (Brazil) 1046 Russian 1049 Swedish 1053 English (UK) 2057 Portuguese (Portugal) 2070 Spanish 3082 Determine which document types can be indexed Query the catalog view sys.fulltext_document_types (Transact-SQL). If the document type that you want to index is not in the list of supported types, then you may have to locate, download, and install additional filters. For more information, see View or Change Registered Filters and Word Breakers. Best practice: Consider creating a separate filegroup for the full-text and semantic indexes Consider creating a separate filegroup for the full-text and semantic indexes if disk space allocation is a concern. The semantic indexes are created in the same filegroup as the full-text index. A fully populated semantic index may contain large amount of data. Issue: Searching on specific column returns no results Was a non-Unicode LCID specified for a Unicode language? It is possible to enable semantic indexing on a non-Unicode column type with an LCID for a language that only has Unicode words, such as LCID 1049 for Russian. In this case, no results will ever be returned from the semantic indexes on this column. Find Key Phrases in Documents with Semantic Search 3/24/2017 • 1 min to read • Edit Online Describes how to find the key phrases in documents or text columns that are configured for statistical semantic indexing. Find the key phrases in documents with SEMANTICKEYPHRASETABLE To identify the key phrases in specific documents, or to identify documents that contain specific key phrases, query the function semantickeyphrasetable (Transact-SQL). SEMANTICKEYPHRASETABLE returns a table with zero, one, or more rows for those key phrases associated with columns in the specified table. This rowset function can be referenced in the FROM clause of a SELECT statement as if it were a regular table name. NOTE In this release, only single words are indexed for semantic search; multi-word phrases (ngrams) are not indexed. Also, various forms of the same word are indexed separately; for example, "computer" and "computers" are indexed separately. For detailed information about the parameters required by the SEMANTICKEYPHRASETABLE function, and about the table of results that it returns, see semantickeyphrasetable (Transact-SQL). IMPORTANT The columns that you target must have full-text and semantic indexing enabled. Example 1: Find the top key phrases in a specific document The following example retrieves the top 10 key phrases from the document specified by the @DocumentId variable in the Document column of the Production.Document table of the AdventureWorks sample database. The @DocumentId variable represents a value from the key column of the full-text index. SELECT TOP(10) KEYP_TBL.keyphrase FROM SEMANTICKEYPHRASETABLE ( Production.Document, Document, @DocumentId ) AS KEYP_TBL ORDER BY KEYP_TBL.score DESC; GO The SEMANTICKEYPHRASETABLE function retrieves these results efficiently by using an index seek instead of a table scan. Example 2: Find the top documents that contain a specific key phrase The following example retrieves the top 25 documents that contain the key phrase “Bracket” from the Document column of the Production.Document table of the AdventureWorks sample database. SELECT TOP (25) DOC_TBL.DocumentID, DOC_TBL.DocumentSummary FROM Production.Document AS DOC_TBL INNER JOIN SEMANTICKEYPHRASETABLE ( Production.Document, Document ) AS KEYP_TBL ON DOC_TBL.DocumentID = KEYP_TBL.document_key WHERE KEYP_TBL.keyphrase = 'Bracket' ORDER BY KEYP_TBL.Score DESC; GO Find Similar and Related Documents with Semantic Search 3/24/2017 • 1 min to read • Edit Online Describes how to find similar or related documents or text values, and information about how they are similar or related, in columns that are configured for statistical semantic indexing. Find similar or related documents with SEMANTICSIMILARITYTABLE To identify similar or related documents in a specific column, query the function semanticsimilaritytable (TransactSQL). SEMANTICSIMILARITYTABLE returns a table of zero, one, or more rows whose content in the specified column is semantically similar to the specified document. This rowset function can be referenced in the FROM clause of a SELECT statement like a regular table name. You cannot query across columns for similar documents. The SEMANTICSIMILARITYTABLE function only retrieves results from the same column as the source column, which is identified by the source_key argument. For detailed information about the parameters required by the SEMANTICSIMILARITYTABLE function, and about the table of results that it returns, see semanticsimilaritytable (Transact-SQL). IMPORTANT The columns that you target must have full-text and semantic indexing enabled. Example: Find the top documents that are similar to another document The following example retrieves the top 10 candidates who are similar to the candidate specified by @CandidateID from the HumanResources.JobCandidate table in the AdventureWorks2012 sample database. SELECT TOP(10) KEY_TBL.matched_document_key AS Candidate_ID FROM SEMANTICSIMILARITYTABLE ( HumanResources.JobCandidate, Resume, @CandidateID ) AS KEY_TBL ORDER BY KEY_TBL.score DESC; GO Find info about how documents are similar or related with SEMANTICSIMILARITYDETAILSTABLE To get information about the key phrases that make documents similar or related, you can query the function semanticsimilaritydetailstable (Transact-SQL). SEMANTICSIMILARITYDETAILSTABLE returns a table of zero, one, or more rows of key phrases common across two documents (a source document and a matched document) whose content is semantically similar. This rowset function can be referenced in the FROM clause of a SELECT statement like a regular table name. For detailed information about the parameters required by the SEMANTICSIMILARITYDETAILSTABLE function, and about the table of results that it returns, see semanticsimilaritydetailstable (Transact-SQL). IMPORTANT The columns that you target must have full-text and semantic indexing enabled. Example: Find the top key phrases that are similar between documents The following example retrieves the 5 key phrases that have the highest similarity score between the specified candidates in HumanResources.JobCandidate table of the AdventureWorks2012 sample database. SELECT TOP(5) KEY_TBL.keyphrase, KEY_TBL.score FROM SEMANTICSIMILARITYDETAILSTABLE ( HumanResources.JobCandidate, Resume, @CandidateID, Resume, @MatchedID ) AS KEY_TBL ORDER BY KEY_TBL.score DESC; GO Manage and Monitor Semantic Search 3/24/2017 • 3 min to read • Edit Online Describes the process of semantic indexing and the tasks related to managing and monitoring the indexes. Check the status of semantic indexing Is the first phase of semantic indexing complete? Query the dynamic management view, sys.dm_fts_index_population (Transact-SQL), and check the status and status_description columns. The first phase of indexing includes the population of the full-text keyword index and the semantic key phrase index, as well as the extraction of document similarity data. USE database_name GO SELECT * FROM sys.dm_fts_index_population WHERE table_id = OBJECT_ID('table_name') GO Is the second phase of semantic indexing complete? Query the dynamic management view, sys.dm_fts_semantic_similarity_population (Transact-SQL), and check the status and status_description columns.. The second phase of indexing includes the population of the semantic document similarity index. USE database_name GO SELECT * FROM sys.dm_fts_semantic_similarity_population WHERE table_id = OBJECT_ID('table_name') GO Check the size of the semantic indexes What is the logical size of a semantic key phrase index or a semantic document similarity index? Query the dynamic management view, sys.dm_db_fts_index_physical_stats (Transact-SQL). The logical size is displayed in number of index pages. USE database_name GO SELECT * FROM sys.dm_db_fts_index_physical_stats WHERE object_id = OBJECT_ID('table_name') GO What is the total size of the full-text and semantic indexes for a full-text catalog? Query the IndexSize property of the FULLTEXTCATALOGPROPERTY (Transact-SQL) metadata function. SELECT FULLTEXTCATALOGPROPERTY('catalog_name', 'IndexSize') GO How many items are indexed in the full-text and semantic indexes for a full-text catalog? Query the ItemCount property of the FULLTEXTCATALOGPROPERTY (Transact-SQL) metadata function. SELECT FULLTEXTCATALOGPROPERTY('catalog_name', 'ItemCount') GO Force the population of the semantic indexes You can force the population of full-text and semantic indexes by using the START/STOP/PAUSE or RESUME POPULATION clause with the same syntax and behavior that is described for full-text indexes. For more information, see ALTER FULLTEXT INDEX (Transact-SQL) and Populate Full-Text Indexes. Since semantic indexing is dependent on full-text indexing, semantic indexes are only populated when the associated full-text indexes are populated. Example: Start a full population of full-text and semantic indexes The following example starts full population of both full-text and semantic indexes by altering an existing full-text index on the Production.Document table in the AdventureWorks2012 sample database. USE AdventureWorks2012 GO ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION GO Disable or re-enable semantic indexing You can enable or disable full-text or semantic indexing by using the ENABLE/DISABLE clause with the same syntax and behavior that is described for full-text indexes. For more information, see ALTER FULLTEXT INDEX (Transact-SQL). When semantic indexing is disabled and suspended, queries over semantic data continue to work successfully and to return previously indexed data. This behavior is not consistent with the behavior of Full-Text Search. -- To disable semantic indexing on a table USE database_name GO ALTER FULLTEXT INDEX ON table_name DISABLE GO -- To re-enable semantic indexing on a table USE database_name GO ALTER FULLTEXT INDEX ON table_name ENABLE GO About the phases of semantic indexing Semantic Search indexes two kinds of data for each column on which it is enabled: 1. Key phrases 2. Document similarity Semantic indexing occurs in two phases, in conjunction with full-text indexing: 3. Phase 1. The full-text keyword index and the semantic key phrase index are populated in parallel at the same time. The data required to index document similarity is also extracted at this time. 4. Phase 2. The semantic document similarity index is then populated. This index depends on both indexes that were populated in the preceding phase. Issue: Semantic Indexes Are Not Populated Are the associated full-text indexes populated? Since semantic indexing is dependent on full-text indexing, semantic indexes are only populated when the associated full-text indexes are populated. Are full-text search and semantic search properly installed and configured? For more information, see Install and Configure Semantic Search. Is the FDHOST service not available, or is there another condition that would cause full-text indexing to fail? For more information, see Troubleshoot Full-Text Indexing. Semantic Search DDL, Functions, Stored Procedures, and Views 3/24/2017 • 1 min to read • Edit Online Lists the Transact-SQL statements and the database objects that support statistical semantic search in SQL Server. For the list of statements and database objects that support full-text search, see Full-Text Search DDL, Functions, Stored Procedures, and Views. Data Definition Language (DDL) Statements OBJECT MORE INFORMATION ALTER FULLTEXT INDEX (Transact-SQL) Enable Semantic Search on Tables and Columns CREATE FULLTEXT INDEX (Transact-SQL) Enable Semantic Search on Tables and Columns System Functions OBJECT MORE INFORMATION semantickeyphrasetable (Transact-SQL) Find Key Phrases in Documents with Semantic Search semanticsimilaritydetailstable (Transact-SQL) Find Similar and Related Documents with Semantic Search semanticsimilaritytable (Transact-SQL) Find Similar and Related Documents with Semantic Search System Metadata Functions OBJECT MORE INFORMATION COLUMNPROPERTY (Transact-SQL) Enable Semantic Search on Tables and Columns DATABASEPROPERTYEX (Transact-SQL) Enable Semantic Search on Tables and Columns FULLTEXTCATALOGPROPERTY (Transact-SQL) Manage and Monitor Semantic Search INDEXPROPERTY (Transact-SQL) Manage and Monitor Semantic Search OBJECTPROPERTYEX (Transact-SQL) Enable Semantic Search on Tables and Columns SERVERPROPERTY (Transact-SQL) Install and Configure Semantic Search System Stored Procedures OBJECT MORE INFORMATION sp_fulltext_semantic_register_language_statistics_db (TransactSQL) Install and Configure Semantic Search sp_fulltext_semantic_unregister_language_statistics_db (Transact-SQL) Install and Configure Semantic Search Catalog Views OBJECT MORE INFORMATION sys.fulltext_index_columns (Transact-SQL) Manage and Monitor Semantic Search sys.fulltext_semantic_language_statistics_database (TransactSQL) Install and Configure Semantic Search sys.fulltext_semantic_languages (Transact-SQL) Install and Configure Semantic Search Dynamic Management Views OBJECT MORE INFORMATION sys.dm_db_fts_index_physical_stats (Transact-SQL) Manage and Monitor Semantic Search sys.dm_fts_index_population (Transact-SQL) Manage and Monitor Semantic Search sys.dm_fts_semantic_similarity_population (Transact-SQL) Manage and Monitor Semantic Search See Also Manage and Monitor Semantic Search