Download Full-Text Search | Microsoft Docs

Document related concepts

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational model wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

PL/SQL wikipedia , lookup

Database model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
Table of Contents
Full text search
Get Started with Full-Text Search
Query with Full-Text Search
Search for Words Close to Another Word with NEAR
Limit Search Results with RANK
Improve the Performance of Full-Text Queries
Search Document Properties with Search Property Lists
Find Property Set GUIDs and Property Integer IDs for Search Properties
Create and Manage Full-Text Catalogs
Create and Manage Full-Text Indexes
Choose a Language When Creating a Full-Text Index
Populate Full-Text Indexes
Improve the Performance of Full-Text Indexes
Troubleshoot Full-Text Indexing
Back Up and Restore Full-Text Catalogs and Indexes
Configure and Manage Filters for Search
Configure and Manage Word Breakers and Stemmers for Search
View or Change Registered Filters and Word Breakers
Change the Word Breaker Used for US English and UK English
Revert the Word Breakers Used by Search to the Previous Version
Customize the Behavior of Word Breakers with a Custom Dictionary
Configure and Manage Stopwords and Stoplists for Full-Text Search
Configure and Manage Thesaurus Files for Full-Text Search
Manage and Monitor Full-Text Search for a Server Instance
Set the Service Account for the Full-text Filter Daemon Launcher
Upgrade Full-Text Search
Full-Text Search DDL, Functions, Stored Procedures, and Views
Use the Full-Text Indexing Wizard
Deprecated Full-Text Search Features in SQL Server 2016
Semantic Search
Install and Configure Semantic Search
Enable Semantic Search on Tables and Columns
Find Key Phrases in Documents with Semantic Search
Find Similar and Related Documents with Semantic Search
Manage and Monitor Semantic Search
Semantic Search DDL, Functions, Stored Procedures, and Views
Full-Text Search
3/24/2017 • 16 min to read • Edit Online
Full-Text Search in SQL Server and Azure SQL Database lets users and applications run full-text queries against
character-based data in SQL Server tables.
Basic tasks
This topic provides an overview of Full-Text Search and describes its components and its architecture. If you prefer
to get started right away, here are the basic tasks.
Get Started with Full-Text Search
Create and Manage Full-Text Catalogs
Create and Manage Full-Text Indexes
Populate Full-Text Indexes
Query with Full-Text Search
NOTE
Full-Text Search is an optional component of the SQL Server Database Engine. If you didn't select Full-Text Search when you
installed SQL Server, run SQL Server Setup again to add it.
Overview
A full-text index includes one or more character-based columns in a table. These columns can have any of the
following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max) and
FILESTREAM. Each full-text index indexes one or more columns from the table, and each column can use a specific
language.
Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and
phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include
simple words and phrases or multiple forms of a word or phrase. A full-text query returns any documents that
contain at least one match (also known as a hit). A match occurs when a target document contains all the terms
specified in the full-text query, and meets any other search conditions, such as the distance between the matching
terms.
Full-Text Search queries
After columns have been added to a full-text index, users and applications can run full-text queries on the text in the
columns. These queries can search for any of the following:
One or more specific words or phrases (simple term)
A word or a phrase where the words begin with specified text (prefix term)
Inflectional forms of a specific word (generation term)
A word or phrase close to another word or phrase (proximity term)
Synonymous forms of a specific word (thesaurus)
Words or phrases using weighted values (weighted term)
Full-text queries are not case-sensitive. For example, searching for "Aluminum" or "aluminum" returns the
same results.
Full-text queries use a small set of Transact-SQL predicates (CONTAINS and FREETEXT) and functions
(CONTAINSTABLE and FREETEXTTABLE). However, the search goals of a given business scenario influence
the structure of the full-text queries. For example:
e-business—searching for a product on a website:
SELECT product_id
FROM products
WHERE CONTAINS(product_description, ”Snap Happy 100EZ” OR FORMSOF(THESAURUS,’Snap Happy’) OR ‘100EZ’)
AND product_cost < 200 ;
Recruitment scenario—searching for job candidates that have experience working with SQL Server:
SELECT candidate_name,SSN
FROM candidates
WHERE CONTAINS(candidate_resume,”SQL Server”) AND candidate_division =DBA;
For more information, see Query with Full-Text Search.
Compare Full-Text Search queries to the LIKE predicate
In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot
use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of
unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against
millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less
against the same data, depending on the number of rows that are returned.
Full-Text Search architecture
Full-text search architecture consists of the following processes:
The SQL Server process (sqlservr.exe).
The filter daemon host process (fdhost.exe).
For security reasons, filters are loaded by separate processes called the filter daemon hosts. The fdhost.exe
processes are created by an FDHOST launcher service (MSSQLFDLauncher), and they run under the security
credentials of the FDHOST launcher service account. Therefore, the FDHOST launcher service must be
running for full-text indexing and full-text querying to work. For information about setting the service
account for this service, see Set the Service Account for the Full-text Filter Daemon Launcher.
These two processes contain the components of the full-text search architecture. These components and
their relationships are summarized in the following illustration. The components are described after the
illustration.
SQL Server process
The SQL Server process uses the following components for full-text search:
User tables. These tables contain the data to be full-text indexed.
Full-text gatherer. The full-text gatherer works with the full-text crawl threads. It is responsible for
scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs.
Thesaurus files. These files contain synonyms of search terms. For more information, see Configure and
Manage Thesaurus Files for Full-Text Search.
Stoplist objects. Stoplist objects contain a list of common words that are not useful for the search. For
more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.
SQL Server query processor. The query processor compiles and executes SQL queries. If a SQL query
includes a full-text search query, the query is sent to the Full-Text Engine, both during compilation and
during execution. The query result is matched against the full-text index.
Full-Text Engine. The Full-Text Engine in SQL Server is fully integrated with the query processor. The FullText Engine compiles and executes full-text queries. As part of query execution, the Full-Text Engine might
receive input from the thesaurus and stoplist.
NOTE
In SQL Server 2008 and later versions, the Full-Text Engine resides in the SQL Server process, rather than in a
separate service. Integrating the Full-Text Engine into the Database Engine improved full-text manageability,
optimization of mixed query, and overall performance.
Index writer (indexer). The index writer builds the structure that is used to store the indexed tokens.
Filter daemon manager. The filter daemon manager is responsible for monitoring the status of the FullText Engine filter daemon host.
Filter Daemon Host process
The filter daemon host is a process that is started by the Full-Text Engine. It runs the following full-text search
components, which are responsible for accessing, filtering, and word breaking data from tables, as well as for word
breaking and stemming the query input.
The components of the filter daemon host are as follows:
Protocol handler. This component pulls the data from memory for further processing and accesses data
from a user table in a specified database. One of its responsibilities is to gather data from the columns being
full-text indexed and pass it to the filter daemon host, which will apply filtering and word breaker as
required.
Filters. Some data types require filtering before the data in a document can be full-text indexed, including
data in varbinary, varbinary(max), image, or xml columns. The filter used for a given document depends
on its document type. For example, different filters are used for Microsoft Word (.doc) documents, Microsoft
Excel (.xls) documents, and XML (.xml) documents. Then the filter extracts chunks of text from the document,
removing embedded formatting and retaining the text and, potentially, information about the position of the
text. The result is a stream of textual information. For more information, see Configure and Manage Filters
for Search.
Word breakers and stemmers. A word breaker is a language-specific component that finds word
boundaries based on the lexical rules of a given language (word breaking). Each word breaker is associated
with a language-specific stemmer component that conjugates verbs and performs inflectional expansions. At
indexing time, the filter daemon host uses a word breaker and stemmer to perform linguistic analysis on the
textual data from a given table column. The language that is associated with a table column in the full-text
index determines which word breaker and stemmer are used for indexing the column. For more information,
see Configure and Manage Word Breakers and Stemmers for Search.
Full-Text Search processing
Full-text search is powered by the Full-Text Engine. The Full-Text Engine has two roles: indexing support and
querying support.
Full-Text indexing process
When a full-text population (also known as a crawl) is initiated, the Full-Text Engine pushes large batches of data
into memory and notifies the filter daemon host. The host filters and word breaks the data and converts the
converted data into inverted word lists. The full-text search then pulls the converted data from the word lists,
processes the data to remove stopwords, and persists the word lists for a batch into one or more inverted indexes.
When indexing data stored in a varbinary(max) or image column, the filter, which implements the IFilter
interface, extracts text based on the specified file format for that data (for example, Microsoft Word). In some cases,
the filter components require the varbinary(max), or image data to be written out to the filterdata folder, instead
of being pushed into memory.
As part of processing, the gathered text data is passed through a word breaker to separate the text into individual
tokens, or keywords. The language used for tokenization is specified at the column level, or can be identified within
varbinary(max), image, or xml data by the filter component.
Additional processing may be performed to remove stopwords, and to normalize tokens before they are stored in
the full-text index or an index fragment.
When a population has completed, a final merge process is triggered that merges the index fragments together
into one master full-text index. This results in improved query performance since only the master index needs to be
queried rather than a number of index fragments, and better scoring statistics may be used for relevance ranking.
Full-Text querying process
The query processor passes the full-text portions of a query to the Full-Text Engine for processing. The Full-Text
Engine performs word breaking and, optionally, thesaurus expansions, stemming, and stopword (noise-word)
processing. Then the full-text portions of the query are represented in the form of SQL operators, primarily as
streaming table-valued functions (STVFs). During query execution, these STVFs access the inverted index to retrieve
the correct results. The results are either returned to the client at this point, or they are further processed before
being returned to the client.
Full-text index architecture
The information in full-text indexes is used by the Full-Text Engine to compile full-text queries that can quickly
search a table for particular words or combinations of words. A full-text index stores information about significant
words and their location within one or more columns of a database table. A full-text index is a special type of tokenbased functional index that is built and maintained by the Full-Text Engine for SQL Server. The process of building a
full-text index differs from building other types of indexes. Instead of constructing a B-tree structure based on a
value stored in a particular row, the Full-Text Engine builds an inverted, stacked, compressed index structure based
on individual tokens from the text being indexed. The size of a full-text index is limited only by the available
memory resources of the computer on which the instance of SQL Server is running.
Beginning in SQL Server 2008, the full-text indexes are integrated with the Database Engine, instead of residing in
the file system as in previous versions of SQL Server. For a new database, the full-text catalog is now a virtual
object that does not belong to any filegroup; it is merely a logical concept that refers to a group of the full-text
indexes. Note, however, that during upgrade of a SQL Server 2005 database, any full-text catalog that contains data
files, a new filegroup is created; for more information, see Upgrade Full-Text Search.
Only one full-text index is allowed per table. For a full-text index to be created on a table, the table must have a
single, unique nonnull column. You can build a full-text index on columns of type char, varchar, nchar, nvarchar,
text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. Creating a full-text
index on a column whose data type is varbinary, varbinary(max), image, or xml requires that you specify a type
column. A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the
document in each row.
Full-text index structure
A good understanding of the structure of a full-text index will help you understand how the Full-Text Engine works.
This topic uses the following excerpt of the Document table in Adventure Works as an example table. This excerpt
shows only two columns, the DocumentID column and the Title column, and three rows from the table.
For this example, we will assume that a full-text index has been created on the Title column.
DOCUMENTID
TITLE
1
Crank Arm and Tire Maintenance
DOCUMENTID
TITLE
2
Front Reflector Bracket and Reflector Assembly 3
3
Front Reflector Bracket Installation
For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the
Title column of the Document table. Full-text indexes contain more information than is presented in this table. The
table is a logical representation of a full-text index and is provided for demonstration purposes only. The rows are
stored in a compressed format to optimize disk usage.
Notice that the data has been inverted from the original documents. Inversion occurs because the keywords are
mapped to the document IDs. For this reason, a full-text index is often referred to as an inverted index.
Also notice that the keyword "and" has been removed from the full-text index. This is done because "and" is a
stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby
improving query performance. For more information about stopwords, see Configure and Manage Stopwords and
Stoplists for Full-Text Search.
Fragment 1
KEYWORD
COLID
DOCID
OCCURRENCE
Crank
1
1
1
Arm
1
1
2
Tire
1
1
4
Maintenance
1
1
5
Front
1
2
1
Front
1
3
1
Reflector
1
2
2
Reflector
1
2
5
Reflector
1
3
2
Bracket
1
2
3
Bracket
1
3
3
Assembly
1
2
6
3
1
2
7
Installation
1
3
4
The Keyword column contains a representation of a single token extracted at indexing time. Word breakers
determine what makes up a token.
The ColId column contains a value that corresponds to a particular column that is full-text indexed.
The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a fulltext indexed table. This mapping is necessary when the full-text key is not an integer data type. In such cases,
mappings between full-text key values and DocId values are maintained in a separate table called the DocId
Mapping table. To query for these mappings use the sp_fulltext_keymappings system stored procedure. To satisfy a
search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve
rows from the base table being queried. If the full-text key value of the base table is an integer type, the value
directly serves as the DocId and no mapping is necessary. Therefore, using integer full-text key values can help
optimize full-text queries.
The Occurrence column contains an integer value. For each DocId value, there is a list of occurrence values that
correspond to the relative word offsets of the particular keyword within that DocId. Occurrence values are useful in
determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. They
are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may
be used in scoring.
Full-text index fragments
The logical full-text index is usually split across multiple internal tables. Each internal table is called a full-text index
fragment. Some of these fragments might contain newer data than others. For example, if a user updates the
following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created.
DOCUMENTID
TITLE
3
Rear Reflector
In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to
Fragment 1. Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3.
Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments
catalog view.
Fragment 2
KEYWORD
COLID
DOCID
OCC
Rear
1
3
1
Reflector
1
3
2
As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries.
Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query
performance. To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option
of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement performs a master merge, which merges
the fragments into a single larger fragment and removes all obsolete entries from the full-text index.
After being reorganized, the example index would contain the following rows:
KEYWORD
COLID
DOCID
OCC
Crank
1
1
1
Arm
1
1
2
Tire
1
1
4
KEYWORD
COLID
DOCID
OCC
Maintenance
1
1
5
Front
1
2
1
Rear
1
3
1
Reflector
1
2
2
Reflector
1
2
5
Reflector
1
3
2
Bracket
1
2
3
Assembly
1
2
6
3
1
2
7
Differences between full-text indexes and regular SQL Server indexes:.
FULL-TEXT INDEXES
REGULAR SQL SERVER INDEXES
Only one full-text index allowed per table.
Several regular indexes allowed per table.
The addition of data to full-text indexes, called a population,
can be requested through either a schedule or a specific
request, or can occur automatically with the addition of new
data.
Updated automatically when the data upon which they are
based is inserted, updated, or deleted.
Grouped within the same database into one or more full-text
catalogs.
Not grouped.
Full-Text search linguistic components and language support
Full-text search supports almost 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali,
and Hindi. For a complete list of the supported full-text languages, see sys.fulltext_languages (Transact-SQL). Each
of the columns contained in the full-text index is associated with a Microsoft Windows locale identifier (LCID) that
equates to a language that is supported by full-text search. For example, LCID 1033 equates to U.S English, and
LCID 2057 equates to British English. For each supported full-text language, SQL Server provides linguistic
components that support indexing and querying full-text data that is stored in that language.
Language-specific components include the following:
Word breakers and stemmers. A word breaker finds word boundaries based on the lexical rules of a given
language (word breaking). Each word breaker is associated with a stemmer that conjugates verbs for the
same language. For more information, see Configure and Manage Word Breakers and Stemmers for Search.
Stoplists. A system stoplist is provided that contains a basic set stopwords (also known as noise words). A
stopword is a word that does not help the search and is ignored by full-text queries. For example, for the
English locale words such as "a", "and", "is", and "the" are considered stopwords. Typically, you will need to
configure one or more thesaurus files and stoplists. For more information, see Configure and Manage
Stopwords and Stoplists for Full-Text Search.
Thesaurus files. SQL Server also installs a thesaurus file for each full-text language, as well as a global
thesaurus file. The installed thesaurus files are essentially empty, but you can edit them to define synonyms
for a specific language or business scenario. By developing a thesaurus tailored to your full-text data, you
can effectively broaden the scope of full-text queries on that data. For more information, see Configure and
Manage Thesaurus Files for Full-Text Search.
Filters (iFilters). Indexing a document in a varbinary(max), image, or xml data type column requires a
filter to perform extra processing. The filter must be specific to the document type (.doc, .pdf, .xls, .xml, and so
forth). For more information, see Configure and Manage Filters for Search.
Word breakers (and stemmers) and filters run in the filter daemon host process (fdhost.exe).
THIS TOPIC APPLIES TO: SQL Server (starting with 2008)
Parallel Data Warehouse
Azure SQL Database
Azure SQL Data Warehouse
Get Started with Full-Text Search
4/7/2017 • 5 min to read • Edit Online
SQL Server databases are full-text enabled by default. Before you can run full-text queries, however, you must
create a full text catalog and create a full-text index on the tables or indexed views you want to search.
Set up full-text search in two steps
There are two basic steps to set up full-text search:
1. Create a full-text catalog.
2. Create a full-text index on tables or indexed view you want to search.
Each full-text index must belong to a full-text catalog. You can create a separate text catalog for each full-text index,
or you can associate multiple full-text indexes with a given catalog. A full-text catalog is a virtual object and does
not belong to any filegroup. The catalog is a logical concept that refers to a group of full-text indexes.
NOTE
These steps assume that you installed the optional Full-Text Search components when you installed SQL Server. If not, you
have to run SQL Server Setup again to add them.
Set up full-text search with a wizard
To set up full-text search by using a wizard, see Use the Full-Text Indexing Wizard.
Set up full-text search with Transact-SQL
The following two-part example creates a full-text catalog named AdvWksDocFTCat on the AdventureWorks sample
database and then creates a full-text index on the Document table in the sample database. This statement creates
the full-text catalog in the default directory specified during SQL Server setup. The folder named AdvWksDocFTCat is
in the default directory.
1. To create a full-text catalog named
statement:
AdvWksDocFTCat
, the example uses a CREATE FULLTEXT CATALOG
USE AdventureWorks;
GO
CREATE FULLTEXT CATALOG AdvWksDocFTCat;
For more info, see Create and Manage Full-Text Catalogs.
2. Before you can create a full-text index on the Document table, ensure that the table has a unique, singlecolumn, non-nullable index. The following CREATE INDEX statement creates a unique index, ui_ukDoc , on
the DocumentID column of the Document table:
CREATE UNIQUE INDEX ui_ukDoc ON Production.Document(DocumentID);
3. After you have a unique key, you can create a full-text index on the
CREATE FULLTEXT INDEX statement.
Document
table by using the following
CREATE FULLTEXT INDEX ON Production.Document
(
Document
--Full-text index column name
TYPE COLUMN FileExtension
--Name of column that contains file type information
Language 2057
--2057 is the LCID for British English
)
KEY INDEX ui_ukDoc ON AdvWksDocFTCat --Unique index
WITH CHANGE_TRACKING AUTO
--Population type;
GO
The TYPE COLUMN defined in this example specifies the type column in the table that contains the type of
the document in each row of the column 'Document' (which is of binary type). The type column stores the
user-supplied file extension - ".doc", ".xls", and so forth - of the document in a given row. The Full-Text
Engine uses the file extension in a given row to invoke the correct filter to use for parsing the data in that
row. After the filter has parsed the binary data of the row, the specified word breaker parses the content. (In
this example, the word breaker for British English is used.) For more information, see Configure and Manage
Filters for Search.
For more info, see Create and Manage Full-Text Indexes.
Choose options for a full-text index
Choose a language
For information about choosing the column language, see Choose a Language When Creating a Full-Text Index.
Choose a filegroup
The process of building a full-text index is fairly I/O intensive. In summary, it consists of reading data from SQL
Server, and then propagating the filtered data to the full-text index. As a best practice, locate a full-text index in the
database filegroup that is best for maximizing I/O performance or locate the full-text indexes in a different
filegroup on another volume.
Choose a full-text catalog
We recommend associating tables with the same update characteristics (such as small number of changes versus
large number of changes, or tables that change frequently during a particular time of day) together under the
same full-text catalog. By setting up full-text catalog population schedules, full-text indexes stay synchronous with
the tables without adversely affecting the resource usage of the database server during periods of high database
activity.
Consider the following guidelines:
If you are indexing a table with millions of rows, assign the table to its own full-text catalog.
Consider the amount of change occurring in the tables being full-text indexed, as well as the total number of
rows. If the total number of rows being changed, together with the number of rows in the table present
during the last full-text population, represents millions of rows, assign the table to its own full-text catalog.
Associate a unique index
Always select the smallest unique index available for your full-text unique key. (A 4-byte, integer-based index is
optimal.) This significantly reduces the resources required by Microsoft Search service in the file system. If the
primary key is large (over 100 bytes), consider choosing another unique index in the table (or creating another
unique index) as the full-text unique key. Otherwise, if the full-text unique key size exceeds the maximum size
allowed (900 bytes), full-text population will not be able to proceed.
Associate a stoplist
A stoplist is a list of stopwords, also known as noise words. A stoplist is associated with each full-text index, and the
words in that stoplist are applied to full-text queries on that index. By default, the system stoplist is associated with
a new full-text index. You can create and use your own stoplist too.
For example, the following CREATE FULLTEXT STOPLIST Transact-SQL statement creates a new full-text stoplist
named myStoplist by copying from the system stoplist:
CREATE FULLTEXT STOPLIST myStoplist FROM SYSTEM STOPLIST;
GO
The following ALTER FULLTEXT STOPLIST Transact-SQL statement alters a stoplist named myStoplist, adding the
word 'en', first for Spanish and then for French:
ALTER FULLTEXT STOPLIST myStoplist ADD 'en' LANGUAGE 'Spanish';
ALTER FULLTEXT STOPLIST myStoplist ADD 'en' LANGUAGE 'French';
GO
For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.
Update a full-text index
Like regular SQL Server indexes, full-text indexes can be automatically updated as data is modified in the
associated tables. This is the default behavior. Alternatively, you can keep your full-text indexes up-to-date
manually, or at specified scheduled intervals. Populating a full-text index can be time-consuming and resourceintensive. Therefore, index updating is usually performed as an asynchronous process that runs in the background
and keeps the full-text index up to date after modifications in the base table.
Updating a full-text index immediately after each change in the base table is also resource-intensive. Therefore, if
you have a high update/insert/delete rate, you may experience some degradation in query performance. If this
occurs, consider scheduling manual change tracking updates to keep up with the numerous changes from time to
time, rather than competing with queries for resources.
For more info, see Populate Full-Text Indexes.
Next steps
After you set up SQL Server Full-Text Search, you're ready to run full-text queries. For more info, see Query with
Full-Text Search.
Query with Full-Text Search
4/7/2017 • 14 min to read • Edit Online
Write full-text queries by using the full-text predicates CONTAINS and FREETEXT and the rowset-valued
functions CONTAINSTABLE and FREETEXTTABLE with the SELECT statement. This topic provides examples of
each predicate and function and helps you choose the best one to use.
Use CONTAINS and CONTAINSTABLE to match words and phrases.
Use FREETEXT and FREETEXTTABLE to match the meaning, but not the exact wording.
Simple examples of each predicate and function
Example - CONTAINS
The following example finds all products with a price of
$80.99
that contain the word
"Mountain"
.
USE AdventureWorks2012
GO
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain')
GO
Example - FREETEXT
The following example searches for all documents that contain words related to vital, safety, components.
USE AdventureWorks2012
GO
SELECT Title
FROM Production.Document
WHERE FREETEXT (Document, 'vital safety components')
GO
Example - CONTAINSTABLE
The following example returns the description ID and description of all products for which the Description
column contain the word "aluminum" near either the word "light" or the word "lightweight." Only rows with a rank
value of 2 or higher are returned.
USE AdventureWorks2012
GO
SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description,
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description,
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO
Example - FREETEXTTABLE
The following example extends a FREETEXTTABLE query to return the highest ranked rows first and to add the
ranking of each row to the select list. To specify the query, you must know that ProductDescriptionID is the
unique key column for the ProductDescription table.
USE AdventureWorks2012
GO
SELECT KEY_TBL.RANK, FT_TBL.Description
FROM Production.ProductDescription AS FT_TBL
INNER JOIN
FREETEXTTABLE(Production.ProductDescription, Description,
'perfect all-around bike') AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC
GO
Here is an extension of the same query that only returns rows with a rank value of 10 or greater:
USE AdventureWorks2012
GO
SELECT KEY_TBL.RANK, FT_TBL.Description
FROM Production.ProductDescription AS FT_TBL
INNER JOIN
FREETEXTTABLE(Production.ProductDescription, Description,
'perfect all-around bike') AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK >= 10
ORDER BY KEY_TBL.RANK DESC
GO
Pick the best predicate or function
/ CONTAINSTABLE and FREETEXT / FREETEXTTABLE are useful for different kinds of matching. The following
table helps you to choose the best predicate or function for your query.
CONTAINS
For examples, see Simple examples of each predicate and function and Examples of specific types of searches. Also
see What you can search for.
Type of query
CONTAINS/CONTAINSTABLE
FREETEXT/FREETEXTTABLE
Match single words and phrases with
precise or fuzzy (less precise) matching.
Match the meaning, but not the exact
wording, of specified words, phrases or
sentences (the freetext string).
Matches are generated if any term or
form of any term is found in the fulltext index of a specified column.
More query options
You can specify the proximity of words
within a certain distance of one
another.
N/a
You can return weighted matches.
You can use logical operation to
combine search conditions. For more
info, see Using Boolean operators
(AND, OR, and NOT) later in this topic.
Compare predicates and functions
The predicates CONTAINS / FREETEXT and the rowset-valued functions CONTAINSTABLE / FREETEXTTABLE have different
syntax and options. The following table helps you to choose the best predicate or function for your query.
For examples, see Simple examples of each predicate and function and Examples of specific types of searches. Also
see What you can search for.
PREDICATES
CONTAINS/FREETEXT
FUNCTIONS
CONTAINSTABLE/FREETEXTTABLE
Usage
Use the full-text predicates CONTAINS
and FREETEXT in the WHERE or
HAVING clause of a SELECT statement.
Use the full-text functions
CONTAINSTABLE and FREETEXTTABLE
functions like a regular table name in
the FROM clause of a SELECT
statement.
More query options
You can combine them with any of the
other Transact-SQL predicates, such as
LIKE and BETWEEN.
You have to specify the base table to
search when you use either of these
functions. As with the predicates, you
can specify a single column, a list of
columns, or all columns in the table to
be searched, and optionally, the
language whose resources will be used
by given full-text query.
You can specify either a single column,
a list of columns, or all columns in the
table to be searched.
Optionally, you can specify the
language whose resources will be used
by the full-text query for word breaking
and stemming, thesaurus lookups, and
noise-word removal.
Typically you have to join the results of
CONTAINSTABLE or FREETEXTTABLE
with the base table. To do this, you
have to know the unique key column
name. This column, which occurs in
every full-text enabled table, is used to
enforce unique rows for the table (the
uniquekey column). For more info
about the key column, see Create and
Manage Full-Text Indexes.
Results
PREDICATES
CONTAINS/FREETEXT
FUNCTIONS
CONTAINSTABLE/FREETEXTTABLE
The CONTAINS and FREETEXT
predicates return a TRUE or FALSE
value that indicates whether a given
row matches the full-text query.
Matching rows are returned in the
result set.
These functions return a table of zero,
one, or more rows that match the fulltext query. The returned table contains
only rows from the base table that
match the selection criteria specified in
the full-text search condition of the
function.
Queries that use one of these functions
also return a relevance ranking value
(RANK) and full-text key (KEY) for each
row returned, as follows
KEY column. The KEY column
returns unique values of the
returned rows. The KEY column
can be used to specify selection
criteria.
RANK column. The RANK
column returns a rank value for
each row that indicates how well
the row matched the selection
criteria. The higher the rank
value of the text or document in
a row, the more relevant the
row is for the given full-text
query. Note that different rows
can be ranked identically. You
can limit the number of matches
to be returned by specifying the
optional top_n_by_rank
parameter. For more
information, see Limit Search
Results with RANK.
Additional options
You can use a four-part name in the
CONTAINS or FREETEXT predicate to
query full-text indexed columns of the
target tables on a linked server. To
prepare a remote server to receive fulltext queries, create a full-text index on
the target tables and columns on the
remote server and then add the remote
server as a linked server.
N/a
More info
For more info about the syntax and
arguments of these predicates, see
CONTAINS and FREETEXT.
For more info about the syntax and
arguments of these functions, see
CONTAINSTABLE and FREETEXTTABLE.
What you can search for
The following table describes the types of words and phrases that you can search for.
QUERY-TERM FORM
DESCRIPTION
SUPPORTED BY
QUERY-TERM FORM
DESCRIPTION
SUPPORTED BY
One or more specific words or phrases
(simple term)
For example, "croissant" is a word, and
"café au lait" is a phrase. Words and
phrases such as these are called simple
terms.
CONTAINS and CONTAINSTABLE look
for an exact match for the phrase.
FREETEXT and FREETEXTTABLE break up
the phrase into separate words.
In full-text search, a word (or token) is a
string whose boundaries are identified
by appropriate word breakers, following
the linguistic rules of the specified
language. A valid phrase consists of
multiple words, with or without any
punctuation marks between them.
For more information, see Searching for
Specific word or Phrase (Simple Term),
later in this topic.
A word or a phrase where the words
begin with specified text
(prefix term)
For a single prefix term, any word
starting with the specified term will be
part of the result set. For example, the
term "auto" matches "automatic",
"automobile", and so forth.
CONTAINS and CONTAINSTABLE
For a phrase, each word within the
phrase is considered to be a prefix
term. For example, the term "auto
tran\" matches "automatic
transmission" and "automobile
transducer", but it does not match
"automatic motor transmission".
A prefix term refers to a string that is
affixed to the front of a word to
produce a derivative word or an
inflected form.
For more information, see Performing
Prefix Searches (Prefix Term), later in
this topic.
Inflectional forms of a specific word
(generation term - inflectional)
For example, search for the inflectional
form of the word "drive". If various rows
in the table include the words "drive",
"drives", "drove", "driving", and "driven",
all would be in the result set because
each of these can be inflectionally
generated from the word drive.
The inflectional forms are the different
tenses and conjugations of a verb or
the singular and plural forms of a noun.
For more information, see Searching for
the Inflectional Form of a Specific Word
(Generation Term), later in this topic.
FREETEXT and FREETEXTTABLE look for
inflectional terms of all specified words
by default.
CONTAINS and CONTAINSTABLE
support an optional INFLECTIONAL
argument.
QUERY-TERM FORM
DESCRIPTION
SUPPORTED BY
Synonymous forms of a specific word
(generation term - thesaurus)
For example, if an entry, "{car,
automobile, truck, van}", is added to a
thesaurus, you can search for the
thesaurus form of the word "car". All
rows in the table queried that include
the words "automobile", "truck", "van",
or "car", appear in the result set
because each of these words belong to
the synonym expansion set containing
the word "car".
FREETEXT and FREETEXTTABLE use the
thesaurus by default.
CONTAINS and CONTAINSTABLE
support an optional THESAURUS
argument.
A thesaurus defines user-specified
synonyms for terms.
For information about the structure of
thesaurus files, see Configure and
Manage Thesaurus Files for Full-Text
Search.
A word or phrase close to another
word or phrase
(proximity term)
For example, you want to find the rows
in which the word "ice" is near the word
"hockey" or in which the phrase "ice
skating" is near the phrase "ice hockey".
A proximity term indicates words or
phrases that are near to each other.,
You can also specify the maximum
number of non-search terms that
separate the first and last search terms.
In addition, you can search for words or
phrases in any order, or in the order in
which you specify them.
For more information, see Search for
Words Close to Another Word with
NEAR.
CONTAINS and CONTAINSTABLE
QUERY-TERM FORM
DESCRIPTION
SUPPORTED BY
Words or phrases using weighted
values
(weighted term)
For example, in a query searching for
multiple terms, you can assign each
search word a weight value indicating
its importance relative to the other
words in the search condition. The
results for this type of query return the
most relevant rows first, according to
the relative weight you have assigned
to search words. The result sets contain
documents or rows containing any of
the specified terms (or content between
them); however, some results will be
considered more relevant than others
because of the variation in the
weighted values associated with
different searched terms.
CONTAINSTABLE
A weighting value indicates the degree
of importance for each word and
phrase within a set of words and
phrases. A weight value of 0.0 is the
lowest, and a weight value of 1.0 is the
highest.
For more information, see Searching for
Words or Phrases Using Weighted
Values (Weighted Term), later in this
topic.
Examples of specific types of searches
Search for a specific word or phrase (Simple Term)
You can use CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE to search a table for a specific phrase. For
example, if you want to search the ProductReview table in the AdventureWorks2012 database to find all
comments about a product with the phrase "learning curve", you could use the CONTAINS predicate as follows:
USE AdventureWorks2012
GO
SELECT Comments
FROM Production.ProductReview
WHERE CONTAINS(Comments, '"learning curve"')
GO
The search condition, in this case "learning curve", can be quite complex and can be composed of one or more
terms.
Search for a word with a prefix (Prefix Term)
You can use CONTAINS or CONTAINSTABLE to search for words or phrases with a specified prefix. All entries in
the column that contain text beginning with the specified prefix are returned. For example, to search for all rows
that contain the prefix top -, as in top``ple , top``ping , and top . The query looks like this:
USE AdventureWorks2012
GO
SELECT Description, ProductDescriptionID
FROM Production.ProductDescription
WHERE CONTAINS (Description, '"top*"' )
GO
All text that matches the text specified before the asterisk () is returned. If the text and asterisk are not delimited by
double quotation marks, as in `CONTAINS (DESCRIPTION, 'top')`, full-text search does not consider the asterisk to
be a wildcard..
When the prefix term is a phrase, each token making up the phrase is considered a separate prefix term. All rows
that have words beginning with the prefix terms will be returned. For example, the prefix term "light bread*" will
find rows with text of "light breaded," "lightly breaded," or "light bread," but it will not return "lightly toasted
bread".
Search for inflectional forms of a specific word (Generation Term)
You can use CONTAINS, CONTAINSTABLE, FREETEXT, or FREETEXTTABLE to search for all the different tenses and
conjugations of a verb or both the singular and plural forms of a noun (an inflectional search) or for synonymous
forms of a specific word (a thesaurus search).
The following example searches for any form of "foot" ("foot", "feet", and so on) in the
ProductReview table in the AdventureWorks database.
Comments
column of the
USE AdventureWorks2012
GO
SELECT Comments, ReviewerName
FROM Production.ProductReview
WHERE CONTAINS (Comments, 'FORMSOF(INFLECTIONAL, "foot")')
GO
Full-text search uses stemmers, which allow you to search for the different tenses and conjugations of a verb, or
both the singular and plural forms of a noun. For more information about stemmers, see Configure and Manage
Word Breakers and Stemmers for Search.
Search for words or phrases using weighted values (Weighted Term)
You can use CONTAINSTABLE to search for words or phrases and specify a weighting value. Weight, measured as
a number from 0.0 through 1.0, indicates the importance of each word and phrase within a set of words and
phrases. A weight of 0.0 is the lowest, and a weight of 1.0 is the highest.
The following example shows a query that searches for all customer addresses, using weights, in which any text
beginning with the string "Bay" has either "Street" or "View". The results give a higher rank to those rows that
contain more of the words specified.
USE AdventureWorks2012
GO
SELECT AddressLine1, KEY_TBL.RANK
FROM Person.Address AS Address INNER JOIN
CONTAINSTABLE(Person.Address, AddressLine1, 'ISABOUT ("Bay*",
Street WEIGHT(0.9),
View WEIGHT(0.1)
) ' ) AS KEY_TBL
ON Address.AddressID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC
GO
A weighted term can be used in conjunction with any simple term, prefix term, generation term, or proximity term.
Use Boolean operators (AND, OR, and NOT)
The CONTAINS predicate and CONTAINSTABLE function use the same search conditions. Both support combining
several search terms by using Boolean operators - AND, OR, and NOT - to perform logical operations. You can use
AND, for example, to find rows that contain both "latte" and "New York-style bagel". You can use AND NOT, for
example, to find the rows that contain "bagel" but do not contain "cream cheese".
In contrast, FREETEXT and FREETEXTTABLE treat the Boolean terms as words to be searched.
For information about combining CONTAINS with other predicates that use the logical operators AND, OR, and
NOT, see Search Condition (Transact-SQL).
Example
The following example uses the CONTAINS predicate to search for descriptions in which the description ID is not
equal to 5 and the description contains both the word "Aluminum" and the word "spindle." The search condition
uses the AND Boolean operator. This example uses the ProductDescription table of the AdventureWorks2012
database.
USE AdventureWorks2012
GO
SELECT Description
FROM Production.ProductDescription
WHERE ProductDescriptionID <> 5 AND
CONTAINS(Description, 'aluminum AND spindle')
GO
More query options
When you write full-text queries, you can also specify the following options.
Language, with the LANGUAGE option. Many query terms depend heavily on word-breaker behavior. To
ensure that you are using the correct word breaker (and stemmer) and thesaurus file, we recommend that
you specify the LANGUAGE option. For more information, see Choose a Language When Creating a FullText Index.
Case sensitivity. Full-text search queries are case-insensitive. However, in Japanese, there are multiple
phonetic orthographies in which the concept of orthographic normalization is akin to case insensitivity (for
example, kana = insensitivity). This type of orthographic normalization is not supported.
Stopwords. When defining a full-text query, the Full-Text Engine discards stopwords (also called noise
words) from the search criteria. Stopwords are words such as "a," "and," "is," or "the," that can occur
frequently but that typically do not help when searching for particular text. Stopwords are listed in a
stoplist. Each full-text index is associated with a specific stoplist, which determines what stopwords are
omitted from the query or the index at indexing time. For more info, see Configure and Manage Stopwords
and Stoplists for Full-Text Search.
Thesaurus. FREETEXT and FREETEXTTABLE queries use the thesaurus by default. CONTAINS and
CONTAINSTABLE support an optional THESAURUS argument. For more info, see Configure and Manage
Thesaurus Files for Full-Text Search.
Check the tokenization results
After you apply a given word breaker, thesaurus, and stoplist combination in a query, you can view the
tokenization results by using the sys.dm_fts_parser dynamic management view. For more information, see
sys.dm_fts_parser (Transact-SQL).
See Also
CONTAINS (Transact-SQL)
CONTAINSTABLE (Transact-SQL)
FREETEXT (Transact-SQL)
FREETEXTTABLE (Transact-SQL)
Create Full-Text Search Queries (Visual Database Tools)
Improve the Performance of Full-Text Queries
Search for Words Close to Another Word with NEAR
3/24/2017 • 5 min to read • Edit Online
You can use the proximity term NEAR in a CONTAINS predicate or CONTAINSTABLE function to search for words
or phrases near one another.
Overview of NEAR
NEAR has the following features:
You can specify the maximum number of non-search terms that separate the first and last search terms.
You can search for words or phrases in any order, or you can search for words and phrases in a specific
order.
You can specify the maximum number of non-search terms, or maximum distance, that separates the first
and last search terms in order to constitute a match.
If you specify the maximum number of terms, you can also specify that matches must contain the search
terms in the specified order.
To qualify as a match, a string of text must:
Start with one of the specified search terms and end with the one of the other specified search terms.
Contain all of the specified search terms.
The number of non-search terms, including stopwords, that occur between the first and last search terms
must be less than or equal to the maximum distance, if the maximum distance is specified.
Syntax of NEAR
The basic syntax of NEAR is:
NEAR (
{
*search_term* [ ,…*n* ]
|
(*search_term* [ ,…*n* ] ) [, <maximum_distance> [, <match_order> ] ]
}
)
For more info about the syntax, see CONTAINS (Transact-SQL).
Examples
Example 1
For example, you could search for 'John' within two terms of 'Smith', as follows:
... CONTAINS(column_name, 'NEAR((John, Smith), 2)')
Some examples of strings that match are " John Jacob Smith " and " Smith, John ". The string "
John Jones knows Fred Smith " contains three intervening non-search terms, so it is not a match.
To require that the terms be found in the specified order, you would change the example proximity term to
NEAR((John, Smith),2, TRUE). This searches for " John " within two terms of " Smith " but only when " John "
precedes " Smith ". In a language that reads from left to right, such as English, an example of a string that matches
is " John Jacob Smith ".
Note that for a language that reads from right to left, such as Arabic or Hebrew, the Full-Text Engine applies the
specified terms in reverse order. Also, Object Explorer in SQL Server Management Studio automatically reverses
the display order of words specified in right-to-left languages.
Example 2
The following example searches the Production.Document table of the AdventureWorks sample database for all
document summaries that contain the word "reflector" in the same document as the word "bracket".
SELECT DocumentNode, Title, DocumentSummary
FROM Production.Document AS DocTable
INNER JOIN CONTAINSTABLE(Production.Document, Document,
'NEAR(bracket, reflector)' ) AS KEY_TBL
ON DocTable.DocumentNode = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 50
ORDER BY KEY_TBL.RANK DESC;
GO
How maximum distance is measured
A specific maximum distance, such as 10 or 25, determines how many non-search terms, including stopwords, can
occur between the first and last search terms in a given string. For example, NEAR((dogs, cats, "hunting mice"), 3)
would return the following row, in which the total number of non-search terms is three (" enjoy ", " but ", and "
avoid "):
" Cats
enjoy
hunting mice``, but avoid
dogs``.
"
The same proximity term would not return the following row, because the maximum distance is exceeded by the
four non-search terms (" enjoy ", " but ", " usually ", and " avoid "):
" Cats
enjoy
hunting mice``, but usually avoid
dogs``.
"
Combine NEAR with other terms
You can combine NEAR with some other terms. You can use AND (&), OR (|), or AND NOT (&!) to combine a
custom proximity term with another custom proximity term, a simple term, or a prefix term. For example:
CONTAINS('NEAR((term1,term2),5) AND term3')
CONTAINS('NEAR((term1,term2),5) OR term3')
CONTAINS('NEAR((term1,term2),5) AND NOT term3')
CONTAINS('NEAR((term1,term2),5) AND NEAR((term3,term4),2)')
CONTAINS('NEAR((term1,term2),5) OR NEAR((term3,term4),2, TRUE)')
For example,
CONTAINS(column_name, 'NEAR((term1, term2), 5, TRUE) AND term3')
You can't combine NEAR with a generation term (ISABOUT …) or a weighted term (FORMSOF …).
More info about proximity searches
Overlapping occurrences of search terms
All proximity searches always look for only non-overlapping occurrences. Overlapping occurrences of
search terms never qualify as matches. For example, consider the following proximity term, which searches "
A " and " AA " in this order with a maximum distance of two terms:
CONTAINS(column_name, 'NEAR((A,AA),2, TRUE')
The possible matches are as " AAA ", " A.AA ", and " A..AA ". Rows containing just " AA " would not match.
NOTE
You can specify terms that overlap, for example, NEAR("mountain bike", "bike trails") or
(NEAR(comfort*, comfortable), 5) . Specifying a overlapping terms increases the complexity of the query by
increasing the possible match permutations. If you specify a large number of such overlapping terms, the query can
run out of resources and fail. If this occurs, simplify the query and try again.
NEAR (regardless of whether a maximum distance is specified) indicates the logical distance between terms,
rather than the absolute distance between them. For example, terms within different phrases or sentences
within a paragraph are treated as farther apart than terms in the same phrase or sentence, regardless of
their actual proximity, on the assumption that they are less related. Likewise, terms in different paragraphs
are treated as being even farther apart. If a match spans the end of a sentence, paragraph, or chapter, the
gap used for ranking a document is increased by 8, 128, or 1024, respectively.
Impact of proximity terms on ranking by the CONTAINSTABLE function
When NEAR is used in the CONTAINSTABLE function, the number of hits in a document relative to its length
as well as the distance between the first and last search terms in each of the hits affects the ranking of each
document. For a generic proximity term, if the matched search terms are >50 logical terms apart, the rank
returned on a document is 0. For a custom proximity term that does not specify an integer as the maximum
distance, a document that contains only hits whose gap is >100 logical terms will receive a ranking of 0. For
more information about ranking of custom proximity searches, see Limit Search Results with RANK.
The transform noise words server option
The value of transform noise words impacts how SQL Server treats stopwords if they are specified in
proximity searches. For more information, see transform noise words Server Configuration Option.
See Also
CONTAINS (Transact-SQL)
CONTAINSTABLE (Transact-SQL)
Query with Full-Text Search
Limit Search Results with RANK
3/24/2017 • 8 min to read • Edit Online
The CONTAINSTABLE and FREETEXTTABLE functions return a column named RANK that contains ordinal values
from 0 through 1000 (rank values). These values are used to rank the rows returned according to how well they
match the selection criteria. The rank values indicate only a relative order of relevance of the rows in the result set,
with a lower value indicating lower relevance. The actual values are unimportant and typically differ each time the
query is run.
NOTE
The CONTAINS and FREETEXT predicates do not return any rank values.
The number of items matching a search condition is often very large. To prevent CONTAINSTABLE or
FREETEXTTABLE queries from returning too many matches, use the optional top_n_by_rank parameter, which
returns only a subset of rows. top_n_by_rank is an integer value, n, that specifies that only the n highest ranked
matches are to be returned, in descending order. If top_n_by_rank is combined with other parameters, the query
could return fewer rows than the number of rows that actually match all the predicates.
SQL Server orders the matches by rank and returns only up to the specified number of rows. This choice can result
in a dramatic increase in performance. For example, a query that would normally return 100,000 rows from a table
of one million rows are processed more quickly if only the top 100 rows are requested.
Examples of Using RANK to Limit Search Results
Example A: Searching for only the top three matches
The following example uses CONTAINSTABLE to return only the top three matches.
USE AdventureWorks2012
GO
SELECT K.RANK, AddressLine1, City
FROM Person.Address AS A
INNER JOIN
CONTAINSTABLE(Person.Address, AddressLine1, 'ISABOUT ("des*",
Rue WEIGHT(0.5),
Bouchers WEIGHT(0.9))',
3) AS K
ON A.AddressID = K.[KEY]
GO
Here is the result set.
RANK
----------172
172
172
Address
-------------------------------9005, rue des Bouchers
5, rue des Bouchers
5, rue des Bouchers
(3 row(s) affected)
Example B: Searching for the top ten matches
City
-----------------------------Paris
Orleans
Metz
The following example uses CONTAINSTABLE to return the description of the top 5 products where the
Description column contains the word "aluminum" near either the word "light" or the word "lightweight".
USE AdventureWorks2012
GO
SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description,
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description,
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)',
5
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
GO
How Search Query Results Are Ranked
Full-text search in SQL Server can generate an optional score (or rank value) that indicates the relevance of the
data returned by a full-text query. This rank value is calculated on every row and can be used as an ordering
criteria to sort the result set of a given query by relevance. The rank values indicate only a relative order of
relevance of the rows in the result set. The actual values are unimportant and typically differ each time the query is
run. The rank value does not hold any significance across queries.
Statistics for Ranking
When an index is built, statistics are collected for use in ranking. The process of building a full-text catalog does not
directly result in a single index structure. Instead, the Full-Text Engine for SQL Server creates intermediate indexes
as data is indexed. The Full-Text Engine then merges these indexes into a larger index as needed. This process can
be repeated many times. The Full-Text Engine then conducts a "master merge" that combines all of the
intermediate indexes into one large master index.
Statistics are collected at each intermediate index level. The statistics are merged when the indexes are merged.
Some statistical values can only be generated during the master merging process.
While ranking a query result set, SQL Server uses statistics from the largest intermediate index. This depends on
whether intermediate indexes have been merged or not. As a result, ranking statistics can vary in accuracy if the
intermediate indexes have not been merged. This explains why the same query can return different rank results
over time as full-text indexed data is added, modified, and deleted, and as the smaller indexes are merged.
To minimize the size of the index and computational complexity, statistics are often rounded.
The list below includes some commonly used terms and statistical values that are important in calculating rank.
Property
A full-text indexed column of the row.
Document
The entity that is returned in queries. In SQL Server this corresponds to a row. A document can have multiple
properties, just as a row can have multiple full-text indexed columns.
Index
A single inverted index of one or more documents. This may be entirely in memory or on disk. Many query
statistics are relative to the individual index where the match occurred.
Full-Text Catalog
A collection of intermediate indexes treated as one entity for queries. Catalogs are the unit of organization visible
to the SQL Server administrator.
Word, token or item
The unit of matching in the full-text engine. Streams of text from documents are tokenized into words, or tokens by
language-specific word breakers.
Occurrence
The word offset in a document property as determined by the word breaker. The first word is at occurrence 1, the
next at 2, and so on. In order to avoid false positives in phrase and proximity queries, end-of-sentence and end-ofparagraph introduce larger occurrence gaps.
TermFrequency
The number times the key value occurs in a row.
IndexedRowCount
Total number of rows indexed. This is computed, based on counts maintained in the intermediate indexes. This
number can vary in accuracy.
KeyRowCount
Total number of rows in the full-text catalog that contain a given key.
MaxOccurrence
The largest occurrence stored in a full-text catalog for a given property in a row.
MaxQueryRank
The maximum rank, 1000, returned by the Full-Text Engine.
Rank Computation Issues
The process of computing rank, depends on a number of factors. Different language word breakers tokenize text
differently. For example, the string "dog-house" could be broken into "dog" "house" by one word breaker and into
"dog-house" by another. This means that matching and ranking will vary based on the language specified, because
not only are the words different, but so is the document length. The document length difference can affect ranking
for all queries.
Statistics such as IndexRowCount can vary widely. For example, if a catalog has 2 billion rows in the master index,
then one new document is indexed into an in-memory intermediate index, and ranks for that document based on
the number of documents in the in-memory index could be skewed compared with ranks for documents from the
master index. For this reason, it is recommended that after any population that results in large number of rows
being indexed or re-indexed the indexes be merged into a master index using the ALTER FULLTEXT CATALOG ...
REORGANIZE Transact-SQL statement. The Full-Text Engine will also automatically merge the indexes based on
parameters such as the number and size of intermediate indexes.
MaxOccurrence values are normalized into 1 of 32 ranges. This means, for example, that a document 50 words
long is treated the same as a document 100 words long. Below is the table used for normalization. Because the
document lengths are in the range between adjacent table values 32 and 128, they are effectively treated as having
the same length, 128 (32 < docLength <= 128).
{ 16, 32, 128, 256, 512, 725, 1024, 1450, 2048, 2896, 4096, 5792, 8192, 11585,
16384, 23170, 28000, 32768, 39554, 46340, 55938, 65536, 92681, 131072, 185363,
262144, 370727, 524288, 741455, 1048576, 2097152, 4194304 };
Ranking of CONTAINSTABLE
CONTAINSTABLE ranking uses the following algorithm:
StatisticalWeight = Log2( ( 2 + IndexedRowCount ) / KeyRowCount )
Rank = min( MaxQueryRank, HitCount * 16 * StatisticalWeight / MaxOccurrence )
Phrase matches are ranked just like individual keys except that KeyRowCount (the number of rows containing the
phrase) is estimated and can be inaccurate and higher than the actual number.
Ranking of NEAR
CONTAINSTABLE supports querying for two or more search terms in proximity to each other by using the NEAR
option. The rank value of each returned row is based on several parameters. One major ranking factor is the total
number of matches (or hits) relative to the length of the document. Thus, for example, if a 100-word document and
a 900-word document contain identical matches, the 100-word document is ranked higher.
The total length of each hit in a row will also contribute to the ranking of that row based on the distance between
the first and last search terms of that hit. The smaller the distance, the more the hit contributes to the rank value of
the row. If a full-text query does not specify an integer as the maximum distance, a document that contains only
hits whose distances are greater than 100 logical terms apart, will have a ranking of 0.
Ranking of ISABOUT
CONTAINSTABLE supports querying for weighted terms by using the ISABOUT option. ISABOUT is a vector-space
query in traditional information retrieval terminology. The default ranking algorithm used is Jaccard, a widely
known formula. The ranking is computed for each term in the query and then combined as described below.
ContainsRank = same formula used for CONTAINSTABLE ranking of a single term (above).
Weight = the weight specified in the query for each term. Default weight is 1.
WeightedSum = Σ[key=1 to n] ContainsRankKey * WeightKey
Rank = ( MaxQueryRank * WeightedSum ) / ( ( Σ[key=1 to n] ContainsRankKey^2 )
+ ( Σ[key=1 to n] WeightKey^2 ) - ( WeightedSum ) )
Ranking of FREETEXTTABLE
FREETEXTTABLE ranking is based on the OKAPI BM25 ranking formula. FREETEXTTABLE queries will add words to
the query via inflectional generation (inflected forms of the original query words); these words are treated as
separate words with no special relationship to the words from which they were generated. Synonyms generated
from the Thesaurus feature are treated as separate, equally weighted terms. Each word in the query contributes to
the rank.
Rank = Σ[Terms in Query] w ( ( ( k1 + 1 ) tf ) / ( K + tf ) ) * ( ( k3 + 1 ) qtf / ( k3 + qtf ) ) )
Where:
w is the Robertson-Sparck Jones weight.
In simplified form, w is defined as:
w = log10 ( ( ( r + 0.5 ) * ( N – R + r + 0.5 ) ) / ( ( R – r + 0.5 ) * ( n – r + 0.5 ) )
N is the number of indexed rows for the property being queried.
n is the number of rows containing the word.
K is ( k1 * ( ( 1 – b ) + ( b * dl / avdl ) ) ).
dl is the property length, in word occurrences.
avdl is the average length of the property being queried, in word occurrences.
k1, b, and k3 are the constants 1.2, 0.75, and 8.0, respectively.
tf is the frequency of the word in the queried property in a specific row.
qtf is the frequency of the term in the query.
See Also
Query with Full-Text Search
Improve the Performance of Full-Text Queries
3/24/2017 • 2 min to read • Edit Online
The following is a list of recommendations that will help to improve the performance of full-text queries.
The performance of full-text queries is also influenced by hardware resources, such as memory, disk speed, CPU
speed, and machine architecture.
Defragment the index of the base table by using ALTER INDEX REORGANIZE.
Reorganize the full-text catalog by using ALTER FULLTEXT CATALOG REORGANIZE. Make sure that you do
this before performance testing because running this statement causes a master merge of the full-text
indexes in that catalog.
Restrict your choice of full-text key columns to a small column. Although a 900-byte column is supported,
we recommend using a smaller key column in a full-text index. int and bigint provide the best performance.
Using an integer full-text key avoids a join with the docid mapping table. Therefore, an integer full-text key
improves query performance by an order of magnitude and improves crawl performance. Additional
performance benefits might result if the full-text key is also the clustered index key.
Combine multiple CONTAINS predicates into one CONTAINS predicate. In SQL Server you can specify a list
of columns in the CONTAINS query.
If you only require full-text key or rank information, use CONTAINSTABLE or FREETEXTTABLE instead of
CONTAINS or FREETEXT, respectively.
To limit results and increase performance, use the top_n_by_rank parameter of the FREETEXTTABLE and
CONTAINSTABLE functions. top_n_by_rank allows you to recall only the most relevant hits. Use this
parameter only if your business scenario does not require recalling all possible hits (that is, it does not
require total recall).
NOTE
Total recall is typically necessary for legal scenarios but might be less important than performance for business
scenarios such as an e-business.
Check the full-text query plan to make sure that the appropriate join plan is chosen. Use a join hint or query
hint if you have to. If a parameter is used in the full-text query, the first-time value of the parameter
determines the query plan. You can use the OPTIMIZE FOR query hint to force the query to compile with the
value you want. This helps achieve a deterministic query plan and better performance.
Too many full-text index fragments in the full-text index, can lead to substantial degradation in query
performance. To reduce the number of fragments, reorganize the full-text catalog by using the REORGANIZE
option of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement essentially merges all the
fragments into a single larger fragment and removes all obsolete entries from the full-text index.
In full-text search, logical operators specified in CONTAINSTABLE (AND, OR) can be implemented either as
SQL joins or inside the full-text execution streaming table-valued functions (STVF). Typically, queries with
only one type of logical operators are implemented purely by full-text execution, whereas queries that mix
logical operators also possess SQL joins. Implementation of a logical operator inside the full-text execution
STVF uses some special index properties that make it much faster than SQL joins. For this reason, we
recommend that, where possible, you frame queries using only a single type of logical operator.
For applications that contain selective-relation predications, queries that use selective relational predicates
and unselective full-text predicates might perform best when they are written to use the query optimizer.
This allows the query optimizer to decide whether it can exploit predicate or range pushdown to produce an
effective query plan. This approach is simpler and often more efficient than indexing relational data as fulltext data.
Related Resources
SQL Server 2008 Full-Text Search: Internals and Enhancements
See Also
sys.dm_fts_memory_buffers (Transact-SQL)
sys.dm_fts_memory_pools (Transact-SQL)
Search Document Properties with Search Property
Lists
3/24/2017 • 10 min to read • Edit Online
The content of document properties was previously indistinguishable from the content of the document body. This
limitation restricted full-text queries to generic searches on whole documents. Now, however, you can configure a
full-text index to support property-scoped searching on particular properties, such as Author and Title, for
supported document types in a varbinary, varbinary(max) (including FILESTREAM), or image binary data
column. This form of searching is known as property searching.
The associated filter (IFilter) determines whether property searching is possible on a specific type of document. For
some document types, the associated IFilter extracts some or all of the properties defined for that type of
document, as well as the content of the document body. You can configure a full-text index to support property
searching only on properties that are extracted by an IFilter during full-text indexing. Among IFilters that extract a
number of document properties are the IFilters for Microsoft Office document types (such as .docx, .xlsx, and .pptx).
On the other hand, the XML IFilter does not emit properties.
How Full-Text Search Works with Search Properties
Internal Property IDs
The Full-Text Engine arbitrarily assigns each registered property an internal property ID, which uniquely identifies
the property in that particular search list and which is specific to that search property list. Therefore, if a property is
added to multiple search property lists, its internal property ID is likely to differ between different lists.
When a property is registered for a search list, the Full-Text Engine arbitrarily assigns an internal property ID to the
property. The internal property ID is an integer that uniquely identifies the property in that search property list.
The following illustration shows a logical view of a search property list that specifies two properties, Title and
Keywords. The property-list name for Keywords is "Tags". These properties belong to the same property set, whose
GUID is F29F85E0-4FF9-1068-AB91-08002B27B3D9. The property integer identifiers are 2 for Title and 5 for Tags
(Keywords). The Full-Text Engine arbitrarily maps each property to an internal property ID that is unique to the
search property list. The internal property ID for the Title property is 1, and the internal property ID for the Tags
property is 2.
The internal property ID is likely to be different from the property integer identifier of the property. If a given
property is registered for multiple search property lists, a different internal property ID might be assigned for each
search property list. For example, the internal property ID might be 4 in one search property list, 1 in another, 3 in
another, and so on. In contrast, the property integer identifier is intrinsic to the property, and it remains the same
wherever the property is used.
Indexing of Registered Properties
After a full-text index is associated with a search property list, the index must be repopulated to index propertyspecific search terms. During full-text indexing, the contents of all properties are stored in the full-text index along
with other content. However, when indexing a search term found in a registered property, the full-text indexer also
stores the corresponding internal property ID with the term. In contrast, if a property is not registered, it is stored in
the full-text index as if it were part of the document body, and it has a value of zero for the internal property ID.
The following illustration shows a logical view of how search terms appear in a full-text index that is associated
with the search property list shown in the preceding illustration. A sample document, Document 1 contains three
properties—Title, Author, and Keywords—as well as the document body. For the properties Title and Keywords,
which are specified in the search property list, search terms are associated with their corresponding internal
property IDs in the full-text index. In contrast, the content of the Author property is indexed as if it were part of the
document body. This means registering a property increases the size of the full-text index somewhat, depending on
the amount of content stored in the property.
Search terms in the Title property—"Favorite," "Biking," and "Trails"—are associated with the internal property ID
assigned to Title for this index, 1. Search terms in the Keywords property—"biking" and "mountain"—are
associated with the internal property ID assigned to Tags for this index, 2. For search terms n the Author property
—"Jane" and "Doe"—and search terms in the document body, the internal property ID is 0. Note that the term
"biking" occurs in the Title property, in the Keywords (Tags) property, and in the document body. A property search
for "biking" in the Title or Keywords (Tags) property would return this document in the results. A generic full-text
query for "biking" would also return this document, just as if the index were not configured for property searching.
A property search for "biking" in the Author property would not return this document.
A property-scoped full-text query uses the internal property IDs registered for the current search property list of
the full-text index.
Impact of Enabling Property Searching
Configuring a full-text index to support searching on one or more properties increases the size of the index
somewhat, depending on the number of properties you specify in your search property list and on the content of
each property.
In testing typical corpuses of Microsoft Word, Excel, and PowerPoint documents, we configured a full-text index to
index typical search properties. Indexing these properties increased the size of the full-text index size by
approximately 5 percent. We anticipate that this approximate size increase will to be typical for most document
corpuses. However, ultimately, the size increase will depend on the amount of property data in a given document
corpus relative to the amount of overall data.
Creating a Search Property List and Enabling Property Search
Creating a Search Property List
To create a search property list with Transact-SQL
Use the CREATE SEARCH PROPERTY LIST (Transact-SQL) statement and provide at least a name the list.
To c r e a t e a se a r c h p r o p e r t y l i st i n M a n a g e m e n t St u d i o
1. In Object Explorer, expand the server.
2. Expand Databases, and then expand the database in which you want to create the search property list.
3. Expand Storage, and then right-click Search Property Lists.
4. Select New Search Property List.
5. Specify the property list name.
6. Optionally, specify someone else as the property list owner.
7. Select one of the following options:
Create an empty search property list
Create from an existing search property list
For more information, see New Search Property List.
8. Click OK.
Adding Properties to a Search Property List
Property searching requires creating a search property list and specifying one or more properties that you want to
make searchable. When you add a property to a search property list, the property is registered for that particular
list. To add a property to a search property list you need the following values:
Property set GUID
Each search property belongs to single property set that contains a group of related properties. Each
property set is identified by a globally unique identifier (GUID).
Property integer identifier
Each search property possesses an identifier that is unique within the property set. Note that for a given
property, the identifier could be either an integer or a string, however full-text search supports only integer
identifiers.
Property name
This is the name that users will specify in full-text queries to search on the property. A property name can
contain internal spaces. The maximum length is 256 characters.
The property name can be any of the following:
The Windows canonical name of the property, such as System.Author or
System.Contact.HomeAddress.
A user-friendly name that is easy for your users to remember. Some properties are associated with a
well-known user-friendly name, such as "Author" or "Home Address," but you can specify whatever
name is most appropriate to your users.
NOTE
A given combination of property set GUID and property identifier must be unique in a given search property list. This
means that you cannot add the same property more than once with different names or descriptions.
Property description (optional)
When adding a search property to a search property list, you can supply an optional description. For
example, you might want to provide information about a property that is not evident from its name, or you
might want to describe the property set of the property.
To obtain values for a search property list
See Find Property Set GUIDs and Property Integer IDs for Search Properties.
To add a property to a search property list with Transact-SQL
Use the ALTER SEARCH PROPERTY LIST (Transact-SQL) statement with the values that you obtained by
using one of the methods described in the topic, Find Property Set GUIDs and Property Integer IDs for
Search Properties.
The following example demonstrates the use of these values when adding a property to a search property
list:
ALTER SEARCH PROPERTY LIST DocumentTablePropertyList
ADD 'Title'
WITH ( PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9', PROPERTY_INT_ID = 2,
PROPERTY_DESCRIPTION = 'System.Title - Title of the item.' );
To add a property to a search property list in Management Studio
Use the Search Property List Properties dialog box to add and remove search properties. You can find Search
Property Lists in Object Explorer under the Storage node of the associated database.
Associating a Search Property List with a Full-Text Index
For a full-text index to support property searching on the properties that are registered for a search property list,
you need to associate the search property list with the index and repopulate the index. Repopulating the full-text
index creates property-specific index entries for search terms in each of the registered properties.
As long as the full-text index remains associated with this search property list, full-text query can use the
PROPERTY option of the CONTAINS predicate to search on properties that are registered for that search property
list.
If you change the search property list associated with a full-text index, then the index must be rebuilt to bring it into
a consistent state. The index is truncated immediately and is empty until the full population runs. For more
information about when changing the search property list causes rebuilding the index, see "Remarks," in ALTER
FULLTEXT INDEX (Transact-SQL).
To associate a search property list with a full-text index with Transact-SQL
Use the ALTER FULLTEXT INDEX (Transact-SQL) statement with the
SET SEARCH PROPERTY LIST = <property_list_name> clause.
To associate a search property list with a full-text index with Management Studio
Specify a value for Search Property List on the General page of the Full-Text Index Properties dialog box.
Querying Search Properties with CONTAINS
The basic CONTAINS syntax for a property-scoped full-text query is as follows:
SELECT column_name FROM table_name
WHERE CONTAINS ( PROPERTY ( column_name, 'property_name' ), '<contains_search_condition>' )
For example, the following query searches on an indexed property, Title , in the Document column of the
Production.Document table of the AdventureWorks database. The query returns only documents whose Title
property contains the string Maintenance or Repair
USE AdventureWorks
GO
SELECT Document FROM Production.Document
WHERE CONTAINS ( PROPERTY ( Document, 'Title' ), 'Maintenance OR Repair')
GO
This example assumes that the IFilter for the document extracts its Title property, that the Title property is added to
the search property list, and that the search property list is associated with the full-text index.
Managing Search Property Lists
Viewing and Changing a Search Property List
To change a search property list with Transact-SQL
Use the ALTER SEARCH PROPERTY LIST (Transact-SQL) statement to add or remove search properties.
To v i e w a n d c h a n g e a se a r c h p r o p e r t y l i st i n M a n a g e m e n t St u d i o
1. In Object Explorer, expand the server.
2. Expand Databases, and then expand the database.
3. Expand Storage.
4. Expand Search Property Lists to display the search property lists.
5. Right-click the property list, and select Properties.
6. In the Search Property List Editor dialog box, use the Properties grid to add or remove search properties:
a. To remove a document property, click the row header to the left of the property, and press DEL .
b. To add a document property, click in the empty row at the bottom of the list, to the right of the \*,
and enter the values for the new property.
For information about these values, see Search Property List Editor. For information about how to
obtain these values for properties defined by Microsoft, see Find Property Set GUIDs and Property
Integer IDs for Search Properties. For information about properties defined by an independent
software vendor (ISV), see the documentation of that vendor.
7. Click OK.
Deleting a Search Property List
You cannot drop a property list from a database while the list is associated with any full-text index.
To delete a search property list with Transact-SQL
Use the DROP SEARCH PROPERTY LIST (Transact-SQL) statement.
To d e l e t e a se a r c h p r o p e r t y l i st i n M a n a g e m e n t St u d i o
1. In Object Explorer, expand the server.
2. Expand Databases, and then expand the database.
3. Expand Storage, and then expand the Search Property Lists node.
4. Right-click the property list that you want to delete, and click Delete.
5. Click OK.
See Also
Find Property Set GUIDs and Property Integer IDs for Search Properties
Configure and Manage Filters for Search
Find Property Set GUIDs and Property Integer IDs
for Search Properties
3/24/2017 • 4 min to read • Edit Online
This topic discusses how to obtain the values that are required before you can add a property to a search property
list and make it searchable by full-text search. These values include the property set GUID and property integer
identifier of a document property.
Document properties that are extracted by IFilters from binary data – that is, from data stored in a varbinary,
varbinary(max) (including FILESTREAM), or image data type column – can be made available for full-text
search. To make an extracted property searchable, the property must be manually added to a search property list.
The search property list must also be associated with one or more full-text indexes. For more information, see
Search Document Properties with Search Property Lists.
Before you can add an available property to a property list, you have to find 2 pieces of information about the
property:
The property set GUID of the property.
The integer ID of the property.
(When you add a property to a property list, you also have to provide a name and description. However you
do not have to use the canonical name and description of the property.)
This topic describes the commonly-used methods to find information about available properties, especially
about properties that are defined by Microsoft. For information about properties that have been defined by
a third party, refer to the third-party documentation or contact the vendor.
Finding Information about Widely Used, Well-Known Microsoft
Properties
Microsoft defines hundreds of document properties for use in many contexts, but only a small subset of the
available properties are used by each file format. Among the frequently used Windows properties is a small set of
generic properties. Some examples of well-known generic properties are shown in the following table. The table
shows the well-known name, the Windows canonical name (from the property description published by
Microsoft), the property set GUID, the property integer identifier, and a brief description.
WELL-KNOWN NAME
WINDOWS CANONICAL
NAME
Authors
PROPERTY SET GUID
INTEGER ID
DESCRIPTION
System.Author
F29F85E0-4FF91068-AB9108002B27B3D9
4
Author or authors of
a given item.
Tags
System.Keywords
F29F85E0-4FF91068-AB9108002B27B3D9
5
Set of keywords (also
known as tags)
assigned to the item.
Type
System.PerceivedTy
pe
28636AA6-953D11D2-B5D600C04FD918D0
9
Perceived file type
based on its canonical
type.
WELL-KNOWN NAME
WINDOWS CANONICAL
NAME
Title
System.Title
PROPERTY SET GUID
INTEGER ID
DESCRIPTION
F29F85E0-4FF91068-AB9108002B27B3D9
2
Title of the item. For
example, the title of a
document, the
subject of a message,
the caption of a
photo, or the name of
a music track.
To encourage consistency among file formats, Microsoft has identified subsets of frequently used, high-priority
document properties for several categories of documents. These include communications, contacts, documents,
music files, pictures, and videos. For more information about the top-ranked properties for each category, see
system-defined properties for custom file formats in the Windows Search documentation.
A specific file format might implement properties of three types:
Generic properties defined by Microsoft.
Category-specific properties defined by Microsoft.
Custom, application-specific properties defined by the software vendor.
Finding Information about Available Properties by using
FILTDUMP.EXE
To learn what properties are discovered and extracted by an installed IFilter, you can install and run the
filtdump.exe utility, which is part of the Microsoft Windows SDK.
You run filtdump.exe from the command prompt and provide a single argument. This argument is the name of
an individual file that has a file type for which an IFilter is installed. The utility displays a list of all the properties
discovered by the IFilter in the document, with their property set GUIDs, integer IDs, and additional information.
For information about installing this software, see Microsoft Windows SDK for Windows 7 and .NET Framework 4.
After you download and install the SDK, look in the following folders for the filtdump.exe utility.
For the 64-bit version, look in
C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\x64
For the 32-bit version, look in
C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin
.
.
Finding Values for a Search Property from a Windows Property
Description
For a well-known Windows search property, you can obtain the information that you need from the formatID and
propID attributes of the property description (propertyDescription).
The following example shows the relevant part of a typical Microsoft property description, in this case, of the
System.Author property. The formatID attribute specifies the property set GUID,
F29F85E0-4FF9-1068-AB91-08002B27B3D9 , and the propID attribute specifies the property integer ID, 4. Notice that
the name attribute specifies the Windows canonical property name, System.Author . (This example omits portions
of the property description that are not relevant.)
.
propertyDescription
name = System.Author
…
formatID = F29F85E0-4FF9-1068-AB91-08002B27B3D9
propID = 4
…
For the complete description of this property, see System.Author in the Windows Search documentation.
For a complete list of Windows properties, see Windows Properties, also in the Windows Search documentation.
Adding a Property to a Search Property List
The following example shows how to add a property to a search property list. The example uses an ALTER SEARCH
PROPERTY LIST statement to add the System.Author property to a search property list named PropertyList1 , and
provides a user friendly name for the property, Author .
ALTER SEARCH PROPERTY LIST PropertyList1
ADD 'Author'
WITH (
PROPERTY_SET_GUID = 'F29F85E0-4FF9-1068-AB91-08002B27B3D9',
PROPERTY_INT_ID = 4,
PROPERTY_DESCRIPTION = 'System.Author - the author or authors of the item'
)
GO
For more information about creating a search property list and associating it with a full-text index, see Search
Document Properties with Search Property Lists.
See Also
Search Document Properties with Search Property Lists
Configure and Manage Filters for Search
Create and Manage Full-Text Catalogs
3/31/2017 • 2 min to read • Edit Online
A full-text catalog is a logical container for a group of full-text indexes. You have to create a full-text catalog before
you can create a full-text index.
A full-text catalog is a virtual object that does not belong to any filegroup.
Create a Full-Text Catalog
Create a full-text catalog with Transact-SQL
Use CREATE FULLTEXT CATALOG. For example:
USE AdventureWorks;
GO
CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;
GO
Create a full-text catalog with Management Studio
1. In Object Explorer, expand the server, expand Databases, and expand the database in which you want to
create the full-text catalog.
2. Expand Storage, and then right-click Full Text Catalogs.
3. Select New Full-Text Catalog.
4. In the New Full-Text Catalog dialog box, specify the information for the catalog that you are re-creating.
For more information, see New Full-Text Catalog (General Page).
NOTE
Full-text catalog IDs begin at 00005 and are incremented by one for each new catalog created.
5. Click OK.
Get the properties of a full-text catalog
Use the Transact-SQL function FULLTEXTCATALOGPROPERTY to get the value of various properties related to
full-text catalogs. For more info, see FULLTEXTCATALOGPROPERTY.
For example, run the following query to get the count of indexes in the full-text catalog
Catalog1
.
USE <database>;
GO
SELECT fulltextcatalogproperty('Catalog1', 'ItemCount');
GO
The following table lists the properties that are related to full-text catalogs. This information may be useful for
administering and troubleshooting full-text search.
PROPERTY
DESCRIPTION
AccentSensitivity
Accent-sensitivity setting.
ImportStatus
Whether the full-text catalog is being imported.
IndexSize
Size of the full-text catalog in megabytes (MB).
ItemCount
Number of full-text indexed items currently in the full-text
catalog.
MergeStatus
Whether a master merge is in progress.
PopulateCompletionAge
Difference in seconds between the completion of the last fulltext index population and 01/01/1990 00:00:00.
PopulateStatus
Populate status.
This feature will be removed in a future version of Microsoft
SQL Server. Avoid using this feature in new development
work, and plan to modify applications that currently use this
feature.
UniqueKeyCount
Number of unique keys in the full-text catalog.
Rebuild a full-text catalog
Run the Transact-SQL statement ALTER FULLTEXT CATALOG ... REBUILD, or do the following things in SQL Server
Management Studio (SSMS).
1. In SSMS, in Object Explorer, expand the server, expand Databases, and then expand the database that
contains the full-text catalog that you want to rebuild.
2. Expand Storage, and then expand Full Text Catalogs.
3. Right-click the name of the full-text catalog that you want to rebuild, and select Rebuild.
4. To the question Do you want to delete the full-text catalog and rebuild it?, click OK.
5. In the Rebuild Full-Text Catalog dialog box, click Close.
Rebuild all full-text catalogs for a database
1. In SSMS, in Object Explorer, expand the server, expand Databases, and then expand the database that
contains the full-text catalogs that you want to rebuild.
2. Expand Storage, and then right-click Full Text Catalogs.
3. Select Rebuild All.
4. To the question, Do you want to delete all full-text catalogs and rebuild them?, click OK.
5. In the Rebuild All Full-Text Catalogs dialog box, click Close.
Remove a full-text catalog from a database
Run the Transact-SQL statement DROP FULLTEXT CATALOG, or do the following things in SQL Server
Management Studio (SSMS).
1. In SSMS, in Object Explorer, expand the server, expand Databases, and expand the database that contains
the full-text catalog you want to remove.
2. Expand Storage, and expand Full Text Catalogs.
3. Right-click the full-text catalog that you want to remove, and then select Delete.
4. In the Delete Objects dialog box, click OK.
Next step
Create and Manage Full-Text Indexes
Create and Manage Full-Text Indexes
3/24/2017 • 8 min to read • Edit Online
This topic describes how to create, populate, and manage full-text indexes in SQL Server.
Prerequisite - Create a full-text catalog
Before you can create a full-text index, you have to have a full-text catalog. The catalog is a virtual container for
one or more full-text indexes. For more info, see Create and Manage Full-Text Catalogs.
Create, alter, or drop a full-text index
Create a full-text index
CREATE FULLTEXT INDEX (Transact-SQL)
Alter a full-text index
ALTER FULLTEXT INDEX (Transact-SQL)
Drop a full-text index
DROP FULLTEXT INDEX (Transact-SQL)
Populate a full-text index
The process of creating and maintaining a full-text index is called a population (also known as a crawl). There are
three types of full-text index population:
Full population
Population based on change tracking
Incremental population based on a timestamp.
For more info, see Populate Full-Text Indexes.
View the properties of a full-text index
View the properties of a full-text index with Transact-SQL
CATALOG OR DYNAMIC MANAGEMENT VIEW
DESCRIPTION
sys.fulltext_index_catalog_usages (Transact-SQL)
Returns a row for each full-text catalog to full-text index
reference.
sys.fulltext_index_columns (Transact-SQL)
Contains a row for each column that is part of a full-text
index.
sys.fulltext_index_fragments (Transact-SQL)
A fulltext index uses internal tables called full-text index
fragments to store the inverted index data. This view can be
used to query the metadata about these fragments. This view
contains a row for each full-text index fragment in every table
that contains a full-text index.
sys.fulltext_indexes (Transact-SQL)
Contains a row per full-text index of a tabular object.
CATALOG OR DYNAMIC MANAGEMENT VIEW
DESCRIPTION
sys.dm_fts_index_keywords (Transact-SQL)
Returns information about the content of a full-text index for
the specified table.
sys.dm_fts_index_keywords_by_document (Transact-SQL)
Returns information about the document-level content of a
full-text index for the specified table. A given keyword can
appear in several documents.
sys.dm_fts_index_population (Transact-SQL)
Returns information about the full-text index populations
currently in progress.
View the properties of a full-text index with Management Studio
1. In Management Studio, in Object Explorer, expand the server.
2. Expand Databases, and then expand the database that contains the full-text index.
3. Expand Tables.
4. Right-click the table on which the full-text index is defined, select Full-Text index, and on the Full-Text
index context menu, click Properties. This opens the Full-text index Properties dialog box.
5. In the Select a page pane, you can select any of the following pages:
PAGE
DESCRIPTION
General
Displays basic properties of the full-text index. These
include several modifiable properties and a number of
unchangeable properties such as database name, table
name, and the name of full-text key column. The
modifiable properties are:
Full-Text Index Stoplist
Full-Text Indexing Enabled
Change Tracking
Search Property List
For more info, see Full-Text Index Properties (General
Page).
Columns
Displays the table columns that are available for full-text
indexing. The selected column or columns are full-text
indexed. You can select as many of the available columns
as you want to include in the full-text index. For more
info, see Full-Text Index Properties (Columns Page).
Schedules
Use this page to create or manage schedules for a SQL
Server Agent job that starts an incremental table
population for the full-text index populations. For more
info, see Populate Full-Text Indexes.
Note: After you exit the Full-Text Index Properties
dialog box, any newly created schedule is associated with
a SQL Server Agent job (Start Incremental Table
Population on database_name.table_name).
6. Click OK. to save any changes and exit the Full-text index Properties dialog box.
View the properties of indexed tables and columns
Several Transact-SQL functions such as OBJECTPROPERTYEX can be used to obtain the value of various full-text
indexing properties. This information is useful for administering and troubleshooting full-text search.
The following table lists the full-text properties related to indexed tables and columns and their related TransactSQL functions.
PROPERTY
DESCRIPTION
FUNCTION
FullTextTypeColumn
TYPE COLUMN in the table that holds
the document type information of the
column.
COLUMNPROPERTY
IsFulltextIndexed
Whether a column has been enabled
for full-text indexing.
COLUMNPROPERTY
IsFulltextKey
Whether the index is the full-text key
for a table.
INDEXPROPERTY
TableFulltextBackgroundUpdateInd
exOn
Whether a table has full-text
background update indexing.
OBJECTPROPERTYEX
TableFulltextCatalogId
Full-text catalog ID in which the fulltext index data for the table resides.
OBJECTPROPERTYEX
TableFulltextChangeTrackingOn
Whether a table has full-text changetracking enabled.
OBJECTPROPERTYEX
TableFulltextDocsProcessed
Number of rows processed since the
start of full-text indexing.
OBJECTPROPERTYEX
TableFulltextFailCount
Number of rows Full-Text Search did
not index.
OBJECTPROPERTYEX
TableFulltextItemCount
Number of rows that were successfully
full-text indexed.
OBJECTPROPERTYEX
TableFulltextKeyColumn
The column ID of the full-text unique
key column.
OBJECTPROPERTYEX
TableFullTextMergeStatus
Whether a table that has a full-text
index is currently in merging.
OBJECTPROPERTYEX
TableFulltextPendingChanges
Number of pending change tracking
entries to process.
OBJECTPROPERTYEX
TableFulltextPopulateStatus
Population status of a full-text table.
OBJECTPROPERTYEX
TableHasActiveFulltextIndex
Whether a table has an active full-text
index.
OBJECTPROPERTYEX
Get info about the full-text key column
Typically, the result of CONTAINSTABLE or FREETEXTTABLE rowset-valued functions need to be joined with the
base table. In such cases, you need to know the unique key column name. You can inquire whether a given unique
index is used as the full-text key, and you can obtain the identifier of the full-text key column.
Determine whether a given unique index is used as the full-text key column
Use a SELECT statement to call the INDEXPROPERTY function. In the function call use the OBJECT_ID function to
convert the name of the table (table_name) into the table ID, specify the name of a unique index for the table, and
specify the IsFulltextKey index property, as follows:
SELECT INDEXPROPERTY( OBJECT_ID('table_name'), 'index_name', 'IsFulltextKey' );
This statement returns 1 if the index is used to enforce uniqueness of the full-text key column and 0 if it is not.
Example
The following example inquires whether the
the full-text key column, as follows:
PK_Document_DocumentID
index is used to enforce the uniqueness of
USE AdventureWorks
GO
SELECT INDEXPROPERTY ( OBJECT_ID('Production.Document'), 'PK_Document_DocumentID', 'IsFulltextKey' )
This example returns 1 if the PK_Document_DocumentID index is used to enforce uniqueness of the full-text key
column. Otherwise, it returns 0 or NULL. NULL implies you are using an invalid index name, the index name does
not correspond to the table, the table does not exist, or so forth.
Find the identifier of the full-text key column
Each full-text enabled table has a column that is used to enforce unique rows for the table (the uniquekey
column). The TableFulltextKeyColumn property, obtained from the OBJECTPROPERTYEX function, contains the
column ID of the unique key column.
To obtain this identifier, you can use a SELECT statement to call the OBJECTPROPERTYEX function. Use the
OBJECT_ID function to convert the name of the table (table_name) into the table ID and specify the
TableFulltextKeyColumn property, as follows:
SELECT OBJECTPROPERTYEX(OBJECT_ID( 'table_name'), 'TableFulltextKeyColumn' ) AS 'Column Identifier';
Examples
The following example returns the identifier of the full-text key column or NULL. NULL implies that you are using
an invalid index name, the index name does not correspond to the table, the table does not exist, or so forth.
USE AdventureWorks;
GO
SELECT OBJECTPROPERTYEX(OBJECT_ID('Production.Document'), 'TableFulltextKeyColumn');
GO
The following example shows how to use the identifier of the unique key column to obtain the name of the
column.
USE AdventureWorks;
GO
DECLARE @key_column sysname
SET @key_column = Col_Name(Object_Id('Production.Document'),
ObjectProperty(Object_id('Production.Document'),
'TableFulltextKeyColumn')
)
SELECT @key_column AS 'Unique Key Column';
GO
This example returns a result set column named Unique Key Column , which displays a single row containing the
name of the unique key column of the Document table, DocumentID. Note that if this query contained an invalid
index name, the index name did not correspond to the table, the table did not exist, and so forth, it would return
NULL.
Index varbinary(max) and xml columns
If a varbinary(max), varbinary, or xml column is full-text indexed, it can be queried using the full-text predicates
(CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE), like any other full-text indexed
column.
Index varbinary(max) or varbinary data
A single varbinary(max) or varbinary column can store many types of documents. SQL Server supports any
document type for which a filter is installed and available in the operative system. The document type of each
document is identified by the file extension of the document. For example, for a .doc file extension, full-text search
uses the filter that supports Microsoft Word documents. For a list of available document types, query the
sys.fulltext_document_types catalog view.
Note that the Full-Text Engine can leverage existing filters that are installed in the operating system. Before you
can use operating-system filters, word breakers, and stemmers, you must load them in the server instance, as
follows:
EXEC sp_fulltext_service @action='load_os_resources', @value=1
To create a full-text index on a varbinary(max) column, the Full-Text Engine needs access to the file extensions of
the documents in the varbinary(max) column. This information must be stored in a table column, called a type
column, that must be associated with the varbinary(max) column in the full-text index. When indexing a
document, the Full-Text Engine uses the file extension in the type column to identify which filter to use.
Index xml data
An xml data type column stores only XML documents and fragments, and only the XML filter is used for the
documents. Therefore, a type column is unnecessary. On xml columns, the full-text index indexes the content of
the XML elements, but ignores the XML markup. Attribute values are full-text indexed unless they are numeric
values. Element tags are used as token boundaries. Well-formed XML or HTML documents and fragments
containing multiple languages are supported.
For more info about indexing and querying on an xml column, see Use Full-Text Search with XML Columns.
Disable or re-enable tull-text indexing for a table
In SQL Server, all user-created databases are full-text enabled by default. Additionally, an individual table is
automatically enabled for full-text indexing as soon as a full-text index is created on it and a column is added to
the index. A table is automatically disabled for full-text indexing when the last column is dropped from its full-text
index.
On a table that has a full-text index, you can manually disable or re-enable a table for full-text indexing using SQL
Server Management Studio.
1. Expand the server group, expand Databases, and expand the database that contains the table you want to
enable for full-text indexing.
2. Expand Tables, and right-click the table that you want to disable or re-enable for full-text indexing.
3. Select Full-Text index, and then click Disable Full-Text index or Enable Full-Text index.
Remove a full-text index from a table
1. In Object Explorer, right-click the table that has the full-text index that you want to delete.
2. Select Delete Full-Text index.
3. When prompted, click OK to confirm that you want to delete the full-text index.
Choose a Language When Creating a Full-Text Index
3/24/2017 • 7 min to read • Edit Online
When creating a full-text index, you need to specify a column-level language for the indexed column. The word
breaker and stemmers of the specified language will be used by full-text queries on the column. There are a couple
of things to consider when choosing the column language when creating a full-text index. These considerations
relate to how your text is tokenized and then indexed by Full-Text Engine.
NOTE
To specify a column-level language for a column of full-text index, use the LANGUAGE language_term clause when
specifying the column. For more information, see CREATE FULLTEXT INDEX (Transact-SQL) and ALTER FULLTEXT INDEX
(Transact-SQL).
Language Support in Full-Text Search
This section provides an introduction to word breakers and stemmers, and discusses how full-text search uses the
LCID of the column-level language.
Introduction to Word Breakers and Stemmers
SQL Server 2008 and later versions include a complete new family of word breakers and stemmers that are
significantly better than those previously available in SQL Server.
NOTE
The Microsoft Natural Language Group (MS NLG) implemented and supports these new linguistic components.
The new word breakers provide the following benefits:
Robustness
Testing has shown that the new word breakers are robust in high-pressure query environments.
Security
The new word breakers are enabled by default in SQL Server thanks to security improvements in linguistic
components. We highly recommend that external components such as word breakers and filters be signed
to improve the overall security and robustness of SQL Server. You can configure full-text to verify that these
components are signed as follows:
EXEC sp_fulltext_service 'verify_signature';
Quality
Word breakers have been redesigned, and testing has shown that the new word breakers provide better
semantic quality than previous word breakers. This increases the recall accuracy.
Coverage for a vast list of languages, word breakers are included in SQL Server out of the box and enabled
by default .
For a list of the languages for which SQL Server includes a word breaker and stemmers, see
sys.fulltext_languages (Transact-SQL).
How Full-Text Search Uses the Name of the Column-Level Language
When creating a full-text index, you need to specify a valid language name for each column. If a language name is
valid but not returned by the sys.fulltext_languages (Transact-SQL) catalog view, full-text search falls back to the
closest available language name of the same language family, if any. Otherwise, full-text search falls back to the
Neutral word breaker. This fall-back behavior might affect the recall accuracy. Therefore we strongly recommend
that you specify a valid and available language name for each column when creating a full-text index.
NOTE
The LCID is used against all data types eligible for full-text indexing (such as char or nchar). If you have the sort order of a
char, varchar, or text type column set to a language setting different from the language identified by the LCID, the LCID is
used anyway during full-text indexing and querying of those columns.
Word Breaking
A word breaker tokenizes the text being indexed on word boundaries, which are language specific. Therefore,
word-breaking behavior differs among different languages. If you use one language, x, to index a number of
languages {x, y, and ,z}, some of the behavior might cause unexpected results. For example, a dash (-) or a comma
(,) might be a word-break element that will be thrown away in one language but not in another. Also rarely
unexpected stemming behavior might occur because a given word might stem differently in different language.
For example, in the English language, word boundaries are typically white space or some form of punctuation. In
other languages, such as German, words or characters may be combined together. Therefore, the column-level
language that you choose should represent the language that you expect will be stored in rows of that column.
Western Languages
For the Western family of languages, if you are unsure which languages will be stored in a column or you expect
more than one to be stored, a general workaround is to use the word breaker for the most complex language that
might be stored in the column. For instance, you might expect to store English, Spanish and German content in a
single column. These three Western languages possess very similar word-breaking patterns, with the German
patterns being the most complex. Therefore, a good choice is this case would be to use the German word breaker,
which should be able to process English and Spanish text correctly. In contrast, the English word breaker might not
process German text perfectly because of the compound words of German.
Note that using the word breaker of the most complex language in a language family does not guarantee perfect
indexing of every language in the family. Corner cases might exist in which the most complex word breaker cannot
correctly handle text written in another language.
Non Western Languages
For non Western languages (such as Chinese, Japanese, Hindi, and so forth) the above workaround does not
necessarily work, for linguistic reasons. For non Western languages, consider one of the following workarounds:
For languages from different families
If a column might contain dramatically different languages, for example, Spanish and Japanese, consider
storing the content of different languages in separate columns. This would allow you to use the languagespecific word breaker for each column. If you choose this solution and you don't know the query language
at query time, you might need to issue the query against both columns to ensure that the query finds the
right row or document.
For Binary content (such as Microsoft Word documents)
When the indexed content is of binary type, the full-text search filter that processes the textual content
before sending it to the word breaker might honor specific language tags existing within the binary file. In
this case, at indexing time, the filter will emit the right LCID for a document or section of a document. The
Full-Text Engine will then call the word breaker for the language with that LCID. However, after indexing
multi language content, we recommend that you verify that the content was correctly indexed.
For plain text content
When your content is plain text, you can convert it to the xml data type and add language tags that indicate
the language corresponding to each specific document or document section. For this to work, however, you
need to know the language before full-text indexing.
Stemming
An additional consideration when choosing your column-level language is stemming. Stemming in full-text
queries is the process of searching for all stemmed (inflectional) forms of a word in a particular language. When
you use a generic word breaker to process several languages, the stemming process works only for the language
specified for the column, not for other languages in the column. For example, German stemmers do not work for
English or Spanish (and so forth). This might affect your recall depending of which language you choose at query
time.
Effect of Column Type on Full-Text Search
Another consideration in language choice is related to how the data is represented. For data that is not stored in
varbinary(max) column, no special filtering is performed. Rather, the text is generally passed through the word
breaking component as-is.
Also, word breakers are designed mainly to process written text. So, if you have any type of markup (such as
HTML) on your text, you may not get great linguistic accuracy during indexing and search. In that case, you have
two choices—the preferred method is simply to store the text data in varbinary(max) column, and to indicate its
document type so it may be filtered. If this is not an option, you may consider using the neutral word breaker and,
if possible, adding markup data (such as 'br' in HTML) to your noise word lists.
NOTE
Language based stemming does not come into play when you specify the neutral language.
Specifying a Non-default Column-Level Language in a Full-Text Query
By default, in SQL Server, full-text search will parse the query terms using the language specified for each column
that is included in the full-text clause. To override this behavior, specify a nondefault language at query time. For
supported languages whose resources are installed, the LANGUAGE language_term clause of a CONTAINS,
CONTAINSTABLE, FREETEXT, or FREETEXTTABLE query can be used to specify the language used for word
breaking, stemming, thesaurus, and stopword processing of the query terms.
See Also
CONTAINS (Transact-SQL)
CONTAINSTABLE (Transact-SQL)
Data Types (Transact-SQL)
FREETEXT (Transact-SQL)
FREETEXTTABLE (Transact-SQL)
Configure and Manage Filters for Search
sp_fulltext_service (Transact-SQL)
sys.fulltext_languages (Transact-SQL)
Configure and Manage Word Breakers and Stemmers for Search
Populate Full-Text Indexes
3/30/2017 • 8 min to read • Edit Online
Creating and maintaining a full-text index involves populating the index by using a process called a population
(also known as a crawl).
Types of population
A full-text index supports the following types of population:
Full population
Automatic or manual population based on change tracking
Incremental population based on a timestamp
Full population
During a full population, index entries are built for all the rows of a table or indexed view. A full population of a
full-text index, builds index entries for all the rows of the base table or indexed view.
By default, SQL Server populates a new full-text index fully as soon as it is created.
On the one hand, a full population can consume a significant amount of resources. Therefore, when creating a
full-text index during peak periods, it is often a best practice to delay the full population until an off-peak time,
particularly if the base table of an full-text index is large.
On the other hand, the full-text catalog to which the index belongs is not usable until all of its full-text indexes
are populated.
To create a full-text index without populating it immediately, specify the CHANGE_TRACKING OFF, NO POPULATION
clause in the CREATE FULLTEXT INDEX statement. If you specify CHANGE_TRACKING MANUAL , the Full-Text Engine
doesn't populate the new full-text index until you execute an ALTER FULLTEXT INDEX statement using the
START FULL POPULATION or START INCREMENTAL POPULATION clause.
Example - Create a full-text index without running a full population
The following example creates a full-text index on the Production.Document table of the AdventureWorks sample
database. This example uses WITH CHANGE_TRACKING OFF, NO POPULATION to delay the initial full population.
CREATE UNIQUE INDEX ui_ukDoc ON Production.Document(DocumentID);
CREATE FULLTEXT CATALOG AW_Production_FTCat;
CREATE FULLTEXT INDEX ON Production.Document
(
Document
--Full-text index column name
TYPE COLUMN FileExtension
--Name of column that contains file type information
Language 1033
--1033 is LCID for the English language
)
KEY INDEX ui_ukDoc
ON AW_Production_FTCat
WITH CHANGE_TRACKING OFF, NO POPULATION;
GO
Example - Run a full population on a table
The following example runs a full population on the
database.
Production.Document
table of the
AdventureWorks
sample
ALTER FULLTEXT INDEX ON Production.Document
START FULL POPULATION;
Population based on change tracking
Optionally, you can use change tracking to maintain a full-text index after its initial full population. There is a small
overhead associated with change tracking because SQL Server maintains a table in which it tracks changes to the
base table since the last population. When you use change tracking, SQL Server maintains a record of the rows in
the base table or indexed view that have been modified by updates, deletes, or inserts. Data changes made
through WRITETEXT and UPDATETEXT are not reflected in the full-text index, and are not picked up with change
tracking.
NOTE
For tables containing a timestamp column, you can use incremental population instead of change tracking.
When you enable change tracking during index creation, SQL Server fully populates the new full-text index
immediately after it is created. Thereafter, changes are tracked and propagated to the full-text index.
Enable change tracking
There are two types of change tracking:
Automatic ( CHANGE_TRACKING AUTO option). Automatic change tracking is the default behavior.
Manual ( CHANGE_TRACKING MANUAL option).
The type of change tracking determines how the full-text index is populated, as follows:
Automatic population
By default, or if you specify CHANGE_TRACKING AUTO , the Full-Text Engine uses automatic population on the
full-text index. After the initial full population completes, changes are tracked as data is modified in the
base table, and the tracked changes are propagated automatically. The full-text index is updated in the
background, however, so propagated changes might not be reflected immediately in the index.
To start tracking changes with automatic population
CREATE FULLTEXT INDEX … WITH CHANGE_TRACKING AUTO
ALTER FULLTEXT INDEX … SET CHANGE_TRACKING AUTO
Example - Alter a full-text index to use automatic change tracking
The following example changes the full-text index of the HumanResources.JobCandidate table of the
AdventureWorks sample database to use change tracking with automatic population.
USE AdventureWorks;
GO
ALTER FULLTEXT INDEX ON HumanResources.JobCandidate SET CHANGE_TRACKING AUTO;
GO
Manual population
If you specify CHANGE_TRACKING MANUAL, the Full-Text Engine uses manual population on the full-text
index. After the initial full population completes, changes are tracked as data is modified in the base table.
However, they are not propagated to the full-text index until you execute an ALTER FULLTEXT INDEX …
START UPDATE POPULATION statement. You can use SQL Server Agent to call this Transact-SQL statement
periodically.
To start tracking changes with manual population
CREATE FULLTEXT INDEX … WITH CHANGE_TRACKING MANUAL
ALTER FULLTEXT INDEX … SET CHANGE_TRACKING MANUAL
Example - Create a full-text index with manual change tracking
The following example creates a full-text index that will use change tracking with manual population on the
HumanResources.JobCandidate table of the AdventureWorks sample database.
USE AdventureWorks;
GO
CREATE UNIQUE INDEX ui_ukJobCand ON HumanResources.JobCandidate(JobCandidateID);
CREATE FULLTEXT CATALOG ft AS DEFAULT;
CREATE FULLTEXT INDEX ON HumanResources.JobCandidate(Resume)
KEY INDEX ui_ukJobCand
WITH CHANGE_TRACKING=MANUAL;
GO
Example - Run a manual population
The following example runs a manual population on the change-tracked full-text index of the
HumanResources.JobCandidate table of the AdventureWorks sample database.
USE AdventureWorks;
GO
ALTER FULLTEXT INDEX ON HumanResources.JobCandidate START UPDATE POPULATION;
GO
Disable change tracking
CREATE FULLTEXT INDEX … WITH CHANGE_TRACKING OFF
ALTER FULLTEXT INDEX … SET CHANGE_TRACKING OFF
Incremental population based on a timestamp
An incremental population is an alternative mechanism for manually populating a full-text index. If a table
experiences a high volume of inserts, using incremental population can be more efficient that using manual
population.
You can run an incremental population for a full-text index that has CHANGE_TRACKING set to MANUAL or OFF.
The requirement for incremental population is that the indexed table must have a column of the timestamp data
type. If a timestamp column does not exist, incremental population cannot be performed.
SQL Server uses the timestamp column to identify rows that have changed since the last population. The
incremental population then updates the full-text index for rows added, deleted, or modified after the last
population, or while the last population was in progress. At the end of a population, the Full-Text Engine records a
new timestamp value. This value is the largest timestamp value that SQL Gatherer has found. This value will be
used when the next incremental population starts.
In some cases, the request for an incremental population results in a full population.
A request for incremental population on a table without a timestamp column results in a full population
operation.
If the first population on a full-text index is an incremental population, it indexes all rows, making it equivalent
to a full population.
If any metadata that affects the full-text index for the table has changed since the last population, incremental
population requests are implemented as full populations. This includes metadata changes caused by altering
any column, index, or full-text index definitions.
Run an incremental population
To run an incremental population, execute an
START INCREMENTAL POPULATION clause.
ALTER FULLTEXT INDEX
statement using the
Create or change a schedule for incremental population
1. In Management Studio, in Object Explorer, expand the server.
2. Expand Databases, and then expand the database that contains the full-text index.
3. Expand Tables.
Right-click the table on which the full-text index is defined, select Full-Text index, and on the Full-Text
index context menu, click Properties. This opens the Full-text index Properties dialog box.
IMPORTANT
If the base table or view does not contain a column of the timestamp data type, incremental population is not
possible.
4. In the Select a page pane, select Schedules.
Use this page to create or manage schedules for a SQL Server Agent job that starts an incremental table
population on the base table or indexed view of the full-text index.
The options are as follows:
To create a new schedule, click New.
This opens the New Full-Text Indexing Table Schedule dialog box, where you can create a
schedule. To save the schedule, click OK.
IMPORTANT
A SQL Server Agent job (Start Incremental Table Population on database_name.table_name) is associated
with a new schedule after you exit the Full-Text Index Properties dialog box. If you create multiple
schedules for the same full-text index, they all use the same job.
To change an existing schedule, select the existing schedule and click Edit.
This opens the New Full-Text Indexing Table Schedule dialog box, where you can modify the
schedule.
NOTE
For information about modifying a SQL Server Agent job, see Modify a Job.
To remove an existing schedule, select the existing schedule and click Delete.
5. Click OK.
Troubleshoot errors in a full-text population (crawl)
When an error occurs during a crawl, the Full-Text Search crawl logging facility creates and maintains a crawl log,
which is a plain text file. Each crawl log corresponds to a particular full-text catalog. By default, crawl logs for a
given instance (in this example, the default instance) are located in
%ProgramFiles%\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\LOG folder.
The crawl log file follows the following naming scheme:
SQLFT<DatabaseID><FullTextCatalogID>.LOG[<n>]
The variable parts of the crawl log file name are the following.
<DatabaseID> - The ID of a database. <dbid> is a five digit number with leading zeros.
<FullTextCatalogID> - Full-text catalog ID. <catid> is a five digit number with leading zeros.
<n> - Is an integer that indicates one or more crawl logs of the same full-text catalog exist.
For example, SQLFT0000500008.2 is the crawl log file for a database with database ID = 5, and full-text
catalog ID = 8. The 2 at the end of the file name indicates that there are two crawl log files for this
database/catalog pair.
See Also
sys.dm_fts_index_population (Transact-SQL)
Get Started with Full-Text Search
Create and Manage Full-Text Indexes
CREATE FULLTEXT INDEX (Transact-SQL)
ALTER FULLTEXT INDEX (Transact-SQL)
Improve the Performance of Full-Text Indexes
3/30/2017 • 11 min to read • Edit Online
This topic describes some of the common causes of poor performance for full-text indexes and queries. It also
provides a few suggestions to mitigate these issues and improve performance.
Common causes of performance issues
Hardware resource issues
The performance of full-text indexing and full-text queries is influenced by hardware resources, such as memory,
disk speed, CPU speed, and machine architecture.
The main cause for reduced full-text indexing performance is hardware-resource limits.
CPU. If CPU usage by the filter daemon host process (fdhost.exe) or the SQL Server process (sqlservr.exe) is
close to 100 percent, the CPU is the bottleneck.
Memory. If there is a shortage of physical memory, memory might be the bottleneck.
Disk. If the average disk-waiting queue length is more than two times the number of disk heads, there is a
bottleneck on the disk. The primary workaround is to create full-text catalogs that are separate from the SQL
Server database files and logs. Put the logs, database files, and full-text catalogs on separate disks. Installing
faster disks and using RAID can also help improve indexing performance.
NOTE
Beginning in SQL Server 2008, the Full-Text Engine can use AWE memory because the Full-Text Engine is part of the
sqlservr.exe process.
Full-text batching issues
If the system has no hardware bottlenecks, the indexing performance of full-text search mostly depends on the
following:
How long it takes SQL Server to create full-text batches.
How quickly the filter daemon can consume those batches.
Full-text index population issues
Type of population. Unlike full population, incremental, manual, and auto change tracking population are
not designed to maximize hardware resources to achieve faster speed. Therefore, the tuning suggestions in
this topic may not enhance performance for full-text indexing when it uses incremental, manual, or auto
change tracking population.
Master merge. When a population has completed, a final merge process is triggered that merges the index
fragments together into one master full-text index. This results in improved query performance since only
the master index needs to be queried rather than a number of index fragments, and better scoring statistics
may be used for relevance ranking. However the master merge can be I/O intensive, because large amounts
of data must be written and read when index fragments are merged, though it does not block incoming
queries.
Master merging a large amount of data can create a long running transaction, delaying truncation of the
transaction log during checkpoint. In this case, under the full recovery model, the transaction log might grow
significantly. As a best practice, before reorganizing a large full-text index in a database that uses the full
recovery model, ensure that your transaction log contains sufficient space for a long-running transaction.
For more information, see Manage the Size of the Transaction Log File.
Tune the performance of full-text indexes
To maximize the performance of your full-text indexes, implement the following best practices:
To use all CPU processors or cores to the maximum, set sp_configure 'max full-text crawl range' to the
number of CPUs on the system. For information about this configuration option, see max full-text crawl
range Server Configuration Option.
Make sure that the base table has a clustered index. Use an integer data type for the first column of the
clustered index. Avoid using GUIDs in the first column of the clustered index. A multi-range population on a
clustered index can produce the highest population speed. We recommend that the column serving as the
full-text key be an integer data type.
Update the statistics of the base table by using the UPDATE STATISTICS statement. More important, update
the statistics on the clustered index or the full-text key for a full population. This helps a multi-range
population to generate good partitions on the table.
Before you perform a full population on a large multi-CPU computer, we recommend that you temporarily
limit the size of the buffer pool by setting the max server memory value to leave enough memory for the
fdhost.exe process and operating system use. For more information, see "Estimating the Memory
Requirements of the Filter Daemon Host Process (fdhost.exe)," later in this topic.
If you use incremental population based on a timestamp column, build a secondary index on the timestamp
column to improve the performance of incremental population.
Troubleshoot the performance of full populations
Review the full-text crawl logs
To help diagnose performance problems, look at the full-text crawl logs.
When an error occurs during a crawl, the Full-Text Search crawl logging facility creates and maintains a crawl log,
which is a plain text file. Each crawl log corresponds to a particular full-text catalog. By default, crawl logs for a
given instance (in this example, the default instance) are located in
%ProgramFiles%\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\LOG folder.
The crawl log file follows the following naming scheme:
SQLFT<DatabaseID\><FullTextCatalogID\>.LOG[<n\>]
The variable parts of the crawl log file name are the following.
<DatabaseID> - The ID of a database. <dbid> is a five digit number with leading zeros.
<FullTextCatalogID> - Full-text catalog ID. <catid> is a five digit number with leading zeros.
<n> - Is an integer that indicates one or more crawl logs of the same full-text catalog exist.
For example, SQLFT0000500008.2 is the crawl log file for a database with database ID = 5, and full-text catalog
ID = 8. The 2 at the end of the file name indicates that there are two crawl log files for this database/catalog
pair.
Check physical memory usage
During a full-text population, it is possible for fdhost.exe or sqlservr.exe to run low on memory or to run out of
memory.
If the full-text crawl log shows that fdhost.exe is being restarted often or that error code 8007008 is being
returned it means one of these processes is running out of memory.
If fdhost.exe is producing dumps, particularly on large, multi-CPU computers, it might be running out of
memory.
To get information about memory buffers used by a full-text crawl, see sys.dm_fts_memory_buffers
(Transact-SQL).
The possible causes of low memory or out of memory issues are the following:
Insufficient memory. If the amount of physical memory that is available during a full population is zero,
the SQL Server buffer pool might be consuming most of the physical memory on the system.
The sqlservr.exe process tries to grab all available memory for the buffer pool, up to the configured
maximum server memory. If the max server memory allocation is too large, out-of-memory conditions
and failure to allocate shared memory can occur for the fdhost.exe process.
You can solve this problem by setting the max server memory value of the SQL Server buffer pool
appropriately. For more information, see "Estimating the Memory Requirements of the Filter Daemon Host
Process (fdhost.exe)," later in this topic. Reducing the batch size used for full-text indexing may also help.
Memory contention. During a full-text population on a multi-CPU computer, contention for the buffer pool
memory can occur between fdhost.exe or sqlservr.exe. The resulting lack of shared memory causes batch
retries, memory thrashing, and dumps by the fdhost.exe process.
Paging issues. Insufficient page-file size, such as on a system that has a small page file with restricted
growth, can also cause the fdhost.exe or sqlservr.exe to run out of memory. If the crawl logs do not indicate
any memory-related failures, it is likely that performance is slow due to excessive paging.
Estimate the memory requirements of the Filter Daemon Host process (fdhost.exe )
The amount of memory required by the fdhost.exe process for a population depends mainly on the number of fulltext crawl ranges it uses, the size of inbound shared memory (ISM), and the maximum number of ISM instances.
The amount of memory (in bytes) consumed by the filter daemon host can be roughly estimated by using the
following formula:
number_of_crawl_ranges * ism_size * max_outstanding_isms * 2
The default values of the variables in the preceding formula are as follows:
VARIABLE
DEFAULT VALUE
number_of_crawl_ranges
The number of CPUs
ism_size
1 MB for x86 computers
4 MB, 8 MB, or 16MB for x64 computers, depending on the
total physical memory
max_outstanding_isms
25 for x86 computers
5 for x64 computers
The following table presents guidelines about how to estimate the memory requirements of fdhost.exe. The
formulas in this table use the following values:
F, which is an estimate of memory needed by fdhost.exe (in MB).
T, which is the total physical memory available on the system (in MB).
M, which is the optimal max server memory setting.
For essential information about the following formulas, see the notes that follow the table.
PLATFORM
ESTIMATING FDHOST.EXE MEMORY
REQUIREMENTS IN MB—F^1
FORMULA FOR CALCULATING MAX SERVER
MEMORY—M^2
x86
F = Number of crawl ranges * 50
M =minimum(T, 2000) – F – 500
x64
F = Number of crawl ranges * 10 * 8
M = T – F – 500
Notes about the formulas
1. If multiple full populations are in progress, calculate the fdhost.exe memory requirements of each separately, as
F1, F2, and so forth. Then calculate M as T– sigma(Fi).
2. 500 MB is an estimate of the memory required by other processes in the system. If the system is doing
additional work, increase this value accordingly.
3. .ism_size is assumed to be 8 MB for x64 platforms.
Example: Estimate the memory requirements of fdhost.exe
This example is for an 64-bit computer that has 8GM of RAM and 4 dual core processors. The first
calculation estimates of memory needed by fdhost.exe—F. The number of crawl ranges is 8 .
F = 8*10*8=640
The next calculation obtains the optimal value for max server memory—M. The total physical memory
available on this system in MB—T—is 8192 .
M = 8192-640-500=7052
Example: Setting max server memory
This example uses the sp_configure and RECONFIGURE Transact-SQL statements to set max server
memory to the value calculated for M in the preceding example, 7052 :
USE master;
GO
EXEC sp_configure 'max server memory', 7052;
GO
RECONFIGURE;
GO
For more info about the server memory options, see Server Memory Server Configuration Options.
Check CPU usage
The performance of full populations is not optimal when the average CPU consumption is lower than about 30
percent. Here are some factors that affect CPU consumption.
High wait time for pages
To find out whether a page wait time is high, run the following Transact-SQL statement:
Execute SELECT TOP 10 * FROM sys.dm_os_wait_stats ORDER BY wait_time_ms DESC;
The following table describes the wait types of interest here.
WAIT TYPE
DESCRIPTION
POSSIBLE RESOLUTION
PAGEIO_LATCH_SH (_EX or _UP)
This could indicate an IO bottleneck,
in which case you would typically also
see a high average disk-queue
length.
Moving the full-text index to a
different filegroup on a different disk
could help reduce the IO bottleneck.
PAGELATCH_EX (or _UP)
This could indicate a lot of contention
among threads that are trying to
write to the same database file.
Adding files to the filegroup on which
the fulltext index resides could help
alleviate such contention.
For more info, see sys.dm_os_wait_stats (Transact-SQL).
Inefficiencies in scanning the base table
A full population scans the base table to produce batches. This table scanning could be inefficient in the
following scenarios:
If the base table has a high percentage of out-of-row columns that are being full-text indexed,
scanning the base table to produce batches might be the bottleneck. In this case, moving the smaller
data in-row using varchar(max) or nvarchar(max) might help.
If the base table is very fragmented, scanning might be inefficient. For information about computing
out-of-row data and index fragmentation, see sys.dm_db_partition_stats (Transact-SQL) and
sys.dm_db_index_physical_stats (Transact-SQL).
To reduce fragmentation, you can reorganize or rebuild the clustered index. For more information,
see Reorganize and Rebuild Indexes.
Troubleshoot slow indexing of documents
NOTE
This section describes an issue that only affects customers who index documents (such as Microsoft Word documents) in
which other document types are embedded.
The Full-Text Engine uses two types of filters when it populates a full-text index: multithreaded filters and singlethreaded filters.
Some documents, such as Microsoft Word documents, are filtered using a multithreaded filter.
Other documents, such as Adobe Acrobat Portable Document Format (PDF) documents, are filtered using a
single-threaded filter.
For security reasons, filters are loaded by the filter daemon host processes. A server instance uses a
multithreaded process for all multithreaded filters and a single-threaded process for all single-threaded
filters. When a document that uses a multithreaded filter contains an embedded document that uses a
single-threaded filter, the Full-Text Engine launches a single-threaded process for the embedded document.
For example, on encountering a Word document that contains a PDF document, the Full-Text Engine uses
the multithreaded process for the Word content and launches a single-threaded process for the PDF content.
A single-threaded filter might not work well in this environment, however, and could destabilize the filtering
process. In certain circumstances where such embedding is common, destabilization might lead to crashes
of the process. When this occurs, the Full-Text Engine re-routes any failed document - for example, a Word
document that contains embedded PDF content - to the single-threaded filtering process. If re-routing
occurs frequently, it results in performance degradation of the full-text indexing process.
To work around this problem, mark the filter for the container document (the Word document, in this example) as a
single-threaded filter. To mark a filter as a single-threaded filter, set the ThreadingModel registry value for the
filter to Apartment Threaded. For information about single-threaded apartments, see the white paper
Understanding and Using COM Threading Models.
See Also
Server Memory Server Configuration Options
max full-text crawl range Server Configuration Option
Populate Full-Text Indexes
Create and Manage Full-Text Indexes
sys.dm_fts_memory_buffers (Transact-SQL)
sys.dm_fts_memory_pools (Transact-SQL)
Troubleshoot Full-Text Indexing
Troubleshoot Full-Text Indexing
3/24/2017 • 2 min to read • Edit Online
Troubleshoot Full-Text Indexing Failures
While populating or maintaining a full-text index, the full-text indexer, for reasons described below, might fail to
index one or more rows. These row-level errors do not prevent the population from completing. The indexer skips
these rows, which means that you are not able to query for content contained in these rows.
Indexing failures can occur when:
The indexer cannot find or load a filter or word breaker component. This failure can occur if the table row
contains a document format or content in a language that has not been registered with the instance of SQL
Server. This failure can also happen if the registered word breaker or filter component was not signed or
failed signature verification when it was being loaded.
A component, such as a word breaker or filter, fails and returns an error to the indexer. This can happen if
the document being indexed is corrupt and the filter is unable to extract text from the document. This can
also occur when a component is unable to handle the content of a single row above a certain size, due to
memory limits on the full-text filter daemon host (fdhost.exe).
For each row-level failure, the crawl log contains details on the reason for the failure. The error counts are
summarized at the end of a full or incremental population.
There are other failures that can impact the indexing process itself and prevent the population from
completing:
The full-text index exceeds the limit for the number of rows that can be contained in a full-text catalog.
A clustered index or full-text key index on the table being indexed gets altered, dropped, or rebuilt.
A hardware failure or disk corruption results in the corruption of the full-text catalog.
A file group that contains the table being full-text indexed goes offline, or is made read-only.
You should view the crawl log at the end of any significant full-text index population operation, or when you
find that a population did not complete.
Unsigned Components
By default, the full-text indexer requires the filters and word breakers that it loads to be signed. If they are not
signed, which is the case sometimes when custom components are installed, you must configure the full-text
indexer to ignore signature verification.
IMPORTANT
Ignoring signature verification makes the instance of SQL Server less secure. We recommend that you sign any components
that you implement or ensure that any components that you acquire are signed. For information about signing components,
see sp_fulltext_service (Transact-SQL).
Full-Text Index in Inconsistent State after Transaction Log Restored
When restoring the transaction log of a database, you might see a warning indicating that the full-text index is not
in a consistent state. The reason for this is that the full-text index on a table was modified after the database was
backed up. To bring the full-text index to a consistent state, you must run a full population (crawl) on the table. For
more information, see Populate Full-Text Indexes.
See Also
ALTER FULLTEXT CATALOG (Transact-SQL)
Populate Full-Text Indexes
Back Up and Restore Full-Text Catalogs and Indexes
3/24/2017 • 2 min to read • Edit Online
THIS TOPIC APPLIES TO:
SQL Server (starting with 2016)
Warehouse
Parallel Data Warehouse
Azure SQL Database
Azure SQL Data
This topic explains how to back up and restore full-text indexes created in SQL Server. In SQL Server, the full-text
catalog is a logical concept and does not reside in a filegroup. Therefore, to back up a full-text catalog in SQL
Server, you must identify every filegroup that contains a full-text index that belongs to the catalog. Then you must
back up those filegroups, one by one.
IMPORTANT
It is possible to import full-text catalogs when upgrading a SQL Server 2005 database. Each imported full-text catalog is a
database file in its own filegroup. To back up an imported catalog, simply back up its filegroup. For more information, see
Backing Up and Restoring Full-Text Catalogs, in SQL Server 2005 Books Online.
Backing Up the Full-Text Indexes of a Full-Text Catalog
Finding the Full-Text Indexes of a Full-Text Catalog
You can retrieve the properties of the full-text indexes by using the following SELECT statement, which selects
columns from the sys.fulltext_indexes and sys.fulltext_catalogs catalog views.
USE AdventureWorks2012;
GO
DECLARE @TableID int;
SET @TableID = (SELECT OBJECT_ID('AdventureWorks2012.Production.Product'));
SELECT object_name(@TableID), i.is_enabled, i.change_tracking_state,
i.has_crawl_completed, i.crawl_type, c.name as fulltext_catalog_name
FROM sys.fulltext_indexes i, sys.fulltext_catalogs c
WHERE i.fulltext_catalog_id = c.fulltext_catalog_id;
GO
Finding the Filegroup or File That Contains a Full-Text Index
When a full-text index is created, it is placed in one of the following locations:
A user-specified filegroup.
The same filegroup as base table or view, for a nonpartitioned table.
The primary filegroup, for a partitioned table.
NOTE
For information about creating a full-text index, see Create and Manage Full-Text Indexes and CREATE FULLTEXT INDEX
(Transact-SQL).
To find the filegroup of full-text index on a table or view, use the following query, where object_name is the name
of the table or view:
SELECT name FROM sys.filegroups f, sys.fulltext_indexes i
WHERE f.data_space_id = i.data_space_id
and i.object_id = object_id('object_name');
GO
Backing Up the Filegroups That Contain Full-Text Indexes
After you find the filegroups that contain the indexes of a full-text catalog, you need back up each of the filegroups.
During the backup process, full-text catalogs may not be dropped or added.
The first backup of a filegroup must be a full file backup. After you have created a full file backup for a filegroup,
you could back up only the changes in a filegroup by creating a series of one or more differential file backups that
are based on the full file backup.
To back up files and filegroups
Back Up Files and Filegroups (SQL Server)
BACKUP (Transact-SQL)
Restoring a Full-Text Index
Restoring a backed-up filegroup restores the full-text index files, as well as the other files in the filegroup. By
default, the filegroup is restored to the disk location on which the filegroup was backed up.
If a full-text indexed table was online and a population was running when the backup was created, the population is
resumed after the restore.
To restore a filegroup
Restore Files and Filegroups (SQL Server)
Restore Files and Filegroups over Existing Files (SQL Server)
Restore Files to a New Location (SQL Server)
RESTORE (Transact-SQL)
See Also
Manage and Monitor Full-Text Search for a Server Instance
Upgrade Full-Text Search
Configure and Manage Filters for Search
3/24/2017 • 1 min to read • Edit Online
Indexing documents in an varbinary, varbinary(max), image, or xml data type column requires extra
processing. This processing must be performed by a filter. The filter extracts the textual information from the
document (removing the formatting). The filter then sends the text to the word-breaker component for the
language associated with the table column.
A given filter is specific to a given document type (.doc, .pdf, .xls, .xml, and so forth). These filters implement the
IFilter interface. For more information about these document types, query the sys.fulltext_document_types
catalog view.
Binary documents can be stored in a single varbinary(max) or image column. For each document, SQL Server
chooses the correct filter based on the file extension. Because the file extension is not visible when the file is
stored in a varbinary(max) or image column, the file extension (.doc, .xls, .pdf, and so forth) must be stored in a
separate column in the table, called a type column. This type column can be of any character-based data type and
contains the document file extension, such as .doc for a Microsoft Word document. In the Document table in
Adventure Works, the Document column is of type varbinary(max), and the type column, FileExtension, is of
type nvarchar(8).
NOTE
A filter might be able to handle objects embedded in the parent object, depending on its implementation. However, SQL
Server does not configure filters to follow links to other objects.
SQL Server installs its own XML and HTML filters. In addition, any filters for Microsoft proprietary formats (.doc,
.xdoc, .ppt and so on) that are already installed on the operating system are also loaded by SQL Server. To
identify the filters that are currently loaded on an instance of SQL Server, use the
sp_help_fulltext_system_components stored procedure, as follows:
EXEC sp_help_fulltext_system_components 'filter';
Before you can use filters for non Microsoft formats, however, you must manually load them into the server
instance. For information about installing additional filters, see View or Change Registered Filters and Word
Breakers.
To view the type column in an existing full-text index
sys.fulltext_index_columns (Transact-SQL)
See Also
sys.fulltext_index_columns (Transact-SQL)
FILESTREAM Compatibility with Other SQL Server Features
Configure and Manage Word Breakers and
Stemmers for Search
3/31/2017 • 6 min to read • Edit Online
Word breakers and stemmers perform linguistic analysis on all full-text indexed data. Linguistic analysis does the
following two things:
Find word boundaries (word-breaking). The word breaker identifies individual words by determining
where word boundaries exist based on the lexical rules of the language. Each word (also known as a token)
is inserted into the full-text index using a compressed representation to reduce its size.
Conjugate verbs (stemming). The stemmer generates inflectional forms of a particular word based on
the rules of that language (for example, "running", "ran", and "runner" are various forms of the word
"run").
Word breakers and stemmers are language specific
Word breakers and stemmers are language specific, and the rules for linguistic analysis differ for different
languages. Language-specific word breakers make the resulting terms more accurate for that language.
To use the word breakers and stemmers provided for all the languages supported by SQL Server, you typically
don't have to take any action.
Where there is a word breaker for the language family, but not for the specific sub-language, the major
language is used. For example, the French word breaker is used to handle text that is French Canadian.
If no word breaker is available for a particular language, the neutral word breaker is used. With the neutral
word breaker, words are broken at neutral characters such as spaces and punctuation marks.
Get the list of supported languages
To see the list of languages supported by SQL Server Full-Text Search, use the following Transact-SQL statement.
The presence of a language in this list indicates that word breakers are registered for the language.
SELECT * FROM sys.fulltext_languages
Get the list of registered word breakers
For Full-Text Search to use the word breakers for a language, they must be registered. For registered word
breakers, associated linguistic resources - stemmers, noise words (stopwords), and thesaurus files - also become
available to full-text indexing and querying operations.
To see the list of registered word breaker components, use the following statement.
EXEC sp_help_fulltext_system_components 'wordbreaker';
GO
For additional options and more info, see sp_help_fulltext_system_components (Transact-SQL).
If you add or remove a word breaker
If you add, remove, or alter a word breaker, you need to refresh the list of Microsoft Windows locale identifiers
(LCIDs) that are supported for full-text indexing and querying. For more information, see View or Change
Registered Filters and Word Breakers.
Set the default full-text language option
For a localized version of SQL Server, SQL Server Setup sets the default full-text language option to the
language of the server if an appropriate match exists. For a non-localized version of SQL Server, the default fulltext language option is English.
When you create or alter a full-text index, you can specify a different language for each full-text indexed column.
If no language is specified for a column, the default is the value of the configuration option default full-text
language.
NOTE
All columns listed in a single full-text query function clause must use the same language, unless the LANGUAGE option is
specified in the query. The language used for the full-text indexed column being queried determines the linguistic analysis
performed on arguments of the full-text query predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and
FREETEXTTABLE).
Choose the language for an indexed column
When creating a full-text index, we recommend that you specify a language for each indexed column. If a
language is not specified for a column, the system default language is used. The language of a column
determines which word breaker and stemmer are used for indexing that column. Also, the thesaurus file of that
language will be used by full-text queries on the column.
There are a couple of things to consider when choosing the column language for creating a full-text index. These
considerations relate to how your text is tokenized and then indexed by Full-Text Engine. For more information,
see Choose a Language When Creating a Full-Text Index.
To view the word breaker language of specific columns, run the following statement.
SELECT 'language_id' AS "LCID" FROM sys.fulltext_index_columns;
For additional options and more info, see sys.fulltext_index_columns (Transact-SQL).
Troubleshoot word-breaking time-out errors
A word-breaking time-out error may occur in a variety of situations. or information about these situations and
how to respond in each situation, see MSSQLSERVER_30053.
Info about the MSSQLSERVER_30053 error
PROPERTY
VALUE
Product Name
SQL Server
Event ID
30053
Event Source
MSSQLSERVER
Component
SQLEngine
PROPERTY
VALUE
Symbolic Name
FTXT_QUERY_E_WORDBREAKINGTIMEOUT
Message Text
Word breaking timed out for the full-text query string. This
can happen if the wordbreaker took a long time to process
the full-text query string, or if a large number of queries are
running on the server. Try running the query again under a
lighter load.
Explanation
A word-breaking timeout error can occur in the following situations:
The word breaker for the query language is configured incorrectly; for example, its registry settings are
incorrect.
The word breaker malfunctions for a specific query string.
The word breaker returns too much data for a specific query string. Excess data is treated as a potential
buffer overrun attack, and shuts down the filter daemon process (fdhost.exe), which hosts the wordbreaking services.
The filter daemon process configuration is incorrect.
The most common configuration problems are password expiration or a domain policy that prevents the
filter daemon account from logging on.
A very heavy query workload is running on the server instance; for example, the word-breaker took a long
time to process the full-text query string, or a large number of queries are running on the server. Note that
this is the least likely cause.
User Action
Select the user action that is appropriate to the probable cause of the timeout, as follows:
PROBABLE CAUSE
USER ACTION
The word breaker for the query language is configured
incorrectly.
If you are using a third-party word breaker it might be
incorrectly registered with the operating system. In this case,
re-register the word breaker. For more information, see
Revert the Word Breakers Used by Search to the Previous
Version.
The word breaker malfunctions for a specific query string.
If the word breaker is supported by SQL Server, contact
Microsoft Customer Service and Support.
The word breaker returns too much data for a specific query
string.
If the word breaker is supported by SQL Server, contact
Microsoft Customer Service and Support.
The filter daemon process configuration is incorrect.
Ensure that you are using the current password and that a
domain policy is not preventing the filter daemon account
from logging on.
A very heavy query workload is running on the server
instance.
Try running the query again under a lighter load.
Understand the impact of updated word breakers
Each version of SQL Server typically includes new word breakers that have better linguistic rules and are more
accurate than earlier word breakers. Potentially, the new word breakers might behave slightly differently from
the word breakers in full-text indexes that were imported from previous versions of SQL Server.
This is significant if a full-text catalog was imported when a database was upgraded to the current version of SQL
Server. One or more languages used by the full-text indexes in the full-text catalog might now be associated with
new word breakers. For more information, see Upgrade Full-Text Search.
See Also
CREATE FULLTEXT INDEX (Transact-SQL)
ALTER FULLTEXT INDEX (Transact-SQL)
Configure and Manage Stopwords and Stoplists for Full-Text Search
View or Change Registered Filters and Word
Breakers
3/24/2017 • 1 min to read • Edit Online
After any word breakers or filters are installed or uninstalled on a system, the changes do not automatically take
effect on server instances. This topic describes how to view the currently registered word breaker or filters and
how to register newly installed word breakers and filters on an instance of SQL Server.
To view a list of languages whose word breakers are currently registered
1. Use the sys.fulltext_languages catalog view, as follows:
SELECT * FROM sys.fulltext_languages;
To view a list of the filters that are currently registered
1. Use the sp_help_fulltext_system_components system stored procedure, as follows:
EXEC sp_help_fulltext_system_components 'filter';
To register newly installed word breakers and filters
1. Use the sp_fulltext_service system stored procedure to update the list of languages, as follows:
exec sp_fulltext_service 'update_languages';
To unregister uninstalled word breakers and filters
1. Use the sp_fulltext_service to update the list of languages, as follows:
exec sp_fulltext_service 'update_languages'
2. Use the sp_fulltext_service to restart the filter daemon host processes (fdhost.exe), as follows:
exec sp_fulltext_service 'restart_all_fdhosts';
To replace existing word breakers or filters when installing new ones
1. When preparing to install a DLL file that contains new word breakers or filters, verify that it has a different
filename from any of the existing DLL files installed on your server instance.
2. Copy the new DLL file into the directory containing the standard SQL Server DLL files for the server
instance. The default location is:
C:\Program Files\Microsoft SQL Server\MSSQL.instance_name\MSSQL\Binn
IMPORTANT
We highly recommend that you load only signed and verified components. Also, we recommend that you run the
FDHOST Launcher (MSSQLFDLauncher) Service with the least possible privileges.
3. Install the new word breaker or filters.
To install and load Microsoft Filter Pack IFilters
How to register Microsoft Filter Pack IFilters with SQL Server
4. Use sp_fulltext_service to load newly installed word breakers and filters in the server instance, as follows:
EXEC sp_fulltext_service @action='load_os_resources', @value=1;
5. Use sp_fulltext_service to update the list of languages, as follows:
EXEC sp_fulltext_service 'update_languages';
6. Restart the filter daemon host processes (fdhost.exe), using sp_fulltext_service as follows:
EXEC sp_fulltext_service 'restart_all_fdhosts';
See Also
Set the Service Account for the Full-text Filter Daemon Launcher
Configure and Manage Filters for Search
Configure and Manage Word Breakers and Stemmers for Search
Change the Word Breaker Used for US English and
UK English
3/24/2017 • 3 min to read • Edit Online
SQL Server 2016 installs a new version (version 14.0.4999.1038) of the word breaker and stemmer for the English
language, replacing the previous version of these components (version 12.0.6828.0). For information about the
changed behavior of the new components, see Behavior Changes to Full-Text Search. This topic describes how to
switch from the new version of these components to the previous version, or to switch back from the previous
version to the new version. For cluster installations, these changes should be made on all the primary and passive
nodes.
Previous versions of SQL Server used different word breakers represented by different CLSIDs for US English (LCID
1033) and UK English (LCID 2057). In this release, both LCIDs use the same components with the same CLSIDs, as
shown in the following table:
WORD BREAKER
INSTALLED BY
PREVIOUS VERSIONS
WORD BREAKER
INSTALLED BY THIS
VERSION
LCID
VERSION 12.0.6828.0
STEMMER INSTALLED
BY PREVIOUS
VERSIONS
1033
(US English)
188D6CC5-CB034C01-912E47D21295D77E
EEED4C20-7F1B11CE-BE5700AA0051FE20
9faed859-0b304434-ae65412e14a16fb8
e1e5ef84-c4a6-4e508188-99aef3de2659
2057
(UK English)
173C97E2-AEBE437C-944501B237ABF2F6
D99F7670-7F1A11CE-BE5700AA0051FE20
9faed859-0b304434-ae65412e14a16fb8
e1e5ef84-c4a6-4e508188-99aef3de2659
VERSION
14.0.4999.1038
STEMMER INSTALLED
BY THIS VERSION
The components described in this topic are DLL files that are installed in the MSSQL\Binn folder for the SQL Server
instance. The full path is typically C:\Program Files\Microsoft SQL Server\<instance>\MSSQL\Binn .
For more information about word breakers and stemmers, see Configure and Manage Word Breakers and
Stemmers for Search.
Switching from the current English word breaker to the previous
English word breakers
To switch from the current version of the US English word breaker to the previous version
1. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID.
2. Use the following steps to add new keyS for the COM ClassIDs for the previous US English word breaker
and stemmer interfaces for LCID 1033:
a. Add a new key with the value {188D6CC5-CB03-4C01-912E-47D21295D77E} for the previous word
breaker.
b. Update the (Default) data of that key value to langwrbk.dll.
c. Add a new key with the value {EEED4C20-7F1B-11CE-BE57-00AA0051FE20} for the previous
stemmer.
d. Update the (Default) data of that key value to infosoft.dll.
3. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\enu.
4. Update the WBreakerClass key value to {188D6CC5-CB03-4C01-912E-47D21295D77E}.
5. Update the StemmerClass key value to {EEED4C20-7F1B-11CE-BE57-00AA0051FE20}.
6. Restart SQL Server.
To switch from the current version of the UK English word breaker to the previous version
1. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID.
2. Use the following steps to add a new key for the COM ClassIDs for the previous UK English word breaker
and stemmer interfaces for LCID 2057:
a. Add a new key with the value {173C97E2-AEBE-437C-9445-01B237ABF2F6} for the previous word
breaker.
b. Update the (Default) data of that key value to langwrbk.dll.
c. Add a new key with the value {D99F7670-7F1A-11CE-BE57-00AA0051FE20} for the previous
stemmer.
d. Update the (Default) data of that key value to infosoft.dll.
3. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\eng.
4. Update the WBreakerClass key value to {173C97E2-AEBE-437C-9445-01B237ABF2F6}.
5. Update the StemmerClass key value to {D99F7670-7F1A-11CE-BE57-00AA0051FE20}.
6. Restart SQL Server.
Switching back from the previous English word breakers to the current
English word breaker
To switch back from the previous version of the US English word breaker to the current version
1. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID.
2. If the following keys do not exist, then use the following steps to add a new key for the COM ClassIDs for the
current US English word breaker and stemmer interfaces for LCID 1033:
a. Add a new key with the value {9faed859-0b30-4434-ae65-412e14a16fb8} for the current word
breaker.
b. Update the (Default) data of that key value to MsWb7.dll.
c. Add a new key with the value {e1e5ef84-c4a6-4e50-8188-99aef3de2659} for the current stemmer.
d. Update the (Default) data of that key value to MsWb7.dll.
3. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\eng.
4. Update the WBreakerClass key value to {9faed859-0b30-4434-ae65-412e14a16fb8}.
5. Update the StemmerClass key value to {e1e5ef84-c4a6-4e50-8188-99aef3de2659}.
6. Restart SQL Server.
To switch back from the previous version of the UK English word breaker to the current version
1. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\CLSID.
2. If the following keys do not exist, then use the following steps to add a new key for the COM ClassIDs for the
current UK English word breaker and stemmer interfaces for LCID 2057:
a. Add a new key with the value {9faed859-0b30-4434-ae65-412e14a16fb8} for the current word
breaker.
b. Update the (Default) data of that key value to MsWb7.dll.
c. Add a new key with the value {e1e5ef84-c4a6-4e50-8188-99aef3de2659} for the current stemmer.
d. Update the (Default) data of that key value to MsWb7.dll.
3. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\\MSSearch\Language\eng.
4. Update the WBreakerClass key value to {9faed859-0b30-4434-ae65-412e14a16fb8}.
5. Update the StemmerClass key value to {e1e5ef84-c4a6-4e50-8188-99aef3de2659}.
6. Restart SQL Server.
See Also
Revert the Word Breakers Used by Search to the Previous Version
Behavior Changes to Full-Text Search
Revert the Word Breakers Used by Search to the
Previous Version
3/30/2017 • 12 min to read • Edit Online
SQL Server 2016 installs and enables a version of the word breakers and stemmers for all languages supported by
Full-Text Search with the exception of Korean. This topic describes how to switch from this version of these
components to the previous version, or to switch back from the previous version to the new version.
This topic does not discuss the following languages:
English. To revert or restore the English components, see Change the Word Breaker Used for US English
and UK English.
Danish, Polish, and Turkish. The third-party word breakers for Danish, Polish, and Turkish that were
included with previous releases of SQL Server have been replaced with Microsoft components.
Czech and Greek. There are new word breakers for Czech and Greek. Previous releases of SQL Server FullText Search did not include support for these two languages.
Korean. The word breaker and stemmer for the Korean language are not upgraded in this release.
For general information about word breakers and stemmers, see Configure and Manage Word Breakers and
Stemmers for Search.
Overview of reverting and restoring word breakers and stemmers
The instructions for reverting and restoring word breakers and stemmers depend on the language. The following
table summarizes the 3 sets of actions that may be required to revert to the previous version of the components.
CURRENT FILE
PREVIOUS FILE
NUMBER OF AFFECTED
LANGUAGES
NaturalLanguage6.dll
NaturalLanguage6.dll
34
ACTION FOR FILES
ACTION FOR REGISTRY
ENTRIES
Obtain and install a
previous version of
NaturalLanguage6.dll,
overwriting the
current version of the
file.
No action required.
The registry keys and
values have not
changed for this
release.
(Other file name)
NaturalLanguage6.dll
5
Obtain and install a
previous version of
NaturalLanguage6.dll,
overwriting the
current version of the
file.
Change a set of
registry entries to
specify the previous
version of the
components.
(Other file name)
(Other file name)
6
No action required.
Change a set of
registry entries to
specify the previous
version of the
components.
SQL Server 2016
setup copies both the
current and the
previous versions of
the components to
the Binn folder.
WARNING
If you replace the current version of the file NaturalLanguage6.dll with a different version, then the behavior of all the
languages that use this file is affected.
The files described in this topic are DLL files that are installed in the
The full path is typically the following path:
MSSQL\Binn
folder for the SQL Server instance.
C:\Program Files\Microsoft SQL Server\<instance>\MSSQL\Binn
Languages for which the file name of both the current and previous
word breaker is NaturalLanguage6.dll
For the languages in the following table, the file name of both the current and previous word breaker is
NaturalLanguage6.dll. To revert or restore these components, you have to overwrite NaturalLanguage6.dll with a
different version of the same file. You do not have to change any registry entries, because the registry entries have
not changed for this release.
WARNING
If you replace the current version of the file NaturalLanguage6.dll with a different version, then the behavior of all the
languages that use this file is affected.
List of affected languages
LANGUAGE
ABBREVIATION
USED IN THE
REGISTRY
LCID
Bengali
ben
1093
Bulgarian
bgr
1026
Catalan
cat
1027
Spanish
esn
3082
French
fra
1036
Gujarati
guj
1095
Hebrew
heb
1037
Hindi
hin
1081
Croatian
hrv
1050
Indonesian
ind
1057
Icelandic
isl
1039
Italian
ita
1040
LANGUAGE
ABBREVIATION
USED IN THE
REGISTRY
LCID
Kannada
kan
1099
Lithuanian
lth
1063
Latvian
lvi
1062
Malayalam
mal
1100
Marathi
mar
1102
Malay
msl
1086
Neutral
Neutral
0000
Norwegial Bokmaal
nor
1044
Punjabi
pan
1094
Brazilian Portuguese
ptb
1046
Portuguese
ptg
2070
Romanian
rom
1048
Slovak
sky
1051
Slovenian
slv
1060
Serbian - Cyrillic
srb
3098
Serbian - Latin
srl
2074
Swedish
sve
1053
Tamil
tam
1097
Telugu
tel
1098
Ukrainian
ukr
1058
Urdu
urd
1056
Vietnamese
vit
1066
The preceding table is sorted alphabetically on the Abbreviation column.
To revert to the previous components
1. Navigate to the Binn folder described above.
2. Back up the SQL Server 2016 version of NaturalLanguage6.dll to another location.
3. Copy the previous version of NaturalLanguage6.dll from the Binn folder of an instance of SQL Server 2008
R2 or SQL Server 2008 into the Binn folder of the SQL Server 2016 instance.
WARNING
This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version.
4. Restart SQL Server.
To restore the current components
1. Navigate to the location where you backed up the SQL Server 2016 version of NaturalLanguage6.dll.
2. Copy the current version of NaturalLanguage6.dll from the backup location into the Binn folder of the SQL
Server 2016 instance.
WARNING
This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version.
3. Restart SQL Server.
Languages for which the file name of the previous word breaker only is
NaturalLanguage6.dll
For the languages in the following table, the file name of the previous word breaker is different from the file name
of the new version. The previous file name is NaturalLanguage6.dll. To revert to the previous version, you have to
overwrite the current version of NaturalLanguage6.dll with an earlier version of the same file. You also have to
change a set of registry entries to specify the previous or current version of the components.
WARNING
If you replace the current version of the file NaturalLanguage6.dll with a different version, then the behavior of all the
languages that use this file is affected.
List of affected languages
LANGUAGE
ABBREVIATION
USED IN THE
REGISTRY
LCID
Arabic
ara
1025
German
deu
1031
Japanese
jpn
1041
Dutch
nld
1043
Russian
rus
1049
The preceding table is sorted alphabetically on the Abbreviation column.
Use the following instructions together with the list of values in the section File names and registry values for
reverting and restoring word breakers and stemmers.
To revert to the previous components
1. Navigate to the Binn folder described above.
2. Do not remove the files for the current version of the components from the Binn folder.
3. Back up the SQL Server 2016 version of NaturalLanguage6.dll to another location.
4. Copy the previous version of NaturalLanguage6.dll from the Binn folder of an instance of SQL Server 2008
R2 or SQL Server 2008 into the Binn folder of the SQL Server 2016 instance.
WARNING
This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version.
5. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\CLSID.
6. Use the following steps to add new keys for the COM ClassIDs for the previous word breaker and stemmer
interfaces for the selected language:
a. Add a new key with the value from the table for the previous word breaker.
b. Update the (Default) data of that key value to the file name of the previous word breaker from the
table.
c. If the selected language uses a stemmer, then add a new key with the value from the table for the
previous stemmer.
d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file
name of the previous stemmer from the table.
7. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the
language that is used in the registry; for example, "fra" for French and "esn" for Spanish.
8. Update the WBreakerClass key value to the value from the table for the current word breaker.
9. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the
table for the current stemmer.
10. Restart SQL Server.
To restore the current components
1. Navigate to the location where you backed up the SQL Server 2016 version of NaturalLanguage6.dll.
2. Copy the current version of NaturalLanguage6.dll from the backup location into the Binn folder of the SQL
Server 2016 instance.
WARNING
This change affects all the languages that use NaturalLanguage6.dll in both the current and previous version.
3. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\CLSID.
4. If the following keys do not exist, then use the following steps to add new keys for the COM ClassIDs for the
current word breaker and stemmer interfaces for the selected language:
a. Add a new key with the value from the table for the current word breaker.
b. Update the (Default) data of that key value to the file name of the current word breaker from the
table.
c. If the selected language uses a stemmer, then add a new key with the value from the table for the
current stemmer.
d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file
name of the current stemmer from the table.
5. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the
language that is used in the registry; for example, "fra" for French and "esn" for Spanish.
6. Update the WBreakerClass key value to the value from the table for the previous word breaker.
7. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the
table for the previous stemmer.
8. Restart SQL Server.
File names and registry values for reverting and restoring word breakers and stemmers
Use the following list of file names and registry entries together with the instructions in the preceding section. Use
the previous values to revert to the previous version, or use the current values to restore the current version of the
components.
The following listed is sorted alphabetically on the abbreviation used for each language.
Arabic (ara), LCID 1025
COMPONENT
WORD BREAKER
STEMMER
Previous CLSID
7EFD3C7E-9E4B-4a93-9503DECD74C0AC6D
483B0283-25DB-4c92-9C15A65925CB95CE
Previous file name
NaturalLanguage6.dll
NaturalLanguage6.dll
Current CLSID
04b37e30-c9a9-4a7d-8f20792fc87ddf71
None
Current file name
MSWB7.dll
None
COMPONENT
WORD BREAKER
STEMMER
Previous CLSID
45EACA36-DBE9-4e4a-A26D5C201902346D
65170AE4-0AD2-4fa5-B3BA7CD73E2DA825
Previous file name
NaturalLanguage6.dll
NaturalLanguage6.dll
German (deu), LCID 1031
COMPONENT
WORD BREAKER
STEMMER
Current CLSID
dfa00c33-bf19-482e-a7913c785b0149b4
8a474d89-6e2f-419c-8dd59b50edc8c787
Current file name
MsWb7.dll
MsWb7.dll
COMPONENT
WORD BREAKER
STEMMER
Previous CLSID
E1E8F15E-8BEC-45df-83BF50FF84D0CAB5
3D5DF14F-649F-4cbc-853DF18FEDE9CF5D
Previous file name
NaturalLanguage6.dll
NaturalLanguage6.dll
Current CLSID
04096682-6ece-4e9e-90c152d81f0422ed
None
Current file name
MsWb70011.dll
None
COMPONENT
WORD BREAKER
STEMMER
Previous CLSID
2C9F6BEB-C5B0-42b6-A5EE84C24DC0D8EF
F7A465EE-13FB-409a-B878195B420433AF
Previous file name
NaturalLanguage6.dll
NaturalLanguage6.dll
Current CLSID
69483c30-a9af-4552-8f84a0796ad5285b
CF923CB5-1187-43ab-B0533E44BED65FFA
Current file name
MsWb7.dll
MsWb7.dll
COMPONENT
WORD BREAKER
STEMMER
Previous CLSID
2CB6CDA4-1C14-4392-A8EC81EEF1F2E079
E06A0DDD-E81A-4e93-8A8DF386C3A1B670
Previous file name
NaturalLanguage6.dll
NaturalLanguage6.dll
Current CLSID
aaa3d3bd-6de7-4317-91a0d25e7d3babc3
d42c8b70-adeb-4b81-a52fc09f24f77dfa
Current file name
MsWb7.dll
MsWb7.dll
Japanese (jpn), LCID 1041
Dutch (nld), LCID 1043
Russian (rus), LCID 1049
Languages for which neither the previous nor the current file name is
NaturalLanguage6.dll
For the languages in the following table, the file names of the previous word breakers and stemmers are different
from the file names of the new versions. Neither the previous nor the current file name is NaturalLanguage6.dll.
You do not have to replace any files, because SQL Server 2016 setup copies both the current and the previous
versions of the components to the Binn folder. However you have to change a set of registry entries to specify the
previous or current version of the components.
List of affected languages
LANGUAGE
ABBREVIATION
USED IN THE
REGISTRY
LCID
Simplified Chinese
chs
2052
Traditional Chinese
cht
1028
Thai
tha
1054
Chinese Traditional
zh-hk
3076
Chinese Traditional
zh-mo
5124
Chinese Simplified
zh-sg
4100
The preceding table is sorted alphabetically on the Abbreviation column.
Use the following instructions together with the list of values in the section File names and registry values for
reverting and restoring word breakers and stemmers.
To revert to the previous components
1. Do not remove the files for the current version of the components from the Binn folder.
2. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\CLSID.
3. Use the following steps to add new keys for the COM ClassIDs for the previous word breaker and stemmer
interfaces for the selected language:
a. Add a new key with the value from the table for the previous word breaker.
b. Update the (Default) data of that key value to the file name of the previous word breaker from the
table.
c. If the selected language uses a stemmer, then add a new key with the value from the table for the
previous stemmer.
d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file
name of the previous stemmer from the table.
4. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the
language that is used in the registry; for example, "fra" for French and "esn" for Spanish.
5. Update the WBreakerClass key value to the value from the table for the current word breaker.
6. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the
table for the current stemmer.
7. Restart SQL Server.
To restore the previous components
1. Do not remove the files for the previous version of the components from the Binn folder.
2. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\CLSID.
3. If the following keys do not exist, then use the following steps to add new keys for the COM ClassIDs for the
current word breaker and stemmer interfaces for the selected language:
a. Add a new key with the value from the table for the current word breaker.
b. Update the (Default) data of that key value to the file name of the current word breaker from the
table.
c. If the selected language uses a stemmer, then add a new key with the value from the table for the
current stemmer.
d. If the selected language uses a stemmer, then update the (Default) data of that key value to the file
name of the current stemmer from the table.
4. In the registry, navigate to the following node:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL
Server<InstanceRoot>\MSSearch\Language<language_key>. represents the abbreviation for the
language that is used in the registry; for example, "fra" for French and "esn" for Spanish.
5. Update the WBreakerClass key value to the value from the table for the previous word breaker.
6. If the selected language uses a stemmer, then update the StemmerClass key value to the value from the
table for the previous stemmer.
7. Restart SQL Server.
File names and registry values for reverting and restoring word breakers and stemmers
Use the following list of file names and registry entries together with the instructions in the preceding section. Use
the previous values to revert to the previous version, or use the current values to restore the current version of the
components.
The following listed is sorted alphabetically on the abbreviation used for each language.
Simplified Chinese (chs), LCID 2052
COMPONENT
WORD BREAKER
Previous CLSID
12CE94A0-DEFB-11D2-B31D-00600893A857
Previous file name
chsbrkr.dll
Current CLSID
E0831C90-BAB0-4ca5-B9BD-EA254B538DAC
Current file name
MsWb70804.dll
Traditional Chinese (cht), LCID 1028
COMPONENT
WORD BREAKER
Previous CLSID
1680E7C3-9430-4A51-9B82-1E7E7AEE5258
COMPONENT
WORD BREAKER
Previous file name
chtbrkr.dll
Current CLSID
E9B1DF65-08F1-438b-8277-EF462B23A792
Current file name
MsWb70404.dll
Thai (tha), LCID 1054
COMPONENT
WORD BREAKER
STEMMER
Previous CLSID
CCA22CF4-59FE-11D1-BBFF00C04FB97FDA
CEDC01C7-59FE-11D1-BBFF00C04FB97FDA
Previous file name
Thawbrkr.dll
Thawbrkr.dll
Current CLSID
F70C0935-6E9F-4ef1-9F067876536DB900
None
Current file name
MsWb7001e.dll
None
Chinese Traditional (zh-hk), LCID 3076
COMPONENT
WORD BREAKER
Previous CLSID
1680E7C3-9430-4A51-9B82-1E7E7AEE5258
Previous file name
chtbrkr.dll
Current CLSID
E9B1DF65-08F1-438b-8277-EF462B23A792
Current file name
MsWb70404.dll
Chinese Traditional (zh-mo), LCID 5124
COMPONENT
WORD BREAKER
Previous CLSID
1680E7C3-9430-4A51-9B82-1E7E7AEE5258
Previous file name
chtbrkr.dll
Current CLSID
E9B1DF65-08F1-438b-8277-EF462B23A792
Current file name
MsWb70404.dll
Chinese Simplified (zh-sg), LCID 4100
COMPONENT
WORD BREAKER
Previous CLSID
12CE94A0-DEFB-11D2-B31D-00600893A857
Previous file name
chsbrkr.dll
COMPONENT
WORD BREAKER
Current CLSID
E0831C90-BAB0-4ca5-B9BD-EA254B538DAC
Current file name
MsWb70804.dll
See Also
Change the Word Breaker Used for US English and UK English
Behavior Changes to Full-Text Search
Customize the Behavior of Word Breakers with a
Custom Dictionary
3/24/2017 • 1 min to read • Edit Online
You can customize the behavior of the word breaker for a particular language by creating a language-specific
custom dictionary file. For example, you can prevent the word breaker from breaking certain terms or patterns.
For more information, see the following SharePoint article:
Create a custom dictionary (SharePoint Server 2010)
For SQL Server, place custom dictionary files in the following folder:
C:\Program Files\Microsoft SQL Server\<instance name>\MSSQL\Binn
After creating or changing custom dictionary files, restart the SQL Full-text Daemon Launcher with the following
command:
exec sp_fulltext_service 'restart_all_fdhosts'
Configure and Manage Stopwords and Stoplists for
Full-Text Search
3/24/2017 • 3 min to read • Edit Online
To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly
occurring strings that do not help the search. These discarded strings are called stopwords. During index creation,
the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on
stopwords.
Stopwords. A stopword can be a word with meaning in a specific language. For example, in the English language,
words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a
search. A stopword can also be a token that does not have linguistic meaning.
Stoplists. Stopwords are managed in databases using objects called stoplists. A stoplist is a list of stopwords that,
when associated with a full-text index, is applied to full-text queries on that index.
Use an existing stoplist
You can use an existsing stoplist in the following ways:
Use the system-supplied stoplist in the database. SQL Server ships with a system stoplist that contains the
most commonly used stopwords for each supported language, that is for every language associated with
given word breakers by default. You can copy the system stoplist and customize your copy by adding and
removing stopwords.
The system stoplist is installed in the Resource database.
Use an existing custom stoplist from another database in the current server instance, then add or drop
stopwords as appropriate.
Create a new stoplist
Create a new stoplist with Transact-SQL
Use CREATE FULLTEXT STOPLIST.
Create a new stoplist with Management Studio
1. In Object Explorer, expand the server.
2. Expand Databases, and then expand the database in which you want to create the full-text stoplist.
3. Expand Storage, and then right-click Full-Text Stoplists.
4. Select New Full-Text Stoplist.
5. Enter your new stoplist's name.
6. Optionally, specify someone else as the stoplist owner.
7. Select one of the following create stoplist options:
Create an empty stoplist
Create from the system stoplist
Create from an existing full-text stoplist
For more information, see New Full-Text Stoplist (General Page).
8. Click OK.
Use a stoplist in full-text queries
To use a stoplist in queries, you must associate it with a full-text index. You can attach a stoplist to a full-text index
when you create the index, or you can alter the index later to add a stoplist.
Create a full-text index and associate a stoplist with it
Use CREATE FULLTEXT INDEX (Transact-SQL).
Associate or disassociate a stoplist with an existing full-text index
Use ALTER FULLTEXT INDEX (Transact-SQL).
Change the stopwords in a stoplist
Add or drop stopwords from a stoplist with Transact-SQL
Use ALTER FULLTEXT STOPLIST (Transact-SQL).
Add or drop stopwords from a stoplist with Management Studio
1. In Object Explorer, expand the server.
2. Expand Databases, and then expand the database.
3. Expand Storage, and then select Full Text Stoplists.
4. Right-click the stoplist whose properties you want to change, and select Properties.
5. In the Full-Text Stoplist Properties dialog box:
a. In the Action list box, select one of the following actions: Add stopword, Delete stopword,
Delete all stopwords, or Clear stoplist.
b. If the Stopword text box is enabled for the selected action, enter a single stopword. This stopword
must be unique; that is, not yet in this stoplist for the language that you select.
c. If the Full-text language list box is enabled for the selected action, select a language.
6. Click OK.
Manage stoplists and their usage
View all the stopwords in a stoplist
Use sys.fulltext_stopwords (Transact-SQL).
Get info about all the stoplists in the current database
Use sys.fulltext_stoplists (Transact-SQL) and sys.fulltext_stopwords (Transact-SQL).
View the tokenization result of a word breaker, thesaurus, and stoplist combination
Use sys.dm_fts_parser (Transact-SQL).
Suppress an error message if stopwords cause a Boolean operation on a full-text query to fail
Use the transform noise words Server Configuration Option.
More info about stopword position
Although it ignores the inclusion of stopwords, the full-text index does take into account their position. For
example, consider the phrase, "Instructions are applicable to these Adventure Works Cycles models". The
following table depicts the position of the words in the phrase:
WORD
POSITION
Instructions
1
are
2
applicable
3
to
4
these
5
Adventure
6
Works
7
Cycles
8
models
9
The stopwords "are", "to", and "these" that are in positions 2, 4, and 5 are left out of the full-text index. However,
their positional information is maintained, thereby leaving the position of the other words in the phrase
unaffected.
Upgrade noise words from SQL Server 2005
SQL Server 2005 noise words have been replaced by stopwords. When a database is upgraded from SQL Server
2005, the noise-word files are no longer used. However, the noise-word files are stored in the FTDATA\
FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding stoplists.
For information about upgrading noise-word files to stoplists, see Upgrade Full-Text Search.
Configure and Manage Thesaurus Files for Full-Text
Search
4/7/2017 • 7 min to read • Edit Online
SQL Server Full-Text Search queries can search for synonyms of user-specified terms through the use of a Full-Text
Search thesaurus. Each thesaurus defines a set of synonyms for a specific language. By developing a thesaurus tailored
to your full-text data, you can effectively broaden the scope of full-text queries on that data.
Thesaurus matching occurs for all FREETEXT and FREETEXTABLE queries and for any CONTAINS and CONTAINSTABLE
queries that specify the FORMSOF THESAURUS clause.
A Full-Text Search thesaurus is an XML text file.
What's in a thesaurus
Before full-text search queries can look for synonyms in a given language, you have to define thesaurus mappings (that
is, synonyms) for that language. Each thesaurus must be manually configured to define the following:
Expansion set
An expansion set contains a group of synonyms such as "writer", "author", and "journalist" that are substituted
for one another by a full-text query. Queries that contain a match for any synonym in an expansion set are
expanded to include every other synonym in the expansion set.
For more information, see XML Structure of an Expansion Set later in this topic.
Replacement set
A replacement set contains a text pattern to be replaced by a substitution set. For an example, see the section
XML Structure of a Replacement Set later in this topic.
Diacritics setting
For a given thesaurus, all search patterns are either sensitive or insensitive to diacritical marks such as a tilde (~),
acute accent mark (´), or umlaut (¨) (that is, accent sensitive or accent insensitive). For example, suppose you
specify the pattern "café" to be replaced by other patterns in a full-text query. If the thesaurus is accentinsensitive, full-text search replaces the patterns "café" and "cafe". If the thesaurus is accent-sensitive, full-text
search replaces only the pattern "café". By default, a thesaurus is accent-insensitive.
Default thesaurus files
SQL Server provides a set of XML thesaurus files, one for each supported language. These files are essentially empty.
They contain only the top-level XML structure that is common to all SQL Server thesauruses and a commented-out
sample thesaurus.
Location of thesaurus files
The default location of the thesaurus files is:
<SQL_Server_data_files_path>\MSSQL13.MSSQLSERVER\MSSQL\FTDATA\
This default location contains the following files:
Language-specific thesaurus files
Setup installs empty thesaurus files in the above location. A separate file is provided for each supported
language. A system administrator can customize these files.
The default file names of the thesaurus files use following format:
'ts' + <three-letter language-abbreviation> + '.xml'
The name of the thesaurus file for a given language is specified in the registry in the following value:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\<instance-name>\MSSearch\<language-abbrev>
The global thesaurus file
An empty global thesaurus file, tsGlobal.xml.
Change the location of a thesaurus file
You can change the location and names of a thesaurus file by changing its registry key. For each language, the location
of the thesaurus file is specified in the following value in the registry:
HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\<instance name>\MSSearch\Language\<language-abbreviation>\TsaurusFile
The global thesaurus file corresponds to the Neutral language with LCID 0. This value can be changed by administrators
only.
How full-text queries use the thesaurus
A thesaurus query uses both a language-specific thesaurus and the global thesaurus.
1. First, the query looks up the language-specific file and loads it for processing (unless it is already loaded). The query
is expanded to include the language-specific synonyms specified by the expansion set and replacement set rules in
the thesaurus file.
2. These steps are then repeated for the global thesaurus. However, if a term is already part of a match in the language
specific thesaurus file, the term is ineligible for matching in the global thesaurus.
Structure of a thesaurus file
Each thesaurus file defines an XML container whose ID is Microsoft Search Thesaurus , and a comment, <!-- … --> ,
that contains a sample thesaurus. The thesaurus is defined in a <thesaurus> element that contains samples of the child
elements that define the diacritics setting, expansion sets, and replacement sets.
A typical empty thesaurus file contains the following XML text:
<XML ID="Microsoft Search Thesaurus">
<!-- Commented out
<thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics_sensitive>0</diacritics_sensitive>
<expansion>
<sub>Internet Explorer</sub>
<sub>IE</sub>
<sub>IE5</sub>
</expansion>
<replacement>
<pat>NT5</pat>
<pat>W2K</pat>
<sub>Windows 2012</sub>
</replacement>
<expansion>
<sub>run</sub>
<sub>jog</sub>
</expansion>
</thesaurus>
-->
</XML>
XML structure of an expansion set
Each expansion set is enclosed within an <expansion> element. Within this element, you specify one or more
substitutions in a <sub> element. In the expansion set, you can specify a group of substitutions that are synonyms of
each other.
For example, you can edit the expansion section to treat the substitutions "writer", "author", and "journalist" as
synonyms. full-text search queries that contain matches in one substitution are expanded to include all other
substitutions specified in the expansion set. Therefore, in the preceding example, when you issue a FORMS OF
THESAURUS or a FREETEXT query for the word "author", full-text search also returns search results containing the
words "writer" and "journalist".
This is what the expansion set section would look like for the above example:
<expansion>
<sub>writer</sub>
<sub>author</sub>
<sub>journalist</sub>
</expansion>
XML structure of a replacement set
Each replacement set is enclosed within a <replacement> element. Within this element you can specify one or more
patterns in a <pat> element and zero or more substitutions in <sub> elements, one per synonym. You can specify a
pattern to be replaced by a substitution set. Patterns and substitutions can contain a word, or a sequence of words. If
there is no substitution specified for a pattern, it has the effect of removing the pattern from the user query.
For example, suppose you want queries for "Win8", the pattern, to be replaced by "Windows Server 2012" or "Windows
8.0", the substitutions. If you run a full-text query for "Win8", full-text search only returns search results containing
"Windows Server 2012" or "Windows 8.0". It does not return results containing "Win8". This is because the pattern
"Win8" has been "replaced" by the patterns "Windows Server 2012" and "Windows 8.0".
This is what the replacement set section would look like for the above example:
<replacement>
<pat>Win8</pat>
<sub>Windows Server 2012</sub>
<sub>Windows 8.0</sub>
</replacement>
If you have two replacement sets with similar patterns being matched, the longer of the two takes precedence. For
example, if you run a FORMS OF THESAURUS query for "Internet Explorer online community" and you have the
following replacement sets, the "Internet Explorer" replacement set takes precedence over the "Internet" replacement
set. The query will therefore be processed as "IE online community" or "IE 9 online community".
<replacement>
<pat>Internet</pat>
<sub>intranet</sub>
</replacement>
and
<replacement>
<pat>Internet Explorer</pat>
<sub>IE</sub>
<sub>IE 9</sub>
</replacement>
XML structure of the diacritics setting
The diacritics setting of a thesaurus is specified in a single
integer value that controls accent sensitivity, as follows:
<diacritics_sensitive>
element. This element contains an
DIACRITICS SETTING
VALUE
XML
Accent insensitive
0
<diacritics_sensitive>0</diacritics_sensitive>
Accent sensitive
1
<diacritics_sensitive>1</diacritics_sensitive>
NOTE
This setting can only be applied one time in the file, and it applies to all search patterns in the file. This setting cannot be specified
for individual patterns.
Edit a thesaurus file
You can configure the thesaurus for a given language by editing its thesaurus file (an XML file). During setup, empty
thesaurus files that contain only the <xml> container and a commented-out sample <thesaurus > element are installed.
In order for full-text search queries that look for synonyms to work properly, you have to create an actual <thesaurus >
element that defines a set of synonyms. You can define two forms of synonyms, expansion sets and replacement sets.
Edit a thesaurus file
1. Open the thesaurus file in Notepad or another text editor.
2. If you are editing the thesaurus file for the first time, remove the following comment lines at the beginning and
end of the file, respectively:
<!--Commented out
-->
3. Add, modify, or delete a replacement set or an expansion set.
4. Save the file and close Notepad.
5. Use sp_fulltext_load_thesaurus_file to load the content of the thesaurus file into tempdb, specifying the local
identifier (LCID) that corresponds to the language of the thesaurus file. For example, for the English thesaurus
file, tsenu.xml, the corresponding LCID is 1033.
USE AdventureWorks;
EXEC sys.sp_fulltext_load_thesaurus_file 1033;
GO
Recommendations for editing thesaurus files
We recommend that entries in the thesaurus file contain no special characters. This is because word breakers have
subtle behaviors with respect to special characters. If a thesaurus entry contains any special characters, word breakers
used in combination with that entry can have subtle behavioral implications for a full-text query.
We recommend that <sub> entries contain no stopwords since stopwords are omitted from the full-text index. Queries
are expanded to include the <sub> entries from a thesaurus file, and if a <sub> entry contains stopwords, query size
increases unnecessarily.
Restrictions for editing thesaurus files
The following restrictions apply to editing a thesaurus file:
Only system administrators can update, modify, or delete thesaurus files.
When editing thesaurus files using text editor tools, the files must be saved in Unicode format, and Byte Order
Marks must be specified.
Thesaurus entries cannot be empty or word break to an empty string.
Phrases in the thesaurus file must be no longer than 512 characters.
A thesaurus must not contain any duplicate entries among the
elements of replacement sets.
See Also
CONTAINS (Transact-SQL)
CONTAINSTABLE (Transact-SQL)
FREETEXT (Transact-SQL)
FREETEXTTABLE (Transact-SQL)
sp_fulltext_load_thesaurus_file (Transact-SQL)
sys.dm_fts_parser (Transact-SQL)
<sub>
entries of expansion sets and the
<pat>
Manage and Monitor Full-Text Search for a Server
Instance
3/24/2017 • 3 min to read • Edit Online
Full-text administration for a server instance includes:
System management tasks such as managing the FDHOST Launcher service (MSSQLFDLauncher), restarting
filter daemon host process if you change the service account credentials, configuring server-wide full-text
properties, and backing up full-text catalogs. At the server level, for example, you can specify a default fulltext language that differs from the default language of the server instance as a whole.
Configuring full-text linguistic components (word breakers and stemmers, thesaurus file, and stopwords
and stoplists).
Configuring a user database for full-text search. This involves creating one or more full-text catalogs for the
database and defining a full-text index on each table or indexed view on which you want to execute full-text
queries.
Viewing or Changing Server Properties for Full-Text Search
You can view the full-text properties of an instance of SQL Server in SQL Server Management Studio.
To view and change server properties for full-text search
1. In Object Explorer, right-click a server, and then click Properties.
2. In the Server Properties dialog box, click the Advanced page to view server information about full-text
search. The full-text properties are as follows:
Default Full-Text Language
Specifies a default language for full-text indexed columns. Linguistic analysis of full-text indexed data
is dependent on the language of the data. The default value of this option is the language of the
server. For the language that corresponds to the displayed setting, see sys.fulltext_languages
(Transact-SQL).
Full-Text Upgrade Option
This server property controls how full-text indexes are migrated when upgrading a database from
SQL Server 2005 to a later version. This property applies to upgrading by attaching a database,
restoring a database backup, restoring a file backup, or copying the database by using the Copy
Database Wizard.
The alternatives are as follows:
Import
Full-text catalogs are imported. Typically, import is significantly faster than rebuild. For example,
when using only one CPU, import runs about 10 times faster than rebuild. However, an imported fulltext catalog does not use the new and enhanced word breakers introduced in SQL Server 2008, so
you might want to rebuild your full-text catalogs eventually.
NOTE
Rebuild can run in multi-threaded mode, and if more than 10 CPUs are available, rebuild might run faster
than import if you allow rebuild to use all of the CPUs.
If a full-text catalog is not available, the associated full-text indexes are rebuilt. This option is available
for only SQL Server 2005 databases.
Rebuild
Full-text catalogs are rebuilt using the new and enhanced word breakers. Rebuilding indexes can take
awhile, and a significant amount of CPU and memory might be required after the upgrade.
Reset
Full-text catalogs are reset. SQL Server 2005 full-text catalog files are removed, but the metadata for
full-text catalogs and full-text indexes is retained. After being upgraded, all full-text indexes are
disabled for change tracking and crawls are not started automatically. The catalog will remain empty
until you manually issue a full population, after the upgrade completes.
For information about choosing a full-text upgrade option, see full-Upgrade Full-Text Search.
NOTE
The full-text upgrade option can also be set by using the sp_fulltext_serviceupgrade_option action.
Viewing Additional Full-Text Server Properties
Transact-SQL functions can be used to obtain the value of various server-level properties of full-text search. This
information is useful for administrating and troubleshooting full-text search.
The following table lists full-text properties of a SQL Server server instance and their related Transact-SQL
functions.
PROPERTY
DESCRIPTION
FUNCTION
IsFullTextInstalled
Whether the full-text component is
installed with the current instance of
SQL Server.
FULLTEXTSERVICEPROPERTY
LoadOSResources
Whether operating system word
breakers and filters are registered and
used with this instance of SQL Server.
FULLTEXTSERVICEPROPERTY
VerifySignature
Specifies whether only signed binaries
are loaded by the Full-Text Engine.
FULLTEXTSERVICEPROPERTY
SERVERPROPERTY
Monitoring Full-Text Search Activity
Several dynamic management views and functions are useful monitoring full-text search activity on a server
instance.
To view information about the full-text catalogs with in-progress population activity
sys.dm_fts_active_catalogs (Transact-SQL)
To view current activity of a filter daemon host process
sys.dm_fts_fdhosts (Transact-SQL)
To view information about in-progress index populations
sys.dm_fts_index_population (Transact-SQL)
To view memory buffers in a memory pool that are used as part of a crawl or crawl range.
sys.dm_fts_memory_buffers (Transact-SQL)
To view the shared memory pools available to the full-text gatherer component for a full-text
crawl or a full-text crawl range
sys.dm_fts_memory_pools (Transact-SQL)
To view information about each full-text indexing batch
sys.dm_fts_outstanding_batches (Transact-SQL)
To view information about the specific ranges related to an in-progress population
sys.dm_fts_population_ranges (Transact-SQL)
Set the Service Account for the Full-text Filter
Daemon Launcher
3/24/2017 • 3 min to read • Edit Online
This topic describes how to set or change the service account for the SQL Full-text Filter Daemon Launcher service
(MSSQLFDLauncher) by using SQL Server Configuration Manager. The default service account used by SQL Server
setup is NT Service\MSSQLFDLauncher .
About the SQL Full-text Filter Daemon Launcher service
The SQL Full-text Filter Daemon Launcher service is used by SQL Server Full-Text Search to start the filter daemon
host process, which handles full-text search filtering and word breaking. The Launcher service must be running to
use full-text search.
The SQL Full-text Filter Daemon Launcher service is an instance-aware service that is associated with a specific
instance of SQL Server. The SQL Full-text Filter Daemon Launcher service propagates the service account
information to each filter daemon host process that it launches.
Set the service account
1. On the Start menu, point to All Programs, expand Microsoft SQL Server 2016, and then click SQL Server
2016 Configuration Manager.
2. In SQL Server Configuration Manager, click SQL Server Services, right-click SQL Full-text Filter
Daemon Launcher (instance name), and then click Properties.
3. Click the Log On tab of the dialog box, and then select or enter the account under which to run the
processes that the SQL Full-text Filter Daemon Launcher service starts.
4. After you close the dialog box, click Restart to restart the SQL Full-text Filter Daemon Launcher service.
Troubleshoot the SQL Full-text Filter Daemon Launcher service if it
doesn't start
If the SQL Full-text Filter Daemon Launcher service doesn't start, review the following possible causes:
Permissions issues
The SQL Server service group does not have permission to start SQL Full-text Filter Daemon Launcher
service.
Make sure the SQL Server service group has permissions to the SQL Full-text Filter Daemon Launcher
service account. During the installation of SQL Server, the SQL Server service group is granted default
permission to manage, query, and start the SQL Full-text Filter Daemon Launcher service. If SQL Server
service group permissions to the SQL Full-text Filter Daemon Launcher service account have been removed
after SQL Server installation, the SQL Full-text Filter Daemon Launcher service will not start, and full-text
search will be disabled.
The account used to log in to the service does not have privileges.
You may be using an account that does not have login privileges on the computer where the server instance
is installed. Verify that you are logging in with an account that has User rights and permissions on the local
computer.
Service account and password issues
The user account or password of the service account is incorrect.
In SQL Server 2016 Configuration Manager, make sure the service is using the correct service account and
password.
The password associated with the SQL Full-text Filter Daemon Launcher service account has expired.
If you use a local user account for the SQL Full-text Filter Daemon Launcher service and the password
expires, you have to:
1. Set a new Windows password for the account.
2. In SQL Server 2016 Configuration Manager, update the SQL Full-text Filter Daemon Launcher service
to use the new password.
Named pipes configuration issues
The SQL Full-text Filter Daemon Launcher service is not configured correctly.
If named pipes functionality has been disabled on the local computer, or if SQL Server has been configured
to use a named pipe other than the default named pipe, the SQL Full-text Filter Daemon Launcher service
might not start.
Another instance of the same named pipe is already running.
The SQL Server service acts as a named pipe server for the SQL Full-text Filter Daemon Launcher service
client. If the named pipe was already created by another process before SQL Server starts, an error will be
logged in the SQL Server error log and the Windows Event Log, and full-text search will not be available.
Determine what process or application is attempting to use the same named pipe and stop the application.
See Also
Managing Services How-to Topics (SQL Server Configuration Manager)
Upgrade Full-Text Search
Upgrade Full-Text Search
3/24/2017 • 11 min to read • Edit Online
Upgrading full-text search to SQL Server 2016 is done during setup and when database files and full-text catalogs
from the earlier version of SQL Server are attached, restored, or copied using the Copy Database Wizard.
Upgrade a server instance
For an in-place upgrade, an instance of SQL Server 2016 is set up side-by-side with the old version of SQL Server,
and data is migrated. If the old version of SQL Server had full-text search installed, a new version of full-text
search is automatically installed. Side-by-side install means that each of the following components exists at the
instance-level of SQL Server.
Word breakers, stemmers, and filters
Each instance now uses its own set of word breakers, stemmers, and filters, rather than relying on the operating
system version of these components. These components are also easier to register and configure at a perinstance level. For more information, see Configure and Manage Word Breakers and Stemmers for Search and
Configure and Manage Filters for Search.
Filter daemon host
The full-text filter daemon hosts are processes that safely load and drive extensible external components used for
index and query, such as word breakers, stemmers, and filters, without compromising the integrity of the Full-Text
Engine. A server instance uses a multithreaded process for all multithreaded filters and a single-threaded process
for all single-threaded filters.
NOTE
SQL Server 2008 introduced a service account for the FDHOST Launcher service (MSSQLFDLauncher). This service
propagates the service account information to the filter daemon host processes of a specific instance of SQL Server. For
information about setting the service account, see Set the Service Account for the Full-text Filter Daemon Launcher.
In SQL Server 2005, each full-text index resides in a full-text catalog that belongs to a filegroup, has a physical
path, and is treated as a database file. In SQL Server 2008 and later versions, a full-text catalog is a logical or
virtual object that contains a group of full-text indexes. Therefore, a new full-text catalog is not treated as a
database file with a physical path. However, during upgrade of any full-text catalog that contains data files, a new
filegroup is created on same disk. This maintains the old disk I/O behavior after upgrade. Any full-text index from
that catalog is placed in the new filegroup if the root path exists. If the old full-text catalog path is invalid, the
upgrade keeps the full-text index in the same filegroup as the base table or, for a partitioned table, in the primary
filegroup.
Full-text upgrade options
When upgrading a server instance to SQL Server 2016, the user interface allows you to choose one of the
following full-text upgrade options.
Import
Full-text catalogs are imported. Typically, import is significantly faster than rebuild. For example, when using only
one CPU, import runs about 10 times faster than rebuild. However, an imported full-text catalog does not use the
new word breakers installed with the latest version of SQL Server. To ensure consistency in query results, full-text
catalogs have to be rebuilt.
NOTE
Rebuild can run in multi-threaded mode, and if more than 10 CPUs are available, rebuild might run faster than import if
you allow rebuild to use all of the CPUs.
If a full-text catalog is not available, the associated full-text indexes are rebuilt. This option is available for only
SQL Server 2005 databases.
For information about the impact of importing full-text index, see "Considerations for Choosing a Full-Text
Upgrade Option," later in this topic.
Rebuild
Full-text catalogs are rebuilt using the new and enhanced word breakers. Rebuilding indexes can take a while, and
a significant amount of CPU and memory might be required after the upgrade.
Reset
Full-text catalogs are reset. When upgrading from SQL Server 2005, full-text catalog files are removed, but the
metadata for full-text catalogs and full-text indexes is retained. After being upgraded, all full-text indexes are
disabled for change tracking and crawls are not started automatically. The catalog will remain empty until you
manually issue a full population, after the upgrade completes.
Considerations for choosing a full-text upgrade option
When choosing the upgrade option for your upgrade, consider the following:
Do you require consistency in query results?
SQL Server 2016 installs new word breakers for use by Full-Text and Semantic Search. The word breakers
are used both at indexing time and at query time. If you do not rebuild the full-text catalogs, your search
results may be inconsistent. If you issue a full-text query that looks for a phrase that is broken differently
by the word breaker in a previous version of SQL Server and the current word breaker, a document or row
containing the phrase might not be retrieved. This is because the indexed phrases were broken using
different logic than the query is using. The solution is to repopulate (rebuild) the full-text catalogs with the
new word breakers so that index time and query time behavior are identical. You can choose the Rebuild
option to accomplish this, or you can rebuild manually after choosing the Import option.
Were any full-text indexes built on integer full-text key columns?
Rebuilding performs internal optimizations that improve the query performance of the upgraded full-text
index in some cases. Specifically, if you have full-text catalogs that contain full-text indexes for which the
full-text key column of the base table is an integer data type, rebuilding achieves ideal performance of fulltext queries after upgrade. In this case, we highly recommend you to use the Rebuild option.
NOTE
For full-text indexes in SQL Server 2016, we recommend that the column serving as the full-text key be an integer
data type. For more information, see Improve the Performance of Full-Text Indexes.
What is the priority for getting your server instance online?
Importing or rebuilding during upgrade takes a lot of CPU resources, which delays getting the rest of the
server instance upgraded and online. If getting the server instance online as soon as possible is important
and if you are willing to run a manual population after the upgrade, Reset is suitable.
Ensure consistent query results after importing a full-text index
If a full-text catalog was imported when upgrading a SQL Server 2005 database to SQL Server 2016, mismatches
between the query and the full-text index content might occur because of differences in the behavior of the old
and new word breakers. In this case, to guarantee a total match between queries and the full-text index content,
choose one of the following options:
Rebuild the full-text catalog that contains the full-text index (ALTER FULLTEXT CATALOGcatalog_name
REBUILD)
Issue a FULL POPULATION on the full-text index (ALTER FULLTEXT INDEX ON table_name START FULL
POPULATION).
For more information about word breakers, see Configure and Manage Word Breakers and Stemmers for
Search.
Upgrade noise-word files to stoplists
When a database is upgraded to SQL Server 2016 from SQL Server 2005, the noise-word files are no longer used.
However, the old noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them
later when updating or building the corresponding SQL Server 2016 stoplists.
After upgrading from SQL Server 2005:
If you never added, modified, or deleted any noise-word files in your installation of SQL Server 2005, the
system stoplist should meet your needs.
If your noise-word files were modified in SQL Server 2005, those modifications are lost during upgrade. To
re-create those updates, you must manually recreate those modifications in the corresponding SQL Server
2008 stoplist. For more information, see ALTER FULLTEXT STOPLIST (Transact-SQL).
If you do not want to apply any stopwords to your full-text indexes (for example, if you deleted or erased
your noise-word files in your SQL Server 2005 installation), you must turn off the stoplist for each
upgraded full-text index. Run the following Transact-SQL statement (replacing database with the name of
the upgraded database and table with the name of the table):
Use database;
ALTER FULLTEXT INDEX ON table
SET STOPLIST OFF;
GO
The STOPLIST OFF clause removes stop-word filtering, and it will trigger a population of the table, without
filtering any words considered to be noise.
Backup and imported full-text catalogs
For full-text catalogs that are rebuilt or reset during upgrade (and for new full-text catalogs), the fulltext catalog is
a logical concept and does not reside in a filegroup. Therefore, to back up a full-text catalog in SQL Server 2016,
you must identify every filegroup that contains a full-text index of the catalog and back each of them up, one by
one. For more information, see Back Up and Restore Full-Text Catalogs and Indexes.
For full-text catalogs that have been imported from SQL Server 2005, the full-text catalog is still a database file in
its own filegroup. The SQL Server 2005 backup process for full-text catalogs still applies except that the
MSFTESQL service does not exist in SQL Server 2016. For information about the SQL Server 2005 process, see
Backing Up and Restoring Full-Text Catalogs in SQL Server 2005 Books Online.
Migrating full-text indexes when upgrading a database to SQL Server
2016
Database files and full-text catalogs from a previous version of SQL Server can be upgraded to an existing SQL
Server 2016 server instance by using attach, restore, or the Copy Database Wizard. SQL Server 2005 full-text
indexes, if any, are either imported, reset, or rebuilt. The upgrade_option server property controls which full-text
upgrade option the server instance uses during these database upgrades.
After you attach, restore, or copy any SQL Server 2005 database to SQL Server 2016, the database becomes
available immediately and is then automatically upgraded. Depending the amount of data being indexed,
importing can take several hours, and rebuilding can take up to ten times longer. Note also that when the
upgrade option is set to import, if a full-text catalog is not available, the associated full-text indexes are rebuilt.
To change full-text upgrade behavior on a server instance
Transact-SQL: Use the upgrade_option action of sp_fulltext_service
SQL Server Management Studio : Use the Full-Text Upgrade Option of the Server Properties dialog
box. For more information, see Manage and Monitor Full-Text Search for a Server Instance.
Considerations for Restoring a SQL Server 2005 Full-Text Catalog to
SQL Server 2016
One method of upgrading fulltext data from a SQL Server 2005 database to SQL Server 2016 is to restore a full
database backup to SQL Server 2016.
While importing a SQL Server 2005 full-text catalog, you can back up and restore the database and the catalog
file. The behavior is the same as in SQL Server 2005:
The full database backup will include the full-text catalog. To refer to the full-text catalog, use its SQL Server
2005 file name, sysft_+catalog-name.
If the full-text catalog is offline, the backup will fail.
For more information about backing up and restoring SQL Server 2005 full-text catalogs, see Backing Up
and Restoring Full-Text Catalogs and File Backup and Restore and Full-Text Catalogsin SQL Server 2005
Books Online.
When the database is restored on SQL Server 2016, a new database file will be created for the full-text
catalog. The default name of this file is ftrow_catalog-name.ndf. For example, if you catalog-name is cat1 ,
the default name of the SQL Server 2016 database file would be ftrow_cat1.ndf . But if the default name is
already being used in the target directory, the new database file would be named ftrow_ catalog-name {
GUID }.ndf , where GUID is the Globally Unique Identifier of the new file.
After the catalogs have been imported, the sys.database_files and sys.master_files are updated to
remove the catalog entries and the path column in sys.fulltext_catalogs is set to NULL.
To back up a database
Full Database Backups (SQL Server)
Transaction Log Backups (SQL Server) (full recovery model only)
To restore a database backup
Complete Database Restores (Simple Recovery Model)
Complete Database Restores (Full Recovery Model)
Example
The following example uses the MOVE clause in the RESTORE statement, to restore a SQL Server 2005 database
named ftdb1 . The SQL Server 2005 database, log, and catalog files are moved to new locations on the SQL
Server 2016 server instance, as follows:
The database file,
ftdb1.mdf
, is moved to
C:\Program Files\Microsoft SQL Server\MSSQL.1MSSQL13.MSSQLSERVER\MSSQL\DATA\ftdb1.mdf
The log file,
ftdb1_log.ldf
\ftdb1_log.ldf
, is moved to a log directory on your log disk drive, log_drive
.
:\
log_directory
.
The catalog files that correspond to the sysft_cat90 catalog are moved to C:\temp . After the full-text
indexes are imported, they will automatically be placed in a database file, C:\ftrow_sysft_cat90.ndf, and the
C:\temp will be deleted.
RESTORE DATABASE [ftdb1] FROM DISK = N'C:\temp\ftdb1.bak' WITH FILE = 1,
MOVE N'ftdb1' TO N'C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\ftdb1.mdf',
MOVE N'ftdb1_log' TO N'log_drive:\log_directory\ftdb1_log.ldf',
MOVE N'sysft_cat90' TO N'C:\temp';
Attaching a SQL Server 2005 database to SQL Server 2016
In SQL Server 2008 and later versions, a full-text catalog is a logical concept that refers to a group of full-text
indexes. The full-text catalog is a virtual object that does not belong to any filegroup. However, when you attach a
SQL Server 2005 database that contains full-text catalog files onto a SQL Server 2016 server instance, the catalog
files are attached from their previous location along with the other database files, the same as in SQL Server
2005.
The state of each attached full-text catalog on SQL Server 2016 is the same as when the database was detached
from SQL Server 2005. If any full-text index population was suspended by the detach operation, the population is
resumed on SQL Server 2016, and the full-text index becomes available for full-text search.
If SQL Server 2016 cannot find a full-text catalog file or if the full-text file was moved during the attach operation
without specifying a new location, the behavior depends on the selected full-text upgrade option. If the full-text
upgrade option is Import or Rebuild, the attached full-text catalog is rebuilt. If the full-text upgrade option is
Reset, the attached full-text catalog is reset.
For more information about detaching and attaching a database, see Database Detach and Attach (SQL Server),
CREATE DATABASE (SQL Server Transact-SQL), sp_attach_db, and sp_detach_db (Transact-SQL).
See also
Get Started with Full-Text Search
Configure and Manage Word Breakers and Stemmers for Search
Configure and Manage Filters for Search
Full-Text Search DDL, Functions, Stored Procedures,
and Views
3/24/2017 • 1 min to read • Edit Online
Lists the Transact-SQL statements and the SQL Server database objects that support full-text search, including the
property search feature.
This list does not include deprecated objects.
For the list of database objects that support semantic search, see Semantic Search DDL, Functions, Stored
Procedures, and Views.
Transact-SQL Data Definition Language (DDL) Statements
CREATE FULLTEXT CATALOG (Transact-SQL)
CREATE FULLTEXT INDEX (Transact-SQL)
CREATE FULLTEXT STOPLIST (Transact-SQL)
CREATE SEARCH PROPERTY LIST (Transact-SQL)
ALTER FULLTEXT CATALOG (Transact-SQL)
ALTER FULLTEXT INDEX (Transact-SQL)
ALTER FULLTEXT STOPLIST (Transact-SQL)
ALTER SEARCH PROPERTY LIST (Transact-SQL)
DROP FULLTEXT CATALOG (Transact-SQL)
DROP FULLTEXT INDEX (Transact-SQL)
DROP FULLTEXT STOPLIST (Transact-SQL)
DROP SEARCH PROPERTY LIST (Transact-SQL)
System Predicates and Functions
CONTAINS (Transact-SQL)
CONTAINSTABLE (Transact-SQL)
FREETEXT (Transact-SQL)
FREETEXTTABLE (Transact-SQL)
System Metadata Functions
COLUMNPROPERTY (Transact-SQL)
FULLTEXTCATALOGPROPERTY (Transact-SQL)
FULLTEXTSERVICEPROPERTY (Transact-SQL)
INDEXPROPERTY (Transact-SQL)
OBJECTPROPERTY (Transact-SQL)
OBJECTPROPERTYEX (Transact-SQL)
SERVERPROPERTY (Transact-SQL)
System Stored Procedures
sp_fulltext_keymappings (Transact-SQL)
sp_fulltext_load_thesaurus_file (Transact-SQL)
sp_fulltext_pendingchanges (Transact-SQL)
sp_fulltext_service (Transact-SQL)
sp_help_fulltext_system_components (Transact-SQL)
System Views – Catalog Views
sys.fulltext_catalogs (Transact-SQL)
sys.fulltext_document_types (Transact-SQL)
sys.fulltext_index_catalog_usages (Transact-SQL)
sys.fulltext_index_columns (Transact-SQL)
sys.fulltext_index_fragments (Transact-SQL)
sys.fulltext_indexes (Transact-SQL)
sys.fulltext_languages (Transact-SQL)
sys.fulltext_stoplists (Transact-SQL)
sys.fulltext_stopwords (Transact-SQL)
sys.fulltext_system_stopwords (Transact-SQL)
sys.registered_search_properties (Transact-SQL)
sys.registered_search_property_lists (Transact-SQL)
System Views – Dynamic Management Views
sys.dm_fts_active_catalogs (Transact-SQL)
sys.dm_fts_fdhosts (Transact-SQL)
sys.dm_fts_index_keywords (Transact-SQL)
sys.dm_fts_index_keywords_by_document (Transact-SQL)
sys.dm_fts_index_keywords_by_property (Transact-SQL)
sys.dm_fts_index_population (Transact-SQL)
sys.dm_fts_memory_buffers (Transact-SQL)
sys.dm_fts_memory_pools (Transact-SQL)
sys.dm_fts_outstanding_batches (Transact-SQL)
sys.dm_fts_parser (Transact-SQL)
sys.dm_fts_population_ranges (Transact-SQL)
Use the Full-Text Indexing Wizard
3/30/2017 • 4 min to read • Edit Online
The Full-Text Indexing Wizard in SSMS walks you through a series of steps designed to help you create a full-text
index.
Create a Full-Text index
1. In Object Explorer, right-click the table on which you want to create a full-text index, point to Full-Text
index, and then click Define Full-Text Index. This action launches the Wizard in a separate window. Click
Next
2. Unique Index. Select an index from the drop down list. The index must be a single-key-column, unique,
non-nullable index. Select the smallest unique key index for the full-text unique key. For best performance, a
clustered index is recommended.
3. Available Columns. Check the box next to all column names for columns you want to include. check box
next to the column name. Ineligible columns are greyed out and their check boxes disabled.
4. Language for Word Breaker. Select a language from the drop-down list. This choice will be used to
identify the correct word breakers for the index. SQL Server uses word breakers to identify word boundaries
in the full-text indexed data.
5. Type Column. Select the name of the column that holds the document type of column being full-text
indexed.
NOTE: The Type Column is enabled only when the column named in the Available Columns column
is of type varbinary(max) or image.
6. Statistical Semantics. Select whether to enable semantic indexing for the selected column. For more
information, see Semantic Search (SQL Server).
NOTES
If your selected language does not have an associated Semantic Language Model, then the Statistical
Semantics checkbox is not enabled. If you select Statistical Semantics prior to selecting a Language, the
languages available in the drop-down combo box will be restricted to those for which there is Semantic
Language Model support.
Semantic Search is not available for Azure SQL Database. The Statistical Semantics option does not appear
when running this Wizard on an Azure SQL Database.
1. Select the change tracking options.
Automatically
Select this radio button to have the full-text index updated automatically as changes occur to the underlying
data.
Manually
Select this radio button if you do not want the full-text index to be updated automatically as changes occur
to the underlying data. Changes to the underlying data are maintained. However, to apply the changes to
the full-text index you must start or schedule this process manually.
Do not track changes
Select this radio button if you do not want the full-text index to be updated with changes to the underlying
data.
2. Start full population when index is created (Available only when you Do not track changes).
Select this radio button to kick off a full population at the successful completion of this wizard. This will
consist of creating the full-text index structure in the catalog and populating it with full-text indexed data.
Click Next
Catalog, Index Filegroup and Stoplist
1. Select full-text catalog
Select a catalog: Select a full-text catalog from the list. The default catalog for the database will be the
selected item by default in the list. If no catalogs are available, the list will be disabled, and the Create a new
catalog checkbox will be checked and disabled.
OR
a. Create a new catalog
b. Select full-text catalog.
a. Name
Enter a name for your new full-text catalog.
b. Set as default catalog
Select to make the catalog the default for this database.
c. Accent sensitivity
Specify whether the new catalog will be accent-sensitive or accent-insensitive. If the database is accentsensitive, Sensitive is selected by default.
d. Select index filegroup
Specify the filegroup on which to create the full-text index.
e. Select a value:
VALUE
DESCRIPTION
If the table or view is not partitioned, select to use the
same filegroup as the underlying table or view. If the table
or view is partitioned, the primary filegroup is used
PRIMARY
Select to use the primary filegroup for the new full-text
index.
user-specified default filegroup
If a user-defined default stoplist exists, select its name
from the list to use that filegroup for the new full-text
index.
a. Select full-text stoplist
Specify a stoplist to use for the full-text index, or disable stoplist use.
Stopwords are managed in databases using objects called stoplists. A stoplist is a list of stopwords
that, when associated with a full-text index, is applied to full-text queries on that index. For more
information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.
Select one of the following values:
VALUE
DESCRIPTION
Select to use the system stoplist on the new full-text
index. This is the default
Select to disable stoplists for the new full-text index.
user-defined-stoplist-name
The list displays the name of each user-defined stoplist, if
any, that has been created on the database. Select any
user-defined stoplist to use for the new full-text index.
Click Next
2. Optionally, SQL Server only, define the population schedule. Indexing operations will begin immediately
unless they have been scheduled for future execution. Schedules will be created immediately, although they
will not run until their scheduled time.
New Table Schedule
Define a population schedule for a table.
New Catalog Schedule
Define a population schedule for a full-text catalog.
Edit
Edit a schedule.
Delete
Delete a schedule.
3. View or control the progress of the Full-Text Indexing Wizard.
Stop
Interrupts the current operation and prevents subsequent full-text operations from being performed by the
wizard during this session.
Report
When all of the operations have finished executing, click this button to access a report on the operations
performed. You can view the report, print it to a file, copy it to the clipboard, or e-mail the report.
Deprecated Full-Text Search Features in SQL Server
2016
3/30/2017 • 1 min to read • Edit Online
This topic describes the deprecated full-text search features still available in SQL Server. These features are
scheduled to be removed in a future release. Do not use deprecated features in new applications.
Monitor your use of deprecated features by using the SQL Server:Deprecated Features object performance
counter and trace events. For more information, see Use SQL Server Objects.
Features no longer supported
DEPRECATED FEATURE
REPLACEMENT
FEATURE NAME
FEATURE ID
FULLTEXTCATALOGPROPER
TY property: LogSize
None.
FULLTEXTCATALOGPROPER
TY('LogSize')
211
FULLTEXTSERVICEPROPERTY
property:
None.
FULLTEXTSERVICEPROPERTY
('ConnectTimeout')
210
209
ConnectTimeout
FULLTEXTSERVICEPROPERTY
('DataTimeout')
DataTimeout
sp_fulltext_catalog
CREATE FULL CATALOG
sp_fulltext_catalog
84
ALTER FULLTEXT CATALOG
DROP FULLTEXT CATALOG
sp_fulltext_column
CREATE FULL INDEX
sp_fulltext_column
86
sp_fulltext_database
ALTER FULLTEXT INDEX
sp_fulltext_database
87
sp_fulltext_table
DROP FULLTEXT INDEX
sp_fulltext_table
85
sp_help_fulltext_catalogs
sys.fulltext_catalogs
sp_help_fulltext_catalogs
88
sp_help_fulltext_catalog_com
ponents
sys.fulltext_index_columns
sp_help_fulltext_catalog_com
ponents
203
sys.fulltext_indexes
90
sp_help_fulltext_catalogs_cur
sor
sp_help_fulltext_catalogs_cur
sor
92
sp_help_fulltext_columns
sp_help_fulltext_columns
93
sp_help_fulltext_columns_cur
sor
sp_help_fulltext_columns_cur
sor
91
sp_help_fulltext_tables
sp_help_fulltext_table
sp_help_fulltext_tables_curso
r
sp_help_fulltext_tables_curso
r
89
DEPRECATED FEATURE
REPLACEMENT
FEATURE NAME
FEATURE ID
sp_fulltext_service action
values: clean_up,
connect_timeout, and
data_timeout return zero
None
sp_fulltext_service
@action=clean_up<br />
sp_fulltext_service
@action=connect_timeout<
br />
sp_fulltext_service
@action=data_timeout
116
dm_fts_active_catalogs.is_pa
used
218
sys.dm_fts_active_catalogs
columns:
None.
117
118
221
is_paused
dm_fts_active_catalogs.previ
ous_status
222
previous_status
dm_fts_active_catalogs.previ
ous_status_description
previous_status_description
224
219
row_count_in_thousands
dm_fts_active_catalogs.row_c
ount_in_thousands
220
dm_fts_active_catalogs.status
223
status
status_description
dm_fts_active_catalogs.status
_description
worker_count
dm_fts_active_catalogs.worke
r_count
sys.dm_fts_memory_buffers
column:
None.
dm_fts_memory_buffers.row_
count
225
None.
fulltext_catalogs.path
215
fulltext_catalogs.data_space_i
d
216
row_count
sys.fulltext_catalogs columns:
path
data_space_id
217
fulltext_catalogs.file_id
file_id columns
Features Not Supported in a Future Version of SQL Server
The following full-text search features are supported in the next version of SQL Server, but will be removed in a
later version. The specific version of SQL Server has not been determined.
The Feature name value appears in trace events as the ObjectName and in performance counters and
sys.dm_os_performance_counters as the instance name. The Feature ID value appears in trace events as the
ObjectId.
DEPRECATED FEATURE
REPLACEMENT
FEATURE NAME
FEATURE ID
CONTAINS and
CONTAINSTABLE generic
NEAR operator:
The custom NEAR operator:
FULLTEXT_OLD_NEAR_SYNT
AX
247
{|}
{ { | } [ ,…n ]
{
| ( { | } [,…n] )
{ { NEAR | ~ } { | } } [...n]
[, [,] ]
}
}
CREATE FULLTEXT CATLOG
IN PATH
237
NEAR(
)
::= {integer | MAX}
::= {TRUE | FALSE}
CREATE FULLTEXT CATALOG
option:
None.
None.*
None.*
IN PATH 'rootpath'
ON FILEGROUP filegroup
DATABASEPROPERTYEX
property: IsFullTextEnabled
None.
DATABASEPROPERTYEX('IsF
ullTextEnabled')
202
sp_detach_db option:
None.
sp_detach_db
@keepfulltextindexfile
226
None
sp_fulltext_service
@action=resource_usage
200
[ @keepfulltextindexfile = ]
'KeepFulltextIndexFile'
sp_fulltext_service action
values: resource_usage has
no function.
The **SQL Server:Deprecated Features* object does not monitor occurrences of CREATE FULLTEXT CATLOG ON
FILEGROUP filegroup.
Semantic Search (SQL Server)
3/24/2017 • 3 min to read • Edit Online
Statistical Semantic Search provides deep insight into unstructured documents stored in SQL Server databases by
extracting and indexing statistically relevant key phrases. Then it uses these key phrases to identify and index
documents that are similar or related.
What can you do with Semantic Search?
Semantic search builds upon the existing full-text search feature in SQL Server, but enables new scenarios that
extend beyond keyword searches. While full-text search lets you query the words in a document, semantic search
lets you query the meaning of the document. Solutions that are now possible include automatic tag extraction,
related content discovery, and hierarchical navigation across similar content. For example, you can query the index
of key phrases to build the taxonomy for an organization, or for a corpus of documents. Or, you can query the
document similarity index to identify resumes that match a job description.
The following examples demonstrate the capabilities of Semantic Search. At the same time these examples
demonstrate the three Transact-SQL rowset functions that you use to query the semantic indexes and retrieve the
results as structured data.
Find the key phrases in a document
The following query gets the key phrases that were identified in the sample document. It presents the results in
descending order by the score that ranks the statistical significance of each key phrase.
This query calls the semantickeyphrasetable function.
SET @Title = 'Sample Document.docx'
SELECT @DocID = DocumentID
FROM Documents
WHERE DocumentTitle = @Title
SELECT @Title AS Title, keyphrase, score
FROM SEMANTICKEYPHRASETABLE(Documents, *, @DocID)
ORDER BY score DESC
Find similar or related documents
The following query gets the documents that were identified as similar or related to the sample document. It
presents the results in descending order by the score that ranks the similarity of the two documents.
This query calls the semanticsimilaritytable function.
SET @Title = 'Sample Document.docx'
SELECT @DocID = DocumentID
FROM Documents
WHERE DocumentTitle = @Title
SELECT @Title AS SourceTitle, DocumentTitle AS MatchedTitle,
DocumentID, score
FROM SEMANTICSIMILARITYTABLE(Documents, *, @DocID)
INNER JOIN Documents ON DocumentID = matched_document_key
ORDER BY score DESC
Find the key phrases that make documents similar or related
The following query gets the key phrases that make the two sample documents similar or related to one another. It
presents the results in descending order by the score that ranks the weight of each key phrase.
This query calls the semanticsimilaritydetailstable function.
SET @SourceTitle = 'first.docx'
SET @MatchedTitle = 'second.docx'
SELECT @SourceDocID = DocumentID FROM Documents WHERE DocumentTitle = @SourceTitle
SELECT @MatchedDocID = DocumentID FROM Documents WHERE DocumentTitle = @MatchedTitle
SELECT @SourceTitle AS SourceTitle, @MatchedTitle AS MatchedTitle, keyphrase, score
FROM semanticsimilaritydetailstable(Documents, DocumentContent,
@SourceDocID, DocumentContent, @MatchedDocID)
ORDER BY score DESC
Store your documents in SQL Server
Before you can index documents with Semantic Search, you have to store the documents in a SQL Server database.
The FileTable feature in SQL Server makes unstructured files and documents first-class citizens of the relational
database. As a result, database developers can manipulate documents together with structured data in TransactSQL set-based operations.
For more info about the FileTable feature, see FileTables (SQL Server). For info about the FILESTREAM feature,
which is another option for storing documents in the database, see FILESTREAM (SQL Server).
Related tasks
Install and Configure Semantic Search
Describes the prerequisites for statistical semantic search and how to install or check them.
Enable Semantic Search on Tables and Columns
Describes how to enable or disable statistical semantic indexing on selected columns that contain documents or
text.
Find Key Phrases in Documents with Semantic Search
Describes how to find the key phrases in documents or text columns that are configured for statistical semantic
indexing.
Find Similar and Related Documents with Semantic Search
Describes how to find similar or related documents or text values, and information about how they are similar or
related, in columns that are configured for statistical semantic indexing.
Manage and Monitor Semantic Search
Describes the process of semantic indexing and the tasks related to monitoring and managing the indexes.
Related content
Semantic Search DDL, Functions, Stored Procedures, and Views
Lists the Transact-SQL statements and the SQL Server database objects added or changed to support statistical
semantic search.
Install and Configure Semantic Search
3/24/2017 • 5 min to read • Edit Online
Describes the prerequisites for statistical semantic search and how to install or check them.
Install Semantic Search
Check whether Semantic Search is installed
Query the IsFullTextInstalled property of the SERVERPROPERTY (Transact-SQL) metadata function.
A return value of 1 indicates that Full-Text Search and Semantic Search are installed; a return value of 0 indicates
that they are not installed.
SELECT SERVERPROPERTY('IsFullTextInstalled');
GO
Install Semantic Search
To install Semantic Search, select Full-Text and Semantic Extractions for Search on the Features to Install
page during SQL Server setup.
Statistical Semantic Search depends on Full-Text Search. These two optional features of SQL Server are installed
together.
Install the Semantic Language Statistics Database
Semantic Search has an additional external dependency that is called the semantic language statistics database.
This database contains the statistical language models required by semantic search. A single semantic language
statistics database contains the language models for all the languages that are supported for semantic indexing.
Check whether the Semantic Language Statistics Database is installed
Query the catalog view sys.fulltext_semantic_language_statistics_database (Transact-SQL).
If the semantic language statistics database is installed and registered for the instance, then the query results
contain a single row of information about the database.
SELECT * FROM sys.fulltext_semantic_language_statistics_database;
GO
Install, attach, and register the Semantic Language Statistics Database
The semantic language statistics database is not installed by the SQL Server setup program. To set up the
Semantic Language Statistics database as a prerequisite for semantic indexing, do the following things:
1. Install the semantic language statistics database.
1. Locate the semantic language statistics database on the SQL Server installation media or download it from
the Web.
a. Locate the Windows installer package named SemanticLanguageDatabase.msi on the SQL Server
installation media.
b. Download the installer package from the Microsoft® SQL Server® 2016 Semantic Language
Statistics page on the Microsoft Download Center.
1. Run the SemanticLanguageDatabase.msi Windows installer package to extract the database and log file.
You can optionally change the destination directory. By default, the installer extracts the files to a folder
named Microsoft Semantic Language Database in the Program Files folder. The MSI file contains a
compressed database file and log file.
2. Move the extracted database file and log file to a suitable location in the file system.
If you leave the files in their default location, it will not be possible to extract another copy of the database
for another instance of SQL Server.
IMPORTANT
When the semantic language statistics database is extracted, restricted permissions are assigned to the database file
and log file in the default location in the file system. As a result, you may not have permission to attach the database
if you leave it in the default location. If an error is raised when you try to attach the database, move the files, or
check and fix file system permissions as appropriate.
2. Attach the semantic language statistics database.
Attach the database to the instance of SQL Server by using Management Studio or by calling CREATE
DATABASE (SQL Server Transact-SQL) with the FOR ATTACH syntax. For more information, see Database
Detach and Attach (SQL Server).
By default, the name of the database is semanticsdb. You can optionally give the database a different
name when you attach it. You have to provide this name when you register the database in the subsequent
step.
CREATE DATABASE semanticsdb
ON ( FILENAME = 'C:\Microsoft Semantic Language Database\semanticsdb.mdf' )
LOG ON ( FILENAME = 'C:\Microsoft Semantic Language Database\semanticsdb_log.ldf' )
FOR ATTACH;
GO
This code sample assumes that you have moved the database from its default location to a new location.
3. Register the semantic language statistics database.
Call the stored procedure sp_fulltext_semantic_register_language_statistics_db (Transact-SQL) and provide the
name that you gave to the database when you attached it.
EXEC sp_fulltext_semantic_register_language_statistics_db @dbname = N'semanticsdb';
GO
Requirements and restrictions for the Semantic Language Statistics
Database
You can only attach and register one semantic language statistics database on an instance of SQL Server.
Each instance of SQL Server on a single computer requires a separate physical copy of the semantic
language statistics database. Attach one copy to each instance.
You cannot detach a valid and registered semantic language statistics database and replace it with an
arbitrary database that has the same name. Doing so will cause active or future index populations to fail.
The semantic language statistics database is read-only. You cannot customize this database. If you alter the
content of the database in any way, the results for future semantic indexing are indeterministic. To restore
the original state of this data, you can drop the altered database, and download and attach a new and
unaltered copy of the database.
It is possible to detach or drop the semantic language statistics database. If there are any active indexing
operations that have read locks on the database, then the detach or drop operation will fail or time out. This
is consistent with existing behavior. After the database is removed, semantic indexing operations will fail.
Remove the Semantic Language Statistics Database
Unregister, detach, and remove the Semantic Language Statistics Database
1. Unregister the semantic language statistics database.
Call the stored procedure sp_fulltext_semantic_unregister_language_statistics_db (Transact-SQL). You do not have
to provide the name of the database since an instance can have only one semantic language statistics database.
EXEC sp_fulltext_semantic_unregister_language_statistics_db;
GO
2. Detach the semantic language statistics database.
Call the stored procedure sp_detach_db (Transact-SQL) and provide the name of the database.
USE master;
GO
EXEC sp_detach_db @dbname = N'semanticsdb';
GO
3. Remove the semantic language statistics database.
After unregistering and detaching the database, you can simply delete the database file. There is no uninstall
program and there is no entry in Programs and Features in the Control Panel.
Install optional support for newer document types
Install the latest filters for Microsoft Office and other Microsoft document types
SQL Server installs the latest Microsoft word breakers and stemmers, but does not install the latest filters for
Microsoft Office documents and other Microsoft document types. These filters are required for indexing
documents created with recent versions of Microsoft Office and other Microsoft applications. To download the
latest filters, see Microsoft Office 2010 Filter Packs. (There does not appear to be a Filter Pack release for Office
2013 or Office 2016.)
Enable Semantic Search on Tables and Columns
3/24/2017 • 9 min to read • Edit Online
Describes how to enable or disable statistical semantic indexing on selected columns that contain documents or
text.
Statistical Semantic Search uses the indexes that are created by Full-Text Search, and creates additional indexes. As
a result of this dependency on full-text search, you create a new semantic index when you define a new full-text
index, or when you alter an existing full-text index. You can create a new semantic index by using Transact-SQL
statements, or by using the Full-Text Indexing Wizard and other dialog boxes in SQL Server Management Studio,
as described in this topic.
Create a semantic index
Requirements and restrictions for creating a semantic index
You can create an index on any of the database objects that are supported for full-text indexing, including
tables and indexed views.
Before you can enable semantic indexing for specific columns, the following prerequisites must exist:
A full-text catalog must exist for the database.
The table must have a full-text index.
The selected columns must participate in the full-text index.
You can create and enable all these requirements at the same time.
You can create a semantic index on columns that have any of the data types that are supported for full-text
indexing. For more information, see Create and Manage Full-Text Indexes.
You can specify any document type that is supported for full-text indexing for varbinary(max) columns.
For more information, see How To: Determine Which Document Types Can Be Indexed in this topic.
Semantic indexing creates two types of indexes for the columns that you select – an index of key phrases,
and an index of document similarity. You cannot select only one type of index or the other when you enable
semantic indexing. However you can query these two indexes independently. For more information, see
Find Key Phrases in Documents with Semantic Search and Find Similar and Related Documents with
Semantic Search.
If you do not explicitly specify an LCID for a semantic index, then only the primary language and its
associated language statistics are used for semantic indexing.
If you specify a language for a column for which the language model is not available, the creation of the
index fails and returns an error message.
Create a semantic index when there is no full-text index
When you create a new full-text index with the CREATE FULLTEXT INDEX statement, you can enable semantic
indexing at the column level by specifying the keyword STATISTICAL_SEMANTICS as part of the column
definition. You can also enable semantic indexing when you use the Full-Text Indexing Wizard to create a new fulltext index.
Create a new semantic index by using Transact-SQL
Call the CREATE FULLTEXT INDEX statement and specify STATISTICAL_SEMANTICS for each column on which
you want to create a semantic index. For more information about all the options for this statement, see CREATE
FULLTEXT INDEX (Transact-SQL).
Example 1: Create a unique index, full-text index, and semantic index
The following example creates a default full-text catalog, ft. The example then creates a unique index on the
JobCandidateID column of the HumanResources.JobCandidate table of the AdventureWorks2012 sample
database. This unique index is required as the key column for a full-text index. The example then creates a full-text
index and a semantic index on the Resume column.
CREATE FULLTEXT CATALOG ft AS DEFAULT
GO
CREATE UNIQUE INDEX ui_ukJobCand
ON HumanResources.JobCandidate(JobCandidateID)
GO
CREATE FULLTEXT INDEX ON HumanResources.JobCandidate
(Resume
Language 1033
Statistical_Semantics
)
KEY INDEX JobCandidateID
WITH STOPLIST = SYSTEM
GO
Example 2: Create a full-text and semantic index on several columns with delayed index population
The following example creates a full-text catalog, documents_catalog, in the AdventureWorks2012 sample
database. The example then creates a full-text index that uses this new catalog. The full-text index is created on the
Title, DocumentSummary, and Document columns of the Production.Document table, while the semantic
index is only created on the Document column. This full-text index uses the newly-created full-text catalog and an
existing unique key index, PK_Document_DocumentID. As recommended, this index key is created on an integer
column, DocumentID. The example specifies the LCID for English, 1033, which is the language of the data in the
columns.
This example also specifies that change tracking is off with no population. Later, during off-peak hours, the
example uses an ALTER FULLTEXT INDEX statement to start a full population on the new index and enable
automatic change tracking.
CREATE FULLTEXT CATALOG documents_catalog
GO
CREATE FULLTEXT INDEX ON Production.Document
(
Title
Language 1033,
DocumentSummary
Language 1033,
Document
TYPE COLUMN FileExtension
Language 1033
Statistical_Semantics
)
KEY INDEX PK_Document_DocumentID
ON documents_catalog
WITH CHANGE_TRACKING OFF, NO POPULATION
GO
Later, at an off-peak time, the index is populated:
ALTER FULLTEXT INDEX ON Production.Document SET CHANGE_TRACKING AUTO
GO
Create a new semantic index by using SQL Server Management Studio
Run the Full-Text Indexing Wizard and enable Statistical Semantics on the Select Table Columns page for each
column on which you want to create a semantic index. For more information, including information about how to
start the Full-Text Indexing Wizard, see Use the Full-Text Indexing Wizard.
Create a semantic index when there is an existing full-text index
You can add semantic indexing when you alter an existing full-text index with the ALTER FULLTEXT INDEX
statement. You can also add semantic indexing by using various dialog boxes in SQL Server Management Studio.
Add a semantic index by using Transact-SQL
Call the ALTER FULLTEXT INDEX statement with the options described below for each column on which you want
to add a semantic index. For more information about all the options for this statement, see ALTER FULLTEXT INDEX
(Transact-SQL).
Both full-text and semantic indexes are repopulated after a call to ALTER, unless you specify otherwise.
To add full-text indexing only to a column, use the ADD syntax.
To add both full-text and semantic indexing to a column, use the ADD syntax with the
STATISTICAL_SEMANTICS option.
To add semantic indexing to a column that is already enabled for full-text indexing, use the ADD
STATISTICAL_SEMANTICS option. You can only add semantic indexing to one column in a single ALTER
statement.
Example: Add semantic indexing to a column that already has full-text indexing
The following example alters an existing full-text index on Production.Document table in
AdventureWorks2012 sample database. The example adds a semantic index on the Document column of
the Production.Document table, which already has a full-text index. The example specifies that the index
will not be repopulated automatically.
ALTER FULLTEXT INDEX ON Production.Document
ALTER COLUMN Document
ADD Statistical_Semantics
WITH NO POPULATION
GO
Add a semantic index by using SQL Server Management Studio
You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns
page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes.
Alter a semantic index
Requirements and restrictions for altering an existing index
You cannot alter an existing index while population of the index is in progress. For more information on
monitoring the progress of index population, see Manage and Monitor Semantic Search.
You cannot add indexing to a column, and alter or drop indexing for the same column, in a single call to the
ALTER FULLTEXT INDEX statement.
Drop a semantic index
You can drop semantic indexing when you alter an existing full-text index with the ALTER FULLTEXT INDEX
statement. You can also drop semantic indexing by using various dialog boxes in SQL Server Management Studio.
Drop a semantic index by using Transact-SQL
To drop semantic indexing only from a column or columns, call the ALTER FULLTEXT INDEX statement with the
ALTER COLUMNcolumn_nameDROP STATISTICAL_SEMANTICS option. You can drop the indexing from
multiple columns in a single ALTER statement.
USE database_name
GO
ALTER FULLTEXT INDEX
ALTER COLUMN column_name
DROP STATISTICAL_SEMANTICS
GO
To drop both full-text and semantic indexing from a column, call the ALTER FULLTEXT INDEX statement with the
ALTER COLUMNcolumn_nameDROP option.
USE database_name
GO
ALTER FULLTEXT INDEX
ALTER COLUMN column_name
DROP
GO
Drop a semantic index by using SQL Server Management Studio
You can change the columns that are enabled for semantic and full-text indexing on the Full-Text Index Columns
page of the Full-Text Index Properties dialog box. For more information, see Manage Full-Text Indexes.
Requirements and restrictions for dropping a semantic index
You cannot drop full-text indexing from a column while retaining semantic indexing. Semantic indexing
depends on full-text indexing for document similarity results.
You cannot specify the NO POPULATION option when you drop semantic indexing from the last column in
a table for which semantic indexing was enabled. A population cycle is required to remove the results that
were indexed previously.
Check whether semantic search is enabled on database objects
Is semantic search enabled for a database?
Query the IsFullTextEnabled property of the DATABASEPROPERTYEX (Transact-SQL) metadata function.
A return value of 1 indicates that full-text search and semantic search are enabled for the database; a return value
of 0 indicates that they are not enabled.
SELECT DATABASEPROPERTYEX('database_name', 'IsFullTextEnabled')
GO
Is semantic search enabled for a table?
Query the TableFullTextSemanticExtraction property of the OBJECTPROPERTYEX (Transact-SQL) metadata
function.
A return value of 1 indicates that semantic search is enabled for the table; a return value of 0 indicates that it is not
enabled.
SELECT OBJECTPROPERTYEX(OBJECT_ID('table_name'), 'TableFullTextSemanticExtraction')
GO
Is semantic search enabled for a column?
To determine whether semantic search is enabled for a specific column:
Query the StatisticalSemantics property of the COLUMNPROPERTY (Transact-SQL) metadata function.
A return value of 1 indicates that semantic search is enabled for the column; a return value of 0 indicates
that it is not enabled.
SELECT COLUMNPROPERTY(OBJECT_ID('table_name'), 'column_name', 'StatisticalSemantics')
GO
Query the catalog view sys.fulltext_index_columns (Transact-SQL) for the full-text index.
A value of 1 in the statistical_semantics column indicates that the specified column is enabled for
semantic indexing in addition to full-text indexing.
SELECT * FROM sys.fulltext_index_columns WHERE object_id = OBJECT_ID('table_name')
GO
In Object Explorer in Management Studio, right-click on a column and select Properties. On the General
page of the Column Properties dialog box, check the value of the Statistical Semantics property.
A value of True indicates that the specified column is enabled for semantic indexing in addition to full-text
indexing.
Determine what can be indexed for Semantic Search
Check which languages are supported for Semantic Search
IMPORTANT
Fewer languages are supported for semantic indexing than for full-text indexing. As a result, there may be columns that you
can index for full-text search, but not for semantic search.
Query the catalog view sys.fulltext_semantic_languages (Transact-SQL).
SELECT * FROM sys.fulltext_semantic_languages
GO
The following languages are supported for semantic indexing. This list represents the output of the catalog view
sys.fulltext_semantic_languages (Transact-SQL), ordered by LCID.
LANGUAGE
LCID
German
1031
English (US)
1033
French
1036
Italian
1040
Portuguese (Brazil)
1046
Russian
1049
Swedish
1053
English (UK)
2057
Portuguese (Portugal)
2070
Spanish
3082
Determine which document types can be indexed
Query the catalog view sys.fulltext_document_types (Transact-SQL).
If the document type that you want to index is not in the list of supported types, then you may have to locate,
download, and install additional filters. For more information, see View or Change Registered Filters and Word
Breakers.
Best practice: Consider creating a separate filegroup for the full-text
and semantic indexes
Consider creating a separate filegroup for the full-text and semantic indexes if disk space allocation is a concern.
The semantic indexes are created in the same filegroup as the full-text index. A fully populated semantic index may
contain large amount of data.
Issue: Searching on specific column returns no results
Was a non-Unicode LCID specified for a Unicode language?
It is possible to enable semantic indexing on a non-Unicode column type with an LCID for a language that only has
Unicode words, such as LCID 1049 for Russian. In this case, no results will ever be returned from the semantic
indexes on this column.
Find Key Phrases in Documents with Semantic Search
3/24/2017 • 1 min to read • Edit Online
Describes how to find the key phrases in documents or text columns that are configured for statistical semantic
indexing.
Find the key phrases in documents with SEMANTICKEYPHRASETABLE
To identify the key phrases in specific documents, or to identify documents that contain specific key phrases, query
the function semantickeyphrasetable (Transact-SQL).
SEMANTICKEYPHRASETABLE returns a table with zero, one, or more rows for those key phrases associated with
columns in the specified table. This rowset function can be referenced in the FROM clause of a SELECT statement
as if it were a regular table name.
NOTE
In this release, only single words are indexed for semantic search; multi-word phrases (ngrams) are not indexed. Also, various
forms of the same word are indexed separately; for example, "computer" and "computers" are indexed separately.
For detailed information about the parameters required by the SEMANTICKEYPHRASETABLE function, and about
the table of results that it returns, see semantickeyphrasetable (Transact-SQL).
IMPORTANT
The columns that you target must have full-text and semantic indexing enabled.
Example 1: Find the top key phrases in a specific document
The following example retrieves the top 10 key phrases from the document specified by the @DocumentId
variable in the Document column of the Production.Document table of the AdventureWorks sample database. The
@DocumentId variable represents a value from the key column of the full-text index.
SELECT TOP(10) KEYP_TBL.keyphrase
FROM SEMANTICKEYPHRASETABLE
(
Production.Document,
Document,
@DocumentId
) AS KEYP_TBL
ORDER BY KEYP_TBL.score DESC;
GO
The SEMANTICKEYPHRASETABLE function retrieves these results efficiently by using an index seek instead of a
table scan.
Example 2: Find the top documents that contain a specific key phrase
The following example retrieves the top 25 documents that contain the key phrase “Bracket” from the Document
column of the Production.Document table of the AdventureWorks sample database.
SELECT TOP (25) DOC_TBL.DocumentID, DOC_TBL.DocumentSummary
FROM Production.Document AS DOC_TBL
INNER JOIN SEMANTICKEYPHRASETABLE
(
Production.Document,
Document
) AS KEYP_TBL
ON DOC_TBL.DocumentID = KEYP_TBL.document_key
WHERE KEYP_TBL.keyphrase = 'Bracket'
ORDER BY KEYP_TBL.Score DESC;
GO
Find Similar and Related Documents with Semantic
Search
3/24/2017 • 1 min to read • Edit Online
Describes how to find similar or related documents or text values, and information about how they are similar or
related, in columns that are configured for statistical semantic indexing.
Find similar or related documents with SEMANTICSIMILARITYTABLE
To identify similar or related documents in a specific column, query the function semanticsimilaritytable (TransactSQL).
SEMANTICSIMILARITYTABLE returns a table of zero, one, or more rows whose content in the specified column is
semantically similar to the specified document. This rowset function can be referenced in the FROM clause of a
SELECT statement like a regular table name.
You cannot query across columns for similar documents. The SEMANTICSIMILARITYTABLE function only
retrieves results from the same column as the source column, which is identified by the source_key argument.
For detailed information about the parameters required by the SEMANTICSIMILARITYTABLE function, and about
the table of results that it returns, see semanticsimilaritytable (Transact-SQL).
IMPORTANT
The columns that you target must have full-text and semantic indexing enabled.
Example: Find the top documents that are similar to another document
The following example retrieves the top 10 candidates who are similar to the candidate specified by @CandidateID
from the HumanResources.JobCandidate table in the AdventureWorks2012 sample database.
SELECT TOP(10) KEY_TBL.matched_document_key AS Candidate_ID
FROM SEMANTICSIMILARITYTABLE
(
HumanResources.JobCandidate,
Resume,
@CandidateID
) AS KEY_TBL
ORDER BY KEY_TBL.score DESC;
GO
Find info about how documents are similar or related with
SEMANTICSIMILARITYDETAILSTABLE
To get information about the key phrases that make documents similar or related, you can query the function
semanticsimilaritydetailstable (Transact-SQL).
SEMANTICSIMILARITYDETAILSTABLE returns a table of zero, one, or more rows of key phrases common across
two documents (a source document and a matched document) whose content is semantically similar. This rowset
function can be referenced in the FROM clause of a SELECT statement like a regular table name.
For detailed information about the parameters required by the SEMANTICSIMILARITYDETAILSTABLE function,
and about the table of results that it returns, see semanticsimilaritydetailstable (Transact-SQL).
IMPORTANT
The columns that you target must have full-text and semantic indexing enabled.
Example: Find the top key phrases that are similar between documents
The following example retrieves the 5 key phrases that have the highest similarity score between the specified
candidates in HumanResources.JobCandidate table of the AdventureWorks2012 sample database.
SELECT TOP(5) KEY_TBL.keyphrase, KEY_TBL.score
FROM SEMANTICSIMILARITYDETAILSTABLE
(
HumanResources.JobCandidate,
Resume, @CandidateID,
Resume, @MatchedID
) AS KEY_TBL
ORDER BY KEY_TBL.score DESC;
GO
Manage and Monitor Semantic Search
3/24/2017 • 3 min to read • Edit Online
Describes the process of semantic indexing and the tasks related to managing and monitoring the indexes.
Check the status of semantic indexing
Is the first phase of semantic indexing complete?
Query the dynamic management view, sys.dm_fts_index_population (Transact-SQL), and check the status and
status_description columns.
The first phase of indexing includes the population of the full-text keyword index and the semantic key phrase
index, as well as the extraction of document similarity data.
USE database_name
GO
SELECT * FROM sys.dm_fts_index_population WHERE table_id = OBJECT_ID('table_name')
GO
Is the second phase of semantic indexing complete?
Query the dynamic management view, sys.dm_fts_semantic_similarity_population (Transact-SQL), and check the
status and status_description columns..
The second phase of indexing includes the population of the semantic document similarity index.
USE database_name
GO
SELECT * FROM sys.dm_fts_semantic_similarity_population WHERE table_id = OBJECT_ID('table_name')
GO
Check the size of the semantic indexes
What is the logical size of a semantic key phrase index or a semantic document similarity index?
Query the dynamic management view, sys.dm_db_fts_index_physical_stats (Transact-SQL).
The logical size is displayed in number of index pages.
USE database_name
GO
SELECT * FROM sys.dm_db_fts_index_physical_stats WHERE object_id = OBJECT_ID('table_name')
GO
What is the total size of the full-text and semantic indexes for a full-text catalog?
Query the IndexSize property of the FULLTEXTCATALOGPROPERTY (Transact-SQL) metadata function.
SELECT FULLTEXTCATALOGPROPERTY('catalog_name', 'IndexSize')
GO
How many items are indexed in the full-text and semantic indexes for a full-text catalog?
Query the ItemCount property of the FULLTEXTCATALOGPROPERTY (Transact-SQL) metadata function.
SELECT FULLTEXTCATALOGPROPERTY('catalog_name', 'ItemCount')
GO
Force the population of the semantic indexes
You can force the population of full-text and semantic indexes by using the START/STOP/PAUSE or RESUME
POPULATION clause with the same syntax and behavior that is described for full-text indexes. For more
information, see ALTER FULLTEXT INDEX (Transact-SQL) and Populate Full-Text Indexes.
Since semantic indexing is dependent on full-text indexing, semantic indexes are only populated when the
associated full-text indexes are populated.
Example: Start a full population of full-text and semantic indexes
The following example starts full population of both full-text and semantic indexes by altering an existing full-text
index on the Production.Document table in the AdventureWorks2012 sample database.
USE AdventureWorks2012
GO
ALTER FULLTEXT INDEX ON Production.Document
START FULL POPULATION
GO
Disable or re-enable semantic indexing
You can enable or disable full-text or semantic indexing by using the ENABLE/DISABLE clause with the same
syntax and behavior that is described for full-text indexes. For more information, see ALTER FULLTEXT INDEX
(Transact-SQL).
When semantic indexing is disabled and suspended, queries over semantic data continue to work successfully and
to return previously indexed data. This behavior is not consistent with the behavior of Full-Text Search.
-- To disable semantic indexing on a table
USE database_name
GO
ALTER FULLTEXT INDEX ON table_name DISABLE
GO
-- To re-enable semantic indexing on a table
USE database_name
GO
ALTER FULLTEXT INDEX ON table_name ENABLE
GO
About the phases of semantic indexing
Semantic Search indexes two kinds of data for each column on which it is enabled:
1. Key phrases
2. Document similarity
Semantic indexing occurs in two phases, in conjunction with full-text indexing:
3. Phase 1. The full-text keyword index and the semantic key phrase index are populated in parallel at the
same time. The data required to index document similarity is also extracted at this time.
4. Phase 2. The semantic document similarity index is then populated. This index depends on both indexes
that were populated in the preceding phase.
Issue: Semantic Indexes Are Not Populated
Are the associated full-text indexes populated?
Since semantic indexing is dependent on full-text indexing, semantic indexes are only populated when the
associated full-text indexes are populated.
Are full-text search and semantic search properly installed and configured?
For more information, see Install and Configure Semantic Search.
Is the FDHOST service not available, or is there another condition that would cause full-text indexing to fail?
For more information, see Troubleshoot Full-Text Indexing.
Semantic Search DDL, Functions, Stored Procedures,
and Views
3/24/2017 • 1 min to read • Edit Online
Lists the Transact-SQL statements and the database objects that support statistical semantic search in SQL Server.
For the list of statements and database objects that support full-text search, see Full-Text Search DDL, Functions,
Stored Procedures, and Views.
Data Definition Language (DDL) Statements
OBJECT
MORE INFORMATION
ALTER FULLTEXT INDEX (Transact-SQL)
Enable Semantic Search on Tables and Columns
CREATE FULLTEXT INDEX (Transact-SQL)
Enable Semantic Search on Tables and Columns
System Functions
OBJECT
MORE INFORMATION
semantickeyphrasetable (Transact-SQL)
Find Key Phrases in Documents with Semantic Search
semanticsimilaritydetailstable (Transact-SQL)
Find Similar and Related Documents with Semantic Search
semanticsimilaritytable (Transact-SQL)
Find Similar and Related Documents with Semantic Search
System Metadata Functions
OBJECT
MORE INFORMATION
COLUMNPROPERTY (Transact-SQL)
Enable Semantic Search on Tables and Columns
DATABASEPROPERTYEX (Transact-SQL)
Enable Semantic Search on Tables and Columns
FULLTEXTCATALOGPROPERTY (Transact-SQL)
Manage and Monitor Semantic Search
INDEXPROPERTY (Transact-SQL)
Manage and Monitor Semantic Search
OBJECTPROPERTYEX (Transact-SQL)
Enable Semantic Search on Tables and Columns
SERVERPROPERTY (Transact-SQL)
Install and Configure Semantic Search
System Stored Procedures
OBJECT
MORE INFORMATION
sp_fulltext_semantic_register_language_statistics_db (TransactSQL)
Install and Configure Semantic Search
sp_fulltext_semantic_unregister_language_statistics_db
(Transact-SQL)
Install and Configure Semantic Search
Catalog Views
OBJECT
MORE INFORMATION
sys.fulltext_index_columns (Transact-SQL)
Manage and Monitor Semantic Search
sys.fulltext_semantic_language_statistics_database (TransactSQL)
Install and Configure Semantic Search
sys.fulltext_semantic_languages (Transact-SQL)
Install and Configure Semantic Search
Dynamic Management Views
OBJECT
MORE INFORMATION
sys.dm_db_fts_index_physical_stats (Transact-SQL)
Manage and Monitor Semantic Search
sys.dm_fts_index_population (Transact-SQL)
Manage and Monitor Semantic Search
sys.dm_fts_semantic_similarity_population (Transact-SQL)
Manage and Monitor Semantic Search
See Also
Manage and Monitor Semantic Search