* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Juan Soto - Adolfo J. Socorro | Databases, Data Analysis
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Access wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational model wikipedia , lookup
Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. [email protected] Outline  What can we do with FTS?  How to install FTS  FTS components  Creating FTS indexes  How to query with FTS  FILESTREAM and FileTable FTS Basics FTS allows searching against character-based data  char  ntext  varchar  image  nchar  xml  nvarchar  varbinary  text  varbinary(max) Search Functionality Specific words or phrases • “hotel” => “hotel” Prefixes • “fan” => “fantastic”, “fantasy” • “local store” => “locally stored” Inflectional forms • “minimized” => “minimizing”, “minimise” Search Functionality Proximity • “search,query” => “query to perform search” Synonyms • “folder” => “directory” Weighted Values A First Look  Let’s run some simple examples to get a feel for FTS! LIKE vs FTS  LIKE works on character patterns only  Cannot use the LIKE predicate to query formatted binary data  FTS is much faster against large amounts of unstructured text data Supported SQL Server Editions  Enterprise  Business Intelligence  Standard  Web  Express with Advanced Services Available since at least SQL Server 2000 FTS Components Word Breaker Thesaurus Stemmer Stoplists Filters Property Lists Language Support  50+ languages  Language-specific components  Word breakers and stemmers  Stoplists  Thesaurus files How to Install Default FTS Language FTS Indexes  One index per table or indexed view  Must have a unique, single-column, nonnullable index on the table  Grouped within the same database into one or more full-text catalogs (“containers”) Full-Text Catalogs A logical construct A way to manage FT indexes together Index Population  Population: indexes the addition of data to full-text Automatic Manual • On Request • Scheduled Steps to Setup an Index on a Table Create Full-Text Catalog For Each Column to Index • Indicate language • Indicate document type * Choose Change-Tracking Mechanism Full-Text Index Wizard Example: Create Catalog and Index CONTAINS  Precise or prefix matches to single words and phrases  Proximity matches  Logical operations between conditions: AND, OR, AND NOT  Optional use of inflectional forms and thesaurus FREETEXT  Matching the meaning, but not the exact wording, of specified words or phrases  Always uses inflectional forms and thesaurus CONTAINSTABLE AND FREETEXTTABLE  Return a relevance ranking value (RANK) and full-text key (KEY) for each row  The actual RANK values are unimportant and typically differ each time the query is run  ISABOUT/WEIGHT influence the ranking in CONTAINSTABLE Example: Queries Stoplists A mechanism to discard commonly occurring strings that do not help the search a is the by and … Thesaurus  Nicknames: Robert/Bob  Common misspellings: calendar/calender  Homophones: Geoff/Jeff  Technical terms: proc/procedure Very powerful if you log searches and learn what users are commonly searching for Thesaurus  One file per language Expansions Replacements “bike” in addition to “bicycle” “calendar” instead of “calender” Filters  Extract textual information from the document (removing the formatting)  Send the text to the word-breaker component for the language associated with the column  Need to manually install Office 2010 and PDF filters Example: FTS Components Where to Store Large Objects? Database File System Why Store in the Database?  Integrating unstructured data into the relational database provides significant benefits:  Integrated storage and data management capabilities (e.g., backup)  Ease of administration and policy management  Full-text search FILESTREAM A database/file system hybrid  FILESTREAM is an attribute that can be assigned to a varbinary(max) column  Allows  Not storing BLOB data in the file system restricted to the 2 GB limit SQL Server imposes on BLOBs FILESTREAM  SQL Server buffer pool is not used  Isolation semantics are governed by Database Engine transaction isolation levels Steps to FILESTREAM Enable at OS level Configure at instance level Create a filegroup Add a file to the filegroup • Indicate root folder OS-level Configuration of FILESTREAM Instance-level Configuration of FILESTREAM Example: FILESTREAM FILESTREAM  All data access must be transactional  Must  Do use specific APIs for file I/O not edit the files directly! When to Use FILESTREAM  Objects that are being stored are, on average, larger than 1 MB  Store smaller objects in the database  Fast  You read access is important are using a middle tier for application logic FileTables A special, fixed-schema kind of table  Builds on top of existing FILESTREAM capabilities  Store files and documents in in the database, but access them from Windows applications as if they were stored in the file system (WIN32 API) FileTables  Hierarchical  Includes namespace file system properties as columns  Preserves full file names  Non-transactional access through the FS FileTables  Calls to create or change a file or directory through the Windows share are intercepted by a SQL Server component and reflected in the corresponding relational data in the FileTable Example: FTS over FileTables FileTables vs FILESTREAM  File and directory hierarchy maintained in the database  Windows application compatibility  Relational  Both access to file attributes are available in all editions Wrap Up  Advanced searching on character-based data, including documents  FTS setup, components, and queries  FILESTREAM  FileTables Other Topics  Document-property  Semantic search search  Optimizations  Query plans and execution traces References  Posts  http://www.sqlskills.com/blogs/bobb/  Blog:  SQL Server FTS Team Blog http://blogs.msdn.com/b/sqlfts  SQL  and presentations by Bob Beauchemin Server 2012 Books Online http://msdn.microsoft.com/enus/library/cc645577(SQL.110).aspx Filter Packs  Adobe  http://www.adobe.com/support/downloads/thankyo u.jsp?ftpID=4025&fileID=3941  Office  PDF Filter 2010 Filters http://www.microsoft.com/enus/download/details.aspx?id=17062
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            