THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Bit Bucket (Greg Low): IDisposable

Ramblings of Greg Low (SQL Server MVP, MCM and Microsoft RD) - SQL Down Under

Modifying the Thesaurus in Full Text Search in SQL Server 2008

While I was prepping my full-text search session for TechEd Australia today, I decided to modify the thesaurus. I found the discussion in books online a bit confusing regarding the locations of the files involved.

What threw me was it said the default location of the thesaurus files was:

SQL_Server_install_path\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\FTDATA\

I went there and couldn't find anything. I worked out that for ENU language, my file should be called tsENU.xml. I searched for that and found it in:

C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\Templates

I modified that and got nowhere. Turns out that as the name implies, this is where the templates for the files live. I then tried creating the missing folder under the stated default location and also got nowhere. Turns out that it's actually related to your data directory location. Where I needed to put it was:

C:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\FTData 

because I'd used C:\SQLData as my data folder. So it's not the SQL Server install path, it's the data file folder. Anyway, that file contains the following XML:

<XML ID="Microsoft Search Thesaurus">

<!--  Commented out

    <thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics_sensitive>0</diacritics_sensitive>
        <expansion>
            <sub>Internet Explorer</sub>
            <sub>IE</sub>
            <sub>IE5</sub>
        </expansion>
        <replacement>
            <pat>NT5</pat>
            <pat>W2K</pat>
            <sub>Windows 2000</sub>
        </replacement>
        <expansion>
            <sub>run</sub>
            <sub>jog</sub>
        </expansion>
    </thesaurus>
-->
</XML>

The template file has three examples commented out. The diacritics entry relates to whether or not accents matter ie: in words like cafe. Expansion entries are used to provide synonyms. In this case, if you search for any one of "Internet Explorer", "IE" and "IE5", it will search for all of them. Replacement is used to force an alternate lookup. In the example above, if you search for either "NT5" or "W2K", it will search for "Windows 2000" instead.

After modifying the correct file, I then caused it to be loaded by specifying:

EXEC sys.sp_fulltext_load_thesaurus_file 1033 

and then I was in working fine with queries like:

SELECT MessageID,Description

FROM dbo.Messages

WHERE CONTAINS(Description,'FORMSOF(THESAURUS,punter)')

ORDER BY MessageID;

Hope this helps someone get started with it.

 

 

 

 

Published Wednesday, August 13, 2008 7:47 PM by Greg Low

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Microslave said:

Thanks a lot for the hint about the ambiguous information in BOL concerning the location of the thesaurus configuration file! I DID find the corresponding file in the default installation path (for my German locale that was "tsdeu.xml") and modified it. After some hours of trying, restarting, searching the web, modifying my queries, creating new tables and sample queries, I was finally desperately lost until I found this information. I then switched to my Data Folder (d:\SQL2008\Data\MSSQL10.MSSQLSERVER\MSSQL\FTData\) and modified the correct file this time and it worked like a charm (well after some timeout errors after trying to update thesaurus configuration with sys.sp_fulltext_load_thesaurus_file).

Sometimes you're completely on the wrong track (or in the wrong folder at least), so I learned that one must indeed sometimes question the infos in BOL, although this stays #1 SQL Server resource for me! So thanks again, for sharing your knowledge. :-)

September 2, 2010 2:09 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Tags

No tags have been created or used yet.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement