One moment please...
 
Exact Synergy Enterprise   
 

WordIndex - Background process

Introduction 

Enterprise search is an advanced functionality that allows you to rate content that is stored in requests, documents, and document attachments. The following are some examples of how content is rated based on the criteria defined:

  • The number of times the search term exists in the requests, documents, or attachments.
  • The number of times the documents are referenced to.
  • The number of references that exist to other information carriers.

The extensions of attachments that can be searched on content wise are .doc, .docx, .ppt, pptx, .pdf.

Apart from relevance ranking, the Enterprise search functionality also incorporates:

  • Synonym search This is based on a self-defined list of synonym words that are linked to each other. For example, if "proposal" = "quotation", searching for the term "proposal" will in this case also provide you information where "quotation" is used.

  • Do you mean  When the search term is misspelt, the system will look for better alternatives in the database and propose this as the term to be used. For example, searching for "qoutation" will result in the system proposing "quotation" instead.

  • Expertise search When using the resource search field, you can search for a resource and the documents that this particular resource has written. For example, searching for "SQL 2005" will give you a list of the relevant resources that have written on this topic.

Compared to the .ASP version, this functionality is a replacement for the previous method of indexing via ExactFulltext.exe. New tables have been introduced and these tables need to be populated. There can be many requests in an implementation, but what is important is that the WordIndex process is performed in the most efficient manner.

This document will describe the steps that should be taken to get populated full text tables for the search engine. Two background jobs should be created, one to index the documents, requests, and attachments and another to optimize and enable the Enterprise search functionality.

Technical information 

The following tables are involved in the Enterprise search functionality:

  • Words

  • WordReferences

  • WordSynonyms

In the first two tables, the words are linked to the GUIDs of the documents, document attachments, and requests in which they are found, including the data used for calculating the relevance. When a user searches for certain words in Exact Synergy Enterprise, the words are checked in the tables mentioned and the results are shown to the user. The WordSynonyms table stores the user-defined synonyms, which are defined via System/Setup/Full-text index/Synonyms

In addition, an entry is made in the BacoSettings table after each run, which states when the job was last run. When the job is started again, it will not index documents that were created or last modified before that time. It will only index documents that are created or last modified after that time. 

In this example, assume that Exact Synergy Enterprise is installed in the folder E:\Program Files\Exact Software\Exact Synergy. The database server name is "SYNServer", the name of the database is "SynergyNET", and the virtual directory is also named "SynergyNET".

The old syntax is as follows:

E:\Program Files\Exact Software\Exact Synergy\bin\ExactFullText.exe /S:SYNServer /D:SynergyNET

The new syntax is as follows:

Exact.Process.exe /DBCONFIG:<virtualdirectory> /ASSEMBLY:<bgJob Assembly> /CLASS:<bgJob Class>

This results in:

E:\Program Files\Exact Software\Exact Synergy\bin\Exact.Process.exe /DBCONFIG:SynergyNET /ASSEMBLY:Exact.WordIndex /CLASS:IndexDocuments

iFilters
To index attachments, you can use iFilters. For every extension, iFilters should be installed in the server on which the full text background job is started. For more information about iFilters, please look at http://www.ifilter.org/ and http://blogs.adobe.com/pdfitmatters/2008/12/adobe_pdf_ifilter_9_for_64bit.html.

Installation of Office iFilters
Microsoft Office iFilters is installed by default when you install one of the Microsoft Office packages like Microsoft Word, Microsoft Excel, or Microsoft PowerPoint. The Exact.Jobs.SysHrMail.dll background job requires Microsoft Excel to create exports in the Microsoft Excel format. If this job is configured with Microsoft Excel, the iFilters will be available for the full text background process.

Installation of PDF iFilters
iFilters for PDF can be downloaded from the Adobe website. Adobe has iFilters for 32-bit and 64-bit systems
. The 32-bit version of PDF iFilter 9 is already installed automatically with Acrobat 9 and Reader 9. For the 64-bit version, you can download it here.

Configuration of iFilters
A path setting to the installation directory of Adobe is needed. 
Path settings can be configured via Start, Run, Control Panel, System, Advanced System Settings, Environment Variables, System variables, and Path.

Parameters

The WordIndex background process supports the following parameters. The command line parameters are case sensitive.

/Abort
Aborts all the running WordIndex processes for this database.


E:\Synergy\bin\Exact.Process.exe /DBCONFIG:Synergy /ASSEMBLY:Exact.WordIndex /CLASS:IndexDocuments /Abort:1

 

/Filter
The filter can be used to index a specific set of entities.

 

     /Filter:"bd.ModifiedDate>2004-01-01"

 

     or

 

     /Filter:"Absences.HID=45662"

 

/CacheWords
With this option, the process will cache all the existing words. Do not use this command if there are only a few entities that must be processed. Caching will speed up the process only if a lot of entities must be processed. New words are not dynamically added to the cache, therefore when you are starting to build WordIndex, it is better to stop the process after some entities (e.g. 1,000) are done and then restart the process to cache the words that are just inserted.

 

/Bulk
This option enables “bulk inserts”. This method might speed up the process because it decreases the number of roundtrips to the database server.

 

/Selection
0=Timestamp
1=All
The entities are always processed in the order of timestamps. ‘All’ means that all entities will be processed while "Timestamp" means that the process will continue from where it has last finished. The timestamp of the last processed entity, is by default stored in the setting "FullTextDocTimestamp". A different setting name can be specified with the command line option "TimeStampSetting".

 

/MaxEntities
This specifies the number of entities to process.

 

/MaxMinutes
This specifies the number of minutes the process may take to execute. The process will automatically terminate after the time has elapsed. Notice that the program might run a bit longer than specified because it will end after all the insert and update queues are empty.

 

/Descending
This specifies that the entities must be processed in descending order.

 

/TimeStampSetting
Use this option to specify a setting name that will be used to save the timestamp of the last processed entity. This option is very useful if you want to run more instances of the WordIndex process. Every instance can process a different range (use a filter) and save the state as a different setting.

 

/Reindex
For background job IndexAttachments, a new parameter 'Reindex' is introduced with an integer value. This parameter is introduced to repair existing data. This parameter:

  • fixes existing document with deleted attachment but the word references still exists.
  • repairs existing attachment indexes to include attachment ID into the new column called SubEntity.

The value assigned to the parameter basically means how many of existing attachment word references the user wants to reindex per execution of background job. If user does not state any value, reindexing will never be executed. The value is made flexible and can be assigned by user because it depends on the capacity of user's machine which only the user knows.

When this parameter is defined, the following will be done:

  • Retrieve the value of ValueType column in BacoSettings table for the SettingName = 'WordReindex'.
  • If the value of the setting is 1, reindex will proceed for the number of records defined by the 'Reindex' parameter. The reindex process will first delete the number of records (defined by the 'Reindex' parameter), and then reindex accordingly. If the deletion returns 0 records, then the ValueType for the setting 'WordReindex' will be set to 0.
  • If the value of the setting is 0, the repair process has already been completed. Therefore, nothing will be done even if the 'Reindex' parameter is defined.

Note that:

  • In order for the reindex to work, the parameter /TimeStampSetting must not be defined. The background job will then loop through all the attachments in order to reindex accordingly.
  • Since this was introduced in product update 247, users who started using the application from product update 247 onwards will not be affected as there are no existing data to repair.
  • Due to the deletion of records in the repair process, the indexes might be badly fragmented. This will enlarge the index sizes substantially and result in poor query performance as well. Thus, it is recommended to rebuild all indexes in the WordReferences table once the repair has been completed (the ValueType of Setting 'WordReindex' is 0). In the case of increasing index sizes, rebuilding of indexes can be done even if the repair process has not been completed.

Syntax

Background job for documents:

Syntax:
"<Installation directory>\bin\Exact.Process.exe" /DBCONFIG:<database virtualdirectory entry db.config> /ASSEMBLY:Exact.WordIndex /CLASS:IndexDocuments /Descending:0 /TimeStampSetting:FulltextDocuments /Selection:0

Syntax example:
"C:\Program Files\Exact Software\SYNNET\bin\Exact.Process.exe" /DBCONFIG:SYNNET390 /ASSEMBLY:Exact.WordIndex
/CLASS:IndexDocuments /Descending:0 /TimeStampSetting:FulltextDocuments /Selection:0

Background job for requests:

Syntax:
"<Installation directory>\bin\Exact.Process.exe" /DBCONFIG:<database virtualdirectory entry db.config> /ASSEMBLY:Exact.WordIndex /CLASS:IndexRequests /Descending:0 /TimeStampSetting:FulltextRequests /Selection:0

Syntax example:
"C:\Program Files\Exact Software\SYNNET\bin\Exact.Process.exe" /DBCONFIG:SYNNET390 /ASSEMBLY:Exact.WordIndex
/CLASS:IndexRequests /Descending:0 /TimeStampSetting:FulltextRequests /Selection:0

Note: The WordIndex background process will only index requests that do not have any expiry date or entitlement created. Hence, only requests with the "Absences.Buildup = 0" clause will be included in the WordIndex background process.

 Background job for attachments:

Syntax:
"<Installation directory>\bin\Exact.Process.exe" /DBCONFIG:<database virtualdirectory entry db.config> /ASSEMBLY:Exact.WordIndex /CLASS:IndexAttachments /Descending:0 /TimeStampSetting:FulltextAttachments /Selection:0 /Reindex:2500

Syntax example
"C:\Program Files\Exact Software\SYNNET\bin\Exact.Process.exe" /DBCONFIG:SYNNET390 /ASSEMBLY:Exact.WordIndex
/CLASS:IndexAttachments /Descending:0 /TimeStampSetting:FulltextAttachments /Selection:0 /Reindex:2500

         

Background job for request attachments:

Syntax:
"<Installation directory>\bin\Exact.Process.exe" /DBCONFIG:<database virtualdirectory entry db.config> /ASSEMBLY:Exact.WordIndex /CLASS:IndexReqAttachments /Descending:0 /TimeStampSetting:FulltextxReqAttachments /Selection:0

Syntax example
"C:\Program Files\Exact Software\SYNNET\bin\Exact.Process.exe" /DBCONFIG:SYNNET390 /ASSEMBLY:Exact.WordIndex
/CLASS:IndexReqAttachments /Descending:0 /TimeStampSetting:FulltextReqAttachments /Selection:0 

Related documents 

     
 Main Category: Support Product Know How  Document Type: Online help main
 Category: On-line help files  Security  level: All - 0
 Sub category: Details  Document ID: 16.108.779
 Assortment:  Date: 28-01-2016
 Release:  Attachment:
 Disclaimer

Attachments
WordIndex-Background Process.doc 84.0 KB Download