Document Reindexing Logic (Modified Dates)

To prevent documents being reindexed needlessly, the engine determines if a document has changed, and will only re-index it if it has.

Note, in Search Lite, re-indexing is not automatic, you must run the indexer either through the SearchResult control or programmatically.

Whether a document has changed is determined by it's modified date, and it's content-size.

Modified Date

Any documents whose 'modified date' is older than the date the document was last indexed, will not be re-indexed. To prevent this default behavior, set IgnoreLastModifiedDate in the configuration to true.

Content Size

Ideally the last-modified date of a document would always be used to identify if it has changed, however web-servers do not always return this information. Further, dynamic pages do not have 'last-modified' dates, since they are dynamic.

The byte size of a document can however be used to identify change with reasonable accuracy. If a document has a different file size, then it has definitely changed. Of course, it is also possible for a document to change, but have the same size (exactly the same number of letters are deleted as added, for example), so this is not 100% accurate.

Despite not having 100% accuracy, it may be desirable to forgo accuracy for performance, by having UseFileSizeToIdentifyChange set to true in the configuration.

Indexing Interval

Further control over re-indexing can be asserted through the IndexingInterval configuration setting, which specifies the number of days that must pass before a document will be reindexed (regardless of whether it was modified), the diagrams below explain the relationship between document modified dates, Indexing Interval and when a document will be indexed.

Reindexing Conditions

In order for a document to be indexed, all of the following conditions must be met;

a) the number of days from today, to the date the document was last indexed, must be equal to or greater than IndexingInterval (eg. this condition is ALWAYS met if IndexingInterval=0).

b) IgnoreLastModifiedDate must be true OR the document's "modified date" must not be older than the document's last index date (if they're equal, it will be indexed) (eg. the document must have changed since it was last indexed).

c) UseFileSizeToIdentifyChange must be false OR the document's "content size" must not different to the document's size when it was last indexed.

Note: if the server fails to supply the indexer with a last modified date in the response, then today will be used.