The index format achieves greater efficiency during indexing, in part by minimizing the amount of redundant optimization performed. In v3 the index was always optimized for searching; whereas in v2010 up, the index should be optimized after batch operations have been performed (eg. after imports, adds and deletes). Optimization usually takes seconds to minutes, depending on the index size. The UI's have an Optimize form, or to optimize programmatically, call DocumentIndex.Optimize(), eg.
DocumentIndex idx = new DocumentIndex(config); idx.Import(...); idx.Optimize(); idx.Close();
DocumentIndex idx = new DocumentIndex(config); idx.Optimize(); idx.Close();
Optimization can improve search performance by a factor of ~5 and has no effect on future indexing performance.
Due to the way Windows Explorer works, and the fact that the search engine writes/deletes lots of files during indexation, it is advisable to close all instances of Windows Explorer. If an instance of Explorer has been used to view the index directory and then is pointed to a different directory, it can still have a substantial negative impact on indexing speed.
To ensure optimum search results and performance you should index only those files that you wish to be searchable.
For example, if you have a very large repository of observational data (e.g. text files filled with records from flight, weather data etc that have many different 'word' strings) then indexing it may cause slow downs for certain searches (wildcards) and any index change operations (adding/removing etc). Although Search can handle this type of data, any unnecessary indexing should be avoided if possible.This also applies to binary files - although binary file types are ignored by default, this only happens if the server sends the correct mime-type with the response. Eg. if the server response for a .wmv file is "text/plain" then the binary content will be parsed as text, which will mostly be garbage and fill the index unnecessarily. The indexer deliberately includes everything it finds (except for a specific stop-list) because it is dangerous to use heuristics to identify words which would 'never be searched' and therefore shouldn't be indexed. Eg. company/product names with symbols in them shouldn't be ignored.
Note: Any change to CreateForwardIndex will require the index to be deleted and recreated.