|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
I am entering "blessing others" in my search box with double quotes before and after (an exact phrase search). I am getting a list of results that the word "blessing" but not "others." I looked in the text of the file for the first hit, and it does not contain the word "others."
I found a stoplist file that contains the word "others" but I don't see how it is being used by MyKeyotiResultsControl.SearchAgent.Search. I have verified that the text being passed to SearchAgent.Search is "blessing others" including the quotes.
How is a stoplist being applied to the searching, and is there a way to not use it for exact phrase searching?
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
The stoplist is used during indexing - any words on the stoplist are not forward indexed, which saves index size. At search time stoplist words are ignored. Since the point of the stoplist is to avoid putting words in the index, you can't be selective about when to apply it. You could remove 'others' from the stoplist, and reindex everything. To do that just edit the stoplist.txt in a text editor. Just remember that whenever you create a fresh index, the default stoplist.txt is generated. Frankly, unless your index is really big (like hundreds of thousands or millions of docs) you probably don't need to worry about using a stoplist at all and can just delete all of the words in it. Best Jim -your feedback is helpful to other users, thank you!-your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
Thank you for the clarification. I'm of the same mind as your last suggestion, to remove all stoplist words. I don't want to jeopardize good results for phrase searching, and we're only dealing with tens of thousands of documents, so we should be fine with the size of the index.
Thanks again for your help!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
In my app that uses PreloadedDocument to generate the index, is there a way to eliminate the stoplist?
With my plugin (the other index), I can delete all the contents of the stoplist.txt file before the documents are imported to generate the index.
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Yep, see Clear Programmatically -> http://keyoti.com/produc...rGuide/Stop%20Lists.htm
-your feedback is helpful to other users, thank you!-your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
Thank you for the link. I added the statement to my code as follows:
Dim ProdConfig As New Configuration ProdConfig.IndexDirectory = IndexFolderPath ProdConfig.Logging = True
Dim ConfigMgr As ConfigurationManager = New ConfigurationManager(IndexFolderPath) ConfigMgr.SaveSettings(ProdConfig)
Dim ProdIndex As New Index.DocumentIndex(ProdConfig)
ProdConfig.StopWords.Clear()
I regenerated the index and it executed successfully. The index folder is slightly larger than it was before. There is still a stoplist.txt file with words in it, but I assume that the StopWords.Clear function only clears what's in memory.
However, when I do a sample search ("blessing others" including the quotes), the search results list appears to be ignoring the word "others" (which is included in the default stop list). I searched through all the data that is going into the index with another program, and that exact phrase doesn't occur at all. The phrase "bless others" does occur once, so I would expect there to be one search result.
Amy I doing something wrong? Do you have any other suggestions?
Thanks!
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
In the codebehind of the search page, try (in Page_Load) SearchResult1.Configuration.StopWords.Clear() an alternative to this would be to write an empty stoplist.txt to the index directory when the index is first created File.Delete( Path.Combine (ProdConfig.IndexDirectory, "stoplist.txt")) File.Create( Path.Combine (ProdConfig.IndexDirectory, "stoplist.txt")).Close() something like that, not sure how Delete handles situation where the file doesn't exist already. Jim -your feedback is helpful to other users, thank you!-your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
That's what I decided to do. After the index is created, I delete the stoplist.txt file and recreate it without contents.
When I tried a sample search in the Index Manager Tool, it appeared that it was using the default stoplist, because it gave me a message about "others" being a common word when I tried "blessing others." That made me think that the stoplist is being used both in the creation of the index and in the search results.
|
|