Title Back Colour Keyoti Title Line Title Curve
Blue Box Top

Stoplists - SearchUnit - Forum

Welcome Guest Search | Active Topics | Log In | Register

Options
DMacy
#1 Posted : Thursday, December 11, 2014 12:01:56 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
I am entering "blessing others" in my search box with double quotes before and after (an exact phrase search). I am getting a list of results that the word "blessing" but not "others." I looked in the text of the file for the first hit, and it does not contain the word "others."

I found a stoplist file that contains the word "others" but I don't see how it is being used by MyKeyotiResultsControl.SearchAgent.Search. I have verified that the text being passed to SearchAgent.Search is "blessing others" including the quotes.

How is a stoplist being applied to the searching, and is there a way to not use it for exact phrase searching?
Jim
#2 Posted : Thursday, December 11, 2014 1:17:43 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
The stoplist is used during indexing - any words on the stoplist are not forward indexed, which saves index size. At search time stoplist words are ignored.

Since the point of the stoplist is to avoid putting words in the index, you can't be selective about when to apply it.

You could remove 'others' from the stoplist, and reindex everything. To do that just edit the stoplist.txt in a text editor. Just remember that whenever you create a fresh index, the default stoplist.txt is generated.

Frankly, unless your index is really big (like hundreds of thousands or millions of docs) you probably don't need to worry about using a stoplist at all and can just delete all of the words in it.

Best
Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


DMacy
#3 Posted : Thursday, December 11, 2014 1:36:12 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
Thank you for the clarification. I'm of the same mind as your last suggestion, to remove all stoplist words. I don't want to jeopardize good results for phrase searching, and we're only dealing with tens of thousands of documents, so we should be fine with the size of the index.

Thanks again for your help!
DMacy
#4 Posted : Thursday, December 11, 2014 1:49:57 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
In my app that uses PreloadedDocument to generate the index, is there a way to eliminate the stoplist?

With my plugin (the other index), I can delete all the contents of the stoplist.txt file before the documents are imported to generate the index.
Jim
#5 Posted : Thursday, December 11, 2014 3:08:20 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Yep, see Clear Programmatically -> http://keyoti.com/produc...rGuide/Stop%20Lists.htm

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


DMacy
#6 Posted : Thursday, December 11, 2014 4:28:58 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
Thank you for the link. I added the statement to my code as follows:

Dim ProdConfig As New Configuration
ProdConfig.IndexDirectory = IndexFolderPath
ProdConfig.Logging = True

Dim ConfigMgr As ConfigurationManager = New ConfigurationManager(IndexFolderPath)
ConfigMgr.SaveSettings(ProdConfig)

Dim ProdIndex As New Index.DocumentIndex(ProdConfig)

ProdConfig.StopWords.Clear()

I regenerated the index and it executed successfully. The index folder is slightly larger than it was before. There is still a stoplist.txt file with words in it, but I assume that the StopWords.Clear function only clears what's in memory.

However, when I do a sample search ("blessing others" including the quotes), the search results list appears to be ignoring the word "others" (which is included in the default stop list). I searched through all the data that is going into the index with another program, and that exact phrase doesn't occur at all. The phrase "bless others" does occur once, so I would expect there to be one search result.

Amy I doing something wrong? Do you have any other suggestions?

Thanks!
Jim
#7 Posted : Thursday, December 11, 2014 7:22:31 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
In the codebehind of the search page, try (in Page_Load)

SearchResult1.Configuration.StopWords.Clear()

an alternative to this would be to write an empty stoplist.txt to the index directory when the index is first created

File.Delete( Path.Combine (ProdConfig.IndexDirectory, "stoplist.txt"))
File.Create( Path.Combine (ProdConfig.IndexDirectory, "stoplist.txt")).Close()

something like that, not sure how Delete handles situation where the file doesn't exist already.

Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


DMacy
#8 Posted : Thursday, December 11, 2014 8:58:56 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
That's what I decided to do. After the index is created, I delete the stoplist.txt file and recreate it without contents.

When I tried a sample search in the Index Manager Tool, it appeared that it was using the default stoplist, because it gave me a message about "others" being a common word when I tried "blessing others." That made me think that the stoplist is being used both in the creation of the index and in the search results.
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.




About | Contact | Site Map | Privacy Policy

Copyright © 2002- Keyoti Inc.