|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
I am getting duplicate search hits for one of my search tests. In fact, I'm getting multiple instances of duplicates. I've looked at the generation of the index (which is done by a VB.NET program using PreloadedDocument objects to add to the index), and I don't think my code is adding duplicate documents. I've saved a CSV file of the custom data for the search hits immediately after they've been retrieved from SearchAgent.Search, and it contains the duplicates.
My search term contains three words, and the ImpliedLogicOperator is And.
Why would SearchUnit return multiple ResultItem objects that are identical (point to the same document in the index)? Is this to be expected at times and I need to deal with it in my code?
Thanks!
Dan
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Definitely not to be expected - do the results have identical URLs, same protocol, same case. Theres a config option for whether to be case sensitive in URLs or not. Jim -your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
The config option URLCaseSensitive is false. The URLs are identical for the results that are being repeated three times in a row. Can I send you an email with steps to show you an example of what I'm talking about?
Dan
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Of course, please do. -your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Thanks for sending me the info. I’m not really sure what is causing it. I see you are grouping by Book. How are you doing that? Have you tried searching with our index management tool to see if you still get duplicates? I'm wondering if the processing you're doing for the grouping is causing the problem. Have you tried Optimizing the index? That could help because it consolidates index slices and deletions. Best Jim -your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
Yes, I'm grouping by book, and I suspected my code first. But I looped through the Keyoti.SearchEngine.Search.SearchResult object's result items immediately after returning, and it listed three result items with the same URL.
The index is optimized.
I have tried your index management tool, but with only a sample of 25 results returned, it doesn't hit this case, since there are thousands of results for the search terms I gave you to try.
I'll be glad to share more code with you or even the index, though it's about 1 GB. Let me know how I can help.
Best regards, Dan
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Quote: I have tried your index management tool, but with only a sample of 25 results returned, it doesn't hit this case, since there are thousands of results for the search terms I gave you to try.
True, but if you use a specific phrase (in quotes) direct from the text you should be able to narrow down to one page. However if you're seeing it in the SearchResult object 3 times then it'll probably be the same in Index Manager. If you could show the code you're using for indexing that may help me, I will try to reproduce here. The only thought I have initially is could it be something about the URL that is preventing the engine from seeing that it has already indexed that URL, special chars or something. -your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
I assume you mean my VB.NET code that is generating the index. I'll send it by email.
I added a log entry to a CSV file I generate each time the index is generated, and it adds a line to the file every time a new PreloadedDocument is added. I searched that file and could only find one instance of the URL that is generating three hits.
Best regards, Dan
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Quote: I searched that file and could only find one instance of the URL that is generating three hits.
If you are creating an index from scratch, is what you said still true? -your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
Yes, I'm creating the index from scratch every time. And I optimize it at the end each time also. We are never adding new documents to it after it's originally generated.
Is there a way I could upload a ZIP or 7Z file of the index folder? Would that help?
Best regards, Dan
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Hopefully you'll get an email sharing our drop box folder, you should be able to upload it there, Dan. It shouldn't matter, but have you checked if you're adding the document more than once? Thanks Jim -your feedback is helpful to other users, thank you!
|
|
Rank: Advanced Member
Groups: Registered
Joined: 9/1/2010 Posts: 136
|
Thanks, Jim. I'll upload a ZIP file shortly.
Yes, I add a record (line) to a log file each time a new PreloadedDocument is added to the index. I've searched that log file for the article in question, and it only appears once.
Best regards, Dan
|
|