Title Back Colour Keyoti Title Line Title Curve
Blue Box Top

Duplicate results - SearchUnit - Forum

Welcome Guest Search | Active Topics | Log In | Register

Options
DMacy
#1 Posted : Thursday, August 10, 2017 9:53:20 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
I am getting duplicate search hits for one of my search tests. In fact, I'm getting multiple instances of duplicates. I've looked at the generation of the index (which is done by a VB.NET program using PreloadedDocument objects to add to the index), and I don't think my code is adding duplicate documents. I've saved a CSV file of the custom data for the search hits immediately after they've been retrieved from SearchAgent.Search, and it contains the duplicates.

My search term contains three words, and the ImpliedLogicOperator is And.

Why would SearchUnit return multiple ResultItem objects that are identical (point to the same document in the index)? Is this to be expected at times and I need to deal with it in my code?

Thanks!

Dan
Jim
#2 Posted : Friday, August 11, 2017 10:28:11 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Definitely not to be expected - do the results have identical URLs, same protocol, same case. Theres a config option for whether to be case sensitive in URLs or not.

Jim
-your feedback is helpful to other users, thank you!


DMacy
#3 Posted : Friday, August 11, 2017 10:40:10 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
The config option URLCaseSensitive is false. The URLs are identical for the results that are being repeated three times in a row. Can I send you an email with steps to show you an example of what I'm talking about?

Dan
Jim
#4 Posted : Saturday, August 12, 2017 4:16:44 AM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Of course, please do.
-your feedback is helpful to other users, thank you!


Jim
#5 Posted : Saturday, August 12, 2017 7:29:39 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Thanks for sending me the info.

I’m not really sure what is causing it.

I see you are grouping by Book. How are you doing that?

Have you tried searching with our index management tool to see if you still get duplicates? I'm wondering if the processing you're doing for the grouping is causing the problem.

Have you tried Optimizing the index? That could help because it consolidates index slices and deletions.

Best
Jim
-your feedback is helpful to other users, thank you!


DMacy
#6 Posted : Monday, August 14, 2017 4:24:46 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
Yes, I'm grouping by book, and I suspected my code first. But I looped through the Keyoti.SearchEngine.Search.SearchResult object's result items immediately after returning, and it listed three result items with the same URL.

The index is optimized.

I have tried your index management tool, but with only a sample of 25 results returned, it doesn't hit this case, since there are thousands of results for the search terms I gave you to try.

I'll be glad to share more code with you or even the index, though it's about 1 GB. Let me know how I can help.

Best regards,
Dan
Jim
#7 Posted : Monday, August 14, 2017 7:30:16 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Quote:

I have tried your index management tool, but with only a sample of 25 results returned, it doesn't hit this case, since there are thousands of results for the search terms I gave you to try.


True, but if you use a specific phrase (in quotes) direct from the text you should be able to narrow down to one page. However if you're seeing it in the SearchResult object 3 times then it'll probably be the same in Index Manager.

If you could show the code you're using for indexing that may help me, I will try to reproduce here. The only thought I have initially is could it be something about the URL that is preventing the engine from seeing that it has already indexed that URL, special chars or something.
-your feedback is helpful to other users, thank you!


DMacy
#8 Posted : Monday, August 14, 2017 7:38:14 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
I assume you mean my VB.NET code that is generating the index. I'll send it by email.

I added a log entry to a CSV file I generate each time the index is generated, and it adds a line to the file every time a new PreloadedDocument is added. I searched that file and could only find one instance of the URL that is generating three hits.

Best regards,
Dan
Jim
#9 Posted : Monday, August 14, 2017 11:11:12 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Quote:

I searched that file and could only find one instance of the URL that is generating three hits.


If you are creating an index from scratch, is what you said still true?
-your feedback is helpful to other users, thank you!


DMacy
#10 Posted : Tuesday, August 15, 2017 3:13:09 AM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
Yes, I'm creating the index from scratch every time. And I optimize it at the end each time also. We are never adding new documents to it after it's originally generated.

Is there a way I could upload a ZIP or 7Z file of the index folder? Would that help?

Best regards,
Dan
Jim
#11 Posted : Tuesday, August 15, 2017 7:28:25 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Hopefully you'll get an email sharing our drop box folder, you should be able to upload it there, Dan.

It shouldn't matter, but have you checked if you're adding the document more than once?

Thanks
Jim
-your feedback is helpful to other users, thank you!


DMacy
#12 Posted : Tuesday, August 15, 2017 7:41:42 PM
Rank: Advanced Member

Groups: Registered

Joined: 9/1/2010
Posts: 133
Thanks, Jim. I'll upload a ZIP file shortly.

Yes, I add a record (line) to a log file each time a new PreloadedDocument is added to the index. I've searched that log file for the article in question, and it only appears once.

Best regards,
Dan
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.




About | Contact | Site Map | Privacy Policy

Copyright © 2002- Keyoti Inc.