MSSQL Indexing Limits - SearchUnit

Welcome Guest

Search | Active Topics | Log In | Register

Forum » Technical Support Questions » SearchUnit » MSSQL Indexing Limits

Options

jkagerer

#1 Posted : Friday, September 5, 2014 7:08:31 PM

Rank: Member

Groups: Registered

Joined: 9/4/2014
Posts: 20

Hello, I've been using Keyoti Search for some time now with very good results. However we need to add a DB source to our search results for 1.8 Million records.

I began the indexing with no paging in my query and is successfully imported 573,500 items.

When I attempted to use paging, the Management Tool was throwing an exception. I believe this was caused by our Unique Field index starting at 8700 instead of 0.

My second attempt was to utilize paging but adding 8700 to the page indexes. This was successful without throwing any exceptions. My first attempt of indexing 500 more pages didn't add any items to my results.

My third attempt was to add 573,500 to my paging index. I let the import run for 5,000 records. Again this did not add any more items to my result set. It is still at 573,500 items.

Is there a limit to the amount of items we can index? Our Index Text for each item is around 100 characters, not very large.

Any help with this is appreciated.
Joe

Joe

User Profile
Hide User Posts

Jim

#2 Posted : Friday, September 5, 2014 7:56:10 PM

Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada

Hi Joe, I'm looking into it, but can I quickly ask which version you are using please?

Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!

WWW

User Profile
Hide User Posts

Jim

#3 Posted : Friday, September 5, 2014 8:21:05 PM

Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada

Joe, let me address some points please.

1. Yes you are right if you use the paging query from our help

SELECT * FROM Table1 where id>={0} AND id<{1}

then {0} is going to be zero based, and your idea to add to it is correct. So that should have worked.

2. The purpose of paging is only to address the issue of moving a large data set from the DB to the indexer machine, and holding it in memory. But, since your initial attempt managed to index a third of your records my guess would be that paging isn't required.
Imagine trying to index a 20GB index in one go, that would all have to be sent down the wire and held before indexing could start.
Your index (assuming the chars are stored as 8 bytes each) is only about 120MB.

3. What I say above and that it got to 573,500 items with or without paging makes me think something else is wrong, unrelated to paging. And to answer your question there are no limits, beyond what is practically possible with the equipment - it should be able to do your index OK.

Questions for you:
When it stopped at 573,500, was that because you stopped it? Did it just seem like it had stalled?

If it stopped itself, then it would be useful to enable logging in the configuration and do it again, then send me all .txt files (via support at keyoti.com)

Best
Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!

WWW

User Profile
Hide User Posts

jkagerer

#4 Posted : Friday, September 5, 2014 8:24:16 PM

Rank: Member

Groups: Registered

Joined: 9/4/2014
Posts: 20

Thank you for your quick reply, I have some positive results.

My Keyoti4.SearchEngine.Web.dll is Product Version 2012.5.13.424 I'm using the Professional License with a Plugin DLL and Custom Data.

Using paging on my 4th Attempt now increased my results list. I've have now indexed 575,995 items.

My PartItemId index starts at 8700, I had 573,500 results so I had to start my paging at "where PartItemId >= (582200 + {0}) AND PartItemId<(582200 + {1})"

I ran this query and imported 2500 more records. I will run the new query over the weekend and see how far we get.

Joe

Joe

User Profile
Hide User Posts

jkagerer

#5 Posted : Friday, September 5, 2014 8:31:06 PM

Rank: Member

Groups: Registered

Joined: 9/4/2014
Posts: 20

Hi Jim, Thanks for your reply.

Questions for you:
When it stopped at 573,500, was that because you stopped it? Did it just seem like it had stalled?

The import had been running for more than 12 hours, my estimation was that it would take less than 7 hours to complete. I watched it for a while to see if it would increase. It seemed like it was stalled so I stopped it.

I will turn on logging when I start it again this evening.
Thanks again for your help on this. Keyoti seems to be very quick with 500,000 results.

Joe

Joe

User Profile
Hide User Posts

Jim

#6 Posted : Friday, September 5, 2014 8:47:20 PM

Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada

Thanks for your replies. Logging will slow it down, by the way.

It has to periodically merge index files together, so my guess is that it was probably doing this, which is why it appeared stalled. It may just be that it needs more time.

On the subject of speed, one thing that can negatively affect speed is unique 'words'. For example if your DB records have fields like times or dates, or usernames or other things that are unique then it can bloat the lexicon. Example, if each of your 1.5 million fields has a unique word in it, the lexicon will by much bigger, and will take longer to work with.

This is from the Help, on Optimization:

Windows Explorer

Due to the way Windows Explorer works, and the fact that the search engine writes/deletes lots of files during indexation, it is advisable to close all instances of Windows Explorer. If an instance of Explorer has been used to view the index directory and then is pointed to a different directory, it can still have a substantial negative impact on indexing speed.

Minimizing Unnecessary Indexing

To ensure optimum search results and performance you should index only those files that you wish to be searchable.
For example, if you have a very large repository of observational data (e.g. text files filled with records from flight, weather data etc that have many different 'word' strings) then indexing it may cause slow downs for certain searches (wildcards) and any index change operations (adding/removing etc). Although Search can handle this type of data, any unnecessary indexing should be avoided if possible.
Hint - the Configuration.IndexNumbers property can also be set false to not index/search for numbers.

This also applies to binary files - although binary file types are ignored by default, this only happens if the server sends the correct mime-type with the response. Eg. if the server response for a .wmv file is "text/plain" then the binary content will be parsed as text, which will mostly be garbage and fill the index unnecessarily. The indexer deliberately includes everything it finds (except for a specific stop-list) because it is dangerous to use heuristics to identify words which would 'never be searched' and therefore shouldn't be indexed. Eg. company/product names with symbols in them shouldn't be ignored.

Forward Index
By default Search will create a Forward Index which allows for result summary generation and result preview features. To speed up indexing, especially with large indexes, you can set Config.CreateForwardIndex=false. This will mean you can't use Config.ResultSummaryType=Dynamic or SearchAgent.GetDocumentText (ResultPreview). StaticBodyStart will be automatically used for ResultSummaryType and can be changed to StaticMetaDescription manually.
Note: Any change to CreateForwardIndex will require the index to be deleted and recreated.

Also we can discuss keeping the index up to date, you probably won't want to reindex it all to keep it up to date, so you'll want to update it when CRUD operations occur....

Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!

WWW

User Profile
Hide User Posts

jkagerer

#7 Posted : Monday, September 8, 2014 7:05:17 PM

Rank: Member

Groups: Registered

Joined: 9/4/2014
Posts: 20

Jim,
Thanks for your help.
On Thursday and Friday I was able to index 600,000 DB records. On Friday evening I re-started the index and it ran for 60 hours. It only indexed 250,000 new records.

I Assume as the index gets larger the indexing slows down. Is this correct?
Joe

Joe

User Profile
Hide User Posts

Jim

#8 Posted : Tuesday, September 9, 2014 3:09:59 PM

Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada

Yes, it does slow down as it gets larger and has to merge larger files. One factor in that is the lexicon size as I mentioned above.

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!

WWW

User Profile
Hide User Posts

Forum Jump

You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Watch this topic
Print this topic

Normal
Threaded

MSSQL Indexing Limits - SearchUnit - Forum