Title Back Colour Keyoti Title Line Title Curve
Blue Box Top

Performance - Thesaurus Desktop .NET - Forum

Welcome Guest Search | Active Topics | Log In | Register

Options
lenwhite
#1 Posted : Thursday, January 25, 2007 1:37:04 PM
Rank: Member

Groups: Registered

Joined: 9/13/2006
Posts: 52
I have a batch process where I query GetRelatedWords, each time taking the first word in the returned array. I am interested only in the first word, in an effort to get something I can treat as a stem. This is performed over large amounts of data, and performance has become an issue.

Profiling my application shows that most of the time is spent comparing strings for equality inside your GetRelatedWords method. Is there anything I can do to speed this up? Or given what I am doing, is there a way that the spell checker component could be used instead? I am using the .NET 1.1 version but my application runs on .NET 2.0 if that makes a difference.

Thanks for any assistance.
Jim
#2 Posted : Thursday, January 25, 2007 4:06:59 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Hi Len, if you really want stems, then what I'd suggest is the following (and maybe we should charge for this!)

Download our search engine product http://keyoti.com/products/search/dotnetweb/ and install it. Inside you'll find a DLL called Keyoti.Text.LemmaGenerator.dll. You can use this to generate stems via it's class -

Keyoti.Text.LemmaGenerator.Lemmas

which has a method

public string[] GetLemmas(string word)


_Technically_ speaking you dont need to license this DLL or class - it will run without serials or keys.

I'd recommend this over the thesaurus's GetRelatedWords for two reasons.

1. The thesaurus only returns stems for words it _knows_. It doesn't know the whole English language because not every word has a synonym (and it's point is to return words it does know synonyms for). The search engine version on the other hand has a 110K word lexicon (about twice the size).

2. I can't guarantee it'll run faster, but the data in the thesaurus is not optimized for looking up lemmas (shared stems), it's optimized for looking up synonyms - whereas the lemma generator obviously is optimized for it's sole purpose.

If you dont want to get involved in that, then a quick search at codeproject will give you a stemmer class. What you'll also need is a word-list, which may be more tricky to license.

Hope that's helpful!
Jim
-your feedback is helpful to other users, thank you!


lenwhite
#3 Posted : Sunday, June 24, 2007 10:36:43 PM
Rank: Member

Groups: Registered

Joined: 9/13/2006
Posts: 52
The lemma generator solved my problem. It's performance is tremendous. Thanks.

Now subsequently I have new features that do require synonyms. The performance of the thesaurus is dramatically slower than the lemma generator. Is there anything I can do to speed it up?
Jim
#4 Posted : Monday, June 25, 2007 2:20:06 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Hi Len,
It's always been fast enough for it's purpose (backing a UI, creating synonyms for 1 word at a time) - so no, there's not really anything you can do.

I assume you need synonyms for lots of words at once, rather than just one at a time like us?

Jim
-your feedback is helpful to other users, thank you!


lenwhite
#5 Posted : Monday, June 25, 2007 2:32:35 PM
Rank: Member

Groups: Registered

Joined: 9/13/2006
Posts: 52
Yes, that is correct. But actually what gave rise to the issue is something closer to the intended purpose. I have a ListView control where as the user moves through it, a separate list of synomyms is updated. It is much slower than it should be. My profiler reports that the call to GetAllSynomymns accounts for approximately 80% of the processor time. It is fine if I have the user right-click to bring up a list of synonymns, but the user is slowed way down if they scroll through the ListView control.
Jim
#6 Posted : Monday, June 25, 2007 3:17:20 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
I can imagine if your user is flipping through items in a list it could make the UI less responsive.

Couple of tips;
1. GetAllSynonyms loads the resource file if it's not already loaded - so don't create a new ThesaurusEngine unnecessarily
2. Use a new thread to call GetAllSynonyms

Jim
-your feedback is helpful to other users, thank you!


lenwhite
#7 Posted : Monday, June 25, 2007 4:43:02 PM
Rank: Member

Groups: Registered

Joined: 9/13/2006
Posts: 52
You nailed it!. I was creating a ThesaurusEngine each time.

Perfomance is great now.

Thanks!
lenwhite
#8 Posted : Saturday, November 1, 2008 8:16:08 PM
Rank: Member

Groups: Registered

Joined: 9/13/2006
Posts: 52
Hi Jim. I've been using your lemma generator dll for some time. I have a new computer and need to set it up. It appears that the licensing for the product has changed since then. What do I need to do to use it?

quote:
Originally posted by Jim

Hi Len, if you really want stems, then what I'd suggest is the following (and maybe we should charge for this!)

Download our search engine product http://keyoti.com/products/search/dotnetweb/ and install it. Inside you'll find a DLL called Keyoti.Text.LemmaGenerator.dll. You can use this to generate stems via it's class -

Keyoti.Text.LemmaGenerator.Lemmas

which has a method

public string[] GetLemmas(string word)


_Technically_ speaking you dont need to license this DLL or class - it will run without serials or keys.

I'd recommend this over the thesaurus's GetRelatedWords for two reasons.

1. The thesaurus only returns stems for words it _knows_. It doesn't know the whole English language because not every word has a synonym (and it's point is to return words it does know synonyms for). The search engine version on the other hand has a 110K word lexicon (about twice the size).

2. I can't guarantee it'll run faster, but the data in the thesaurus is not optimized for looking up lemmas (shared stems), it's optimized for looking up synonyms - whereas the lemma generator obviously is optimized for it's sole purpose.

If you dont want to get involved in that, then a quick search at codeproject will give you a stemmer class. What you'll also need is a word-list, which may be more tricky to license.

Hope that's helpful!
Jim

Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.




About | Contact | Site Map | Privacy Policy

Copyright © 2002- Keyoti Inc.