|
Rank: Member
Groups: Registered
Joined: 9/13/2006 Posts: 52
|
I have a batch process where I query GetRelatedWords, each time taking the first word in the returned array. I am interested only in the first word, in an effort to get something I can treat as a stem. This is performed over large amounts of data, and performance has become an issue.
Profiling my application shows that most of the time is spent comparing strings for equality inside your GetRelatedWords method. Is there anything I can do to speed this up? Or given what I am doing, is there a way that the spell checker component could be used instead? I am using the .NET 1.1 version but my application runs on .NET 2.0 if that makes a difference.
Thanks for any assistance.
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Hi Len, if you really want stems, then what I'd suggest is the following (and maybe we should charge for this!) Download our search engine product http://keyoti.com/products/search/dotnetweb/ and install it. Inside you'll find a DLL called Keyoti.Text.LemmaGenerator.dll. You can use this to generate stems via it's class - Keyoti.Text.LemmaGenerator.Lemmas which has a method public string[] GetLemmas(string word) _Technically_ speaking you dont need to license this DLL or class - it will run without serials or keys. I'd recommend this over the thesaurus's GetRelatedWords for two reasons. 1. The thesaurus only returns stems for words it _knows_. It doesn't know the whole English language because not every word has a synonym (and it's point is to return words it does know synonyms for). The search engine version on the other hand has a 110K word lexicon (about twice the size). 2. I can't guarantee it'll run faster, but the data in the thesaurus is not optimized for looking up lemmas (shared stems), it's optimized for looking up synonyms - whereas the lemma generator obviously is optimized for it's sole purpose. If you dont want to get involved in that, then a quick search at codeproject will give you a stemmer class. What you'll also need is a word-list, which may be more tricky to license. Hope that's helpful! Jim -your feedback is helpful to other users, thank you!
|
|
Rank: Member
Groups: Registered
Joined: 9/13/2006 Posts: 52
|
The lemma generator solved my problem. It's performance is tremendous. Thanks.
Now subsequently I have new features that do require synonyms. The performance of the thesaurus is dramatically slower than the lemma generator. Is there anything I can do to speed it up?
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
Hi Len, It's always been fast enough for it's purpose (backing a UI, creating synonyms for 1 word at a time) - so no, there's not really anything you can do. I assume you need synonyms for lots of words at once, rather than just one at a time like us? Jim -your feedback is helpful to other users, thank you!
|
|
Rank: Member
Groups: Registered
Joined: 9/13/2006 Posts: 52
|
Yes, that is correct. But actually what gave rise to the issue is something closer to the intended purpose. I have a ListView control where as the user moves through it, a separate list of synomyms is updated. It is much slower than it should be. My profiler reports that the call to GetAllSynomymns accounts for approximately 80% of the processor time. It is fine if I have the user right-click to bring up a list of synonymns, but the user is slowed way down if they scroll through the ListView control.
|
|
Rank: Advanced Member
Groups: Administrators, Registered
Joined: 8/13/2004 Posts: 2,669 Location: Canada
|
I can imagine if your user is flipping through items in a list it could make the UI less responsive. Couple of tips; 1. GetAllSynonyms loads the resource file if it's not already loaded - so don't create a new ThesaurusEngine unnecessarily 2. Use a new thread to call GetAllSynonyms Jim -your feedback is helpful to other users, thank you!
|
|
Rank: Member
Groups: Registered
Joined: 9/13/2006 Posts: 52
|
You nailed it!. I was creating a ThesaurusEngine each time.
Perfomance is great now.
Thanks!
|
|
Rank: Member
Groups: Registered
Joined: 9/13/2006 Posts: 52
|
Hi Jim. I've been using your lemma generator dll for some time. I have a new computer and need to set it up. It appears that the licensing for the product has changed since then. What do I need to do to use it? quote: Originally posted by Jim
Hi Len, if you really want stems, then what I'd suggest is the following (and maybe we should charge for this!)
Download our search engine product http://keyoti.com/products/search/dotnetweb/ and install it. Inside you'll find a DLL called Keyoti.Text.LemmaGenerator.dll. You can use this to generate stems via it's class -
Keyoti.Text.LemmaGenerator.Lemmas
which has a method
public string[] GetLemmas(string word)
_Technically_ speaking you dont need to license this DLL or class - it will run without serials or keys.
I'd recommend this over the thesaurus's GetRelatedWords for two reasons.
1. The thesaurus only returns stems for words it _knows_. It doesn't know the whole English language because not every word has a synonym (and it's point is to return words it does know synonyms for). The search engine version on the other hand has a 110K word lexicon (about twice the size).
2. I can't guarantee it'll run faster, but the data in the thesaurus is not optimized for looking up lemmas (shared stems), it's optimized for looking up synonyms - whereas the lemma generator obviously is optimized for it's sole purpose.
If you dont want to get involved in that, then a quick search at codeproject will give you a stemmer class. What you'll also need is a word-list, which may be more tricky to license.
Hope that's helpful! Jim
|
|