Knowledgebase Home Page  >  SearchUnit
Search the Knowledge Base
Indexing documents stored as binary fields in a database?
https://keyoti.com/kb/Default.aspx?ToDo=view&questId=183&catId=54

Options

Print this page
Email this to a friend
If you have documents (PDF, Word, RTF etc) stored as binary in a database, they will not automatically be indexed by the database importer.  To index the contents of these documents, you need to write some code to extract text from the document.
 
One way to do this would be to utilise our 'custom dataset provider' feature, which is where you write a DLL that returns a DataSet containing the plain text to index.  So your DLL would access the DB and pull the document binary images (plus any other fields you need), it would then send the images to our parsers as a stream, which would turn them into plain text.
 
There are other ways, but this is most direct and more flexible.
 
Here's info (at the bottom) on using a 'custom dataset provider'
http://keyoti.com/products/search/dotNetWeb/Help2010/UserGuide/Importing.htm
 
Here's the API docs on the parsers (you use the ReadText method to get plain text)
 
 
Word:
 
 
 
For assistance, please email support@keyoti.com and describe your situation along with any parts of this that are unclear.
 
 

Related Questions:

Attachments:

No attachments were found.