Knowledgebase Home Page  >  SearchUnit
Search the Knowledge Base
Plug-in Example - Creating a custom document parser which allows meta tags to be searched. (C#)
https://keyoti.com/kb/Default.aspx?ToDo=view&questId=190&catId=54

Options

Print this page
Email this to a friend
To demonstrate how to write and use a plug-in, beyond what is already described in the Help CHM, we will construct a plug-in which creates a customized HTML document parser.  The parser will include text from the meta tags in the document text, allowing it to be searched, which is not currently performed in v3.1. 
 
The new HTML parser class will be a sub-class of our default HtmlDocumentParser, and it will be used via our plug-in system ("Central Events") - it might help to familiarise yourself with plug-ins by reading http://keyoti.com/products/search/dotNetWeb/Help2010/UserGuide/Central%20Event%20System%20-%20Plug-ins/Introduction.htm
 
 
To use the new parser, you will first need to create the plug-in DLL, and then configure it to be used with the index.
 
1. Compile the Plugin project, and make a note of the path to the DLL it creates.
 
 
2. Open the Configuration window (eg. under Visual Studio, right click on the SearchResult control);
 
i. Paste the path to the new plugin DLL in the "EventHandlerAssemblyPath" field. 
 
ii. Enable logging (check the "Logging" field) - this will be useful for any debugging if necessary.
 
iii. Click OK to close the window.
 
 
3. If the index has already been built, the content of any documents will need to be changed in order for it to be reindexed (or the current index directory can be deleted to force a fresh build).
 
 
4. Build the index, as usual.
 
 
5. The plugin should now have been used, and therefore any meta tags indexed as well.  Try a search on one of the meta words.  If it doesn't find any results, check the CentralEventDispatcher.txt log file in the index directory for info - it should have lines like;
 
06/17/2008 20:26 Success, initialized external event handler assembly @ C:\Program Files\Keyoti Inc\Search for ASP.NET v3\Demos\VS2005\Plugin\bin\Debug\Plugin.dll
 
Also, check the Plug-in.txt log file, which when working properly will have lines like;
 
06/17/2008 20:26 Initialized
06/17/2008 20:27 Created extended provider
06/17/2008 20:27 Created custom parser for text/html
06/17/2008 20:27 Meta description:test123 contents3245
 
Note: once the plugin is loaded (eg. by Visual Studio in the designer) any changes you make to the plugin code will require the process to be stopped (eg. Visual Studio to be closed and reopened) since the DLL cannot be replaced while it is loaded by another process.
 
The parser itself is in the ExtendedHtmlDocumentParser class, and this is fairly simple.  What happens is that when the object needs to read the document text, it first makes a copy of the document in a string, which is then looked at to obtain the meta tags.  The object then adds the meta tag contents to the document text, and allows the sub-class to parse this modified text.
 
 
Please email support if you have any questions.

Related Questions:

Attachments: