KB: How to index meta-tags (C#)

Knowledgebase Home Page > SearchUnit

Search the Knowledge Base

How to index meta-tags (C#)
https://keyoti.com/kb/Default.aspx?ToDo=view&questId=193&catId=54

Options

Print this page
Email this to a friend

At the time of writing, the search engine does not index meta tags, but only document content (which the user can see). In order to index/search meta tags, a plug-in can be written (see attached "CustomParser-MetaTag-Plugin-C#.zip").

The plugin uses a new parser class. This new parser class is a sub-class of our default HtmlDocumentParser, and it is used via our plug-in system ("Central Events") - it might help to familiarise yourself with plug-ins by reading this article.

The attached project is based on this article.

To use the new parser, you will first need to create the plug-in DLL, and then configure it to be used with the index.

1. Compile the Plugin project, and make a note of the path to the DLL it creates.

2. Open the Configuration window (eg. under Visual Studio, right click on the SearchResult control);

i. Paste the path to the new plugin DLL in the "EventHandlerAssemblyPath" field.

ii. Enable logging (check the "Logging" field) - this will be useful for any debugging if necessary.

iii. Click OK to close the window.

3. If the index has already been built, the content of any documents will need to be changed in order for it to be reindexed (or the current index directory can be deleted to force a fresh build).

4. Build the index, as usual.

5. The plugin should now have been used, and therefore any meta tags indexed as well. Try a search on one of the meta words. If it doesn't find any results, check the CentralEventDispatcher.txt log file in the index directory for info - it should have lines like;

06/17/2008 20:26 Success, initialized external event handler assembly @ C:\Program Files\Keyoti Inc\Search for ASP.NET v3\Demos\VS2005\Plugin\bin\Debug\Plugin.dll

Also, check the Plug-in.txt log file, which when working properly will have lines like;

06/17/2008 20:26 Initialized
06/17/2008 20:27 Created extended provider
06/17/2008 20:27 Created custom parser for text/html
06/17/2008 20:27 Meta description:test123 contents3245

Note: once the plugin is loaded (eg. by Visual Studio in the designer) any changes you make to the plugin code will require the process to be stopped (eg. Visual Studio to be closed and reopened) since the DLL cannot be replaced while it is loaded by another process.

The parser itself is in the ExtendedHtmlDocumentParser class, and this is fairly simple. What happens is that when the object needs to read the document text, it first makes a copy of the document in a string, which is then looked at to obtain the meta tags. The object then adds the meta tag contents to the document text, and allows the sub-class to parse this modified text.