How To Add A Custom Or New Document Parser

Using the Central Event System, additional or replacement document parsers can be added for use by the indexer.

There are 2 ways to use the event system, either directly or via a plug-in. When programmatic code is creating an index (i.e. direct use of the search API), then the event handlers below can be attached to the configuration object (as below). If a UI is creating an index (i.e. Index Manager Tool, Web Admin or Windows Service) then the event handlers must be attached via a plug-in (see the plug-in section for more info).

The following assumes that indexing will be programmatic, and an object named "configuration" exists and was used to create the DocumentIndex, eg.

Keyoti.SearchEngine.Configuration configuration = new Keyoti.SearchEngine.Configuration();
configuration.IndexDirectory = "...some path...";
DocumentIndex documentIndex = new DocumentIndex(configuration);

Step 1.

Attach to the NeedObject event from the Central Event System.

configuration.CentralEventDispatcher.NeedObject+=
				new Keyoti.SearchEngine.Events.NeedObjectEventHandler(CentralEventDispatcher_NeedObject);
Step 2.

Handle the event and return a custom subclass of ParserProvider

void CentralEventDispatcher_NeedObject(object sender, Keyoti.SearchEngine.Events.NeedObjectEventArgs e)
{
	if (e.RequiredObject is Keyoti.SearchEngine.Documents.ParserProvider)
	{
		e.RequiredObject = new ExtendedParserProvider(e.Configuration);
	}
}
Step 3.

Write the class, ExtendedParserProvider, which will return the custom parser when the appropriate MIME type is encountered

public class ExtendedParserProvider : ParserProvider
{

    public ExtendedParserProvider(Configuration c) : base(c){}

    public override Parser GetParser(string mimeType)
    {
        if(mimeType=="application/my.file.type")
        {
            return new QQQDocumentParser(Configuration);
        } else 
            return base.GetParser (mimeType);
    }
}
Step 4.

If the mime type is not known by the search engine, associate it with the file extension.

For example, if handling a previously unhandled file type with a file extension .qqq, add it to the FileTypeSettings property in the Configuration

eg. programmatically (the same can be achieved in the visual configuration editors, or in the Configuration.xml file in the index directory)

configuration.FileTypesSettings.Add("QQQ", "application/my.file.type");

this associates the new mimetype application/my.file.type with any files with the extension .qqq (note that the extension must be added in uppercase).

Step 5.

Add the QQQDocumentParser class - which overrides the Read method. The method below takes the string content (which it is up to the user to provide) and processes it in base.Read to return the necessary "DocumentText" object.

eg.

public class QQQDocumentParser : Keyoti.SearchEngine.Documents.TxtDocumentParser
{

	public QQQDocumentParser(Configuration c) : base(c) { }

	public override Keyoti.SearchEngine.Documents.DocumentText Read(System.IO.Stream stream, Uri uri, System.Text.Encoding encoding)
	{
        //stream is the document as a stream (i.e. as it is returned from the server)
        //uri is the document's Uri
        //e is the response encoding from the server
        //the stream should be read, and plain text from it added to a string and set in the "documentBody" variable
		string documentBody = "this is test content, real content comes from user created parser.";
		return base.Read(documentBody, uri, encoding);
	}
}