Show language: C# VB.NET Both

Programmatic Importing & Indexing

This page describes 3 methods of indexing, importing an entire source, adding 1 document at a time (incremental indexing), and adding data directly to the index as strings

Importing An Entire Source

'Importing' a website/file-system folder/database/DataSet means that the indexer will scan for all available documents/pages/data and index everything that matches the import criteria. Reimporting will cause the indexer to rescan the source for changes (where possible, otherwise reindex everything). To import programmatically, use the appropriate Import method in DocumentIndex;

Also see here for a complete code example (Form)

More information on import parameters.

C#
DocumentIndex documentIndex = new DocumentIndex(configuration);
//import a website
documentIndex.ImportWebsite( startURL );
//or like this
documentIndex.Import(new WebsiteBasedIndexableSourceRecord( startURL, pathMatchesToBeIgnored, pathMatchesToBeIncluded));

//or import a file system folder
string localFolderPath = @"C:\inetpub\wwwroot";
string virtualPath = "http://localhost/";
ArrayList targetMatchList = null, ignoreMatchList = null;
bool recurseSubFolders = true;
documentIndex.ImportFileSystemFolder(localFolderPath, virtualPath, targetMatchList, ignoreMatchList, recurseSubFolders);

//or import a database
documentIndex.ImportDatabase(sourceType, connectionString, sqlQuery, uniqueColumnName, resultUrlFormat);

//or import a DataSet (from an assembly)
documentIndex.ImportCustomDataSet(assemblyFilePath, fullClassName, uniqueColumnName, resultUrlFormat);

documentIndex.Close();

VB.NET
Dim documentIndex As New DocumentIndex(configuration)
'import a website
documentIndex.ImportWebsite( startURL )
'or like this
documentIndex.Import(new WebsiteBasedIndexableSourceRecord( startURL, pathMatchesToBeIgnored, pathMatchesToBeIncluded))

'or import a file system folder
documentIndex.ImportFileSystemFolder(localFolderPath, virtualPath, targetMatchList, ignoreMatchList, recurseSubFolders)
'or import a database
documentIndex.ImportDatabase(sourceType, connectionString, sqlQuery, uniqueColumnName, resultUrlFormat)
'or import a DataSet (from an assembly)
documentIndex.ImportCustomDataSet(assemblyFilePath, fullClassName, uniqueColumnName, resultUrlFormat)
documentIndex.Close()

To reimport the index use

documentIndex.ReimportIndexableSources()

To reindex one specific source, obtain the IndexableSourceRecord from:

documentIndex.GetIndexableSourceRecords()

and pass the IndexableSourceRecord to

documentIndex.Import(sourceRecordFromTheList)

Adding One Document

Instead of importing an entire source, it is possible to add documents/data to the index incrementally. This is ideal for updating the index as documents are created/uploaded.

C#
DocumentIndex documentIndex = new DocumentIndex(configuration);
try{
	documentIndex.AddDocument(new Document("http://some/URL/document", configuration));
} finally {
	documentIndex.Close();
}

VB.NET
Dim documentIndex As DocumentIndex = New DocumentIndex(configuration)
Try
	documentIndex.AddDocument(new Document("http://some/URL/document", configuration))
Finally 
	documentIndex.Close()
End Try

Note that "AddDocument" may or may not complete in a trivial amount of time (the actual amount of time depends on many factors including machine load, document size/type, index size, whether the index is due optimization etc), therefore it is not advisable for use in web applications (as the web page doing the indexing will not return to the user until AddDocument has finished).

Asynchronous Adding (.NET 2 up)

Adding to the index asynchronously allows your code to return immediately (e.g. for a web application's upload document page to return immediately), while the document is queued up to be added to the index as soon as possible in the background. To do this use the AsynchronousQueue class (in namespace Keyoti.SearchEngine.Index) - which will queue up AddDocument operations and call them in their original order. AsynchronousQueue uses it's own instance of DocumentIndex, and will create and close that instance as necessary (therefore it is important not to have another instance of DocumentIndex open on the same index directory while there are items in the queue).

C#
//...this code could be called in a button event handler in a web page for example

EventHandler finished = delegate(object sender, EventArgs e)
{
	//at this point the index directory is unlocked and there are no more items pending adding to the index.
};

AsynchronousQueue.QueueForIndexing(new Document("http://someURL/somepage.aspx", Configuration), finished);
AsynchronousQueue.QueueForIndexing(new Document("http://someURL/somepage2.aspx", Configuration), finished);
VB.NET
Private Sub MyFunc()
	'...this code could be called in a button event handler in a web page for example
	Dim finished As EventHandler = AddressOf Me.OnFinished
	AsynchronousQueue.QueueForIndexing(New Document("http://someURL/somepage.aspx", Configuration), finished)
	AsynchronousQueue.QueueForIndexing(New Document("http://someURL/somepage2.aspx", Configuration), finished)
End Sub

Private Sub OnFinished(ByVal sender As Object, ByVal e As EventArgs)
	'at this point the index directory is unlocked and there are no more items pending adding to the index.
End Sub

Removing One Document

Use the RemoveDocument method in DocumentIndex to remove a document from the index. It's important that the document URL matches exactly with the URL already in the index. Please pay attention to trailing slashes (e.g. http://localhost/) and ensure any spaces are encoded as %20.

Asynchronous Remove (.NET 2 up)

Removing from the index asynchronously allows your code to return immediately (e.g. for a web application's deleete document page to return immediately), while the document is queued up to be removed from the index as soon as possible in the background. To do this use the AsynchronousQueue class (in namespace Keyoti.SearchEngine.Index) - which will queue up RemoveDocument operations and call them in their original order. AsynchronousQueue uses it's own instance of DocumentIndex, and will create and close that instance as necessary (therefore it is important not to have another instance of DocumentIndex open on the same index directory while there are items in the queue).

This is the same queue as the asynchronous adding example uses and both add and remove operations can be mixed.

C#
//...this code could be called in a button event handler in a web page for example

EventHandler finished = delegate(object sender, EventArgs e)
{
	//at this point the index directory is unlocked and there are no more items pending adding to the index.
};

AsynchronousQueue.QueueForRemoval(new Document("http://someURL/somepage.aspx", Configuration), finished);
AsynchronousQueue.QueueForRemoval(new Document("http://someURL/somepage2.aspx", Configuration), finished);
VB.NET
Private Sub MyFunc()
	'...this code could be called in a button event handler in a web page for example
	Dim finished As EventHandler = AddressOf Me.OnFinished
	AsynchronousQueue.QueueForRemoval(New Document("http://someURL/somepage.aspx", Configuration), finished)
	AsynchronousQueue.QueueForRemoval(New Document("http://someURL/somepage2.aspx", Configuration), finished)
End Sub

Private Sub OnFinished(ByVal sender As Object, ByVal e As EventArgs)
	'at this point the index directory is unlocked and there are no more items pending adding to the index.
End Sub

Removing a 'document' that originated in a DB

When a row is imported from a DB, we create our own URI for it. To delete that row/document, you need to recreate the URI.

C#
IndexableSourceUri uri = new IndexableSourceUri(1, "d4", "col1");
//where 1 is the IndexableSource ID (see below)
//"d4" is the value in the unique field, that identifies the row to delete
//"col1" is the name of the unique field

documentIndex.RemoveDocument(new Document(uri.UriInstance.AbsoluteUri, Configuration));
In the above, the data was originally imported from a query like this
col1	data
-------------
a1	blah
b2	some
c3	empty
d4	more
so the code will remove that last row from the index. The indexable source ID, can be obtained with code like this
C#
ArrayList recs = documentIndex.GetIndexableSourceRecords();
(recs[0] as IndexableSourceRecord).ID;
assuming that the first record is the one you need. Otherwise you can iterate through 'recs' and look at the Query or Location properties to find the one you need.

Adding Data Directly As Strings

It is possible to add 'documents' to the index that are defined by strings only. In other words, it is possible to index data without the data having to actually reside in a document/page/database etc. This can be useful in the following scenarios for example;

To do this, use the PreloadedDocument class, which is a simple class where you pass the 'URI' that will identify the indexed data/document, and specify it’s title, text and custom data - all as strings.

C#
documentIndex.AddDocument(new PreloadedDocument(new Uri(uri), title, text, summary, null, null, null, customData, configuration));
VB.NET
documentIndex.AddDocument(new PreloadedDocument(new Uri(uri), title, text, summary, Nothing, Nothing, Nothing, customData, configuration))

Where;
-'uri' is the real or fictitious Uri of the 'document' - this can point to an actual document or just be used as an arbitrary identifier for the indexed data
-'title' is string title of the document, searchable by the user
-'text' is the text body, this is searchable by the user
-'summary' is used for the result summary if a 'static' summary type is selected in the configuration (otherwise the result summary is generated from the text content based on hits)
-The 3 null/nothings are respectively; content category list, location category name and security group list (please see the API docs)
-'customData' is any CustomData to be added to the document record
-'configuration' is the usual configuration object, as was used to create DocumentIndex

Removing A PreloadedDocument

To remove a 'document' added with PreloadedDocument, use documentIndex.RemoveDocument, passing in the same Uri that the document was created with.