KB: How can I reindex (build and/or crawl) a web-site everytime a certain web page loads? (C#)

Knowledgebase Home Page > SearchUnit

Search the Knowledge Base

How can I reindex (build and/or crawl) a web-site everytime a certain web page loads? (C#)
https://keyoti.com/kb/Default.aspx?ToDo=view&questId=261&catId=54

Options

Print this page
Email this to a friend

(For details on programmatic indexing please see this page from the documentation)

Customers using hosting services often cannot install Windows Services. Therefore the use of our Windows Service for automated reindexing is out of the question. Some hosting services provide the ability to hit specified URLs at set intervals (even if yours doesn't it may be possible to setup a simple program on a client machine to make a request to a specific URL at intervals).

This page will start an asynchronous process to crawl and build the index (as it will take more than a few seconds to complete). When run, a blank page will be shown, but the crawl and build will start in the background.

You can see in the page_load that there's a check to see if already crawling/building in the Application object, and then start the crawl and build. This means there wont be any way to tell whether crawl/build are happening by viewing the page. If you reload the page, it will either start crawling/building again, or it will do nothing (if already working). A suggestion would be to watch the CPU usage to see what's happening. Also, you can add

Configuration.logging=true;

which will write log files about what's happening to the index dir.

Highlighted are the 2 parts you will need to change for your setup.

[Please remember to give the ASPNET user permission to write to the index directory (eg. cacls c:\inetpub\wwwroot\idxdir /E /G ASPNET:F )]

ASPX

<%@ Page language="c#" Codefile="Default.aspx.cs" AutoEventWireup="false" Inherits="RunCrawlBuild._Default" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<HTML>
<HEAD>
<title>Default</title>
<meta name="GENERATOR" Content="Microsoft Visual Studio .NET 7.1">
<meta name="CODE_LANGUAGE" Content="C#">
<meta name="vs_defaultClientScript" content="JavaScript">
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
</HEAD>
<body>
    <form id="form1" runat="server">
    <div>

    </div>
    </form>
</body>
</html>

CODEBEHIND

using System;

using System.Collections;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Web;

using System.Web.SessionState;

using System.Web.UI;

using System.Runtime.Remoting.Messaging;

using System.Web.UI.WebControls;

using System.Web.UI.HtmlControls;

using Keyoti.SearchEngine.Search;

using Keyoti.SearchEngine;

using Keyoti.SearchEngine.DataAccess.IndexableSourceRecords;

namespace RunCrawlBuild

{

/// <summary>

/// Summary description for _Default.

/// </summary>

public partial class _WebForm1: System.Web.UI.Page

{

private delegate void CrawlBuildDelegate();

Configuration Configuration = new Configuration(); //this is required in versions 3 onwards, and should be removed in older versions

private IAsyncResult CrawlBuildAsync()

{

CrawlBuildDelegate CrawlBuildDelegate = new CrawlBuildDelegate(CrawlBuild);

IAsyncResult ar = CrawlBuildDelegate.BeginInvoke(new AsyncCallback(MyCallback), null);

return ar;

}

private void MyCallback(IAsyncResult ar)

{

AsyncResult aResult = (AsyncResult)ar;

CrawlBuildDelegate CrawlBuildDelegate = (CrawlBuildDelegate)aResult.AsyncDelegate;

CrawlBuildDelegate.EndInvoke(ar);

}

void CrawlBuild()

{

try

{

Application["runningCB"] = true;

Keyoti.SearchEngine.Index.DocumentIndex webSiteSpider = new Keyoti.SearchEngine.Index.DocumentIndex(Configuration);

//use these lines to import a particular website

string startUrl = "http://localhost/";

webSiteSpider.Import(new Keyoti.SearchEngine.DataAccess.IndexableSourceRecords.WebsiteBasedIndexableSourceRecord(startUrl));

//use these lines to refresh existing sources

/*ArrayList existingSources = webSiteSpider.GetIndexableSourceRecords();

foreach (IndexableSourceRecord record in existingSources)

{

webSiteSpider.Import(record);

}

webSiteSpider.Close();

}

catch (Exception ex)

{

throw ex;

}

finally

{

Application["runningCB"] = false;

}

private void Page_Load(object sender, System.EventArgs e)

{

Configuration.IndexDirectory = @"C:\Inetpub\wwwroot\RunCrawlBuild\IndexDirectory";

if ((Application["runningCB"] == null) ||

((Application["runningCB"] != null) && (!((bool)Application["runningCB"]))))

CrawlBuildAsync();

}

#region Web Form Designer generated code

override protected void OnInit(EventArgs e)

{

// CODEGEN: This call is required by the ASP.NET Web Form Designer.

InitializeComponent();

base.OnInit(e);

}

/// <summary>

/// Required method for Designer support - do not modify

/// the contents of this method with the code editor.

/// </summary>

private void InitializeComponent()

{

this.Load += new System.EventHandler(this.Page_Load);

}

#endregion

}

With this page setup on the server, any call to the URL will trigger a crawl and build of the site 'http://localhost' - of course this URL can be changed to any URL.

If you are not sure that it is working, or need help, please just email support@keyoti.com