Knowledgebase Home Page  >  SearchUnit
Search the Knowledge Base
Indexing meta data with document content in a document management system (C#, MVC)
https://keyoti.com/kb/Default.aspx?ToDo=view&questId=275&catId=54

Options

Print this page
Email this to a friend

If this article does not address your needs please let us know.

Typically the SearchUnit indexer will index from sources such as the web (crawling), databases, and the filesystem, however it cannot automatically merge meta-data from a database with documents that the meta-data describes.

To do that we need to write a little bit of code, to pull the data together and apply it as needed.

There is a complete project here for MVC however it applies equally to Web Forms or other non-web projects.  In the project there are 2 areas to focus on, “SearchIndexController.cs” which does the indexing (for brevity the controller contains the actual logic) and Views\SearchIndex\Index.cshtml which holds the view (everything else in the project is regular plumbing code).

Also, for simplicity this project indexes documents every time the page loads, in reality you would probably want to index when documents are added or changed.  Also if you do index from a web page, it is better to index documents asynchronously.

Indexing using meta-data

In this example a simple List<> is used to hold the meta-data, but it could also have come from a database.

public ActionResult Index()

        {

            string appURL = string.Format("{0}://{1}{2}", Request.Url.Scheme, Request.Url.Authority, Url.Content("~"));

           

           

            /* NOTE this sample depends upon external URLs, please double check the URLs are accessible before attempting to index them. */

 

            //Create some meta data that we want to index - typically this would come from a database.

            var documentsToIndex = new List<DocumentMetaData>();

            documentsToIndex.Add(new DocumentMetaData {

                Author = "Crowd",

                Url = "https://en.wikipedia.org/wiki/John_Smith",

                Type = "Web page",

                Description="Wikipedia page about John Smith",

                Title="Wikipedia article about John Smith" });

 

            documentsToIndex.Add(new DocumentMetaData {

                Author = "Unknown Author",

                Url = appURL+"/docs/1.pdf",

                Type = "PDF",

                Description="PDF about the explorer, John Smith",

                Title="PDF about John Smith" });

 

 

            //Create a configuration object using the index directory path where we want to store the index files.

            var config = new Keyoti.SearchEngine.Configuration{IndexDirectory = System.Web.Hosting.HostingEnvironment.MapPath("~/App_Data/Index" )};

 

            //Force the documents to be reindexed even if they haven't changed, just for testing.

            config.IgnoreLastModifiedDate = true;

            config.UseFileSizeToIdentifyChange = false;

 

 

            DocumentIndex documentIndex = null;

            try {

                documentIndex = new DocumentIndex(config);

 

               

               

                //Iterate our meta data, and index the documents

                foreach(DocumentMetaData dm in documentsToIndex){

                    var doc = new Keyoti.SearchEngine.Documents.Document(dm.Url, config);

 

                    var dt = doc.ReadText();

                    //Add our meta data to the indexed content

                    dt.AppendText(dm.Author+" ", config);

                    dt.AppendText(dm.Description+" ", config);

                    doc.Title = dm.Title;

 

                    //Add extra data to the CustomData so we can use it as the results are generated

                    doc.CustomData="Title="+dm.Title+"&Type="+dm.Type;

 

                    documentIndex.AddDocument(doc);

 

                }

 

 

            } finally

            {

                if(documentIndex!=null)

                    documentIndex.Close();

            }

 

 

            return View();

        }

 

In the code above the meta-data is appended to the indexed text using

var doc = new Keyoti.SearchEngine.Documents.Document(dm.Url, config);

 

var dt = doc.ReadText();

//Add our meta data to the indexed content

dt.AppendText(dm.Author+" ", config);

dt.AppendText(dm.Description+" ", config);

 

 

And we want to use the title from the meta-data;

doc.Title = dm.Title;

 

By setting the .CustomData property to a (URL encoding based) formatted string, we can pull out the file type (Type) as the results are shown.

doc.CustomData="Title="+dm.Title+"&Type="+dm.Type;

 

To show the results, some templating is required in order to show the filetype

<div id="sew_searchResultControl">

    <div id="sew_resultHeader">

    </div>

    <div id="sew_resultList">

 

        <div id="sew_resultItemTEMPLATE" class="sew_resultItem">

            <span class="sew_resultItemLink"><a href="${UriStringWithKeywords}">${Title}</a></span>

            <span class="sew_resultItemSummary">${Summary}</span>

            <span class="sew_previewResultWrapper">

                <img alt="Click to preview the document text" src="/Keyoti_SearchEngine_Web_Common/ResultPreview_Expander_Closed.png"

                     onclick="keyotiSearchResultPreviewer.toggleResultPreview(this,

                    '${UriStringAsStored}',

                    '/Keyoti_SearchEngine_Web_Common/ResultPreview_Expander_Closed.png',

                    '/Keyoti_SearchEngine_Web_Common/ResultPreview_Expander_Opened.png')" />

                <span class="sew_previewResultContent">Loading document...</span>

            </span>

            <div style="clear:both; height:1px;"></div>

            <span class="sew_resultItemURL">${UriString}</span>

            <span class="sew_location">${Location}</span>

            <span class="sew_location">${Content}</span>

 

            <span class="${CustomDataDictionary.TypeDisplayClass}">File type: ${CustomDataDictionary.Type}</span>

        </div>

 

    </div>

 

    <div id="sew_resultFooter"></div>

 

</div>

 

 


Related Questions:

Attachments:

No attachments were found.