Title Back Colour Keyoti Title Line Title Curve
Blue Box Top

PDF Document title - SearchUnit - Forum

Welcome Guest Search | Active Topics | Log In | Register

2 Pages 12>
Options
Ankur Jaini
#1 Posted : Saturday, March 22, 2014 8:35:54 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
Hi,
I am try to index pdf document from my NAS device...it will work properly, but the problem is.. i am trying to index my result as document title...i have set Configuration option "PDF Document Title "= DocumentTitleField. my document title stored on NAS are like this "SanMacros(22-01-2013)", but when i search it shows result like this "Layout1", "Layout2" and so on... instead of "SanMacros(22-01-2013)"... please help me try to find out solution....waiting for reply...Thanks

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Jim
#2 Posted : Monday, March 24, 2014 4:28:09 AM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada
Hi Ankur, would you be able to send one of the PDFs where this happens to me via support at keyoti.com please?

Thanks
Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Jim
#3 Posted : Friday, March 28, 2014 6:03:32 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada
Thanks - looks like we're picking up the wrong /Title tag in the document.

I'm going to email you directly about this with regards to a solution.

Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Ankur Jaini
#4 Posted : Friday, March 28, 2014 7:50:43 PM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
Thanks a ton jim, m looking forward for your suggestions regarding this problem...

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Jim
#5 Posted : Saturday, March 29, 2014 3:22:10 AM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada
You're welcome - please download here http://keyoti.com/downlo...2012-PDFTitleChoice.msi

The actual issue was that your PDF has 2 types of title meta data in it - so you can now specify which to use. To do that you need to set PdfDocumentTitleSource to PdfDocumentTitleSource.DocumentTitleFieldPreferNonDC in Configuration.

You can edit the configuration in index manager or as described here http://keyoti.com/produc...Guide/Configuration.htm


Regarding your other question, about how to search by creation date:

To access creation date you will need a plug-in (don't worry they're pretty easy). The plugin will read the meta data from the PDF during indexing and get the creation date, you can then do what you need with it (eg you can append it to the body text so that it is searchable, or you can use it for sorting by storing it in Custom Data, as described here http://keyoti.com/produc...ith%20Custom%20Data.htm)


Eg. a plugin that would get the creation date and use it as Custom Data (so you can order by date)

Code:

using System;
using System.Collections.Generic;
using System.Text;
using Keyoti.SearchEngine.Events;
using Keyoti.SearchEngine.Documents;
using Keyoti.SearchEngine.DataAccess;
using Keyoti.SearchEngine.Index.IndexableSources;
using System.Collections;
using Keyoti.SearchEngine.DataAccess.IndexableSourceRecords;

namespace Keyoti.SearchEngine
{
    public class ExternalEventHandler
    {
        IEventDispatcher dispatcher;
        Configuration conf;

        /// <summary>
        /// New, attaches event handlers.
        /// </summary>
        /// <param name="dispatcher">The object which fires events that this plug-in handles.</param>
        /// <param name="conf">The engine configuration.</param>
        public ExternalEventHandler(IEventDispatcher dispatcher, Configuration conf)
        {
            Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("Plug-in Template Project", "Initialized", conf);
            dispatcher.Action += new ActionEventHandler(dispatcher_Action);
            dispatcher.NeedObject += new NeedObjectEventHandler(dispatcher_NeedObject);
            this.dispatcher = dispatcher;
            this.conf = conf;
           
        }

        /// <summary>
        /// Removes the handlers, so that the object can be disposed of. 
        /// </summary>
        /// <remarks>This is called when the DLL is unloaded.</remarks>
        public void DetachHandlers()
        {
            if (dispatcher != null)
            {
                dispatcher.Action -= new ActionEventHandler(dispatcher_Action);
                dispatcher.NeedObject -= new NeedObjectEventHandler(dispatcher_NeedObject);
            }


        }

        /// <summary>
        /// Handles ACTION events.  This method is called as the engine performs various actions.
        /// </summary>
        public void dispatcher_Action(object sender, ActionEventArgs e)
        {
            //Log everything - comment this line after debugging to optimize speed.
            Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("Plug-in Template Project", e.ActionData.Name.ToString(), conf);


            try
            {

                /* Some examples
                 * Also see Help for details on what data is available for different events */
                switch (e.ActionData.Name)
                {
                   
                 <b>   case ActionName.ReadingText:

                        DocumentText documentText = e.ActionData.Data as DocumentText;
                        string creationDate = documentText.MetaData["CreateDate"].ToString();// = "D:20130111170238-06'00'" ; (D:YYYYMMDDHHmmSSOHH'mm')
                        documentText.MetaCustomData = creationDate;                       
                        break;
</b>
                   

                }

            }
            catch (Exception ex)
            {
                Keyoti.SearchEngine.DataAccess.Log.WriteLogEntry("Plug-in Template Project", "Exception: "+ex.ToString(), conf);
            }
        }

        /// <summary>
        /// Handles need object events.  This method is called as the engine requires new instances of various classes.
        /// </summary>
        public void dispatcher_NeedObject(object sender, NeedObjectEventArgs e)
        {
            //leave blank unless needed, will use default objects.
        }

    }
}



Or to make it append the date change

Code:

documentText.MetaCustomData = creationDate;


to

Code:

documentText.AppendText("Created on: "+creationDate, conf);



Just remember that the creation date format is YYYYMMDDHHmmSSOHH'mm', so you might want to DateTime.Parse first to make it something more user friendly.

If you'd like my actual plugin project let me know.





Best
Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Ankur Jaini
#6 Posted : Saturday, March 29, 2014 5:23:25 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
thanks a ton jim...this look really helpful hope this will solve my issues...i will notify you if this will work...thanks once again...

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#7 Posted : Saturday, March 29, 2014 5:36:25 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
one more thing PdfDocumentTitleSource.DocumentTitleFieldPreferNonDC where this code will be used in configuration.xml file in index directory or i have made it progrmatically...

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#8 Posted : Saturday, March 29, 2014 5:48:05 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
ahhh i got it....now there is a option in index manager as DocumentTitleFieldPreferNonDC

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#9 Posted : Saturday, March 29, 2014 6:06:43 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
jim...will you please elaborate how to use this plugin...title issue resolved by the solution you have given to me...thanks a ton for that...but now only issue remains sort as created date...will you please mail me an complete example of that as per my requirement you also have link of my nas device from where i want to index my files please will you do it??? its a humble request and i am very thankful if you'll do it for me...so that i can implement it in my application...thanks a lot

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#10 Posted : Saturday, March 29, 2014 7:08:10 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
when you are available then ping me on sublimeitsolution13 @skype...please help for this solution also....

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#11 Posted : Saturday, March 29, 2014 8:03:13 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
Hi jim, i have mailed you my test code will you please make some changes in it to solve create date issue ....there is 2 paes in it SearchPage and ResultPgae, result page is used to show result and search just send search keyword to it...i hope you haven't mind to do that because i know its like to play with toys for you...:)

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#12 Posted : Saturday, March 29, 2014 9:02:10 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
jim i have created a plugin as you said with your given code...but it shows all result in a random order....i want in a proper order descending / ascending order as when we index docs with modified date it will arrange all records in descending order automatically, i don't want to give option of sorting to user i want that it should arrange all docs in descending order ..is it possible...

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#13 Posted : Saturday, March 29, 2014 12:52:14 PM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
please reply soon....

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#14 Posted : Saturday, March 29, 2014 3:02:55 PM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
and example given on this link http://keyoti.com/produc...ith%20Custom%20Data.htm
is also not working properly it shows result in a random order..

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Jim
#15 Posted : Saturday, March 29, 2014 11:29:40 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada
Thanks for emailing your prototype project, that's appreciated. So you got the plugin working fine and the meta custom data was set to the create date, the only thing that was missing was parsing the PDF date to DateTime., and I think realizing that 'Sr1' in our article relates to 'SearchResult1' in your code.


Here's the complete result page codebehind

Code:

using System;
using System.Collections.Generic;
using System.Text;
using Keyoti.SearchEngine.Events;
using Keyoti.SearchEngine.Documents;
using Keyoti.SearchEngine.DataAccess;
using Keyoti.SearchEngine.Index.IndexableSources;
using System.Collections;
using Keyoti.SearchEngine.DataAccess.IndexableSourceRecords;
using Keyoti.SearchEngine;
using System.Web.UI.WebControls;


public partial class ResultPage : System.Web.UI.Page
{
    bool sortByDate = false;

    SortDirection sortDirection = SortDirection.Descending;
   
    protected void Page_Load(object sender, EventArgs e)
    {
        SearchResult1.FilterLoadLevel = Keyoti.SearchEngine.Search.FilterLoadLevel.Everything;
        SearchResult1.ItemCreated += Sr1_ItemCreated;

        SearchResult1.Configuration.CentralEventDispatcher.Action += new Keyoti.SearchEngine.Events.ActionEventHandler(CentralEventDispatcher_Action);
    }
    protected void sortDownBT_Click(object sender, EventArgs e)
    {

        sortByDate = true;

        sortDirection = SortDirection.Descending;

        SearchResult1.InvalidateChildControlHierarchy();

    }

    protected void sortUpBT_Click(object sender, EventArgs e)
    {

        sortByDate = true;

        sortDirection = SortDirection.Ascending;

        SearchResult1.InvalidateChildControlHierarchy();

    }
    protected void Sr1_ItemCreated(object sender, Keyoti.SearchEngine.Web.SearchResultItemEventArgs e)
    {

        if (e.Item is Keyoti.SearchEngine.Web.ResultItem)
        {

            Label dateLabel = e.Item.FindControl("dateLabel") as Label;

            if (dateLabel != null)
            {

                try
                {
                    DateTime docDate = ParsePDFMetaDate((e.Data as Keyoti.SearchEngine.Search.ResultItem).DocumentRecord.CustomData);

                    dateLabel.Text = docDate.ToLongDateString();

                }

                catch (FormatException)
                {

                    //wasn't a date

                }

            }

        }

    }

    private static DateTime ParsePDFMetaDate(string rawString)
    {
        //"D:20130114062009-06'00
        string metaDate = rawString.Substring(2, 8);
        DateTime docDate = DateTime.ParseExact(metaDate, "yyyyMMdd", System.Globalization.CultureInfo.InvariantCulture);

        return docDate;
    }
    void CentralEventDispatcher_Action(object sender, Keyoti.SearchEngine.Events.ActionEventArgs e)
    {

        if (sortByDate && e.ActionData.Name == Keyoti.SearchEngine.Events.ActionName.ResultItemsFinalized)
        {

            Keyoti.SearchEngine.Utils.ResultItemList resultItems = e.ActionData.Data as Keyoti.SearchEngine.Utils.ResultItemList;

            resultItems.Sort(new DocumentDateComparer(sortDirection));

        }

    }

    class DocumentDateComparer : IComparer<Keyoti.SearchEngine.Search.ResultItem>
    //NOTE: .NET1 users, the above line should be "class DocumentDateComparer : IComparer"
    {

        SortDirection sortDirection;

        public DocumentDateComparer(SortDirection sortDirection)
        {

            this.sortDirection = sortDirection;

        }


        public int Compare(Keyoti.SearchEngine.Search.ResultItem x, Keyoti.SearchEngine.Search.ResultItem y)
        //NOTE: .NET1 users, the above line should be "public int Compare(object x, object y)"
        {

            string xData = (x as Keyoti.SearchEngine.Search.ResultItem).DocumentRecord.CustomData;

            string yData = (y as Keyoti.SearchEngine.Search.ResultItem).DocumentRecord.CustomData;

            if (xData.Length == 0 && yData.Length > 0) return 1;

            if (yData.Length == 0 && xData.Length > 0) return -1;

            if (xData.Length == 0 && yData.Length == 0) return 0;

            DateTime xDate, yDate;

            try
            {

                xDate = ParsePDFMetaDate(xData); //DateTime.Parse(xData);

            }

            catch (FormatException)
            {

                //not a date, so make it historic so that it ends up lower in the sort order

                xDate = DateTime.MinValue;

            }

            try
            {

                yDate = ParsePDFMetaDate(yData); //DateTime.Parse(yData);

            }

            catch (FormatException)
            {

                //not a date, so make it historic so that it ends up lower in the sort order

                yDate = DateTime.MinValue;

            }

            if (sortDirection == SortDirection.Descending)

                return yDate.CompareTo(xDate);

            else

                return xDate.CompareTo(yDate);

        }

    }

}



I also added dateLabel to the ResultItem template, but you don't have to do that.

Code:

<ResultItemTemplate>
                <div class="SEResultItem">
                    <table border="0">
                        <tr>
                            <td class="SEResultItemLink"
                                style="font-family:sans-serif, Verdana, Arial, Helvetica; font-size:10pt; ">
                                <a href="<%# Container.Uri %>"><%# Container.Title %></a>
            <b>                    <asp:Label runat=server ID="dateLabel" /></b>
                            </td>



to have it sort by date without clicking a button, just set sortByDate=true initially in the declaration instead of false.

bool sortByDate = true;

Working project will be emailed to you.

Best
Jim



-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Ankur Jaini
#16 Posted : Monday, March 31, 2014 5:34:28 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
thanks a lot jim, i appreciate your efforts and your support are commendable fro me, its works exact as i want...thanks once again ...i am very happy that your product and your support is amazing.... please share price catalog with me....for search engine

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Jim
#17 Posted : Monday, March 31, 2014 5:43:27 AM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada
You're welcome - glad to help.

Licensing -> http://keyoti.com/produc...otNetWeb/licensing.html

Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Ankur Jaini
#18 Posted : Monday, March 31, 2014 6:38:53 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
your code shows as error if i reindex my files error is: "startIndex cannot be larger than length of string.
Parameter name: startIndex"

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Ankur Jaini
#19 Posted : Monday, March 31, 2014 10:12:11 AM
Rank: Member

Groups: Registered

Joined: 3/22/2014
Posts: 18
jim test keys is not working on server?? for how log test key run on live server?

Sr.Software Developer
Sublime IT Solutions.
Sr.Software Developer
Sublime IT Solutions.
Jim
#20 Posted : Monday, March 31, 2014 4:06:08 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,669
Location: Canada
Ankur,

The string problem is likely here
Code:

private static DateTime ParsePDFMetaDate(string rawString)
    {
        //"D:20130114062009-06'00
        string metaDate = rawString.Substring(2, 8);
        DateTime docDate = DateTime.ParseExact(metaDate, "yyyyMMdd", System.Globalization.CultureInfo.InvariantCulture);

        return docDate;
    }


Needs to change to


Code:

private static DateTime ParsePDFMetaDate(string rawString)
    {
        if(rawString.Length>8 && rswString.StartsWith("D:"){
        //"D:20130114062009-06'00
        string metaDate = rawString.Substring(2, 8);
        DateTime docDate = DateTime.ParseExact(metaDate, "yyyyMMdd", System.Globalization.CultureInfo.InvariantCulture);

        return docDate;
} else return DateTime.Now;
    }


this is because some documents won't have a PDF meta date, so just return today for those.

As for the test key, they work for 30 days, did you install Pro or Lite?

You can get a new key here
http://keyoti.com/produc...ult.aspx?SKU=SEWEBP.NET

Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


2 Pages 12>
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.




About | Contact | Site Map | Privacy Policy

Copyright © 2002- Keyoti Inc.