Title Back Colour Keyoti Title Line Title Curve
Blue Box Top

exclude files from indexing. - SearchUnit - Forum

Welcome Guest Search | Active Topics | Log In | Register

Options
agendlin
#1 Posted : Wednesday, March 12, 2014 8:58:07 PM
Rank: Member

Groups: Registered

Joined: 2/23/2014
Posts: 24
Hi Jim,
How can I avoid to index files with extantions like (.js, .mdb, .dll) . Is there configiration I can use to exclude this type of files from indexing. The plugin that I use to create lastModifiedDate Crashed for some reason when trying to add date to custom data from httpwebResponse.

Thanks,
Alex

ag
ag
Jim
#2 Posted : Wednesday, March 12, 2014 9:44:18 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Sorry, I missed your other post where you mentioned a problem with this line

lastResponseLastModified = (CType(e.ActionData.Data, HttpWebResponse)).LastModified

Can I see what your code is currently for the plugin, specifically how you're working with the date - is it still the same as above?

I think it would be good to solve the problem at the plugin, rather than avoiding the files which cause the problem - just so that your code is more robust in case in the future a file comes up that you haven't avoided.


But, to answer your question, the web crawler should automatically ignore resource file types, .js, images etc.

For filesystem import you can specifiy what to import and what no to import (see File-system Document Store)
http://keyoti.com/produc...UserGuide/Importing.htm


Best
Jim




-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


agendlin
#3 Posted : Wednesday, March 12, 2014 10:20:55 PM
Rank: Member

Groups: Registered

Joined: 2/23/2014
Posts: 24
Ok, Here is my plugin. I try to put the logic not to add unwanted files in MetaCustomData.

Public Sub CentralEventDispatcher_Action(ByVal sender As Object, ByVal e As ActionEventArgs)

Try

'Log everything - comment this line after debugging to optimize speed.
'DataAccess.Log.WriteLogEntry("PluginProject", "Ready", conf)

If e.ActionData.Name = ActionName.ResponseFromServerReceived Then

lastResponseURI = CType(e.ActionData.Data, Net.HttpWebResponse).ResponseUri
lastResponseLastModified = (CType(e.ActionData.Data, Net.HttpWebResponse)).LastModified

End If

If e.ActionData.Name = ActionName.ReadText Then

If lastResponseURI.ToString.IndexOf(".dll", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".css", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".js", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".jpg", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".gif", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".png", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".xml", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".rpt", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".ini", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".ssi", System.StringComparison.OrdinalIgnoreCase) = -1 And _
lastResponseURI.ToString.IndexOf(".swf", System.StringComparison.OrdinalIgnoreCase) = -1 Then

If lastResponseURI = CType(sender, Document).Uri Then
CType(e.ActionData.Data, DocumentText).MetaCustomData = lastResponseLastModified.ToShortDateString()
End If

End If




End If



Catch ex As Exception

Throw New Exception("Source Description: " & ex.Message & ControlChars.CrLf & _
"Source Trace: " & ex.StackTrace)

End Try

End Sub

thanks,
Alex
ag
Jim
#4 Posted : Wednesday, March 12, 2014 10:47:31 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Thanks, what was the exception? If it's a null reference, which it could be for certain types then the solution would be just to check for lastResponseLastModified being Nothing before using it.

If lastResponseURI = CType(sender, Document).Uri And Not lastResponseLastModified Is Nothing Then
CType(e.ActionData.Data, DocumentText).MetaCustomData = lastResponseLastModified.ToShortDateString()
End If


-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


agendlin
#5 Posted : Thursday, March 13, 2014 2:49:17 PM
Rank: Member

Groups: Registered

Joined: 2/23/2014
Posts: 24
Hi Jim,
I still have this error: Object reference not set to an instance of an object. Even that I checked for lastResponseLastModified being nothing.

Thanks,
Alex

ag
ag
Jim
#6 Posted : Thursday, March 13, 2014 4:24:19 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Alex, on which line? Perhaps it's actually lastResponseURI that is Nothing?

Thanks
Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


agendlin
#7 Posted : Thursday, March 13, 2014 5:04:56 PM
Rank: Member

Groups: Registered

Joined: 2/23/2014
Posts: 24
No That was e.actionData.Data is nothing I actually got it work. But now I have this:
Exception: A fatal exception has prevented the search engine from continuing. counter: 2000

its getting frustrating. I will disconnect plugin to see if this is plugin issue.

ag
ag
Jim
#8 Posted : Thursday, March 13, 2014 5:18:07 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
The fatal exception probably means that your filesystem import has wrong parameters (the virtual folder is wrong) - or there's a server error coming when it tries to index the files.

Can you enable logging in the configuration, try your import again and then send all .txt files (there should be several) to me in a ZIP via support at keyoti.com

Thanks
Jim

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Jim
#9 Posted : Thursday, March 13, 2014 10:18:36 PM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
Thanks for the files - so there are server errors when it makes requests for files like

http://slwebdev.xxxx/_vti_cnf/Global.asax
and other resource type files, dlls etc.

Each of these errors is counted and when it gets to 2000 it stops - you can change that limit in the configuration, just scan it for the number 2000 and increase to whatever you like, its only there for sanity.

Better though is to narrow down what you want to import, with the target match list, in the import params (click 'More') - add items like .doc, .pdf .aspx

One other thing to note, with ASPX pages, they don't always import very well from the filesystem import, because GET parameters obviously don't exist for the file sys import, so if you have links like default.aspx?id=1234 then you'll need a website import to pick them up.

Jim






-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


agendlin
#10 Posted : Friday, March 14, 2014 2:27:50 AM
Rank: Member

Groups: Registered

Joined: 2/23/2014
Posts: 24
Thanks for you reply, that was really helpful. Here is another error message:

Exception: The file-system import operation had 474 problems reading documents, double check that the virtual folder parameter correctly points to the local folder.

Is this the same thing as a fatal error? Or this just only for information?

Thanks,
Alex

ag
ag
Jim
#11 Posted : Friday, March 14, 2014 2:52:43 AM
Rank: Advanced Member

Groups: Administrators, Registered

Joined: 8/13/2004
Posts: 2,667
Location: Canada
It's just for information, if there are only 474 files being imported, then it's a problem, but if you imported 1000 files, it's probably fine. You can always check in the Reader.txt log file, it will list the server errors, and anything suspicious, you can try out the URL in a browser.

-your feedback is helpful to other users, thank you!

-your feedback is helpful to other users, thank you!


Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.




About | Contact | Site Map | Privacy Policy

Copyright © 2002- Keyoti Inc.