Keyoti SearchUnit API Docs
WebSiteSpider Class
API DocumentationKeyoti.SearchEngine.Index.IndexableSourcesWebSiteSpider
Keyoti SearchUnit v6
Crawls web-sites to find new links.
Declaration Syntax
C#Visual Basic
public class WebSiteSpider
Public Class WebSiteSpider
Members
All MembersConstructorsMethodsPropertiesEvents



IconMemberDescription
WebSiteSpider(Configuration)
Creates a new instance.

AddedDocument
Fired when a document has been added to the index.

AddingDocument
Fired when a document is about to be added to the index.

CancelCrawl()()()()
Cancels the crawl.

Close()()()()
Closes the index for use

Configuration
Gets the instance of the Configuration class that holds the settings to be used.

Crawl()()()() Obsolete.
Starts the spidering process, which will recrawl all the existing documents for new links.

Crawl(String, Boolean) Obsolete.
Starts the spidering process, which will recrawl all the existing documents for new links.

Crawl(String, Boolean, WebsiteBasedIndexableSourceRecord) Obsolete.
Starts the spidering process, which will recrawl all the existing documents for new links.

Crawl(ArrayList) Obsolete.
Crawls all URL strings in urlList, adds new document URLs (including those from urlList) to the 'database' and returns list of newly found URLs.

Crawl(ArrayList, ArrayList) Obsolete.
Crawls all URL strings in urlList, adds new document URLs (including those from urlList) to the 'database' and returns list of newly found URLs.

Crawl(ArrayList, ArrayList, WebsiteBasedIndexableSourceRecord) Obsolete.
Crawls all URL strings in urlList, adds new document URLs (including those from urlList) to the 'database' and returns list of newly found URLs.

CreateNewDocument(DocumentRecord)
Called whenever a Document object is created, override this method to use Document subclasses.

DataAccess
The data-access layer.

DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList)
Crawls all URLs in urlList, and returns list of all newly found URLs (whether they are known to the document index or not, and not including urls in urlList)

DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList, ArrayList)
Crawls all URLs in urlList, and adds found URLs to discoveredURLs list (whether they are known to the document index or not, and not including urls in urlList)

DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList, ArrayList, ArrayList, Boolean, ArrayList, WebSiteSpider..::..DiscoveredDocumentHandler)
Crawls all URLs in urlList, and adds found URLs to discoveredURLs list (whether they are known to the document index or not, and not including urls in urlList)

DocumentShouldBeCrawled(Document) Obsolete.
Method called from Crawl - checks if the current document type is specified as to be crawled in config file, returning true - if so (files types from config file are stored in class member when the object is instantiated)

Equals(Object)
Determines whether the specified Object is equal to the current Object.
(Inherited from Object.)
Finalize()()()()
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
ForceHttps(Uri)
Converts an HTTP Uri to HTTPS.

GetHashCode()()()()
Serves as a hash function for a particular type.
(Inherited from Object.)
GetIndexedDocuments()()()() Obsolete.
Gets all the indexed documents from the document table, as an ArrayList of Document objects

GetType()()()()
Gets the type of the current instance.
(Inherited from Object.)
IsDocumentToBeCrawled(Document, Boolean, Boolean, String, ArrayList, ArrayList)
Method called during crawl - checks if the current document type is specified as to be crawled in config file, returning true - if so (files types from config file are stored in class member when the object is instantiated)

MemberwiseClone()()()()
Creates a shallow copy of the current Object.
(Inherited from Object.)
NewLinkNo
Number of new links found so far.

OnReaderExceptionOccurred(ReaderException, Uri)
Called whenever a readerException is caught

Open()()()()
Opens the index for use.

PrefilterDocumentToCrawl(Document)
Called during crawl to identify if the document might be crawled. Will return false if the document won't be crawled based on it's URL.

ProcessedLinkNo
The number of processed links.

ReaderExceptionOccurred
Fired whenever the (web) reader encounters a network exception, eg. 404 (this can be useful for identifying dead links)

ToString()()()()
Returns a string that represents the current object.
(Inherited from Object.)
Remarks
Before any operations (Crawl, GetIndexedDocuments) can be performed the index must be opened with Open (otherwise a KeyotiFatalException is thrown) - and should be closed afterwards with Close.
Inheritance Hierarchy
Object
WebSiteSpider

Assembly: Keyoti4.SearchEngine.Core (Module: Keyoti4.SearchEngine.Core.dll) Version: 2022.8.22.610