WebSiteSpider Class

Declaration Syntax

C#	Visual Basic

public class WebSiteSpider

Public Class WebSiteSpider

Members

Icon	Member	Description
	WebSiteSpider(Configuration)	Creates a new instance.
	AddedDocument	Fired when a document has been added to the index.
	AddingDocument	Fired when a document is about to be added to the index.
	CancelCrawl()()()()	Cancels the crawl.
	Close()()()()	Closes the index for use
	Configuration	Gets the instance of the Configuration class that holds the settings to be used.
	Crawl()()()()	Obsolete. Starts the spidering process, which will recrawl all the existing documents for new links.
	Crawl(String, Boolean)	Obsolete. Starts the spidering process, which will recrawl all the existing documents for new links.
	Crawl(String, Boolean, WebsiteBasedIndexableSourceRecord)	Obsolete. Starts the spidering process, which will recrawl all the existing documents for new links.
	Crawl(ArrayList)	Obsolete. Crawls all URL strings in urlList, adds new document URLs (including those from urlList) to the 'database' and returns list of newly found URLs.
	Crawl(ArrayList, ArrayList)	Obsolete. Crawls all URL strings in urlList, adds new document URLs (including those from urlList) to the 'database' and returns list of newly found URLs.
	Crawl(ArrayList, ArrayList, WebsiteBasedIndexableSourceRecord)	Obsolete. Crawls all URL strings in urlList, adds new document URLs (including those from urlList) to the 'database' and returns list of newly found URLs.
	CreateNewDocument(DocumentRecord)	Called whenever a Document object is created, override this method to use Document subclasses.
	DataAccess	The data-access layer.
	DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList)	Crawls all URLs in urlList, and returns list of all newly found URLs (whether they are known to the document index or not, and not including urls in urlList)
	DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList, ArrayList)	Crawls all URLs in urlList, and adds found URLs to discoveredURLs list (whether they are known to the document index or not, and not including urls in urlList)
	DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList, ArrayList, ArrayList, Boolean, ArrayList, WebSiteSpider..::..DiscoveredDocumentHandler)	Crawls all URLs in urlList, and adds found URLs to discoveredURLs list (whether they are known to the document index or not, and not including urls in urlList)
	DocumentShouldBeCrawled(Document)	Obsolete. Method called from Crawl - checks if the current document type is specified as to be crawled in config file, returning true - if so (files types from config file are stored in class member when the object is instantiated)
	Equals(Object)	Determines whether the specified Object is equal to the current Object. (Inherited from Object.)
	Finalize()()()()	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.)
	ForceHttps(Uri)	Converts an HTTP Uri to HTTPS.
	GetHashCode()()()()	Serves as a hash function for a particular type. (Inherited from Object.)
	GetIndexedDocuments()()()()	Obsolete. Gets all the indexed documents from the document table, as an ArrayList of Document objects
	GetType()()()()	Gets the type of the current instance. (Inherited from Object.)
	IsDocumentToBeCrawled(Document, Boolean, Boolean, String, ArrayList, ArrayList)	Method called during crawl - checks if the current document type is specified as to be crawled in config file, returning true - if so (files types from config file are stored in class member when the object is instantiated)
	MemberwiseClone()()()()	Creates a shallow copy of the current Object. (Inherited from Object.)
	NewLinkNo	Number of new links found so far.
	OnReaderExceptionOccurred(ReaderException, Uri)	Called whenever a readerException is caught
	Open()()()()	Opens the index for use.
	PrefilterDocumentToCrawl(Document)	Called during crawl to identify if the document might be crawled. Will return false if the document won't be crawled based on it's URL.
	ProcessedLinkNo	The number of processed links.
	ReaderExceptionOccurred	Fired whenever the (web) reader encounters a network exception, eg. 404 (this can be useful for identifying dead links)
	ToString()()()()	Returns a string that represents the current object. (Inherited from Object.)

Remarks

Before any operations (Crawl, GetIndexedDocuments) can be performed the index must be opened with Open (otherwise a KeyotiFatalException is thrown) - and should be closed afterwards with Close.

Inheritance Hierarchy

Object
	WebSiteSpider

All Members	Constructors	Methods	Properties	Events
Public Protected		Instance Static		Declared Inherited