Crawls web-sites to find new links.

C# | C# | Visual Basic | Visual Basic | Visual C++ | Visual C++ | F# | F# |
public class WebSiteSpider
public class WebSiteSpider
Public Class WebSiteSpider
Public Class WebSiteSpider
public ref class WebSiteSpider
public ref class WebSiteSpider
type WebSiteSpider = class end
type WebSiteSpider = class end

All Members | Constructors | Methods | Properties | Events | |
Icon | Member | Description |
---|---|---|
![]() | WebSiteSpider(Configuration) |
Creates a new instance.
|
![]() | AddedDocument |
Fired when a document has been added to the index.
|
![]() | AddingDocument |
Fired when a document is about to be added to the index.
|
![]() | CancelCrawl()()()() |
Cancels the crawl.
|
![]() | Close()()()() |
Closes the index for use
|
![]() | Configuration |
Gets the instance of the Configuration class that holds the settings to be used.
|
![]() | Crawl()()()() | Obsolete.
Starts the spidering process, which will recrawl all the existing documents for new links.
|
![]() | Crawl(String, Boolean) | Obsolete.
Starts the spidering process, which will recrawl all the existing documents for new links.
|
![]() | Crawl(String, Boolean, WebsiteBasedIndexableSourceRecord) | Obsolete.
Starts the spidering process, which will recrawl all the existing documents for new links.
|
![]() | Crawl(ArrayList) | Obsolete.
Crawls all URL strings in urlList, adds new document URLs (including those from urlList)
to the 'database' and returns list of newly found URLs.
|
![]() | Crawl(ArrayList, ArrayList) | Obsolete.
Crawls all URL strings in urlList, adds new document URLs (including those from urlList)
to the 'database' and returns list of newly found URLs.
|
![]() | Crawl(ArrayList, ArrayList, WebsiteBasedIndexableSourceRecord) | Obsolete.
Crawls all URL strings in urlList, adds new document URLs (including those from urlList)
to the 'database' and returns list of newly found URLs.
|
![]() | CreateNewDocument(DocumentRecord) |
Called whenever a Document object is created, override this method to use Document subclasses.
|
![]() | DataAccess |
The data-access layer.
|
![]() | DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList) |
Crawls all URLs in urlList, and returns list of all newly found
URLs (whether they are known to the document index or not, and not including urls in urlList)
|
![]() | DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList, ArrayList) |
Crawls all URLs in urlList, and adds found URLs to discoveredURLs list (whether they are known to the document index or not, and not including urls in urlList)
|
![]() | DiscoverLinkedURLs(ArrayList, ArrayList, ArrayList, ArrayList, ArrayList, Boolean, ArrayList, WebSiteSpider..::..DiscoveredDocumentHandler) |
Crawls all URLs in urlList, and adds found URLs to discoveredURLs list (whether they are known to the document index or not, and not including urls in urlList)
|
![]() | DocumentShouldBeCrawled(Document) | Obsolete.
Method called from Crawl - checks if the current document type is
specified as to be crawled in config file, returning true - if so
(files types from config file are stored in class member when the object is instantiated)
|
![]() | Equals(Object) | (Inherited from Object.) |
![]() | Finalize()()()() | (Inherited from Object.) |
![]() | GetHashCode()()()() | (Inherited from Object.) |
![]() | GetIndexedDocuments()()()() | Obsolete.
Gets all the indexed documents from the document table, as an ArrayList of Document objects
|
![]() | GetType()()()() | (Inherited from Object.) |
![]() | IsDocumentToBeCrawled(Document, Boolean, Boolean, String, ArrayList, ArrayList) |
Method called during crawl - checks if the current document type is
specified as to be crawled in config file, returning true - if so
(files types from config file are stored in class member when the object is instantiated)
|
![]() | MemberwiseClone()()()() | (Inherited from Object.) |
![]() | NewLinkNo |
Number of new links found so far.
|
![]() | OnReaderExceptionOccurred(ReaderException, Uri) |
Called whenever a readerException is caught
|
![]() | Open()()()() |
Opens the index for use.
|
![]() | PrefilterDocumentToCrawl(Document) |
Called during crawl to identify if the document might be crawled. Will return false if the document won't be crawled based on it's URL.
|
![]() | ProcessedLinkNo |
The number of processed links.
|
![]() | ReaderExceptionOccurred | Fired whenever the (web) reader encounters a network exception, eg. 404 (this can be useful for identifying dead links) |
![]() | ToString()()()() | (Inherited from Object.) |

Before any operations (Crawl, GetIndexedDocuments) can be performed the index must
be opened with Open (otherwise a KeyotiFatalException is thrown) - and should be closed afterwards with Close.

Object | |
![]() | WebSiteSpider |
Assembly: Keyoti4.SearchEngine.Core (Module: Keyoti4.SearchEngine.Core.dll) Version: 2015.6.15.120