Keyoti SearchUnit API Docs
Parser Class
API DocumentationKeyoti.SearchEngine.DocumentsParser
Keyoti SearchUnit v6
Abstract base class for document parsers.
Declaration Syntax
C#Visual Basic
public abstract class Parser
Public MustInherit Class Parser
Members
All MembersConstructorsMethodsProperties



IconMemberDescription
Parser(Configuration)
Configuration
Gets the instance of the Configuration class that holds the settings to be used.

CopyStream(Stream)
Encoding
The character encoding used in the document Stream, if applicable.

Equals(Object)
Determines whether the specified Object is equal to the current Object.
(Inherited from Object.)
Finalize()()()()
Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object.)
FindUrlsInPlainText(String)
GetHashCode()()()()
Serves as a hash function for a particular type.
(Inherited from Object.)
GetHiddenFooter(Uri, String, String)
Creates a footer with additional (hidden) indexed text, based on Uri, Title, meta tags etc.

GetNextWord(String)
Returns the next 'word' in rawBody, is iterative, so subsequent calls move to consecutive words.

GetType()()()()
Gets the type of the current instance.
(Inherited from Object.)
GetWordsInUri(Uri)
Returns list of words as strings in an ArrayList, that are in the Uri

IsCurrentWordInTitle()()()()
Returns whether the word last returned by GetNextWord is part of the title.

IsInIgnoredRegion(ArrayList)
Determines whether current word (at wordStart) is in an ignored region.

IsStreamNeeded()()()() Obsolete.
Whether the parser would need a stream to be passed to it in order to perform a ReadText or ReadLinks operation.

MemberwiseClone()()()()
Creates a shallow copy of the current Object.
(Inherited from Object.)
ParseWords(String, ArrayList, WordCollection, StringBuilder, ArrayList)
Parses rawBody into descrete Word objects and places them in readDocumentWords.

PreprocessBreakChunk(String)
Applies any required processing to a chunk of text that typically forms either a word or whitespace block.

ProcessWordsToFinalIndexedList(WordCollection, Boolean)
Processes the list of all words found in the document and returns a list that should be index.

ProcessWordsToFinalIndexedList(WordCollection, Boolean, ArrayList)
Processes the list of all words found in the document and returns a list that should be index.

Read(Stream, Document, Encoding)
Reads a document and returns an object holding it's text and any links.

ReadLinks(Stream, Encoding) Obsolete.
Reads links to other pages.

ReadText(Stream, Uri, Encoding) Obsolete.
Reads text and returns list of words and title

ResetWordPointers()()()()
Resets the current word being processed.

ToString()()()()
Returns a string that represents the current object.
(Inherited from Object.)
TruncateWordWithRepeatedChar(String)
Removes repeated non-letters from word.

WordEnd
The current word's end.

WordStart
The current word's start.

Inheritance Hierarchy

Assembly: Keyoti4.SearchEngine.Core (Module: Keyoti4.SearchEngine.Core.dll) Version: 2022.8.22.610