Abstract base class for document parsers.
data:image/s3,"s3://crabby-images/00b4b/00b4b3d831403b93c14292d4e085cce086033f44" alt=""
C# | Visual Basic |
public abstract class Parser
Public MustInherit Class Parser
data:image/s3,"s3://crabby-images/00b4b/00b4b3d831403b93c14292d4e085cce086033f44" alt=""
All Members | Constructors | Methods | Properties | ||
Icon | Member | Description |
---|---|---|
![]() | Parser(Configuration) | |
![]() | Configuration |
Gets the instance of the Configuration class that holds the settings to be used.
|
![]() | CopyStream(Stream) | |
![]() | Encoding |
The character encoding used in the document Stream, if applicable.
|
![]() | Equals(Object) | (Inherited from Object.) |
![]() | Finalize()()()() | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object.) |
![]() | FindUrlsInPlainText(String) | |
![]() | GetHashCode()()()() | Serves as a hash function for a particular type. (Inherited from Object.) |
![]() | GetHiddenFooter(Uri, String, String) |
Creates a footer with additional (hidden) indexed text, based on Uri, Title, meta tags etc.
|
![]() | GetNextWord(String) |
Returns the next 'word' in rawBody, is iterative, so subsequent calls move to consecutive words.
|
![]() | GetType()()()() | Gets the type of the current instance. (Inherited from Object.) |
![]() | GetWordsInUri(Uri) |
Returns list of words as strings in an ArrayList, that are in the Uri
|
![]() | IsCurrentWordInTitle()()()() |
Returns whether the word last returned by GetNextWord is part of the title.
|
![]() | IsInIgnoredRegion(ArrayList) |
Determines whether current word (at wordStart) is in an ignored region.
|
![]() | IsStreamNeeded()()()() | Obsolete.
Whether the parser would need a stream to be passed to it in order to perform a ReadText or ReadLinks operation.
|
![]() | MemberwiseClone()()()() | Creates a shallow copy of the current Object. (Inherited from Object.) |
![]() | ParseWords(String, ArrayList, WordCollection, StringBuilder, ArrayList) |
Parses rawBody into descrete Word objects and places them in readDocumentWords.
|
![]() | PreprocessBreakChunk(String) |
Applies any required processing to a chunk of text that typically forms either a word or whitespace block.
|
![]() | ProcessWordsToFinalIndexedList(WordCollection, Boolean) |
Processes the list of all words found in the document and returns a list that should be index.
|
![]() | ProcessWordsToFinalIndexedList(WordCollection, Boolean, ArrayList) |
Processes the list of all words found in the document and returns a list that should be index.
|
![]() | Read(Stream, Document, Encoding) |
Reads a document and returns an object holding it's text and any links.
|
![]() | ReadLinks(Stream, Encoding) | Obsolete.
Reads links to other pages.
|
![]() | ReadText(Stream, Uri, Encoding) | Obsolete.
Reads text and returns list of words and title
|
![]() | ResetWordPointers()()()() |
Resets the current word being processed.
|
![]() | ToString()()()() | Returns a string that represents the current object. (Inherited from Object.) |
![]() | TruncateWordWithRepeatedChar(String) |
Removes repeated non-letters from word.
|
![]() | WordEnd |
The current word's end.
|
![]() | WordStart |
The current word's start.
|
data:image/s3,"s3://crabby-images/00b4b/00b4b3d831403b93c14292d4e085cce086033f44" alt=""
Assembly: Keyoti4.SearchEngine.Core (Module: Keyoti4.SearchEngine.Core.dll) Version: 2022.8.22.610