Parses Html documents for words and links.
Declaration Syntax
C# | C# | Visual Basic | Visual Basic | Visual C++ | Visual C++ | F# | F# |
public class HtmlDocumentParser : Parser
public class HtmlDocumentParser : Parser
Public Class HtmlDocumentParser Inherits Parser
Public Class HtmlDocumentParser Inherits Parser
public ref class HtmlDocumentParser : public Parser
public ref class HtmlDocumentParser : public Parser
type HtmlDocumentParser = class inherit Parser end
type HtmlDocumentParser = class inherit Parser end
Members
All Members | Constructors | Methods | Properties | ||
Icon | Member | Description |
---|---|---|
HtmlDocumentParser(Configuration) |
Creates a new instance of HtmlDocumentParser.
| |
Configuration |
Gets the instance of the Configuration class that holds the settings to be used.
(Inherited from Parser.) | |
CopyStream(Stream) | (Inherited from Parser.) | |
DeriveEncoding(Stream) |
Tries to find the encoding of a HTML file from the Content-type meta tag.
| |
DeriveTitleFromDocument(String) |
Attempts to returns the title of the document, based on the documentBody
| |
Encoding |
The character encoding used in the document Stream, if applicable.
(Inherited from Parser.) | |
Equals(Object) | (Inherited from Object.) | |
Finalize()()()() | (Inherited from Object.) | |
FindIgnoreRegions(String) |
Finds all ignore regions in documentBody.
| |
GetFilenameFooter(Uri) |
Creates a footer with filename info from the Uri
(Inherited from Parser.) | |
GetHashCode()()()() | (Inherited from Object.) | |
GetNextWord(String) |
Returns the next 'word' in rawBody, is iterative, so subsequent calls move to consecutive words.
(Overrides Parser.GetNextWord(String).) | |
GetType()()()() | (Inherited from Object.) | |
GetWordsInUri(Uri) |
Returns list of words as strings in an ArrayList, that are in the Uri
(Inherited from Parser.) | |
IsCurrentWordInTitle()()()() |
Whether word last returned by GetNextWord is in title.
(Overrides Parser.IsCurrentWordInTitle()()()().) | |
IsInIgnoredRegion(ArrayList) |
Determines whether current word (at wordStart) is in an ignored region.
(Inherited from Parser.) | |
IsStreamNeeded()()()() | Obsolete.
Whether the parser would need a stream to be passed to it in order to perform a ReadText or ReadLinks operation.
(Inherited from Parser.) | |
MemberwiseClone()()()() | (Inherited from Object.) | |
ParseWords(String, ArrayList, WordCollection, StringBuilder, ArrayList) |
Parses rawBody into descrete Word objects and places them in readDocumentWords.
(Inherited from Parser.) | |
PreprocessBreakChunk(String) |
Applies any required processing to a chunk of text that typically forms either a word or whitespace block.
(Inherited from Parser.) | |
ProcessWordsToFinalIndexedList(WordCollection, Boolean) |
Processes the list of all words found in the document and returns a list that should be index.
(Inherited from Parser.) | |
ProcessWordsToFinalIndexedList(WordCollection, Boolean, ArrayList) |
Processes the list of all words found in the document and returns a list that should be index.
(Inherited from Parser.) | |
Read(Stream, Uri, Encoding) |
Reads a document and returns an object holding it's text and any links.
(Overrides Parser.Read(Stream, Uri, Encoding).) | |
ReadDocumentContent(Stream, Encoding) |
Returns string read from 'stream'.
| |
ReadLinks(Stream, Encoding) | Obsolete.
Reads links to other pages.
(Inherited from Parser.) | |
ReadMetaTable(String) |
Reads the meta tags for a document.
| |
ReadText(Stream, Uri, Encoding) | Obsolete.
Reads text and returns list of words and title
(Inherited from Parser.) | |
ResetWordPointers()()()() |
Resets the current word being processed.
(Inherited from Parser.) | |
ToString()()()() | (Inherited from Object.) | |
TruncateWordWithRepeatedChar(String) |
Removes repeated non-letters from word.
(Inherited from Parser.) | |
WordEnd |
The current word's end.
(Inherited from Parser.) | |
WordStart |
The current word's start.
(Inherited from Parser.) |
Inheritance Hierarchy
Object | |||
Parser | |||
HtmlDocumentParser | |||
DataSetRecordParser |
Assembly: Keyoti4.SearchEngine.Core (Module: Keyoti4.SearchEngine.Core.dll) Version: 2015.6.15.120