
markup-parser
This set of C# classes can be used to parse "HTML-like" markup. They are intended to be inherited from, rather than being used directly.
Class Descriptions
Tag
The simplest possible form of a tag. Includes the tag name, and a list of attributes.
PrimitiveTag
LiteralPrimitiveTag, CommentPrimitiveTag
A low-level representation of a tag, in HTML it would correspond to a piece of text contained between < and >. Literals (text outside of < and >) and comments are also treated as tags.
ParserState
Container class that encapsulates the raw input (as a string), the output (as a list of PrimitiveTags), and includes methods for navigating the input.
Parser
This class encapsulates most of the parsing logic. Given a ParserState (which contains the raw markup as a string), it produces a collection of PrimitiveTags.
HierarchyNode
LiteralNode, CommentNode, RootNode
A higher level representation of a tag. At this level, the document hierarchy is maintained as a collection of child nodes.
TagConverter
Given a Tag, produces a HierarchyNode. Can also identify tags which automatically close themselves (like <br>
in HTML).
The IsSelfClosingMethod attribute identifies which methods of the TagConverter class are used to determine whether or not a tag is self closing. The GetHierarchyNodeMethod attribute is used to determine which method is used to create a HierarchyNode from a given Tag.
DocumentHierarchyCreator
Traverses a list of primitive tags, and creates a tree of HierarchyNodes. Uses the TagConverter class to create the HierarchyNodes.
Workflow
Instantiate a ParseState object, and set its Source property to the markup to be parsed:
ParseState state = new ParseState();
state.Source = markupToParse;
Instantiate a Parser object, passing in the ParseState created earlier:
Parser parser = new Parser(state);
Call the Parse() method, and store the results:
Collection<PrimitiveTag> tags = parser.Parse();
Instantiate a TagConverter and DocumentHierarchyCreator:
TagConverter converter = new TagConverter();
DocumentHierarchyCreator hierarchyCreator = new DocumentHierarchyConverter();
Call the BuildHierarchy method of the DocumentHierarchyCreator, passing in the list of PrimitiveTags and the TagConverter:
HierarchyNode root = hierarchyCreator.BuildHierarchy(tags.ToArray<PrimitiveTag>(), converter);
At this point root contains the parsed document in a tree structure. Calling the render method will convert it back into a string:
string output = root.Render();