Class HTMLPageParser

java.lang.Object
com.opensymphony.module.sitemesh.parser.HTMLPageParser
All Implemented Interfaces:
PageParser
Direct Known Subclasses:
DivExtractingPageParser

public class HTMLPageParser extends Object implements PageParser

Builds an HTMLPage object from an HTML document. This behaves similarly to the FastPageParser, however it's a complete rewrite that is simpler to add custom features to such as extraction and transformation of elements.

To customize the rules used, this class can be extended and have the userDefinedRules() methods overridden.

Author:
Joe Walnes
See Also:
  • Constructor Details

    • HTMLPageParser

      public HTMLPageParser()
  • Method Details

    • parse

      public Page parse(char[] buffer) throws IOException
      Description copied from interface: PageParser
      Parse the given buffer into a Page object.
      Specified by:
      parse in interface PageParser
      Parameters:
      buffer - The buffer for the page.
      Returns:
      The parsed page
      Throws:
      IOException - if an error occurs
    • parse

      public Page parse(SitemeshBuffer buffer) throws IOException
      Description copied from interface: PageParser
      Parse the given buffer into a page object. DefaultSitemeshBuffer is the appropriate implementation of this interface to pass in.
      Specified by:
      parse in interface PageParser
      Parameters:
      buffer - The buffer for the page.
      Returns:
      The parsed page
      Throws:
      IOException - if an error occurs
    • addUserDefinedRules

      protected void addUserDefinedRules(State html, PageBuilder page)
      Adds the user defined rules.
      Parameters:
      html - the html
      page - the page