Class HTMLScanner

java.lang.Object
org.ccil.cowan.tagsoup.HTMLScanner
All Implemented Interfaces:
Scanner, Locator

public class HTMLScanner extends Object implements Scanner, Locator
This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.
  • Constructor Details

    • HTMLScanner

      public HTMLScanner()
  • Method Details

    • getLineNumber

      public int getLineNumber()
      Specified by:
      getLineNumber in interface Locator
    • getColumnNumber

      public int getColumnNumber()
      Specified by:
      getColumnNumber in interface Locator
    • getPublicId

      public String getPublicId()
      Specified by:
      getPublicId in interface Locator
    • getSystemId

      public String getSystemId()
      Specified by:
      getSystemId in interface Locator
    • resetDocumentLocator

      public void resetDocumentLocator(String publicid, String systemid)
      Reset document locator, supplying systemid and publicid.
      Specified by:
      resetDocumentLocator in interface Scanner
      Parameters:
      publicid - Public id
      systemid - System id
    • scan

      public void scan(Reader r0, ScanHandler h) throws IOException, SAXException
      Scan HTML source, reporting lexical events.
      Specified by:
      scan in interface Scanner
      Parameters:
      r0 - Reader that provides characters
      h - ScanHandler that accepts lexical events.
      Throws:
      IOException
      SAXException
    • startCDATA

      public void startCDATA()
      A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
      Specified by:
      startCDATA in interface Scanner
    • main

      public static void main(String[] argv) throws IOException, SAXException
      Test procedure. Reads HTML from the standard input and writes PYX to the standard output.
      Throws:
      IOException
      SAXException