Package org.ccil.cowan.tagsoup
Class HTMLScanner
- java.lang.Object
-
- org.ccil.cowan.tagsoup.HTMLScanner
-
- All Implemented Interfaces:
Scanner,org.xml.sax.Locator
public class HTMLScanner extends java.lang.Object implements Scanner, org.xml.sax.Locator
This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.
-
-
Constructor Summary
Constructors Constructor Description HTMLScanner()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intgetColumnNumber()intgetLineNumber()java.lang.StringgetPublicId()java.lang.StringgetSystemId()static voidmain(java.lang.String[] argv)Test procedure.voidresetDocumentLocator(java.lang.String publicid, java.lang.String systemid)Reset document locator, supplying systemid and publicid.voidscan(java.io.Reader r0, ScanHandler h)Scan HTML source, reporting lexical events.voidstartCDATA()A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
-
-
-
Method Detail
-
getLineNumber
public int getLineNumber()
- Specified by:
getLineNumberin interfaceorg.xml.sax.Locator
-
getColumnNumber
public int getColumnNumber()
- Specified by:
getColumnNumberin interfaceorg.xml.sax.Locator
-
getPublicId
public java.lang.String getPublicId()
- Specified by:
getPublicIdin interfaceorg.xml.sax.Locator
-
getSystemId
public java.lang.String getSystemId()
- Specified by:
getSystemIdin interfaceorg.xml.sax.Locator
-
resetDocumentLocator
public void resetDocumentLocator(java.lang.String publicid, java.lang.String systemid)Reset document locator, supplying systemid and publicid.- Specified by:
resetDocumentLocatorin interfaceScanner- Parameters:
systemid- System idpublicid- Public id
-
scan
public void scan(java.io.Reader r0, ScanHandler h) throws java.io.IOException, org.xml.sax.SAXExceptionScan HTML source, reporting lexical events.
-
startCDATA
public void startCDATA()
A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.- Specified by:
startCDATAin interfaceScanner
-
main
public static void main(java.lang.String[] argv) throws java.io.IOException, org.xml.sax.SAXExceptionTest procedure. Reads HTML from the standard input and writes PYX to the standard output.- Throws:
java.io.IOExceptionorg.xml.sax.SAXException
-
-