Package org.ccil.cowan.tagsoup
Class HTMLScanner
java.lang.Object
org.ccil.cowan.tagsoup.HTMLScanner
This class implements a table-driven scanner for HTML, allowing for lots of
defects. It implements the Scanner interface, which accepts a Reader
object to fetch characters from and a ScanHandler object to report lexical
events to.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionintintstatic voidTest procedure.voidresetDocumentLocator(String publicid, String systemid) Reset document locator, supplying systemid and publicid.voidscan(Reader r0, ScanHandler h) Scan HTML source, reporting lexical events.voidA callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
-
Constructor Details
-
HTMLScanner
public HTMLScanner()
-
-
Method Details
-
getLineNumber
public int getLineNumber()- Specified by:
getLineNumberin interfaceLocator
-
getColumnNumber
public int getColumnNumber()- Specified by:
getColumnNumberin interfaceLocator
-
getPublicId
- Specified by:
getPublicIdin interfaceLocator
-
getSystemId
- Specified by:
getSystemIdin interfaceLocator
-
resetDocumentLocator
Reset document locator, supplying systemid and publicid.- Specified by:
resetDocumentLocatorin interfaceScanner- Parameters:
publicid- Public idsystemid- System id
-
scan
Scan HTML source, reporting lexical events.- Specified by:
scanin interfaceScanner- Parameters:
r0- Reader that provides charactersh- ScanHandler that accepts lexical events.- Throws:
IOExceptionSAXException
-
startCDATA
public void startCDATA()A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.- Specified by:
startCDATAin interfaceScanner
-
main
Test procedure. Reads HTML from the standard input and writes PYX to the standard output.- Throws:
IOExceptionSAXException
-