public class Jsoup extends Object
| Modifier and Type | Method and Description |
|---|---|
static String |
clean(String bodyHtml,
String baseUri,
Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted
tags and attributes.
|
static String |
clean(String bodyHtml,
String baseUri,
Whitelist whitelist,
Document.OutputSettings outputSettings)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of
permitted
tags and attributes.
|
static String |
clean(String bodyHtml,
Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted
tags and attributes.
|
static boolean |
isValid(String bodyHtml,
Whitelist whitelist)
Test if the input HTML has only tags and attributes allowed by the Whitelist.
|
static Document |
parse(File in,
String charsetName)
Parse the contents of a file as HTML.
|
static Document |
parse(File in,
String charsetName,
String baseUri)
Parse the contents of a file as HTML.
|
static Document |
parse(InputStream in,
String charsetName,
String baseUri)
Read an input stream, and parse it to a Document.
|
static Document |
parse(InputStream in,
String charsetName,
String baseUri,
Parser parser)
Read an input stream, and parse it to a Document.
|
static Document |
parse(String html)
Parse HTML into a Document.
|
static Document |
parse(String html,
String baseUri)
Parse HTML into a Document.
|
static Document |
parse(String html,
String baseUri,
Parser parser)
Parse HTML into a Document, using the provided Parser.
|
static Document |
parseBodyFragment(String bodyHtml)
Parse a fragment of HTML, with the assumption that it forms the
body of the HTML. |
static Document |
parseBodyFragment(String bodyHtml,
String baseUri)
Parse a fragment of HTML, with the assumption that it forms the
body of the HTML. |
static Document |
parseXML(InputStream in,
String charsetName)
Parse XML into a Document.
|
static Document |
parseXML(InputStream in,
String charsetName,
String baseUri)
Parse XML into a Document.
|
static Document |
parseXML(String xml)
Parse XML into a Document.
|
static Document |
parseXML(String xml,
String baseUri)
Parse XML into a Document.
|
public static Document parse(String html, String baseUri)
html - HTML to parsebaseUri - The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href> tag.public static Document parse(String html, String baseUri, Parser parser)
html - HTML to parsebaseUri - The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href> tag.parser - alternate parser to use.public static Document parse(String html)
<base href> tag.html - HTML to parseparse(String, String)public static Document parseXML(String xml, String baseUri)
xml - XML to parsebaseUri - The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href> tag.public static Document parseXML(String xml)
xml - XML to parsepublic static Document parseXML(InputStream in, String charsetName, String baseUri) throws IOException
in - input stream to read. Make sure to close it after parsing.charsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.IOException - if the file could not be found, or read, or if the charsetName is invalid.public static Document parseXML(InputStream in, String charsetName) throws IOException
in - input stream to read. Make sure to close it after parsing.charsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).IOException - if the file could not be found, or read, or if the charsetName is invalid.public static Document parse(File in, String charsetName, String baseUri) throws IOException
in - file to load HTML fromcharsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.IOException - if the file could not be found, or read, or if the charsetName is invalid.public static Document parse(File in, String charsetName) throws IOException
in - file to load HTML fromcharsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).IOException - if the file could not be found, or read, or if the charsetName is invalid.parse(File, String, String)public static Document parse(InputStream in, String charsetName, String baseUri) throws IOException
in - input stream to read. Make sure to close it after parsing.charsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.IOException - if the file could not be found, or read, or if the charsetName is invalid.public static Document parse(InputStream in, String charsetName, String baseUri, Parser parser) throws IOException
in - input stream to read. Make sure to close it after parsing.charsetName - (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if
present, or fall back to UTF-8 (which is often safe to do).baseUri - The URL where the HTML was retrieved from, to resolve relative links against.parser - alternate parser to use.IOException - if the file could not be found, or read, or if the charsetName is invalid.public static Document parseBodyFragment(String bodyHtml, String baseUri)
body of the HTML.bodyHtml - body HTML fragmentbaseUri - URL to resolve relative URLs against.Document.body()public static Document parseBodyFragment(String bodyHtml)
body of the HTML.bodyHtml - body HTML fragmentDocument.body()public static String clean(String bodyHtml, String baseUri, Whitelist whitelist)
bodyHtml - input untrusted HTML (body fragment)baseUri - URL to resolve relative URLs againstwhitelist - white-list of permitted HTML elementsCleaner.clean(Document)public static String clean(String bodyHtml, Whitelist whitelist)
bodyHtml - input untrusted HTML (body fragment)whitelist - white-list of permitted HTML elementsCleaner.clean(Document)public static String clean(String bodyHtml, String baseUri, Whitelist whitelist, Document.OutputSettings outputSettings)
bodyHtml - input untrusted HTML (body fragment)baseUri - URL to resolve relative URLs againstwhitelist - white-list of permitted HTML elementsoutputSettings - document output settings; use to control pretty-printing and entity escape modesCleaner.clean(Document)public static boolean isValid(String bodyHtml, Whitelist whitelist)
bodyHtml - HTML to testwhitelist - whitelist to test againstclean(String, Whitelist)Copyright © 1998–2018 iText Group NV. All rights reserved.