Interface LinkExtractorParser
-
- All Implemented Interfaces:
public interface LinkExtractorParserInterface specifying contract of content parser that aims to extract links
- Since:
3.0
-
-
Method Summary
Modifier and Type Method Description abstract Iterator<URL>getEmbeddedResourceURLs(String userAgent, Array<byte> responseData, URL baseUrl, String encoding)Get the URLs for all the resources that a browser would automatically download following the download of the content, that is: images, stylesheets, javascript files, applets, etc... abstract booleanisReusable()-
-
Method Detail
-
getEmbeddedResourceURLs
abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> responseData, URL baseUrl, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the content, that is: images, stylesheets, javascript files, applets, etc...
URLs should not appear twice in the returned iterator.
Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.
- Parameters:
userAgent- User AgentresponseData- Response databaseUrl- Base URL from which the HTML code was obtainedencoding- Charset- Returns:
an Iterator for the resource URLs
-
isReusable
abstract boolean isReusable()
-
-
-
-