Package com.panforge.robotstxt
Interface RobotsTxt
-
public interface RobotsTxtRepresents access policy from a single "robots.txt" file.Use
read(java.io.InputStream)to read and parse robots.txt.
-
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Default Methods Deprecated Methods Modifier and Type Method Description default Grantask(String userAgent, String path)Asks for grant.IntegergetCrawlDelay()Deprecated.useask(java.lang.String, java.lang.String)to getGrantfrom whichGrant.getCrawlDelay()might be invoked.List<String>getDisallowList(String userAgent)Gets a list of disallowed resources.StringgetHost()Gets host.List<String>getSitemaps()Gets site maps.booleanquery(String userAgent, String path)Checks access to the given HTTP path.static RobotsTxtread(InputStream input)Reads robots.txt available at the URL.
-
-
-
Method Detail
-
query
boolean query(String userAgent, String path)
Checks access to the given HTTP path.- Parameters:
userAgent- user agent to be used evaluate authorizationpath- path to access- Returns:
trueif there is an access to the requested path
-
ask
default Grant ask(String userAgent, String path)
Asks for grant.- Parameters:
userAgent- user agent to be used evaluate authorizationpath- path to access- Returns:
- grant (never
null)
-
getCrawlDelay
@Deprecated Integer getCrawlDelay()
Deprecated.useask(java.lang.String, java.lang.String)to getGrantfrom whichGrant.getCrawlDelay()might be invoked.Gets crawl delay.- Returns:
- crawl delay in seconds or
0if no delay declared
-
getHost
String getHost()
Gets host.- Returns:
- host or
nullif no host declared
-
getDisallowList
List<String> getDisallowList(String userAgent)
Gets a list of disallowed resources.- Parameters:
userAgent- user agent- Returns:
- list of disallowed resources
-
read
static RobotsTxt read(InputStream input) throws IOException
Reads robots.txt available at the URL.- Parameters:
input- stream of content- Returns:
- parsed robots.txt object
- Throws:
IOException- if unable to read content.
-
-