Interface SeedUrlConfiguration.Builder

    • Method Detail

      • seedUrls

        SeedUrlConfiguration.Builder seedUrls​(Collection<String> seedUrls)

        The list of seed or starting point URLs of the websites you want to crawl.

        The list can include a maximum of 100 seed URLs.

        Parameters:
        seedUrls - The list of seed or starting point URLs of the websites you want to crawl.

        The list can include a maximum of 100 seed URLs.

        Returns:
        Returns a reference to this object so that method calls can be chained together.
      • seedUrls

        SeedUrlConfiguration.Builder seedUrls​(String... seedUrls)

        The list of seed or starting point URLs of the websites you want to crawl.

        The list can include a maximum of 100 seed URLs.

        Parameters:
        seedUrls - The list of seed or starting point URLs of the websites you want to crawl.

        The list can include a maximum of 100 seed URLs.

        Returns:
        Returns a reference to this object so that method calls can be chained together.
      • webCrawlerMode

        SeedUrlConfiguration.Builder webCrawlerMode​(String webCrawlerMode)

        You can choose one of the following modes:

        • HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

        • SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

        • EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link to.

        The default mode is set to HOST_ONLY.

        Parameters:
        webCrawlerMode - You can choose one of the following modes:

        • HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

        • SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

        • EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link to.

        The default mode is set to HOST_ONLY.

        Returns:
        Returns a reference to this object so that method calls can be chained together.
        See Also:
        WebCrawlerMode, WebCrawlerMode
      • webCrawlerMode

        SeedUrlConfiguration.Builder webCrawlerMode​(WebCrawlerMode webCrawlerMode)

        You can choose one of the following modes:

        • HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

        • SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

        • EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link to.

        The default mode is set to HOST_ONLY.

        Parameters:
        webCrawlerMode - You can choose one of the following modes:

        • HOST_ONLY—crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

        • SUBDOMAINS—crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

        • EVERYTHING—crawl the website host names with subdomains and other domains that the web pages link to.

        The default mode is set to HOST_ONLY.

        Returns:
        Returns a reference to this object so that method calls can be chained together.
        See Also:
        WebCrawlerMode, WebCrawlerMode