Interface UrlPool

  • All Implemented Interfaces:

    
    public interface UrlPool
    
                        

    A UrlPool contains many UrlCaches, the urls added to the pool will be processed in crawl loops.

    Different UrlCaches in UrlPool have different priorities, there are real time cache, delay cache, ordered caches and unordered caches.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      public class UrlPool.Companion
    • Field Summary

      Fields 
      Modifier and Type Field Description
    • Constructor Summary

      Constructors 
      Constructor Description
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
    • Constructor Detail

    • Method Detail

      • removeDeceased

         abstract Unit removeDeceased()

        Remove deceased urls, such as URLs that are past the deadline.

      • hasMore

         abstract Boolean hasMore()

        Check if there is more items in the url pool

      • getRealTimeCache

         abstract UrlCache getRealTimeCache()

        The real time url cache in which urls have the highest priority of all.

      • getDelayCache

         abstract Queue<DelayUrl> getDelayCache()

        An unbounded queue of Delayed urls, in which an element can only be taken when its delay has expired.

        Delay cache has higher priority than all ordered caches and is usually used for retrying tasks.

      • getUnorderedCaches

         abstract List<UrlCache> getUnorderedCaches()

        The unordered url caches, tasks in unordered caches have the lowest priority.

        Unordered caches has the lowest priority of all

      • getLowestCache

         abstract UrlCache getLowestCache()

        A shortcut to the cache with the lowest priority in the ordered caches

      • getLower5Cache

         abstract UrlCache getLower5Cache()

        A shortcut to the cache that is 5 priority lower than the normal cache in the ordered caches.

      • getLower4Cache

         abstract UrlCache getLower4Cache()

        A shortcut to the cache that is 4 priority lower than the normal cache in the ordered caches.

      • getLower3Cache

         abstract UrlCache getLower3Cache()

        A shortcut to the cache that is 3 priority lower than the normal cache in the ordered caches.

      • getLower2Cache

         abstract UrlCache getLower2Cache()

        A shortcut to the cache that is 2 priority lower than the normal cache in the ordered caches.

      • getLowerCache

         abstract UrlCache getLowerCache()

        A shortcut to the cache that is 1 priority lower than the normal cache in the ordered caches.

      • getNormalCache

         abstract UrlCache getNormalCache()

        A shortcut to the cache has the default priority in the ordered caches.

      • getHigherCache

         abstract UrlCache getHigherCache()

        A shortcut to the cache that is 1 priority higher than the normal cache in the ordered caches.

      • getHigher2Cache

         abstract UrlCache getHigher2Cache()

        A shortcut to the cache that is 2 priority higher than the normal cache in the ordered caches.

      • getHigher3Cache

         abstract UrlCache getHigher3Cache()

        A shortcut to the cache that is 3 priority higher than the normal cache in the ordered caches.

      • getHigher4Cache

         abstract UrlCache getHigher4Cache()

        A shortcut to the cache that is 4 priority higher than the normal cache in the ordered caches.

      • getHigher5Cache

         abstract UrlCache getHigher5Cache()

        A shortcut to the cache that is 5 priority higher than the normal cache in the ordered caches.

      • getHighestCache

         abstract UrlCache getHighestCache()

        A shortcut to the cache with the highest priority in the ordered caches.