Package ai.platon.pulsar.common.collect
Class UrlFeeder
-
- All Implemented Interfaces:
-
kotlin.collections.Iterable
public final class UrlFeeder implements Iterable<UrlAware>
The url feeder collects urls from the url pool and feed them to the crawlers.
The url feed collect urls using DataCollector, each DataCollector collect urls from exactly one UrlCache.
The user can register multiple UrlCaches and DataCollectors for different type of tasks.
-
-
Field Summary
Fields Modifier and Type Field Description private final Collection<PriorityDataCollector<UrlAware>>openCollectorsprivate final List<PriorityDataCollector<UrlAware>>collectorsprivate final ConcurrentLoadingIterable<UrlAware>loadingIterableprivate final IntegercacheSizeprivate final Integersizeprivate final IntegerestimatedSizeprivate final Stringabstractprivate final Stringreportprivate final UrlPoolurlPoolprivate final IntegerlowerCacheSizeprivate final BooleanenableDefaults
-
Method Summary
-
-
Method Detail
-
getOpenCollectors
final Collection<PriorityDataCollector<UrlAware>> getOpenCollectors()
-
getCollectors
final List<PriorityDataCollector<UrlAware>> getCollectors()
-
getLoadingIterable
final ConcurrentLoadingIterable<UrlAware> getLoadingIterable()
The loading iterable is a concurrent loading iterable, which loads urls from the url pool and feeds them to the crawlers.
-
getCacheSize
final Integer getCacheSize()
-
getEstimatedSize
final Integer getEstimatedSize()
-
getAbstract
final String getAbstract()
-
getUrlPool
final UrlPool getUrlPool()
-
getLowerCacheSize
final Integer getLowerCacheSize()
-
getEnableDefaults
final Boolean getEnableDefaults()
-
isNotEmpty
final Boolean isNotEmpty()
-
addFirst
final Unit addFirst(UrlAware url)
Add a hyperlink to the very beginning of the fetch queue, so it will be served first
-
addLast
final Unit addLast(UrlAware url)
Add a hyperlink to the end of the fetch queue, so it will be served last
-
estimatedOrder
final Integer estimatedOrder(Integer priority)
Estimate the order to fetch for the next task to add with priority priority.
-
addDefaultCollectors
final UrlFeeder addDefaultCollectors()
-
addCollector
final UrlFeeder addCollector(PriorityDataCollector<UrlAware> collector)
-
addCollectors
final UrlFeeder addCollectors(Iterable<PriorityDataCollector<UrlAware>> collectors)
-
findByName
final List<PriorityDataCollector<UrlAware>> findByName(String name)
-
findByName
final List<PriorityDataCollector<UrlAware>> findByName(Iterable<String> names)
-
findByName
final List<PriorityDataCollector<UrlAware>> findByName(Regex regex)
-
findByNameLike
final List<PriorityDataCollector<UrlAware>> findByNameLike(String name)
-
remove
final Boolean remove(DataCollector<UrlAware> collector)
-
removeAll
final Boolean removeAll(Collection<DataCollector<UrlAware>> collectors)
-
-
-
-