Class AbstractSpout
java.lang.Object
org.apache.storm.topology.base.BaseComponent
org.apache.storm.topology.base.BaseRichSpout
com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
com.digitalpebble.stormcrawler.opensearch.persistence.AbstractSpout
- All Implemented Interfaces:
Serializable,org.apache.storm.spout.ISpout,org.apache.storm.topology.IComponent,org.apache.storm.topology.IRichSpout
- Direct Known Subclasses:
AggregationSpout
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
AbstractQueryingSpout.InProcessMap<K extends Object,V extends Object> -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static org.opensearch.client.RestHighLevelClientQuery to use as a positive filter, set by es.status.filterQueryprotected Stringprotected StringUsed to distinguish between instances in the logs *protected intprotected intprotected static final Stringprotected static final StringField name to use for aggregating *protected static final StringField name to use for sorting the URLs within a bucket, not used if empty or null.protected static final Stringprotected static final StringField name to use for sorting the buckets, not used if empty or null.protected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected StringField name used for field collapsing e.g.protected Dateprotected intprotected intwhen using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLsprotected StringFields inherited from class com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
_collector, beingProcessed, buffer, eventCounter, isInQuery, lastTimeResetToNOW, maxDelayBetweenQueries, minDelayBetweenQueries, queryTimes, resetFetchDateAfterNSecs, resetFetchDateParamName, StatusMaxDelayParamName, StatusMinDelayParamName, StatusTTLPurgatory -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidprotected final booleanaddHitToBuffer(org.opensearch.search.SearchHit hit) voidclose()voidprotected final MetadatafromKeyValues(Map<String, Object> keyValues) voidopen(Map<String, Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector) protected abstract voidBuilds a query and use it retrieve the results from OS *Methods inherited from class com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
activate, deactivate, declareOutputFields, getTimeLastQuerySent, markQueryReceivedNow, nextTupleMethods inherited from class org.apache.storm.topology.base.BaseComponent
getComponentConfigurationMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.storm.topology.IComponent
getComponentConfiguration
-
Field Details
-
OSBoltType
- See Also:
-
OSStatusIndexNameParamName
- See Also:
-
OSStatusBucketFieldParamName
Field name to use for aggregating *- See Also:
-
OSStatusMaxBucketParamName
- See Also:
-
OSStatusMaxURLsParamName
- See Also:
-
OSStatusBucketSortFieldParamName
Field name to use for sorting the URLs within a bucket, not used if empty or null.- See Also:
-
OSStatusGlobalSortFieldParamName
Field name to use for sorting the buckets, not used if empty or null.- See Also:
-
OSStatusFilterParamName
- See Also:
-
OSStatusQueryTimeoutParamName
- See Also:
-
filterQueries
Query to use as a positive filter, set by es.status.filterQuery -
indexName
-
client
protected static org.opensearch.client.RestHighLevelClient client -
shardID
protected int shardIDwhen using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLs -
logIdprefix
Used to distinguish between instances in the logs * -
partitionField
Field name used for field collapsing e.g. key * -
maxURLsPerBucket
protected int maxURLsPerBucket -
maxBucketNum
protected int maxBucketNum -
bucketSortField
-
totalSortField
-
queryDate
-
queryTimeout
protected int queryTimeout
-
-
Constructor Details
-
AbstractSpout
public AbstractSpout()
-
-
Method Details
-
open
public void open(Map<String, Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector) - Specified by:
openin interfaceorg.apache.storm.spout.ISpout- Overrides:
openin classAbstractQueryingSpout
-
populateBuffer
protected abstract void populateBuffer()Builds a query and use it retrieve the results from OS *- Specified by:
populateBufferin classAbstractQueryingSpout
-
addHitToBuffer
protected final boolean addHitToBuffer(org.opensearch.search.SearchHit hit) -
fromKeyValues
-
ack
- Specified by:
ackin interfaceorg.apache.storm.spout.ISpout- Overrides:
ackin classAbstractQueryingSpout
-
fail
- Specified by:
failin interfaceorg.apache.storm.spout.ISpout- Overrides:
failin classAbstractQueryingSpout
-
close
public void close()- Specified by:
closein interfaceorg.apache.storm.spout.ISpout- Overrides:
closein classorg.apache.storm.topology.base.BaseRichSpout
-