java.lang.Object
org.apache.storm.topology.base.BaseComponent
org.apache.storm.topology.base.BaseRichSpout
com.digitalpebble.stormcrawler.persistence.AbstractQueryingSpout
com.digitalpebble.stormcrawler.opensearch.persistence.AbstractSpout
All Implemented Interfaces:
Serializable, org.apache.storm.spout.ISpout, org.apache.storm.topology.IComponent, org.apache.storm.topology.IRichSpout
Direct Known Subclasses:
AggregationSpout

public abstract class AbstractSpout extends AbstractQueryingSpout
See Also:
  • Field Details

    • OSBoltType

      protected static final String OSBoltType
      See Also:
    • OSStatusIndexNameParamName

      protected static final String OSStatusIndexNameParamName
      See Also:
    • OSStatusBucketFieldParamName

      protected static final String OSStatusBucketFieldParamName
      Field name to use for aggregating *
      See Also:
    • OSStatusMaxBucketParamName

      protected static final String OSStatusMaxBucketParamName
      See Also:
    • OSStatusMaxURLsParamName

      protected static final String OSStatusMaxURLsParamName
      See Also:
    • OSStatusBucketSortFieldParamName

      protected static final String OSStatusBucketSortFieldParamName
      Field name to use for sorting the URLs within a bucket, not used if empty or null.
      See Also:
    • OSStatusGlobalSortFieldParamName

      protected static final String OSStatusGlobalSortFieldParamName
      Field name to use for sorting the buckets, not used if empty or null.
      See Also:
    • OSStatusFilterParamName

      protected static final String OSStatusFilterParamName
      See Also:
    • OSStatusQueryTimeoutParamName

      protected static final String OSStatusQueryTimeoutParamName
      See Also:
    • filterQueries

      protected List<String> filterQueries
      Query to use as a positive filter, set by es.status.filterQuery
    • indexName

      protected String indexName
    • client

      protected static org.opensearch.client.RestHighLevelClient client
    • shardID

      protected int shardID
      when using multiple instances - each one is in charge of a specific shard useful when sharding based on host or domain to guarantee a good mix of URLs
    • logIdprefix

      protected String logIdprefix
      Used to distinguish between instances in the logs *
    • partitionField

      protected String partitionField
      Field name used for field collapsing e.g. key *
    • maxURLsPerBucket

      protected int maxURLsPerBucket
    • maxBucketNum

      protected int maxBucketNum
    • bucketSortField

      protected List<String> bucketSortField
    • totalSortField

      protected String totalSortField
    • queryDate

      protected Date queryDate
    • queryTimeout

      protected int queryTimeout
  • Constructor Details

    • AbstractSpout

      public AbstractSpout()
  • Method Details

    • open

      public void open(Map<String,Object> stormConf, org.apache.storm.task.TopologyContext context, org.apache.storm.spout.SpoutOutputCollector collector)
      Specified by:
      open in interface org.apache.storm.spout.ISpout
      Overrides:
      open in class AbstractQueryingSpout
    • populateBuffer

      protected abstract void populateBuffer()
      Builds a query and use it retrieve the results from OS *
      Specified by:
      populateBuffer in class AbstractQueryingSpout
    • addHitToBuffer

      protected final boolean addHitToBuffer(org.opensearch.search.SearchHit hit)
    • fromKeyValues

      protected final Metadata fromKeyValues(Map<String,Object> keyValues)
    • ack

      public void ack(Object msgId)
      Specified by:
      ack in interface org.apache.storm.spout.ISpout
      Overrides:
      ack in class AbstractQueryingSpout
    • fail

      public void fail(Object msgId)
      Specified by:
      fail in interface org.apache.storm.spout.ISpout
      Overrides:
      fail in class AbstractQueryingSpout
    • close

      public void close()
      Specified by:
      close in interface org.apache.storm.spout.ISpout
      Overrides:
      close in class org.apache.storm.topology.base.BaseRichSpout