Class JSONURLFilterWrapper

All Implemented Interfaces:
Configurable

public class JSONURLFilterWrapper extends URLFilter
Wraps a URLFilter whose resources are in a JSON file that can be stored in ES. The benefit of doing this is that the resources can be refreshed automatically and modified without having to recompile the jar and restart the topology. The connection to ES is done via the config and uses a new bolt type 'config'.

The configuration of the delegate is done in the urlfilters.json as usual.

  {
     "class": "com.digitalpebble.stormcrawler.elasticsearch.filtering.JSONURLFilterWrapper",
     "name": "ESFastURLFilter",
     "params": {
         "refresh": "60",
         "delegate": {
             "class": "com.digitalpebble.stormcrawler.filtering.regex.FastURLFilter",
             "params": {
                 "file": "fast.urlfilter.json"
             }
         }
     }
  }
 
The resource file can be pushed to ES with
  curl -XPUT 'localhost:9200/config/config/fast.urlfilter.json?pretty' -H 'Content-Type: application/json' -d @fast.urlfilter.json
 
  • Constructor Details

    • JSONURLFilterWrapper

      public JSONURLFilterWrapper()
  • Method Details

    • configure

      public void configure(@NotNull @NotNull Map<String,Object> stormConf, @NotNull @NotNull com.fasterxml.jackson.databind.JsonNode filterParams)
    • filter

      @Nullable public @Nullable String filter(@Nullable @Nullable URL sourceUrl, @Nullable @Nullable Metadata sourceMetadata, @NotNull @NotNull String urlToFilter)
      Specified by:
      filter in class URLFilter