Class RedirectionBolt

java.lang.Object
org.apache.storm.topology.base.BaseComponent
org.apache.storm.topology.base.BaseRichBolt
com.digitalpebble.stormcrawler.tika.RedirectionBolt
All Implemented Interfaces:
Serializable, org.apache.storm.task.IBolt, org.apache.storm.topology.IComponent, org.apache.storm.topology.IRichBolt

public class RedirectionBolt extends org.apache.storm.topology.base.BaseRichBolt
Uses Tika only if a document has not been parsed with anything else. Emits the tuples to be processed with Tika on a stream of the same name ('tika').

Remember to set

   jsoup.treat.non.html.as.error: false
 
Use in your topologies as follows :
 builder.setBolt("jsoup", new JSoupParserBolt()).localOrShuffleGrouping(
         "sitemap");

 builder.setBolt("shunt", new RedirectionBolt()).localOrShuffleGrouping("jsoup");

 builder.setBolt("tika", new ParserBolt()).localOrShuffleGrouping("shunt",
         "tika");

 builder.setBolt("indexer", new IndexingBolt(), numWorkers)
         .localOrShuffleGrouping("shunt").localOrShuffleGrouping("tika");
 
See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)
     
    void
    execute(org.apache.storm.tuple.Tuple tuple)
     
    void
    prepare(Map conf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)
     

    Methods inherited from class org.apache.storm.topology.base.BaseRichBolt

    cleanup

    Methods inherited from class org.apache.storm.topology.base.BaseComponent

    getComponentConfiguration

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.apache.storm.topology.IComponent

    getComponentConfiguration
  • Constructor Details

    • RedirectionBolt

      public RedirectionBolt()
  • Method Details

    • prepare

      public void prepare(Map conf, org.apache.storm.task.TopologyContext context, org.apache.storm.task.OutputCollector collector)
    • execute

      public void execute(org.apache.storm.tuple.Tuple tuple)
    • declareOutputFields

      public void declareOutputFields(org.apache.storm.topology.OutputFieldsDeclarer declarer)