Class DecisionTableOptimizedAlgorithm

  • All Implemented Interfaces:
    IDecisionTableAlgorithm

    public class DecisionTableOptimizedAlgorithm
    extends Object
    implements IDecisionTableAlgorithm
    The basic algorithm for decision table (DT) evaluation is straightforward (let's consider table with conditions and actions as columns and rules as rows - you remember that OpenL Tablets allow both this and transposed orientation):
    1. For each rule (row) from the top to the bottom of the table evaluate conditions from the left to the right
    2. If all conditions are true, execute all the actions in the rule from the left to the right, if any condition is false stop evaluating conditions, go to the next rule
    3. If the action is non-empty return action then return the value of the action (stops the evaluation)
    4. If no rules left then return null
    The logic of the algorithm must be kept intact in all optimization implementations, unless some permutations are explicitly allowed.

    The goal of optimizations is to decrease the number of condition checking required to determine which rule actions need to be executed. Action optimizations are not considered, even though some improvements could be possible. Sometimes the best results may be achieved by changing the order of conditions(columns) and rules(rows) but these approaches will change the algorithm's logic and must be used with caution. We are not going to implement these approaches at this point, but will give some guidelines to users on how to re-arrange rule tables to achieve better performance.

    Out of the class of optimization algorithms that do not change the order of conditions or rules we are going to consider ones that optimize condition checking in one condition (column) at the time. In decision table with multiple conditions the algorithm will create a tree of the optimized nodes. The time of the of the tree traversing will equal the sum of times required to calculate each node.

    The optimization algorithms that deal with single condition can be classified as follows:

  • Condition Sharing algorithm.
    Merge all the rows that share the same condition data into one node. Then calculate condition only once for all rules that share the same condition. The algorithm can be extended to achieve even better results if we allow the pseudo-data keyword else. The advantage of this algorithm is in it's universal applicability - it does not depend on the nature of condition expression. The disadvantage is in it's low performance improvement - it's expected average performance is n/s - where n is the total number of conditions and s = n/u where u is the number of unique conditions. We see that the more unique conditions the column has, the less is performance advantage of this method.

  • Indexing
    Calculate condition input value and determine which rules to fire using some kind of index. The performance will be determined by the index speed. For example, if index is implemented as a HashMap (for equality checks) the performance is expected to be constant, if as a TreeMap - log (u), where u is number of unique conditions. Interestingly, for indexing algorithms, the more is the number of shared conditions or empty conditions, the less is performance improvement - quite the opposite to the Condition Sharing algorithm.

    Applicability of the Algorithms

    Both algorithms are single-condition(column) based. Both work only if rule condition expression does not change it's value during the course of DT evaluation. In other words, the Decision Table rules do not change attributes that participate in condition evaluation. For indexing the additional requirement is necessary - the index value should be known at compile time. It excludes conditions with dynamic formulas as cell values.

    If a Decision Table does not conform to these assumptions, you should not use optimizations. It is also recommended that you take another look at your design and ask yourself: was it really necessary to produce such a twisted logic?

    Explicit Indexing Optimization

    Generally speaking, it would be nice to have system automatically apply optimizations, when appropriate, would not it? In reality there are always the cases where one does not want optimization happen for some reason, for example condition calls a function with side effects.. This would require us to provide some facility to suppress optimization on column and/or table level, and this might unnecessary complicate DT structure.

    Fortunately, we were lucky to come up with an approach that gives the developer an explicit control over (indexing) optimization at the same time reducing the total amount of the code in the condition. To understand the approach, one needs to relize that
    a) indexing is possible only for some very well defined operations, like equality or range checks
    and
    b) the index has to be calculated in advance

    For example we have condition

    driver.type == type

    where driver.type is tested value and type is rule condition parameter against which the tested value is being checked. We could parse the code and figure it out automatically - but we decided to simplify our task by putting a bit more responsibility (and control too) into developer's hands. If the condition expression will be just

    driver.type

    the parser will easily recognize that it does not depend on rule parameters at all. This can be used as the hint that the condition needs to be optimized. The structure of parameters will determine the type of index using this table:

    Expr Type N Params Param Type Condition Index Performance
    any T x 1 T value x == value Constant(HashMap) performance
    any T x 1 T[] ary contains(ary, x) Constant(HashMap) performance
    Comparable T x 2 T min, T max min <= x && x < max log(n)(TreeMap) performance
    any T x 2 <in|not in> Enum isIn, T[] ary isIn == in ? contains(ary, x) : !contains(ary, x) Constant(HashMap) performance

    The OpenL Tablets Decision Table Optimizer will automatically recognize these conditions and create indexes for them.
    The advantages of the suggested approach are summarized here:

    • there is less code to type and therefore less to read, less places to make typos etc.
    • there is less to compile and parse, the less work for the compiler or optimizer to determine programmer's intentions == better performance and less errors
    • optimization algorithm is easy to turn on or off for any condition, it is easy to predict what kind of optimization will be used for particular kind of data, there would not be any "black box magic". Eventually we are going not only publish all the optimization algorithms, but also provide formula-based estimates for their expected performance. This will provide the basis for static compile-time performance analysis

Author:
sshor