Class DecisionTableOptimizedAlgorithm
- java.lang.Object
-
- org.openl.rules.dt.algorithm.DecisionTableOptimizedAlgorithm
-
- All Implemented Interfaces:
IDecisionTableAlgorithm
public class DecisionTableOptimizedAlgorithm extends Object implements IDecisionTableAlgorithm
The basic algorithm for decision table (DT) evaluation is straightforward (let's consider table with conditions and actions as columns and rules as rows - you remember that OpenL Tablets allow both this and transposed orientation):- For each rule (row) from the top to the bottom of the table evaluate conditions from the left to the right
- If all conditions are true, execute all the actions in the rule from the left to the right, if any condition is false stop evaluating conditions, go to the next rule
- If the action is non-empty return action then return the value of the action (stops the evaluation)
- If no rules left then return
null
The goal of optimizations is to decrease the number of condition checking required to determine which rule actions need to be executed. Action optimizations are not considered, even though some improvements could be possible. Sometimes the best results may be achieved by changing the order of conditions(columns) and rules(rows) but these approaches will change the algorithm's logic and must be used with caution. We are not going to implement these approaches at this point, but will give some guidelines to users on how to re-arrange rule tables to achieve better performance.
Out of the class of optimization algorithms that do not change the order of conditions or rules we are going to consider ones that optimize condition checking in one condition (column) at the time. In decision table with multiple conditions the algorithm will create a tree of the optimized nodes. The time of the of the tree traversing will equal the sum of times required to calculate each node.
The optimization algorithms that deal with single condition can be classified as follows:
- Condition Sharing algorithm.
Merge all the rows that share the same condition data into one node. Then calculate condition only once for all rules that share the same condition. The algorithm can be extended to achieve even better results if we allow the pseudo-data keyword else. The advantage of this algorithm is in it's universal applicability - it does not depend on the nature of condition expression. The disadvantage is in it's low performance improvement - it's expected average performance is n/s - where n is the total number of conditions and s = n/u where u is the number of unique conditions. We see that the more unique conditions the column has, the less is performance advantage of this method.- Indexing
Calculate condition input value and determine which rules to fire using some kind of index. The performance will be determined by the index speed. For example, if index is implemented as a HashMap (for equality checks) the performance is expected to be constant, if as a TreeMap - log (u), where u is number of unique conditions. Interestingly, for indexing algorithms, the more is the number of shared conditions or empty conditions, the less is performance improvement - quite the opposite to the Condition Sharing algorithm.
Applicability of the Algorithms
Both algorithms are single-condition(column) based. Both work only if rule condition expression does not change it's value during the course of DT evaluation. In other words, the Decision Table rules do not change attributes that participate in condition evaluation. For indexing the additional requirement is necessary - the index value should be known at compile time. It excludes conditions with dynamic formulas as cell values.
If a Decision Table does not conform to these assumptions, you should not use optimizations. It is also recommended that you take another look at your design and ask yourself: was it really necessary to produce such a twisted logic?
Explicit Indexing Optimization
Generally speaking, it would be nice to have system automatically apply optimizations, when appropriate, would not it? In reality there are always the cases where one does not want optimization happen for some reason, for example condition calls a function with side effects.. This would require us to provide some facility to suppress optimization on column and/or table level, and this might unnecessary complicate DT structure.
Fortunately, we were lucky to come up with an approach that gives the developer an explicit control over (indexing) optimization at the same time reducing the total amount of the code in the condition. To understand the approach, one needs to relize that
a) indexing is possible only for some very well defined operations, like equality or range checks
and
b) the index has to be calculated in advanceFor example we have condition
driver.type == typewhere
driver.typeis tested value andtypeis rule condition parameter against which the tested value is being checked. We could parse the code and figure it out automatically - but we decided to simplify our task by putting a bit more responsibility (and control too) into developer's hands. If the condition expression will be justdriver.typethe parser will easily recognize that it does not depend on rule parameters at all. This can be used as the hint that the condition needs to be optimized. The structure of parameters will determine the type of index using this table:Expr Type N Params Param Type Condition Index Performance any T x 1 T value x == valueConstant(HashMap) performance any T x 1 T[] ary contains(ary, x)Constant(HashMap) performance Comparable T x 2 T min, T max min <= x && x < maxlog(n)(TreeMap) performanceany T x 2 <in|not in> Enum isIn, T[] ary isIn == in ? contains(ary, x) : !contains(ary, x)Constant(HashMap) performance The OpenL Tablets Decision Table Optimizer will automatically recognize these conditions and create indexes for them.
The advantages of the suggested approach are summarized here:- there is less code to type and therefore less to read, less places to make typos etc.
- there is less to compile and parse, the less work for the compiler or optimizer to determine programmer's intentions == better performance and less errors
- optimization algorithm is easy to turn on or off for any condition, it is easy to predict what kind of optimization will be used for particular kind of data, there would not be any "black box magic". Eventually we are going not only publish all the optimization algorithms, but also provide formula-based estimates for their expected performance. This will provide the basis for static compile-time performance analysis
- Author:
- sshor
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description IIntIteratorcheckedRules(Object target, Object[] params, IRuntimeEnv env)This method produces the iterator over the set of rules in DT.voidcleanParamValuesForIndexedConditions()Clears condition's param values.static IConditionEvaluatormakeEvaluator(ICondition condition, IOpenClass conditionMethodType, IBindingContext bindingContext)
-
-
-
Method Detail
-
makeEvaluator
public static IConditionEvaluator makeEvaluator(ICondition condition, IOpenClass conditionMethodType, IBindingContext bindingContext)
-
cleanParamValuesForIndexedConditions
public void cleanParamValuesForIndexedConditions()
Clears condition's param values.Memory optimization: clear condition values because this values will be used in index(only if it condition is not used).
- Specified by:
cleanParamValuesForIndexedConditionsin interfaceIDecisionTableAlgorithm
-
checkedRules
public IIntIterator checkedRules(Object target, Object[] params, IRuntimeEnv env)
This method produces the iterator over the set of rules in DT. It has to retain the order of the rules.An optimized algorithm has 2 distinct steps:
1) Create initial discriminate rules set using indexing in initial conditions.
2) Iterate over the initial set using remaining conditions as selectors; not-optimized algorithm has the whole rules set as initial.
Performance. From the algorithm definition it is clear, that step 1 of algorithm is performed with constant or near constant speed with regard to the number of the rules. The performance of the part 2 is largely dependent the size of the resulting rules set. The order of initial indexed conditions does not seem to affect performance much (//TODO this statement needs verification)
- Specified by:
checkedRulesin interfaceIDecisionTableAlgorithm- Returns:
- iterator over rule indexes - integer iterator.
-
-