public interface IndexMappingConverter
IndexMapping to bins
encoded using another one.| Modifier and Type | Method and Description |
|---|---|
void |
convertAscendingIterator(java.util.Iterator<Bin> inBins,
BinAcceptor outBins)
Converts bins.
|
static IndexMappingConverter |
distributingUniformly(IndexMapping inMapping,
IndexMapping outMapping)
Returns a converter that uniformly distributes the count of a bin to the overlapping bins of
the new mapping based on the shares of the initial bin that the new bins cover.
|
void convertAscendingIterator(java.util.Iterator<Bin> inBins, BinAcceptor outBins)
inBins - an ascending iterator, that is, an iterator that returns bins whose indexes are
sorted in ascending orderoutBins - a consumer that is fed the converted binsjava.lang.IllegalArgumentException - if the provided iterator is not ascendingstatic IndexMappingConverter distributingUniformly(IndexMapping inMapping, IndexMapping outMapping)
This conversion method is not the one that minimizes the relative accuracy of the quantiles
that are computed from the resulting bins. For instance, transferring the full count of a bin
of the initial mapping to the single bin of the new mapping that overlaps IndexMapping.value(int) of the initial mapping allows computing more accurate quantiles.
However, this method produces better-looking histograms and avoids conversion artifacts that
would cause empty bins or bins with counts that are excessively high relative to its
neighbors'.
If \(\alpha_i\) is the relative accuracy of the initial mapping inMapping,
\(\alpha_o\) the relative accuracy of the new mapping outMapping, and assuming that the
initial bins are not themselves resulting from a conversion (in which case \(\alpha_i\) needs
to be adjusted to be the effective relative accuracy of the initial bins), the effective
relative accuracy of the quantiles that are computed from the bins that result from this
conversion method is upper-bounded by \(\alpha =
\frac{(1+\alpha_i)(1+\alpha_o)}{1-\alpha_i}-1\). If relative accuracies are small, this is
approximately \(\alpha \approx 2\alpha_i+\alpha_o\).
That is because this conversion method causes an input data point to be spread over the full width of a bin of the initial mapping, hence a multiplicative shift of up to \(\gamma_i = \frac{1+\alpha_i}{1-\alpha_i}\) to the right, and down to \(\frac{1}{\gamma_i}\) to the left. In addition, because of the relative error induced by the new mapping, transferring counts to the new mapping will cause an additional multiplicative shift of up to \(1+\alpha_o\) to the right, and down to \(1-\alpha_o\) to the left. Therefore, the resulting relative error is up to \(\alpha = \gamma_i(1+\alpha_o)-1\) to the right and up to \(\alpha' = 1-\frac{1-\alpha_o}{\gamma_i}\) to the left. Because \(\alpha-\alpha' = \frac{1}{\gamma_i}((\gamma_i^2-1)\alpha_o+(\gamma_i-1)^2) \geq 0\) (given that \(\gamma_i \geq 1\) and \(\alpha_o \geq 0\)), the resulting relative error is upper-bounded by \(\alpha = \gamma_i(1+\alpha_o)-1 = \frac{(1+\alpha_i)(1+\alpha_o)}{1-\alpha_i}-1\).
In other words, this conversion method causes a single point to be spread over the full width of a bin of the initial mapping, inducing a relative error up to approximately \(2\alpha_i\). In addition, the allocation of counts to the bins of the new mapping causes a relative error that is up to approximately \(\alpha_o\). Informally, here is what can happen in the worst case:
single input value: x initial mapping: -|-------o-------|-------o-------|-------o-------| max (q_1) after bin encoding (1): x count spreading over full bin (2): [---------------] new mapping: |---o---|---o---|---o---|---o---|---o---|---o---|- non-empty bins after conversion (3): o o o max after conversion: xThe resulting value at quantile \(1\) (i.e., the maximum value) is shifted by \(\alpha_i\) because of (1), an additional \(\alpha_i\) because of (2) and \(\alpha_o\) because of (3).