Class TransformationStrategies
- java.lang.Object
-
- it.unimi.dsi.bits.TransformationStrategies
-
public class TransformationStrategies extends Object
A class providing static methods and objects that do useful things with transformation strategies.This class provides several transformation strategies that turn strings or other objects into bit vectors. The transformations might optionally be:
- Lexicographical: for objects based on bytes or characters, such as strings and byte arrays, this means that the first bit of the bit vector is the most significant bit of the first byte or character, and so on. In other word, the lexicographical order between bit vectors reflects the lexicographical byte-by-byte, char-by-char, etc. order. Thiss property is necessary for some kind of static structure that depends on it, but it has some computational cost, as after compacting byte or chars into a long we need to revert the bit order of each piece.
- Prefix-free: no two bit vector returned by the transformation on two
different objects will be comparable in prefix order. Again, this might require to use more
linear (e.g.,
prefixFree()) or constant (e.g.,prefixFreeIso()) additional space.
As a general rule, transformations without additional naming are lexicographical. Transformation that generate prefix-free bit vectors are marked as such. Plain transformations that do not provide any guarantee are called raw. They should be used only when performance is the main issue and the two properties above are not relevant.
- See Also:
TransformationStrategy
-
-
Constructor Summary
Constructors Constructor Description TransformationStrategies()
-
Method Summary
Modifier and Type Method Description static TransformationStrategy<byte[]>byteArray()A lexicographical transformation from byte arrays to bit vectors.static TransformationStrategy<Long>fixedLong()A transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE-bit vector.static <T extends BitVector>
TransformationStrategy<T>identity()A trivial transformation for data already inBitVectorform.static <T extends CharSequence>
TransformationStrategy<T>iso()A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.static <T extends BitVector>
TransformationStrategy<T>prefixFree()A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.static <T extends CharSequence>
TransformationStrategy<T>prefixFreeIso()A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.static <T extends CharSequence>
TransformationStrategy<T>prefixFreeUtf16()A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.static <T extends CharSequence>
TransformationStrategy<T>prefixFreeUtf32()A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.static TransformationStrategy<byte[]>rawByteArray()A trivial, high-performance, raw transformation from byte arrays to bit vectors that simply concatenates the bytes of the array.static TransformationStrategy<Long>rawFixedLong()A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE-bit vector.static <T extends CharSequence>
TransformationStrategy<T>rawIso()A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.static <T extends CharSequence>
TransformationStrategy<T>rawUtf16()A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.static <T extends CharSequence>
TransformationStrategy<T>rawUtf32()A trivial raw transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.static <T extends CharSequence>
TransformationStrategy<T>utf16()A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.static <T extends CharSequence>
TransformationStrategy<T>utf32()A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.static <T> Iterable<BitVector>wrap(Iterable<T> iterable, TransformationStrategy<? super T> transformationStrategy)Wraps a given iterable, returning an iterable that contains bit vectors.static <T> Iterator<BitVector>wrap(Iterator<T> iterator, TransformationStrategy<? super T> transformationStrategy)Wraps a given iterator, returning an iterator that emits bit vectors.static <T> List<BitVector>wrap(List<T> list, TransformationStrategy<? super T> transformationStrategy)Wraps a given list, returning a list that contains bit vectors.
-
-
-
Method Detail
-
identity
public static <T extends BitVector> TransformationStrategy<T> identity()
A trivial transformation for data already inBitVectorform.
-
rawUtf32
public static <T extends CharSequence> TransformationStrategy<T> rawUtf32()
A trivial raw transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.Warning: this transformation is not lexicographic.
-
utf32
public static <T extends CharSequence> TransformationStrategy<T> utf32()
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs and concatenates the bits of the UTF-32 representation.
-
prefixFreeUtf32
public static <T extends CharSequence> TransformationStrategy<T> prefixFreeUtf32()
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation, decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.Note that strings provided to this strategy must not contain NULs.
-
rawUtf16
public static <T extends CharSequence> TransformationStrategy<T> rawUtf16()
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.Warning: this transformation is not lexicographic.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
utf16
public static <T extends CharSequence> TransformationStrategy<T> utf16()
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
prefixFreeUtf16
public static <T extends CharSequence> TransformationStrategy<T> prefixFreeUtf16()
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.Note that strings provided to this strategy must not contain NULs.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
rawIso
public static <T extends CharSequence> TransformationStrategy<T> rawIso()
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.Warning: this transformation is not lexicographic.
Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
iso
public static <T extends CharSequence> TransformationStrategy<T> iso()
A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
prefixFreeIso
public static <T extends CharSequence> TransformationStrategy<T> prefixFreeIso()
A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset, and that strings provided to this strategy must not contain ASCII NULs.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
rawByteArray
public static TransformationStrategy<byte[]> rawByteArray()
A trivial, high-performance, raw transformation from byte arrays to bit vectors that simply concatenates the bytes of the array.Warning: this transformation is not lexicographic.
Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
- See Also:
TransformationStrategies
-
byteArray
public static TransformationStrategy<byte[]> byteArray()
A lexicographical transformation from byte arrays to bit vectors.Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
- See Also:
TransformationStrategies
-
wrap
public static <T> Iterator<BitVector> wrap(Iterator<T> iterator, TransformationStrategy<? super T> transformationStrategy)
Wraps a given iterator, returning an iterator that emits bit vectors.- Parameters:
iterator- an iterator.transformationStrategy- a strategy to transform the object returned byiterator.- Returns:
- an iterator that emits the content of
iteratorpassed throughtransformationStrategy.
-
wrap
public static <T> Iterable<BitVector> wrap(Iterable<T> iterable, TransformationStrategy<? super T> transformationStrategy)
Wraps a given iterable, returning an iterable that contains bit vectors.- Parameters:
iterable- an iterable.transformationStrategy- a strategy to transform the object contained initerable.- Returns:
- an iterable that has the content of
iterablepassed throughtransformationStrategy.
-
wrap
public static <T> List<BitVector> wrap(List<T> list, TransformationStrategy<? super T> transformationStrategy)
Wraps a given list, returning a list that contains bit vectors.- Parameters:
list- a list.transformationStrategy- a strategy to transform the object contained inlist.- Returns:
- a list that has the content of
listpassed throughtransformationStrategy.
-
prefixFree
public static <T extends BitVector> TransformationStrategy<T> prefixFree()
A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.More in detail, we map 0 to 10, 1 to 11, and we add a 0 at the end of all strings.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
-
fixedLong
public static TransformationStrategy<Long> fixedLong()
A transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE-bit vector. Note that the bit vectors have as first bit the most significant bit of the underlying long integer, so lexicographical and numerical order do coincide for positive numbers.
-
rawFixedLong
public static TransformationStrategy<Long> rawFixedLong()
A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-sizeLong.SIZE-bit vector.
-
-