Class XXHash


  • public class XXHash
    extends java.lang.Object
    Native bindings to xxhash.

    xxHash is an extremely fast Hash algorithm, running at RAM speed limits. It also successfully passes all tests from the SMHasher suite.

    A 64-bit version, named XXH64, is available since r35. It offers much better speed, but for 64-bit applications only.

    Streaming

    Streaming functions generate the xxHash value from an incremental input. This method is slower than single-call functions, due to state management. For small inputs, prefer 32 and 64, which are better optimized.

    XXH state must first be allocated, using 32_createState.

    Start a new hash by initializing state with a seed, using 32_reset.

    Then, feed the hash state by calling 32_update as many times as necessary. Obviously, input must be allocated and read accessible. The function returns an error code, with 0 meaning OK, and any other value meaning there is an error.

    Finally, a hash value can be produced anytime, by using 32_digest. This function returns the 32-bits hash as an int.

    It's still possible to continue inserting input into the hash state after a digest, and generate some new hash values later on, by calling again 32_digest.

    When done, release the state, using 32_freeState.

    Example code for incrementally hashing a file:

    
     #include <stdio.h>
     #include <xxhash.h>
     #define BUFFER_SIZE 256
     
     // Note: XXH64 and XXH3 use the same interface.
     XXH32_hash_t
     hashFile(FILE* stream)
     {
         XXH32_state_t* state;
         unsigned char buf[BUFFER_SIZE];
         size_t amt;
         XXH32_hash_t hash;
     
         state = XXH32_createState();       // Create a state
         assert(state != NULL);             // Error check here
         XXH32_reset(state, 0xbaad5eed);    // Reset state with our seed
         while ((amt = fread(buf, 1, sizeof(buf), stream)) != 0) {
             XXH32_update(state, buf, amt); // Hash the file in chunks
         }
         hash = XXH32_digest(state);        // Finalize the hash
         XXH32_freeState(state);            // Clean up
         return hash;
     }

    Canonical representation

    The default return values from XXH functions are unsigned 32 and 64 bit integers. This the simplest and fastest format for further post-processing.

    However, this leaves open the question of what is the order on the byte level, since little and big endian conventions will store the same number differently.

    The canonical representation settles this issue by mandating big-endian convention, the same convention as human-readable numbers (large digits first).

    When writing hash values to storage, sending them over a network, or printing them, it's highly recommended to use the canonical representation to ensure portability across a wider range of systems, present and future.

    XXH3

    XXH3 is a more recent hash algorithm featuring:

    • Improved speed for both small and large inputs
    • True 64-bit and 128-bit outputs
    • SIMD acceleration
    • Improved 32-bit viability

    Speed analysis methodology is explained here:

    https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html

    Compared to XXH64, expect XXH3 to run approximately ~2x faster on large inputs and >3x faster on small ones, exact differences vary depending on platform.

    XXH3's speed benefits greatly from SIMD and 64-bit arithmetic, but does not require it. Any 32-bit and 64-bit targets that can run XXH32 smoothly can run XXH3 at competitive speeds, even without vector support. Further details are explained in the implementation.

    Optimized implementations are provided for AVX512, AVX2, SSE2, NEON, POWER8, ZVector and scalar targets. This can be controlled via the XXH_VECTOR macro.

    XXH3 implementation is portable:

    • it has a generic C90 formulation that can be compiled on any platform,
    • all implementations generage exactly the same hash value on all platforms.
    • Starting from v0.8.0, it's also labelled "stable", meaning that any future version will also generate the same hash value.

    XXH3 offers 2 variants, _64bits and _128bits. When only 64 bits are needed, prefer invoking the _64bits variant, as it reduces the amount of mixing, resulting in faster speed on small inputs. It's also generally simpler to manipulate a scalar return type than a struct.

    The API supports one-shot hashing, streaming mode, and custom secrets.

    *_withSecretandSeed()

    These variants generate hash values using either seed for "short" keys (< XXH3_MIDSIZE_MAX = 240 bytes) or secret for "large" keys (≥ XXH3_MIDSIZE_MAX).

    This generally benefits speed, compared to _withSeed() or _withSecret(). _withSeed() has to generate the secret on the fly for "large" keys. It's fast, but can be perceptible for "not so large" keys (< 1 KB). _withSecret() has to generate the masks on the fly for "small" keys, which requires more instructions than _withSeed() variants. Therefore, _withSecretandSeed variant combines the best of both worlds.

    When secret has been generated by 3_generateSecret_fromSeed, this variant produces exactly the same results as _withSeed()` variant, hence offering only a pure speed benefit on "large" input, by skipping the need to regenerate the secret for every large input.

    Another usage scenario is to hash the secret to a 64-bit hash value, for example with 3_64bits, which then becomes the seed, and then employ both the seed and the secret in _withSecretandSeed(). On top of speed, an added benefit is that each bit in the secret has a 50% chance to swap each bit in the output, via its impact to the seed. This is not guaranteed when using the secret directly in "small data" scenarios, because only portions of the secret are employed for small data.

    • Method Detail

      • nXXH32

        public static int nXXH32​(long input,
                                 long length,
                                 int seed)
        Parameters:
        length - the length of input, in bytes
      • XXH32

        public static int XXH32​(@Nullable
                                java.nio.ByteBuffer input,
                                int seed)
        Calculates the 32-bit hash of input using xxHash32.

        Speed on Core 2 Duo @ 3 GHz (single thread, SMHasher benchmark): 5.4 GB/s

        The memory between input and input + length must be valid, readable, contiguous memory. However, if length is 0, input may be NULL.

        Parameters:
        input - the block of data to be hashed, at least length bytes in size
        seed - the 32-bit seed to alter the hash's output predictably
        Returns:
        the calculated 32-bit hash value
      • nXXH32_createState

        public static long nXXH32_createState()
        Unsafe version of: XXH32_createState()
      • XXH32_createState

        @Nullable
        public static XXH32State XXH32_createState()
        Allocates an XXH32_state_t.

        Must be freed with 32_freeState.

        LWJGL note: This function delegates to the memory allocator configured by LWJGL.

        Returns:
        an allocated XXH32_state_t on success, NULL on failure
      • XXH32_freeState

        public static int XXH32_freeState​(XXH32State statePtr)
        Frees an XXH32_state_t.

        Must be allocated with 32_createState.

        Parameters:
        statePtr - the state to free
      • XXH32_copyState

        public static void XXH32_copyState​(XXH32State dst_state,
                                           XXH32State src_state)
        Copies one XXH32_state_t to another.

        dst_state and src_state must not be NULL and must not overlap.

        Parameters:
        dst_state - the state to copy to
        src_state - the state to copy from
      • XXH32_reset

        public static int XXH32_reset​(XXH32State statePtr,
                                      int seed)
        Resets an XXH32_state_t to begin a new hash.

        This function resets and seeds a state. Call it before 32_update.

        Parameters:
        statePtr - the state struct to reset
        seed - the 32-bit seed to alter the hash result predictably
        Returns:
        OK on success, ERROR on failure
      • XXH32_update

        public static int XXH32_update​(XXH32State statePtr,
                                       @Nullable
                                       java.nio.ByteBuffer input)
        Consumes a block of input to an XXH32_state_t.

        Call this to incrementally consume blocks of data.

        The memory between input and input + length must be valid, readable, contiguous memory. However, if length is 0, input may be NULL.

        Parameters:
        statePtr - the state struct to update
        input - the block of data to be hashed, at least length bytes in size
        Returns:
        OK on success, ERROR on failure
      • XXH32_digest

        public static int XXH32_digest​(XXH32State statePtr)
        Returns the calculated hash value from an XXH32_state_t.

        Calling XXH32_digest() will not affect statePtr, so you can update, digest, and update again.

        Parameters:
        statePtr - the state struct to calculate the hash from
        Returns:
        the calculated xxHash32 value from that state
      • XXH32_canonicalFromHash

        public static void XXH32_canonicalFromHash​(XXH32Canonical dst,
                                                   int hash)
        Converts an XXH32_hash_t to a big endian XXH32_canonical_t.
        Parameters:
        dst - the XXH32_canonical_t pointer to be stored to.
        hash - the XXH32_hash_t to be converted
      • XXH32_hashFromCanonical

        public static int XXH32_hashFromCanonical​(XXH32Canonical src)
        Converts an XXH32_canonical_t to a native XXH32_hash_t.
        Parameters:
        src - the XXH32_canonical_t to convert
      • nXXH64

        public static long nXXH64​(long input,
                                  long length,
                                  long seed)
        Parameters:
        length - the length of input, in bytes
      • XXH64

        public static long XXH64​(@Nullable
                                 java.nio.ByteBuffer input,
                                 long seed)
        Calculates the 64-bit hash of input using xxHash64.

        This function usually runs faster on 64-bit systems, but slower on 32-bit systems.

        The memory between input and input + length must be valid, readable, contiguous memory. However, if length is 0, input may be NULL.

        Parameters:
        input - the block of data to be hashed, at least length bytes in size
        seed - the 64-bit seed to alter the hash's output predictably
        Returns:
        the calculated 64-bit hash
      • nXXH64_createState

        public static long nXXH64_createState()
        Unsafe version of: XXH64_createState()
      • XXH64_createState

        @Nullable
        public static XXH64State XXH64_createState()
        Allocates an XXH64_state_t.

        Must be freed with 64_freeState.

        LWJGL note: This function delegates to the memory allocator configured by LWJGL.

        Returns:
        an allocated XXH64_state_t on success, NULL on failure
      • XXH64_freeState

        public static int XXH64_freeState​(XXH64State statePtr)
        Frees an XXH64_state_t.

        Must be allocated with 64_createState.

        Parameters:
        statePtr - the state to free
      • XXH64_copyState

        public static void XXH64_copyState​(XXH64State dst_state,
                                           XXH64State src_state)
        Copies one XXH64_state_t to another.

        dst_state and src_state must not be NULL and must not overlap.

        Parameters:
        dst_state - the state to copy to
        src_state - the state to copy from
      • XXH64_reset

        public static int XXH64_reset​(XXH64State statePtr,
                                      long seed)
        Resets an XXH64_state_t to begin a new hash.

        This function resets and seeds a state. Call it before 64_update.

        Parameters:
        statePtr - the state struct to reset
        seed - the 64-bit seed to alter the hash result predictably
      • XXH64_update

        public static int XXH64_update​(XXH64State statePtr,
                                       @Nullable
                                       java.nio.ByteBuffer input)
        Consumes a block of input to an XXH64_state_t.

        Call this to incrementally consume blocks of data.

        The memory between input and input + length must be valid, readable, contiguous memory. However, if length is 0, input may be NULL.

        Parameters:
        statePtr - the state struct to update
        input - the block of data to be hashed, at least length bytes in size
      • XXH64_digest

        public static long XXH64_digest​(XXH64State statePtr)
        Returns the calculated hash value from an XXH64_state_t.

        Calling XXH64_digest() will not affect statePtr, so you can update, digest, and update again.

        Parameters:
        statePtr - the state struct to calculate the hash from
        Returns:
        the calculated xxHash64 value from that state
      • XXH64_canonicalFromHash

        public static void XXH64_canonicalFromHash​(XXH64Canonical dst,
                                                   long hash)
        Converts an XXH64_hash_t to a big endian XXH64_canonical_t.
        Parameters:
        dst - the XXH64_canonical_t pointer to be stored to.
        hash - the XXH64_hash_t to be converted
      • XXH64_hashFromCanonical

        public static long XXH64_hashFromCanonical​(XXH64Canonical src)
        Converts an XXH64_canonical_t to a native XXH64_hash_t.
        Parameters:
        src - the XXH64_canonical_t to convert
      • XXH3_64bits

        public static long XXH3_64bits​(java.nio.ByteBuffer data)
        Default 64-bit variant, using default secret and default seed of 0.

        It's the fastest variant.

      • XXH3_64bits_withSeed

        public static long XXH3_64bits_withSeed​(java.nio.ByteBuffer data,
                                                long seed)
        This variant generates on the fly a custom secret, based on the default secret, altered using the seed value.

        While this operation is decently fast, note that it's not completely free. Note seed==0 produces same results as 3_64bits.

      • XXH3_64bits_withSecret

        public static long XXH3_64bits_withSecret​(java.nio.ByteBuffer data,
                                                  java.nio.ByteBuffer secret)
        It's possible to provide any blob of bytes as a "secret" to generate the hash. This makes it more difficult for an external actor to prepare an intentional collision. The main condition is that secretSize must be large enough (≥ XXH3_SECRET_SIZE_MIN).

        However, the quality of the secret impacts the dispersion of the hash algorithm. Therefore, the secret must look like a bunch of random bytes. Avoid "trivial" or structured data such as repeated sequences or a text document. Whenever in doubt about the "randomness" of the blob of bytes, consider employing 3_generateSecret instead. It will generate a proper high entropy secret derived from the blob of bytes. Another advantage of using XXH3_generateSecret() is that it guarantees that all bits within the initial blob of bytes will impact every bit of the output. This is not necessarily the case when using the blob of bytes directly because, when hashing small inputs, only a portion of the secret is employed.

      • nXXH3_createState

        public static long nXXH3_createState()
      • XXH3_createState

        @Nullable
        public static XXH3State XXH3_createState()
      • nXXH3_freeState

        public static int nXXH3_freeState​(long statePtr)
      • XXH3_freeState

        public static int XXH3_freeState​(XXH3State statePtr)
      • nXXH3_copyState

        public static void nXXH3_copyState​(long dst_state,
                                           long srct_state)
      • XXH3_copyState

        public static void XXH3_copyState​(XXH3State dst_state,
                                          XXH3State srct_state)
      • XXH3_64bits_reset

        public static int XXH3_64bits_reset​(XXH3State statePtr)
        Initialize with default parameters.

        Result will be equivalent to 3_64bits.

      • XXH3_64bits_reset_withSeed

        public static int XXH3_64bits_reset_withSeed​(XXH3State statePtr,
                                                     long seed)
        Generate a custom secret from seed, and store it into state.

        Digest will be equivalent to 3_64bits_withSeed.

      • XXH3_64bits_reset_withSecret

        public static int XXH3_64bits_reset_withSecret​(XXH3State statePtr,
                                                       java.nio.ByteBuffer secret)
        secret is referenced, and must outlive the hash streaming session.

        Similar to one-shot API, secretSize must be ≥ XXH3_SECRET_SIZE_MIN, and the quality of produced hash values depends on secret's entrop (secret's content should look like a bunch of random bytes). When in doubt about the randomness of a candidate secret, consider employing 3_generateSecret instead (see below).

      • nXXH3_64bits_update

        public static int nXXH3_64bits_update​(long statePtr,
                                              long input,
                                              long length)
      • XXH3_64bits_update

        public static int XXH3_64bits_update​(XXH3State statePtr,
                                             java.nio.ByteBuffer input)
      • nXXH3_64bits_digest

        public static long nXXH3_64bits_digest​(long statePtr)
      • XXH3_64bits_digest

        public static long XXH3_64bits_digest​(XXH3State statePtr)
      • nXXH3_128bits

        public static void nXXH3_128bits​(long data,
                                         long len,
                                         long __result)
      • XXH3_128bits

        public static XXH128Hash XXH3_128bits​(java.nio.ByteBuffer data,
                                              XXH128Hash __result)
      • nXXH3_128bits_withSeed

        public static void nXXH3_128bits_withSeed​(long data,
                                                  long len,
                                                  long seed,
                                                  long __result)
      • XXH3_128bits_withSeed

        public static XXH128Hash XXH3_128bits_withSeed​(java.nio.ByteBuffer data,
                                                       long seed,
                                                       XXH128Hash __result)
      • nXXH3_128bits_withSecret

        public static void nXXH3_128bits_withSecret​(long data,
                                                    long len,
                                                    long secret,
                                                    long secretSize,
                                                    long __result)
      • XXH3_128bits_withSecret

        public static XXH128Hash XXH3_128bits_withSecret​(java.nio.ByteBuffer data,
                                                         java.nio.ByteBuffer secret,
                                                         XXH128Hash __result)
      • nXXH3_128bits_reset

        public static int nXXH3_128bits_reset​(long statePtr)
      • XXH3_128bits_reset

        public static int XXH3_128bits_reset​(XXH3State statePtr)
      • nXXH3_128bits_reset_withSeed

        public static int nXXH3_128bits_reset_withSeed​(long statePtr,
                                                       long seed)
      • XXH3_128bits_reset_withSeed

        public static int XXH3_128bits_reset_withSeed​(XXH3State statePtr,
                                                      long seed)
      • nXXH3_128bits_reset_withSecret

        public static int nXXH3_128bits_reset_withSecret​(long statePtr,
                                                         long secret,
                                                         long secretSize)
      • XXH3_128bits_reset_withSecret

        public static int XXH3_128bits_reset_withSecret​(XXH3State statePtr,
                                                        java.nio.ByteBuffer secret)
      • nXXH3_128bits_update

        public static int nXXH3_128bits_update​(long statePtr,
                                               long input,
                                               long length)
      • XXH3_128bits_update

        public static int XXH3_128bits_update​(XXH3State statePtr,
                                              java.nio.ByteBuffer input)
      • nXXH3_128bits_digest

        public static void nXXH3_128bits_digest​(long statePtr,
                                                long __result)
      • XXH128_isEqual

        public static boolean XXH128_isEqual​(XXH128Hash h1,
                                             XXH128Hash h2)
        Returns 1 if equal, 0 if different.
      • XXH128_cmp

        public static int XXH128_cmp​(java.nio.ByteBuffer h128_1,
                                     java.nio.ByteBuffer h128_2)
        This comparator is compatible with stdlib's qsort()/bsearch().
      • nXXH128_canonicalFromHash

        public static void nXXH128_canonicalFromHash​(long dst,
                                                     long hash)
      • nXXH128_hashFromCanonical

        public static void nXXH128_hashFromCanonical​(long src,
                                                     long __result)
      • XXH3_INITSTATE

        public static void XXH3_INITSTATE​(XXH3State statePtr)
        Initializes a stack-allocated XXH3_state_t.

        When the XXH3_state_t structure is merely emplaced on stack, it should be initialized with XXH3_INITSTATE() or a memset() in case its first reset uses XXH3_NNbits_reset_withSeed(). This init can be omitted if the first reset uses default or _withSecret mode. This operation isn't necessary when the state is created with 3_createState.

        Note that this doesn't prepare the state for a streaming operation, it's still necessary to use XXH3_NNbits_reset*() afterwards.

      • XXH128

        public static XXH128Hash XXH128​(java.nio.ByteBuffer data,
                                        long seed,
                                        XXH128Hash __result)
        Simple alias to pre-selected XXH3_128bits variant.
      • XXH3_generateSecret

        public static int XXH3_generateSecret​(java.nio.ByteBuffer secretBuffer,
                                              @Nullable
                                              java.nio.ByteBuffer customSeed)
        Derives a high-entropy secret from any user-defined content, named customSeed.

        The generated secret can be used in combination with *_withSecret() functions. The _withSecret() variants are useful to provide a higher level of protection than 64-bit seed, as it becomes much more difficult for an external actor to guess how to impact the calculation logic.

        The function accepts as input a custom seed of any length and any content, and derives from it a high-entropy secret of length secretSize into an already allocated buffer secretBuffer. secretSize must be ≥ XXH3_SECRET_SIZE_MIN.

        The generated secret can then be used with any *_withSecret() variant. Functions 3_128bits_withSecret, 3_64bits_withSecret, 3_128bits_reset_withSecret and 3_64bits_reset_withSecret are part of this list. They all accept a secret parameter which must be large enough for implementation reasons (≥ XXH3_SECRET_SIZE_MIN) and feature very high entropy (consist of random-looking bytes). These conditions can be a high bar to meet, so XXH3_generateSecret() can be employed to ensure proper quality.

        customSeed can be anything. It can have any size, even small ones, and its content can be anything, even "poor entropy" sources such as a bunch of zeroes. The resulting secret will nonetheless provide all required qualities.

        When customSeedSize > 0, supplying NULL as customSeed is undefined behavior.

      • XXH3_generateSecret_fromSeed

        public static void XXH3_generateSecret_fromSeed​(java.nio.ByteBuffer secretBuffer,
                                                        long seed)
        Generate the same secret as the _withSeed() variants.

        The resulting secret has a length of XXH3_SECRET_DEFAULT_SIZE (necessarily). secretBuffer must be already allocated, of size at least XXH3_SECRET_DEFAULT_SIZE bytes.

        The generated secret can be used in combination with *_withSecret() and _withSecretandSeed() variants. This generator is notably useful in combination with _withSecretandSeed(), as a way to emulate a faster _withSeed() variant.

      • nXXH3_64bits_withSecretandSeed

        public static long nXXH3_64bits_withSecretandSeed​(long data,
                                                          long len,
                                                          long secret,
                                                          long secretSize,
                                                          long seed)
      • XXH3_64bits_withSecretandSeed

        public static long XXH3_64bits_withSecretandSeed​(@Nullable
                                                         java.nio.ByteBuffer data,
                                                         java.nio.ByteBuffer secret,
                                                         long seed)
      • nXXH3_128bits_withSecretandSeed

        public static void nXXH3_128bits_withSecretandSeed​(long data,
                                                           long len,
                                                           long secret,
                                                           long secretSize,
                                                           long seed,
                                                           long __result)
      • XXH3_128bits_withSecretandSeed

        public static XXH128Hash XXH3_128bits_withSecretandSeed​(@Nullable
                                                                java.nio.ByteBuffer data,
                                                                java.nio.ByteBuffer secret,
                                                                long seed,
                                                                XXH128Hash __result)
      • nXXH3_64bits_reset_withSecretandSeed

        public static int nXXH3_64bits_reset_withSecretandSeed​(long statePtr,
                                                               long secret,
                                                               long secretSize,
                                                               long seed64)
      • XXH3_64bits_reset_withSecretandSeed

        public static int XXH3_64bits_reset_withSecretandSeed​(XXH3State statePtr,
                                                              java.nio.ByteBuffer secret,
                                                              long seed64)
      • nXXH3_128bits_reset_withSecretandSeed

        public static int nXXH3_128bits_reset_withSecretandSeed​(long statePtr,
                                                                long secret,
                                                                long secretSize,
                                                                long seed64)
      • XXH3_128bits_reset_withSecretandSeed

        public static int XXH3_128bits_reset_withSecretandSeed​(XXH3State statePtr,
                                                               java.nio.ByteBuffer secret,
                                                               long seed64)