Class BaseChunkForwardIndexWriter

  • All Implemented Interfaces:
    Closeable, AutoCloseable
    Direct Known Subclasses:
    FixedByteChunkForwardIndexWriter, VarByteChunkForwardIndexWriter

    public abstract class BaseChunkForwardIndexWriter
    extends Object
    implements Closeable
    Base implementation for chunk-based raw (non-dictionary-encoded) forward index writer where each chunk contains fixed number of docs.

    The layout of the file is as follows:

    • Header Section
      • File format version (int)
      • Total number of chunks (int)
      • Number of docs per chunk (int)
      • Size of entry in bytes (int)
      • Total number of docs (int)
      • Compression type enum value (int)
      • Start offset of data header (int)
      • Data header (start offsets for all chunks)
        • For version 2, offset is stored as int
        • For version 3 onwards, offset is stored as long
    • Individual Chunks
    • Field Detail

      • _chunkBuffer

        protected final ByteBuffer _chunkBuffer
      • _compressedBuffer

        protected final ByteBuffer _compressedBuffer
      • _chunkSize

        protected int _chunkSize
      • _dataOffset

        protected long _dataOffset
    • Constructor Detail

      • BaseChunkForwardIndexWriter

        protected BaseChunkForwardIndexWriter​(File file,
                                              ChunkCompressionType compressionType,
                                              int totalDocs,
                                              int numDocsPerChunk,
                                              long chunkSize,
                                              int sizeOfEntry,
                                              int version,
                                              boolean fixed)
                                       throws IOException
        Constructor for the class.
        Parameters:
        file - Data file to write into
        compressionType - Type of compression
        totalDocs - Total docs to write
        numDocsPerChunk - Number of docs per data chunk
        chunkSize - Size of chunk
        sizeOfEntry - Size of entry (in bytes), max size for variable byte implementation.
        version - version of File
        fixed - if the data type is fixed width (required for version validation)
        Throws:
        IOException - if the file isn't found or can't be mapped
    • Method Detail

      • writeChunk

        protected void writeChunk()
        Helper method to compress and write the current chunk.
        • Chunk header is of fixed size, so fills out any remaining offsets for partially filled chunks.
        • Compresses (if required) and writes the chunk to the data file.
        • Updates the header with the current chunks offset.
        • Clears up the buffers, so that they can be reused.