public class VLSNBucket
extends Object
A VLSNBucket instance represents a set of VLSN->LSN mappings. Buckets are
usually not updated, except at times when the replication stream may have
been reduced in size, by log cleaning or syncup. The VLSNBuckets in the
VLSNIndex's VLSNTracker are written to disk and are persistent. There are
also VLSNBuckets in the temporary recovery-time tracker that are used for
collecting mappings found in the log during recovery.
VLSNBuckets only hold mappings from a single log file. A single log file
may be mapped by multiple VLSNBuckets though.
As a tradeoff in space vs time, a VLSNBucket only stores a sparse set of
mappings and the caller must use a VLSNReader to scan the log file and
find any log entries not mapped directly by the bucket. In addition,
the VLSN is not actually stored. Only the offset portion of the LSN is
stored, and the VLSN is intuited by a stride field. Each VLSNBucket
only maps a single file, though a single file may be mapped by several
VLSNBuckets.
For example, suppose a node had these VLSN->LSN mappings:
VLSN LSN (file/offset)
9 10/100
10 10/110
11 10/120
12 10/130
13 10/140
14 11/100
15 11/120
The mappings in file 10 could be represented by a VLSNBucket with
a stride of 4. That means the bucket would hold the mappings for
9 10/100,
13 10/140
And since the target log file number and the stride is known, the mappings
can be represented in by the offset alone in this array: {100, 140}, rather
than storing the whole lsn.
Each bucket can also provide the mapping for the first and last VLSN it
covers, even if the lastVLSN is not divisible by the stride. This is done to
support forward and backward scanning. From the example above, the completed
bucket can provide 9->10/100, 13->10/140, 15 -> 10/160 even though 15 is not
a stride's worth away from 13.
Because registering a VLSN->LSN mapping is done outside the log write latch,
any inserts into the VLSNBucket may not be in order. However, when any
VLSN is registered, we can assume that all VLSNs < that value do exist in
the log. It's just an accident of timing that they haven't yet been
registered. Note that out of order inserts into the buckets can create holes
in the bucket's offset array, or cause the array to be shorter than
anticipated.
For example, if the insertion order into the bucket is vlsns 9, 15, we'll
actually only keep an offset array of size 1. We have to be able to handle
holes in the bucket, and can't count on filling them in when the lagging
vlsn arrives, because it is possible that a reading thread will access the
bucket before the laggard inserter arrives, or that the bucket might be
flushed to disk, and become immutable.