public final class MajorCompactionTask extends Object implements CompactionTask
Segments to reclaim disk space.
Major compaction is a more heavyweight compaction task which is responsible both for removing tombstone
entries from the log and combining groups of neighboring log Segments together.
Combining segments
As entries are written to the log and the log rolls over to new segments, entries are compacted out of individual
segments by MinorCompactionTasks. However, the minor compaction process only rewrites individual segments
and doesn't combine them. This will result in an ever growing number of open file pointers. During major compaction,
the major compaction task rewrites groups of segments provided by the MajorCompactionManager. For each group
of segments, a single compact segment will be created with the same version and starting index as
the first segment in the group. All entries from all segments in the group that haven't been
released will then be written to the new compact segment.
Once the rewrite is complete, the compact segment will be locked and the set of old segments deleted.
Removing tombstones
Tombstones are entries in the log which amount to state changes that remove state. That is,
tombstones are an indicator that some set of prior entries no longer contribute to the state of the system. Thus,
it is critical that tombstones remain in the log as long as any prior related entries do. If a tombstone is removed
from the log before its prior related entries, rebuilding state from the log will result in inconsistencies.
A significant objective of the major compaction task is to remove tombstones from the log in a manor that ensures
failures before, during, or after the compaction task will not result in inconsistencies when state is rebuilt from
the log. In order to ensure tombstones are removed only after any prior related entries, the major compaction
task simply compacts segments in sequential order from the Segment.firstIndex() of the first segment to the
Segment.lastIndex() of the last segment. This ensures that if a failure occurs during the compaction process,
only entries earlier in the log will have been removed, and potential tombstones which erase the state of those entries
will remain.
Nevertheless, there are some significant potential race conditions that must be considered in the implementation of
major compaction. The major compaction task assumes that state machines will always release related entries
in monotonically increasing order. That is, if a state machines receives a Commit
remove 1 that deletes the state of a prior Commit set 1, the state machine will call
Commit.close() on the set 1 commit before releasing the remove 1 commit. But even if applications
release entries from the log in monotonic order, and the major compaction task compacts segments in sequential order,
inconsistencies can still arise. Consider the following history:
set 1 is at index 1 in segment 1remove 1 is at index 12345 in segment 81set 1 at index 1 in the rewritten version of segment 1remove 1 at index 12345 in segment 8, which the compaction task
has yet to compact2 through 8, removing tombstone entry 12345 during
the process
In the scenario above, the resulting log contains set 1 but not remove 1. If we replayed those entries
as Commits to the log, it would result in an inconsistent state. Worse yet, not only is this server's state
incorrect, but it will be inconsistent with other servers which are likely to have correctly removed both entry
1 and entry 12345 during major compaction.
In order to prevent such a scenario from occurring, the major compaction task takes an immutable snapshot of the state of offsets underlying all the segments to be compacted prior to rewriting any entries. This ensures that any entries released after the start of rewriting segments will not be considered for compaction during the execution of this task.
Copyright © 2013–2016. All rights reserved.