public class Log extends Object implements AutoCloseable
The log is the primary vehicle for storing state changes and managing replication in Raft. The log is used to verify consistency between members and manage cluster configurations, client sessions, state machine operations, and other tasks.
State changes are written to the log as Entry objects. Each entry is associated with an index and
term. The index is a 1-based entry index from the start of the log. The term is used for
various consistency checks in the Raft algorithm. Raft guarantees that a committed entry at any
index i that has term t will also be present in the logs on all other servers in the cluster at the
same index i with term t. However, note that log compaction may break this contract. Considering log
compaction, it's more accurate to say that iff committed entry i is present in another server's log, that
entry has term t and the same value.
Entries are written to the log via the append(Entry) method. When an entry is appended, it's written to the
next sequential index in the log after lastIndex(). Entries can be created from a typed entry pool with the
create(Class) method.
long index;
try (CommandEntry entry = log.create(CommandEntry.class)) {
entry.setTerm(2)
.setSequence(5)
.setCommand(new PutCommand());
index = log.append(entry);
}
entries are appended to Segments in the log. Segments are individual file or memory based
groups of sequential entries. Each segment has a fixed capacity in terms of either number of entries or size in
bytes. Once the capacity of a segment has been reached, the log rolls over to a new segment for the next entry that's
appended.
Internally, each segment maintains an in-memory index of entries. The index stores the offset and position of each
entry within the segment's internal Buffer. For entries that are appended to the
log sequentially, the index has an O(1) lookup time. For instances where entries in a segment have been skipped (due
to log compaction), the lookup time is O(log n) due to binary search. However, due to the nature of the Raft
consensus algorithm, readers should typically benefit from O(1) lookups.
In order to prevent exhausting disk space, the log manages a set of background threads that periodically rewrite and
combine segments to free disk space. This is known as log compaction. As entries are committed to the log and applied
to the Raft state machine as Commit objects, state machines release(long)
entries that no longer apply to the state machine state. Internally, each log Segment maintains a compact
BitArray to track the liveness of entries. When an entry is released, the entry's
offset is set in the bit array for the associated segment. The bit array represents the state of entries waiting to
be compacted from the log.
As entries are written to the log, segments reach their capacity and the log rolls over into new segments. Once a
segment is full and all of its entries have been committed, indicating they cannot be removed,
the segment becomes eligible for compaction. Log compaction processes come in two forms:
Compaction.MINOR and
Compaction.MAJOR, which can be configured in the Storage
configuration. Minor and major compaction serve to remove normal entries and tombstones from the log respectively.
Minor compaction is the more frequent and lightweight process. Periodically, according to the configured
Storage.minorCompactionInterval(), a background thread will evaluate the log for minor compaction. The minor
compaction process iterates through segments and selects compactable segments based on the ratio of entries that have
been released. Minor compaction is generational. The
MinorCompactionManager is more likely to select segments that haven't
yet been compacted than ones that have. Once a set of segments have been compacted, for each segment a
MinorCompactionTask rewrites the segment without released entries.
This rewriting results in a segment with missing entries, and Copycat's Raft implementation accounts for that. For
instance, a segment with entries {1, 2, 3} can become {1, 3} after being released, and any attempt to
read entry 2 will result in a null entry.
However, note that minor compaction only applies to non-tombstone entries. Tombstones are entries that represent the removal of state from the system, and that requires a more careful and costly compaction process to ensure consistency in the event of a failure. Consider a state machine with the following two commands in the log:
put 1remove 11, and the second command is written to segment 2,
compacting segment 2 (minor compaction may compact segments in any order) without removing the first command
from segment 1 would effectively result in the undoing of the remove 1 command. If the remove 1
command is removed from the log before put 1, a restart and replay of the log will result in the application of
put 1 to the state machine, but not remove 1, thus resulting in an inconsistent state machine state.
As entries are removed from the log during minor and major compaction, log segment files begin to shrink. Copycat does not want to have a thousand file pointers open, so some mechanism is required to combine segments as disk space is freed. To that end, as the major compaction process iterates through the set of committed segments and rewrites live entries, it combines multiple segments up to the configured segment capacity. When a segment becomes full during major compaction, the compaction process rolls over to a new segment and continues compaction. This results in a significantly smaller number of files.
| Modifier and Type | Method and Description |
|---|---|
long |
append(io.atomix.copycat.server.storage.entry.Entry entry)
Appends an entry to the log.
|
void |
close()
Closes the log.
|
Log |
commit(long index)
Commits entries up to the given index to the log.
|
Compactor |
compactor()
Returns the log compactor.
|
boolean |
contains(long index)
Returns a boolean value indicating whether the log contains a live entry at the given index.
|
<T extends io.atomix.copycat.server.storage.entry.Entry<T>> |
create(Class<T> type)
Creates a new log entry.
|
long |
firstIndex()
Returns the log's current first index.
|
void |
flush()
Flushes the log to disk.
|
<T extends io.atomix.copycat.server.storage.entry.Entry> |
get(long index)
Gets an entry from the log at the given index.
|
boolean |
isClosed()
Returns a boolean value indicating whether the log is closed.
|
boolean |
isEmpty()
Returns a boolean value indicating whether the log is empty.
|
boolean |
isOpen()
Returns a boolean value indicating whether the log is open.
|
long |
lastIndex()
Returns the index of the last entry in the log.
|
long |
length()
Returns the number of entries in the log.
|
long |
nextIndex()
Returns the next index in the log.
|
Log |
release(long index)
Releases the entry at the given index.
|
Serializer |
serializer()
Returns the log entry serializer.
|
long |
size()
Returns the total size of all
segments of the log on disk in bytes. |
Log |
skip(long entries)
Skips the given number of entries.
|
long |
term(long index)
Returns the term for the entry at the given index.
|
String |
toString() |
Log |
truncate()
Truncates the log.
|
Log |
truncate(long index)
Truncates the log up to the given index.
|
public Compactor compactor()
public Serializer serializer()
public boolean isOpen()
public boolean isEmpty()
IllegalStateException - If the log is not open.public long size()
segments of the log on disk in bytes.segments of the log in bytes.IllegalStateException - If the log is not open.public long length()
The length is the total number of entries represented by the log on disk. This includes entries
that have been compacted from the log. So, in that sense, the length represents the total range of indexes.
IllegalStateException - If the log is not open.public long firstIndex()
If no entries have been written to the log then the first index will be 0. If the log contains entries then
the first index will be 1.
0 if the log is empty.IllegalStateException - If the log is not open.public long lastIndex()
If no entries have been written to the log then the last index will be 0.
0 if the log is empty.IllegalStateException - If the log is not open.public long nextIndex()
public <T extends io.atomix.copycat.server.storage.entry.Entry<T>> T create(Class<T> type)
Users should ensure that the returned Entry is closed once the write is complete. Closing the entry will
result in its contents being persisted to the log. Only a single Entry instance may be open via the this
method at any given time.
type - The entry type.IllegalStateException - If the log is not openNullPointerException - If the type is nullpublic long append(io.atomix.copycat.server.storage.entry.Entry entry)
entry - The entry to append.IllegalStateException - If the log is not openNullPointerException - If entry is nullIndexOutOfBoundsException - If the entry's index does not match the expected next log index.public long term(long index)
This method provides a more efficient means of reading an entry term without deserializing the entire entry. Servers should use this method when performing consistency checks that don't require reading the full entry object. Terms can typically be read in O(1) time with no disk access on segments that haven't been compacted.
If the given index is outside of the bounds of the log then a IndexOutOfBoundsException will be thrown. If
the entry at the given index has been compacted then the returned entry will be null.
index - The index for which to return the term.public <T extends io.atomix.copycat.server.storage.entry.Entry> T get(long index)
If the given index is outside of the bounds of the log then a IndexOutOfBoundsException will be thrown. If
the entry at the given index has been compacted then the returned entry will be null.
Entries returned by this method are pooled and reference counted.
In order to ensure the entry is released back to the internal entry pool call Entry.close() or load the
entry in a try-with-resources statement.
try (RaftEntry entry = log.get(123)) {
// Do some stuff...
}
index - The index of the entry to get.null if the entry doesn't exist.IllegalStateException - If the log is not open.IndexOutOfBoundsException - If the given index is not within the bounds of the log.public boolean contains(long index)
index - The index to check.IllegalStateException - If the log is not open.public Log release(long index)
index - The index of the entry to release.IllegalStateException - If the log is not open.IndexOutOfBoundsException - If the given index is not within the bounds of the log.public Log commit(long index)
index - The index up to which to commit entries.IllegalStateException - If the log is not open.public Log skip(long entries)
This method essentially advances the log's lastIndex() without writing any entries at the interim
indices. Note that calling Loggable#truncate() after skip() will result in the skipped entries
being partially or completely reverted.
entries - The number of entries to skip.IllegalStateException - If the log is not open.IllegalArgumentException - If the number of entries is less than 1IndexOutOfBoundsException - If skipping the given number of entries places the index out of the bounds of the
log.public Log truncate()
IllegalStateException - If the log is not open.public Log truncate(long index)
index - The index at which to truncate the log.IllegalStateException - If the log is not open.IndexOutOfBoundsException - If the given index is not within the bounds of the log.public void flush()
IllegalStateException - If the log is not open.public void close()
close in interface AutoCloseableIllegalStateException - If the log is not open.public boolean isClosed()
Copyright © 2013–2016. All rights reserved.