public class DataEraser extends StoppableThread implements EnvConfigObserver
RepParams.MIN_VLSN_INDEX_SIZE),
- when the files do not contain any VLSNs and therefore the VLSNIndex
cannot be truncated at any specific point (this is a corner case).
- A cycle is suspended because the JE Environment is closed and not
all files were erased. The cycle will be resumed at the next
Environment open, if the cycle end time has not passed.
- An incomplete cycle is resumed at startup. The cycle was incomplete
at the last Environment close, and its end time has not yet passed.
- An incomplete cycle cannot be resumed at startup because its end
time has now passed. A fresh cycle is then started.
All messages start with ERASER and include the cycle start/end times, the
files processed, and the files that are yet to be processed.
Stats
=====
Stat Group: Eraser
eraserCycleStart - Erasure cycle start time (UTC).
eraserCycleEnd - Erasure cycle end time (UTC).
eraserFilesRemaining - Number of files still to be processed in erasure
cycle.
eraserFilesErased - Number of files erased by overwriting obsolete entries.
eraserFilesDeleted - Number of reserved files deleted by the eraser.
eraserFilesAlreadyDeleted - Number of reserved files deleted coincidentally
by the cleaner.
eraserFSyncs - Number of fsyncs performed by the eraser.
eraserReads - Number of file reads performed by the eraser.
eraserReadBytes - Number of bytes read by the eraser.
eraserWrites - Number of file writes performed by the eraser.
eraserWriteBytes - Number of bytes written by the eraser.
Erasing data at the end of log
==============================
When writing stops, the user data at the tail end of the log may need
erasure at some point. This would happen if a table is dropped and then
there is no writing for a long time (or very little writing). This is a
corner case and not one we expect in production, but it can happen in
production and it certainly happens in tests.
Currently JE does not allow cleaning of any file in the recovery interval.
This is to ensure that recovery works, of course. The same restriction
probably applies to erasure. If we were to treat any slot referencing an
erased LSN as if the slot were deleted, we might get recovery to work.
But this would be complex to analyze and test thoroughly, especially for IN
replay. Therefore if we need to erase items in the recovery interval, we
would need to detect this situation and force a checkpoint before erasing.
In addition, to erase all entries at the end of the log means that the
VLSNIndex would need to be completely truncated, i.e., made empty. This
means the node could not be used as a feeder when functioning as a master,
and could not perform syncup when functioning as a replica. Rather than
emptying the VLSNIndex completely we could log a benign replicated entry
so there is at least one entry. But that doesn't address the broader
problem of syncup. Perhaps this doesn't matter when writing has stopped
for an extended period because the replicas will be up-to-date. And
perhaps network restore is fine for other unusual cases. But it is a risk.
Right now we always leave at least 1,000 VLSNs in the VLSNIndex to guard
against problems, but we don't really know what problems might arise. So
it would take some work to figure this out and test it.
Therefore, we simply do not erase obsolete data when it appears at the tail
end of the log. One way to explain this is to say that we can't erase
data at the very end of the transaction log, because this would prevent
recovery in certain situations. We do log a warning message in this
situation.
Aborting Erasure
================
Erasure of a file is aborted in the following cases.
- The file is needed for a backup or network restore. In both cases, it is
DbBackup.startBackup that aborts the erasure. The file aborted will then
be protected and won't be selected again for erasure until the backup or
network restore is finished.
- The extinction filter returns EXTINCT_MAYBE. This is due to a temporary
situation at startup time, while NoSQL DB (or another app) has not yet
initialized its metadata (table metadata in the case of NoSQL DB). It
will be retried repeatedly until this no longer occurs.
Discarded idea: We could add a way to know that the filter is fully
initialized and the eraser thread could delay starting until then. But
we would still have to abort if MAYBE_EXTINCT is returned after that
point. So for simplicity we just do the abort.
Reserved Files, VLSNIndex, Cleaning
===================================
- Reserved files are an exception. Because an erased file cannot be used for
replication, reserved files are deleted rather than erasing them.
Therefore, reserved files are never older than N*2 days.
- Before erasing a file covered by the VLSNIndex, we truncate the
VLSNIndex to remove the file from its range. Because the VLSNIndex range
never retreats (only advances), files protected by the VLSNIndex will
never have been erased.
- However, other files may be erased and subsequently become protected. This
is OK, because such protection only needs to guarantee that files are not
changed while they are protected. These include:
- Backup and network restore.
- DiskOrderedCursor and Database.count.
- Cleaning is not coordinated with erasure (except for the treatment of
reserved files discussed above). A given file may be erased and cleaned
concurrently, and this should not cause problems. This is a waste of
resources, but since it is unlikely we do not try to detect it or
optimize for it. Such coordination would add a lot of complexity.
Throttling
==========
Throttling is performed by estimating the total amount of work in the
cycle at the beginning of the cycle, dividing up the total cycle time
by this work amount, and throttling (waiting) at various points in
order to spread the work out fairly evenly over the cycle period.
The WorkThrottle class calculates the wait time based on work
done so far.
It is possible that we cannot complete the work within the cycle period,
for example, because a node is down for an extended period. In such
cases we do _not_ attempt to catch up by speeding up the rate of work,
since this could cause performance spikes. Instead we intentionally
overestimate the amount of work, leaving spare time to account for such
problems. In the end, if erasure of all selected files cannot be
completed by the end of the cycle, the cycle will be aborted and a new
cycle is started with a recalculated set of files. This is acceptable
behavior in unusual conditions.
Note: Such problems could be addressed differently by integrating
with the TaskCoordinator. In that case we would use a different
approach entirely: we would simply work at the maximum rate allowed by
the TaskCoordinator. But for this to work well, other JE components would
also need to be integrated with the TaskCoordinator. So for now we simply
perform work at a fixed rate within each cycle.
Before the cycle starts we have to open each file to get its creation
time, which has a cost. Throttling for that one time task is performed
separately in startCycle(). The remaining time in the cycle is also
allocated by startCycle(), which initializes cycleThrottle
and related fields. Work is divided as described below.
For each file to be erased there are several components of work:
1. We may have to read file, or parts of files, for two reasons that
are in addition to the file erasure process itself:
a. We may have to read the file to find its last VLSN, to determine
where to truncate the VLSNIndex. In the worst case scenario we have
to do this for every file, but it is much more likely that it will
only have to be done for a small fraction of the files, and only a
fraction (the end) of each file will normally be read. In addition,
each time we truncate the VLSNIndex we perform an fsync; however,
in the normal case we do this only once per cycle.
b. We read a file redundantly when erasure of a file is aborted and
restarted later. See Aborting Erasure above.
2. Read through the file and overwrite the type byte for each entry
that should be erased. We know the length of each file the read cost
is known. We don't know how many erasures will take place, but the
worst case is that every entry is erased.
Note that reserved files are simply deleted rather than erasing them
and this is cheaper than erasure. So when there are reserved files,
we will overestimate the amount of work.
3. Before any overwriting, at the time we determine that at least one entry
must be erased, touch the file and perform an fsync to ensure that the
lastModifiedTime is updated persistently. The fsync is assumed to be
expensive.
4. After overwriting all type bytes, perform a second fsync to make the
type changes persistent. The fsync is assumed to be expensive.
5. Overwrite the item in each entry that was erased. This cost is
unknown, but the worst case is that every entry is erased.
6. Perform the third and final fsync to make the erasure persistent. The
fsync is assumed to be expensive.
Work units are defined as follows.
- For component 1 we assign a work unit to each byte read (the file
length). This is a very large overestimate and is intended to
account for processing delays, such as when a node is down.
- For component 2 we also assign a work unit to each byte read (the file
length). The overwrite of type bytes is variable and included in this
cost for simplicity.
- For components 3, 4, 5 and 6 together we also assign the file length as
a very rough estimate, and this work is divided between these components
as follows:
18% for step 3
16% for step 4
50% for step 5
16% for step 6
Therefore the total amount of work is simply three times the length of the
files to be erased.
Other
=====
- To support network restore, before erasing a file we remove its cached
info (which includes a checksum) from the response cache in LogFileFeeder.
See RepImpl.clearedCachedFileChecksum(String).
- Prohibiting erasure of protected files prevents changes to the file while
it is being read by the protecting entity. Even so, if we read from the
log buffer cache or tip cache, we could also get different results than
reading directly from the file. To be safe, erasure could clear any
cached data for the file. But is this necessary? No, because reading
from the tip cache and log buffer cache is done only by feeders, and
the files read by feeders are protected from erasure.
- BtreeVerifier, when configured to read LNs, checks for LSN references to
erased entries since it simply reads the LNs via a cursor. If the LN has
been erased and it it is not extinct, the environment is invalidated as
usual.
- The checksum for the LOG_ERASED type cannot be verified, and its
LogEntryHeader.hasChecksum() method will return false. Because
an entry may be erased in the middle of checksum calculation, the header
may have to be re-read from disk in rare cases. See
ChecksumValidator.validate(long, long).
- The LOG_ERASED type is not counted as an LN or IN, which could throw off
utilization counts. This may only impact tests and debugging. A thorough
analysis of this issue has not been performed.Thread.State, Thread.UncaughtExceptionHandlerenvImplMAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY| Constructor and Description |
|---|
DataEraser(EnvironmentImpl envImpl) |
| Modifier and Type | Method and Description |
|---|---|
void |
abortErase(FileProtector.ProtectedFileSet fileSet)
Used to ensure that erasure of a file stops before coping that file
during a backup or network restore.
|
void |
envConfigUpdate(DbConfigManager configManager,
EnvironmentMutableConfig ignore)
Notifies the observer that one or more mutable properties have been
changed.
|
Logger |
getLogger() |
int |
initiateSoftShutdown()
Threads that use shutdownThread() must define this method.
|
boolean |
isEntryErased(long lsn)
Returns whether the log entry at the given LSN has been erased.
|
StatGroup |
loadStats(StatsConfig config) |
void |
run() |
void |
startThread() |
cleanup, getSavedShutdownException, getTotalCpuTime, getTotalUserTime, handleUncaughtException, isShutdown, saveShutdownException, shutdownDone, shutdownThreadactiveCount, checkAccess, clone, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yieldpublic DataEraser(EnvironmentImpl envImpl)
public void envConfigUpdate(DbConfigManager configManager, EnvironmentMutableConfig ignore)
EnvConfigObserverenvConfigUpdate in interface EnvConfigObserverpublic StatGroup loadStats(StatsConfig config)
public Logger getLogger()
getLogger in class StoppableThreadpublic int initiateSoftShutdown()
StoppableThreadinitiateSoftShutdown in class StoppableThreadpublic void startThread()
public void abortErase(FileProtector.ProtectedFileSet fileSet)
fileSet - erasure of the current file is aborted if the current
file is protected by this protected file set. The given fileSet must
be protected at the time this method is called.EraserAbortException - if we can't abort erasure of a target
file within EnvironmentParams.ERASE_ABORT_TIMEOUT. The
timeout is long, so this should not happen unless the eraser thread
is wedged or starved.public boolean isEntryErased(long lsn)
Copyright © 2024. All rights reserved.