- All Implemented Interfaces:
- EnvConfigObserver
public class Evictor
extends Object
implements EnvConfigObserver
Overview
--------
The Evictor is responsible for managing the JE cache. The cache is
actually a collection of in-memory btree nodes, implemented by the
com.sleepycat.je.dbi.INList class. A subset of the nodes in te INList
are candidates for eviction. This subset is tracked in one or more
LRULists, which are maintained by the Evictor. When a node is evicted,
it is detached from its containing BTree and then removed from the INList
and from its containing LRUList. Once all references to an evicted node
are removed, it can be GC'd by the JVM.
The Evictor owns a pool of threads that are available to handle eviction
tasks. The eviction pool is a standard java.util.concurrent thread pool,
and can be mutably configured in terms of core threads, max threads, and
keepalive times.
Eviction is carried out by three types of threads:
1. An application thread, in the course of doing critical eviction.
2. Daemon threads, such as the cleaner or INCompressor, in the course of
doing their respective duties.
3. Eviction pool threads.
Memory consumption is tracked by the MemoryBudget. The Arbiter, which is
also owned by the Evictor, is used to query the MemoryBudget and determine
whether eviction is actually needed, and if so, how many bytes should be
evicted by an evicting thread.
Multiple threads can do eviction concurrently. As a result, it's important
that eviction is both thread safe and as parallel as possible. Memory
thresholds are generally accounted for in an unsynchronized fashion, and are
seen as advisory. The only point of true synchronization is around the
selection of a node for eviction. The act of eviction itself can be done
concurrently.
The eviction method is not reentrant, and a simple concurrent hash map
of threads is used to prevent recursive calls.
Details on the implementation of the LRU-based eviction policy
--------------------------------------------------------------
------------------
Data structures
------------------
An LRU eviction policy is approximated by one or more LRULists. An LRUList
is a doubly linked list consisting of BTree nodes. If a node participates
in an LRUList, then whenever it is accessed, it moves to the "back" of the
list. When eviction is needed, the evictor evicts the nodes at the "front"
of the LRULists.
An LRUList is implemented as 2 IN references: a "front" ref pointing to the
IN at the front of the list and a "back" ref, pointing to the IN at the back
of the list. In addition, each IN has "nextLRUNode" and "prevLRUNode" refs
for participating in an LRUList. This implementation works because an IN can
belong to at most 1 LRUList at a time. Furthermore, it is the responsibility
of the Evictor to know which LRUList a node belongs to at any given time
(more on this below). As a result, each LRUList can assume that a node will
either not be in any list at all, or will belong to "this" list. This way,
membership of a node to an LRUList can be tested by just checking that
either the nextLRUNode or prevLRUNode field of the node is non-null.
The operations on an LRUList are:
- addBack(IN) :
Insert an IN at the back of the list. Assert that the node does not belong
to an LRUList already.
- addFront(IN) :
Insert an IN at the front of the list. Assert that the node does not belong
to an LRUList already.
- moveBack(IN) :
Move an IN to the back of the list, if it is in the list already. Noop
if the node is not in the list.
- moveFront(IN) :
Move an IN to the front of the list, if it is in the list already. Noop
if the node is not in the list.
- removeFront() :
Remove the IN at the front of the list and return it to the caller.
Return null if the list is empty.
- remove(IN) :
Remove the IN from the list, if it is there. Return true if the node was
in the list, false otherwise.
- contains(IN):
Return true if the node is contained in the list, false otherwise.
All of the above methods are synchronized on the LRUList object. This may
create a synchronization bottleneck. To alleviate this, the Evictor uses
multiple LRULists, which taken together comprise a logical LRU list, called
an LRUSet. The number of LRULists per LRUSet (numLRULists) is fixed and
determined by a config parameter (max of 64). The LRULists are stored in
an array whose length is numLRULists.
The Evictor actually maintains 2 LRUSets: priority-1 and priority-2.
Within an LRUSet, the nodeId is used to place a node to an LRUList: a
node with id N goes to the (N % numLRULists)-th list. In addition, each
node has a flag (isInPri2LRU) to identify which LRUSet it belongs to.
This way, the Evictor knows which LRUList a node should belong to, and
accesses the appropriate LRUList instance when it needs to add/remove/move
a node within the LRU.
Access to the isInPri2LRU flag is synchronized via the SH/EX node latch.
When there is no off-heap cache configured, the priority-1 LRU is the
"mixed" one and the priority-2 LRU is the "dirty" one. When there is an
off-heap cache configured, the priority-1 LRU is the "normal" one and the
priority-2 LRU is the "level-2" one.
Justification for the mixed and dirty LRUSets: We would like to keep dirty
INs in memory as much as possible to achieve "write absorption". Ideally,
dirty INs should be logged by the checkpointer only. So, we would like to
have the option in the Evictor to chose a clean IN to evict over a dirty
IN, even if the dirty IN is colder than the clean IN. In this mode, having
a single LRUSet will not perform very well in the situation when most (or
a lot) or the INs are dirty (because each time we get a dirty IN from an
LRUList, we would have to put it back to the list and try another IN until
we find a clean one, thus spending a lot of CPU time trying to select an
eviction target).
Justification for the normal and level-2 LRUSets: With an off-heap cache,
if level-2 INs were not treated specially, the main cache evictor may run
out of space and (according to LRU) evict a level 2 IN, even though the IN
references off-heap BINs (which will also be evicted). The problem is that
we really don't want to evict the off-heap BINs (or their LNs) when the
off-heap cache is not full. Therefore we only evict level-2 INs with
off-heap children when there are no other nodes that can be evicted. A
level-2 IN is moved to the priority-2 LRUSet when it is encountered by the
evictor in the priority-1 LRUSet.
Within each LRUSet, picking an LRUList to evict from is done in a round-
robin fashion. To this end, the Evictor maintains 2 int counters:
nextPri1LRUList and nextPri2LRUList. To evict from the priority-1 LRUSet, an
evicting thread picks the (nextPri1LRUList % numLRULists)-th list, and
then increments nextPri1LRUList. Similarly, to evict from the priority-2
LRUSet, an evicting thread picks the (nextPri2LRUList % numLRULists)-th
list, and then increments nextPri2LRUList. This does not have to be done in
a synchronized way.
A new flag (called hasCachedChildren) is added to each IN to indicate
whether the IN has cached children or not. This flag is used and maintained
for upper INs (UINs) only. The need for this flag is explained below.
Access to this flag is synchronized via the SH/EX node latch.
---------------------------------------------------------------------------
LRUSet management: adding/removing/moving INs in/out of/within the LRUSets
---------------------------------------------------------------------------
We don't want to track upper IN (UIN) nodes that have cached children.
There are 2 reasons for this: (a) we cannot evict UINs with cached children
(the children must be evicted first) and (b) UINs will normally have high
access rate, and would add a lot of CPU overhead if they were tracked.
The hasCachedChildren flag is used as a quick way to determine whether a
UIN has cached children or not.
Adding a node to the LRU.
-------------------------
A IN N is added in an LRUSet via one of the following Evictor methods:
addBack(IN), addFront(IN), pri2AddBack(IN), or pri2AddFront(IN). The
first 2 add the node to the priority-1 LRUSet and set its isInPri2LRU flag
to false. The last 2 add the node to the priority-2 LRUSet and set its
isInPri2LRU flag to true.
Note: DINs and DBINs are never added to the LRU.
A node N is added to the LRU in the following situations:
1. N is fetched into memory from the log. Evictor.addBack(N) is called
inside IN.postfetchInit() (just before N is connected to its parent).
2. N is a brand new node created during a split, and either N is a BIN or
N does not get any cached children from its split sibling.
Evictor.addFront(N) is called if N is a BIN and the cachemode is
MAKE_COLD or EVICT_BIN. Otherwise, Evictor.addBack(child) is called.
3. N is a UIN that is being split, and before the split it had cached
children, but all its cached children have now moved to its newly
created sibling. Evictor.addBack(N) is called in this case.
4. N is a UIN that looses its last cached child (either because the child is
evicted or it is deleted). Evictor.addBack(N) is called inside
IN.setTarget(), if the target is null, N is a UIN, N's hasCachedChildren
flag is true, and N after setting the target to null, N has no remaining
cached children.
5. N is the 1st BIN in a brand new tree. In this case, Evictor.addBack(N)
is called inside Tree.findBinForInsert().
6. N is a node visited during IN.rebuildINList() and N is either a BIN or
a UIN with no cached children.
7. An evicting thread T removes N from the LRU, but after T EX-latches N,
it determines that N is not evictable or should not be evicted, and
should be put back in the LRU. T puts N back to the LRU using one of
the above 4 methods (for details, read about the eviction processing
below), but ONLY IF (a) N is still in the INList, and (b) N is not in
the LRU already.
Case (b) can happen if N is a UIN and after T removed N from the LRU
but before T could latch N, another thread T1 added a child to N and
removed that child. Thus, by item 4 above, T1 adds N back to the LRU.
Furthermore, since N is now back in the LRU, case (a) can now happen
as well if another thread can evict N before T latches it.
8. When the checkpointer (or any other thread/operation) cleans a dirty IN,
it must move it from the priority-2 LRUSet (if there) to the priority-1
one. This is done via the Evictor.moveToPri1LRU(N) method: If the
isInPri2LRU flag of N is true, LRUList.remove(N) is called to remove
the node from the priority-2 LRUSet. If N was indeed in the priority-2
LRUSet (i.e., LRUList.remove() returns true), addBack(N) is called to
put it in the priority-1 LRUSet.
By moving N to the priority-1 LRUSet only after atomically removing it
from the priority-2 LRUSet and checking that it was indeed there, we
prevent N from being added into the LRU if N has been or would be removed
from the LRU by a concurrently running evicting thread.
In cases 2, 3, 4, 5, 7, and 8 N is EX-latched. In case 1, the node is not
latched, but it is inaccessible by any other threads because it is not
connected to its parent yet and the parent is EX-latched (but N has already
been inserted in the INList; can this create any problems ?????). In case
6 there is only one thread running. So, in all cases it's ok to set the
isInPri2LRU flag of the node.
Question: can a thread T try to add a node N, seen as a Java obj instance,
into the LRU, while N is already there? I believe not, and LRUList addBack()
and addFront() methods assert that this cannot happen. In cases 1, 2, and 5
above N is newly created node, so it cannot be in the LRU already. In cases
3 and 4, N is a UIN that has cached children, so it cannot be in the LRU.
In case 6 there is only 1 thread. Finally, in cases 7 and 8, T checks that
N is not in the LRU before attempting to add it (and the situation cannot
change between tis check and the insertion into the LRU because N is EX-
latched).
Question: can a thread T try to add a node N, seen as a logical entity
represented by its nodeId, into the LRU, while N is already there?
Specifically, (a) can two Java instances, N1 and N2, of the same node
N exist in memory at the same time, and (b) while N1 is in the LRU, can
a thread T try to add N2 in the LRU? The answer to (a) is "yes", and as
far as I can think, the answer to (b) is "no", but there is no explicit
check in the code for this. Consider the following sequence of events:
Initially only N1 is in memory and in the LRU. An evicting thread T1
removes N1 from the LRU, thread T2 adds N1 in the LRU, thread T3 removes
N1 from the LRU and actually evicts it, thread T4 fetches N from the log,
thus creating instance N2 and adding N2 to the LRU, thread T1 finally
EX-latches N1 and has to decide what to do with it. The check in case
7a above makes sure that N1 will not go back to the LRU. In fact the
same check makes sure that N1 will not be evicted (i.e., logged, if
dirty). T1 will just skip N1, thus allowing it to be GCed.
Removing a node from the LRU
----------------------------
A node is removed from the LRU when it is selected as an eviction target
by an evicting thread. The thread chooses an LRUList list to evict from
and calls removeFront() on it. The node is not latched when it is removed
from the LRU in this case. The evicting thread is going to EX-latch the
node shortly after the removal. But as explain already earlier, between
the removal and the latching, another thread may put the node back to the
LRU, and as a result, another thread may also choose the same node for
eviction. The node may also be detached from the BTree, or its database
closed, or deleted.
A node may also be removing from the LRU by a non-evicting thread. This
is done via the Evictor.remove(IN) method. The method checks the node's
isInDrtryLRU flag to determine which LRUSet the node belongs to (if any)
and then calls LRUList.remove(N). The node must be at least SH latched
when the method is called. The method is a noop if the node is not in the
LRU. The node may not belong to any LRUList, because it has been selected
for eviction by another thread (and thus removed from LRU), but the
evicting thread has not yet latched the node. There are 3 cases (listed
below) where Evictor.remove(N) is called. In the first two cases
Evictor.remove(N) is invoked from INList.removeInternal(N). This makes
sure that N is removed from the LRU whenever it it removed from the
INList (to guarantee that the nodes in the LRU are always a subset of
the nodes in the INList).
1. When a tree branch containing N gets detached from its tree. In this
case, INList.remove(N) is invoked inside accountForSubtreeRemoval() or
accountForDeferredWriteSubtreeRemoval().
2. When the database containing N gets deleted or truncated. In this case,
INList.iter.remove() is called via DatabaseImpl.startDbExtinction().
3. N is a UIN with no cached children (hasCachedChildren flag is false)
and a new child for N is fetched. The call to Evictor.remove(N) is
done inside IN.setTarget().
Moving a node within the LRU
----------------------------
A node N is moved within its containing LRUList (if any) via the Evictor
moveBack(IN) and moveFront(IN) methods. The methods check the isInPri2LRU
flag of the node to determine the LRUSet the node belongs to and then move
the node to the back or to the front of the LRUList. The node will be at
least SH latched when these methods are called. Normally, the IN will be
in an LRUList. However, it may not belong to any LRUList, because it has
been selected for eviction by another thread (and thus removed from LRU),
but the evicting thread has not yet EX-latched the node. In this case,
these methods are is a noop. The methods are called in the following
situations:
1. N is latched with cachemode DEFAULT, KEEP_HOT, or EVICT_LN and N is a
BIN or a UIN with no cached children (the hasCachedChildren flag is
used to check if the UIN has cached children, so we don't need to
iterate over all of the node's child entries). In this case,
Evictor.moveBack(N) .
2. N is latched with cachemode MAKE_COLD or EVICT_BIN and N is a BIN.
In this case, Evictor.moveFront(N) is called.
-------------------
Eviction Processing
-------------------
A thread can initiate eviction by invoking the Evictor.doEviction() method.
This method implements an "eviction run". An eviction run consists of a
number of "eviction passes", where each pass is given as input a maximum
number of bytes to evict. An eviction pass is implemented by the
Evictor.evictBatch() method.
Inside Evictor.evictBatch(), an evicting thread T:
1. Picks the priority-1 LRUset initially as the "current" LRUSet to be
processed,
2. Initializes the max number of nodes to be processed per LRUSet to the
current size of the priority-1 LRUSet,
3. Executes the following loop:
3.1. Picks a non-empty LRUList from the current LRUSet in a round-robin
fashion, as explained earlier, and invokes LRUList.removeFront() to
remove the node N at the front of the list. N becomes the current
eviction target.
3.2. If the DB node N belongs to has been deleted or closed, skips this node,
i.e., leaves N outside the LRU and goes to 3.4.
3.3. Calls ProcessTarget(N) (see below)
3.4. If the current LRUset is the priority-1 one and the number of target nodes
processed reaches the max number allowed, the priority-2 LRUSet becomes
the current one, the max number of nodes to be processed per LRUSet is
set to the current size of the priority-2 LRUSet, and the number of
nodes processed is reset to 0.
3.5. Breaks the loop if the max number of bytes to evict during this pass
has been reached, or memConsumption is less than (maxMemory - M) (where
M is a config param), or the number of nodes that have been processed
in the current LRUSet reaches the max allowed.
--------------------------
The processTarget() method
--------------------------
This method is called after a node N has been selected for eviction (and as
result, removed from the LRU). The method EX-latches N and determines
whether it can/should really be evicted, and if not what is the appropriate
action to be taken by the evicting thread. Before returning, the method
unlatches N. Finally, it returns the number of bytes evicted (if any).
If a decision is taken to evict N or mutate it to a BINDelta, N must first
be unlatched and its parent must be searched within the tree. During this
search, many things can happen to the unlatched N, and as a result, after
the parent is found and the N is relatched, processTarget() calls itself
recursively to re-consider all the possible actions for N.
Let T be an evicting thread running processTarget() to determine what to do
with a target node N. The following is the list of possible outcomes:
1. SKIP - Do nothing with N if:
(a) N is in the LRU. This can happen if N is a UIN and while it is
unlatched by T, other threads fetch one or more of N's children,
but then all of N's children are removed again, thus causing N to
be put back to the LRU.
(b) N is not in the INList. Given than N can be put back to the LRU while
it is unlatched by T, it can also be selected as an eviction target
by another thread and actually be evicted.
(c) N is a UIN with cached children. N could have acquired children
after the evicting thread removed it from the LRU, but before the
evicting thread could EX-latch it.
(d) N is the root of the DB naming tree or the DBmapping tree.
(e) N is dirty, but the DB is read-only.
(f) N's environment used a shared cache and the environment has been
closed or invalidated.
(g) If a decision was taken to evict od mutate N, but the tree search
(using N's keyId) to find N's parent, failed to find the parent, or
N itself. This can happen if during the search, N was evicted by
another thread, or a branch containing N was completely removed
from the tree.
2. PUT BACK - Put N to the back of the LRUSet it last belonged to, if:
(a) It is a BIN that was last accessed with KEEP_HOT cache mode.
(b) N has an entry with a NULL LSN and a null target.
3. PARTIAL EVICT - perform partial eviction on N, if none of the cases
listed above is true. Currently, partial eviction applies to BINs only
and involves the eviction (stripping) of evictable LNs. If a cached LN
is not evictable, the whole BIN is not evictable as well. Currently,
only MapLNs may be non-evictable (see MapLN.isEvictable()).
After partial eviction is performed the following outcomes are possible:
4. STRIPPED PUT BACK - Put N to the back of the LRUSet it last belonged to,
if partial eviction did evict any bytes, and N is not a BIN in EVICT_BIN
or MAKE_COLD cache mode.
5. PUT BACK - Put N to the back of the LRUSet it last belonged to, if
no bytes were stripped, but partial eviction determined that N is not
evictable.
6. MUTATE - Mutate N to a BINDelta, if none of the above apply and N is a
BIN that can be mutated.
7. MOVE DIRTY TO PRI-2 LRU - Move N to the front of the priority-2 LRUSet,
if none of the above apply and N is a dirty node that last belonged to
the priority-1 LRUSet, and a dirty LRUSet is used (meaning that no
off-heap cache is configured).
8. MOVE LEVEL-2 TO PRI-2 LRU - Move N to the front of the priority-2 LRUSet,
if none of the above apply and N is a level-2 node with off-heap BINs
that last belonged to the priority-1 LRUSet.
9. EVICT - Evict N is none of the above apply.
-------
TODO:
-------
1. Decide what to do about assertions (keep, remove, convert to JE
exceptions, convert to DEBUG-only expensive checks).