public class ImmutableExternalPrefixMap extends AbstractPrefixMap implements Serializable
ImmutableExternalPrefixMap,
Serialized Form| Modifier and Type | Field and Description |
|---|---|
protected long[][] |
blockOffset
A big array array parallel to
blockStart giving the offset in blocks in the dump file
of the corresponding word in blockStart. |
protected long |
blockSize
The block size of this (in bits).
|
protected long[][] |
blockStart
The index of the first word in each block, plus an additional entry containing
size. |
protected Char2IntOpenHashMap |
char2symbol
A map from characters to symbols of the coder.
|
protected Decoder |
decoder
A decoder used to read data from the dump stream.
|
protected InputBitStream |
dumpStream
A reference to the dump stream.
|
protected ImmutableBinaryTrie<CharSequence> |
intervalApproximator
The in-memory data structure used to approximate intervals..
|
protected boolean |
iteratorIsUsable
If true, the creation of the last
DumpStreamIterator was not
followed by a call to any get method. |
protected boolean |
selfContained
Whether this map is self-contained.
|
static long |
serialVersionUID |
protected int |
size
The number of terms in this map.
|
static int |
STD_BLOCK_SIZE
The standard block size (in bytes).
|
protected char[] |
symbol2char
A map (given by an array) from symbols in the coder to characters.
|
list, prefixMap, rangeMapdefRetValue| Constructor and Description |
|---|
ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms)
Creates an external prefix map with block size
STD_BLOCK_SIZE. |
ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms,
CharSequence dumpStreamFilename)
Creates an external prefix map with block size
STD_BLOCK_SIZE and specified dump stream. |
ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms,
int blockSizeInBytes)
Creates an external prefix map with specified block size.
|
ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms,
int blockSizeInBytes,
CharSequence dumpStreamFilename)
Creates an external prefix map with specified block size and dump stream.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
containsKey(Object term) |
LongInterval |
getInterval(CharSequence prefix)
Returns the range of strings having a given prefix.
|
long |
getLong(Object o) |
protected MutableString |
getTerm(long index,
MutableString s)
Writes a string specified by index into a
MutableString. |
ObjectIterator<CharSequence> |
iterator()
Returns an iterator over the map.
|
static void |
main(String[] arg) |
void |
setDumpStream(CharSequence dumpStreamFilename)
Sets the dump stream of this external prefix map to a given filename.
|
void |
setDumpStream(InputBitStream dumpStream)
Sets the dump stream of this external prefix map to a given input bit stream.
|
int |
size() |
long |
size64() |
list, prefixMap, rangeMapclear, defaultReturnValue, defaultReturnValue, get, put, put, remove, removeLongclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdefaultReturnValue, defaultReturnValue, put, removeLongpublic static final long serialVersionUID
public static final int STD_BLOCK_SIZE
protected final ImmutableBinaryTrie<CharSequence> intervalApproximator
protected final long blockSize
protected final Decoder decoder
protected final char[] symbol2char
protected final Char2IntOpenHashMap char2symbol
protected final int size
protected final long[][] blockStart
size.protected final long[][] blockOffset
blockStart giving the offset in blocks in the dump file
of the corresponding word in blockStart. If there are no overflows, this will just
be an initial segment of the natural numbers, but overflows cause jumps.protected final boolean selfContained
protected transient boolean iteratorIsUsable
DumpStreamIterator was not
followed by a call to any get method.protected transient InputBitStream dumpStream
public ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms, int blockSizeInBytes, CharSequence dumpStreamFilename) throws IOException
This constructor does not assume that CharSequence instances returned by terms.iterator()
will be distinct. Thus, it can be safely used with FileLinesCollection.
terms - an iterable whose iterator will enumerate in lexicographical order the terms for the map.blockSizeInBytes - the block size (in bytes).dumpStreamFilename - the name of the dump stream, or null for a self-contained map.IOExceptionpublic ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms, CharSequence dumpStreamFilename) throws IOException
STD_BLOCK_SIZE and specified dump stream.
This constructor does not assume that CharSequence instances returned by terms.iterator()
will be distinct. Thus, it can be safely used with FileLinesCollection.
terms - a collection whose iterator will enumerate in lexicographical order the terms for the map.dumpStreamFilename - the name of the dump stream, or null for a self-contained map.IOExceptionpublic ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms, int blockSizeInBytes) throws IOException
This constructor does not assume that CharSequence instances returned by terms.iterator()
will be distinct. Thus, it can be safely used with FileLinesCollection.
blockSizeInBytes - the block size (in bytes).terms - a collection whose iterator will enumerate in lexicographical order the terms for the map.IOExceptionpublic ImmutableExternalPrefixMap(Iterable<? extends CharSequence> terms) throws IOException
STD_BLOCK_SIZE.
This constructor does not assume that strings returned by terms.iterator()
will be distinct. Thus, it can be safely used with FileLinesCollection.
terms - a collection whose iterator will enumerate in lexicographical order the terms for the map.IOExceptionpublic void setDumpStream(CharSequence dumpStreamFilename) throws FileNotFoundException
This method sets the dump file used by this map, and should be only called after deserialisation, providing exactly the file generated at creation time. Essentially anything can happen if you do not follow the rules.
Note that this method will attempt to close the old stream, if present.
dumpStreamFilename - the name of the dump file.FileNotFoundExceptionsetDumpStream(InputBitStream)public void setDumpStream(InputBitStream dumpStream)
This method sets the dump file used by this map, and should be only called after deserialisation, providing a repositionable stream containing exactly the file generated at creation time. Essentially anything can happen if you do not follow the rules.
Using this method you can load an external prefix map in core memory, enjoying the compactness of the data structure, but getting much more speed.
Note that this method will attemp to close the old stream, if present.
dumpStream - a repositionable input bit stream containing exactly the dump stream generated
at creation time.setDumpStream(CharSequence)public LongInterval getInterval(CharSequence prefix)
AbstractPrefixMapgetInterval in class AbstractPrefixMapprefix - a prefix.protected MutableString getTerm(long index, MutableString s)
AbstractPrefixMapMutableString.getTerm in class AbstractPrefixMapindex - the index of a string.s - a mutable string.string.public boolean containsKey(Object term)
containsKey in interface Function<CharSequence,Long>public long getLong(Object o)
getLong in interface Object2LongFunction<CharSequence>public ObjectIterator<CharSequence> iterator()
The iterator returned by this method scans directly the dump stream.
Note that the returned iterator uses the same stream as all get methods. Calling such methods while
the iterator is being used will produce an IllegalStateException.
public int size()
public static void main(String[] arg) throws ClassNotFoundException, IOException, JSAPException, SecurityException, NoSuchMethodException