public abstract class FileVec extends ByteVec
Vec.CollectDomain, Vec.VectorGroup, Vec.Writer| Modifier and Type | Field and Description |
|---|---|
int |
_chunkSize |
int |
_log2ChkSize |
static int |
DFLT_CHUNK_SIZE
Default Chunk size in bytes, useful when breaking up large arrays into
"bite-sized" chunks.
|
static int |
DFLT_LOG2_CHUNK_SIZE
Log-2 of Chunk size.
|
_espc, KEY_PREFIX_LEN, PERCENTILES, T_BAD, T_ENUM, T_NUM, T_STR, T_TIME, T_TIMELAST, T_UUID| Modifier | Constructor and Description |
|---|---|
protected |
FileVec(Key key,
long len,
byte be) |
protected |
FileVec(Key key,
long len,
byte be,
int chunkSize) |
| Modifier and Type | Method and Description |
|---|---|
long |
byteSize()
Size of vector data.
|
static int |
calcOptimalChunkSize(long totalSize,
int numCols)
Calculates safe and hopefully optimal chunk sizes.
|
protected Value |
chunkIdx(int cidx)
Get a Chunk's Value by index.
|
static long |
chunkOffset(Key ckey)
Convert a chunk-key to a file offset.
|
long |
length()
Number of elements in the vector; returned as a
long instead of
an int because Vecs support more than 2^32 elements. |
int |
nChunks()
Number of chunks, returned as an
int - Chunk count is limited by
the max size of a Java long[]. |
boolean |
writable()
Default read/write behavior for Vecs.
|
chunkForChunkIdx, getFirstBytes, isInt, naCnt, openStreamadaptTo, align, at, at16h, at16l, at8, atStr, base, bins, cardinality, checksum_impl, chunkForRow, chunkKey, chunkKey, domain, equals, factor, get_espc, get_type, getVecKey, group, hashCode, isBad, isConst, isEnum, isNA, isNumeric, isString, isTime, isUUID, lazy_bins, makeCon, makeCon, makeCon, makeCons, makeCopy, makeRand, makeRepSeq, makeSeq, makeSeq, makeVec, makeZero, makeZero, makeZero, makeZeros, makeZeros, max, maxs, mean, min, mins, newKey, ninfs, nzCnt, open, pctiles, pinfs, postWrite, preWriting, remove_impl, set, set, set, set, setDomain, sigma, startRollupStats, startRollupStats, stride, toEnum, toString, toStringVecclone, frozenType, read_impl, read, readExternal, readJSON_impl, readJSON, toJsonString, write_impl, write, writeExternal, writeHTML_impl, writeHTML, writeJSON_impl, writeJSONpublic static final int DFLT_LOG2_CHUNK_SIZE
public static final int DFLT_CHUNK_SIZE
public final int _log2ChkSize
public int _chunkSize
protected FileVec(Key key, long len, byte be)
protected FileVec(Key key, long len, byte be, int chunkSize)
public long length()
Veclong instead of
an int because Vecs support more than 2^32 elements. Overridden
by subclasses that compute length in an alternative way, such as
file-backed Vecs.public int nChunks()
Vecint - Chunk count is limited by
the max size of a Java long[]. Overridden by subclasses that
compute chunks in an alternative way, such as file-backed Vecs.public boolean writable()
Vecpublic long byteSize()
public static long chunkOffset(Key ckey)
protected Value chunkIdx(int cidx)
VecDKV.get(). Warning: this pulls the data locally; using this call
on every Chunk index on the same node will probably trigger an OOM!public static int calcOptimalChunkSize(long totalSize,
int numCols)
very small data < 128K per proc - uses default chunk size and all data will be in one chunk
small data - data is partitioned into chunks that at least 4 chunks per core to help keep all cores loaded
default - chunks are 4194304
large data - if the data would create more than 4M keys per node, then chunk sizes larger than DFLT_CHUNK_SIZE are issued.
Too many keys can create enough overhead to blow out memory in large data parsing. # keys = (parseSize / chunkSize) * numCols. Key limit of 2M is a guessed "reasonable" number.
totalSize - - parse size in bytes (across all files to be parsed)numCols - - number of columns expected in dataset