package io
- Alphabetic
- Public
- All
Type Members
-
case class
BacktrackingException(position: Long, maxBacktrackLength: Int) extends ThinException with Product with Serializable
There is a finite limit to the distance one can backtrack which is given by the implementation's use of a finite array of finite fixed size buckets.
There is a finite limit to the distance one can backtrack which is given by the implementation's use of a finite array of finite fixed size buckets. If a vast amount of data is read in and is more than the ultimate backtrack limit (number of buckets times bucket size) in size, then that will cause data buckets to effectively spill off into parser history, and the ability to backtrack to the points in the data they stored is lost along with them. Any time there is a point-of-uncertainty to which the parser could backtrack, and then the parser advances through more data than the backtracking maximum limit, the ability to backtrack to that point of uncertainty will be lost. This will be detected, and is a fatal error (Runtime SDE).
This situation is easiest to envision if BLOB objects are involved, but is not BLOB specific. A format with a choice, then in the first branch of the choice, a group and/or array of small data items, ultimately totaling in size to more than the backtrack limit, and then a branch failure, will cause this backtracking limit error with no BLOBs being used.
There is no limit created by this module of code to the size of a any data item having to fit within JVM byte array maximums, nor the JVM memory footprint even. Only a limitation on backtracking distance is created.
It is also worth noting that it is not the act of backtracking that directly cause an error. The error only occurs when someone tries to read from a bucket that has already been released.
-
class
BitOrderChangeException extends Exception
Throw to indicate that bitOrder changed, but not on a byte boundary.
Throw to indicate that bitOrder changed, but not on a byte boundary.
Must be caught at higher level and turned into a RuntimeSDE where we have the context to do so.
All calls to setFinished should, somewhere, be surrounded by a catch of this.
-
class
BucketingInputSource extends InputSource
Implements the InputSource interface, reading data from a generic java.io.InputStream and storing the data in buckets of a defined size.
Implements the InputSource interface, reading data from a generic java.io.InputStream and storing the data in buckets of a defined size. Buckets are freed when no "locks" exist inside the bucket to minimize memory usage. Note that "locks" in this sense are the InputSource locks on bytePosition and are not about syncrhonization. This more of a reference count, counting buckets to determine which buckets are no longer needed and can be freed when the reference count goes to zero.
-
class
ByteArrayOutputStreamWithGetBuf extends ByteArrayOutputStream
This simple extension just gives us a public method for access to the underlying byte array.
This simple extension just gives us a public method for access to the underlying byte array. That way we don't have to make a copy just to access the bytes.
-
class
ByteBufferInputSource extends InputSource
Wraps a java.nio.ByteBuffer in a InputSource
Wraps a java.nio.ByteBuffer in a InputSource
When an instance of this class is created, it creates a readOnly copy of the ByteBuffer. The current position of the ByteBuffer is considered index 0. For example, if thed passed in ByteBuffer had position 2, calling setPosition(0) would reset the byteBuffer back to position 2. The limit of the ByteBuffer is considered the end of data.
- sealed trait DOSState extends AnyRef
-
class
DataDumper extends AnyRef
Hex/Bits and text dump formats for debug/trace purposes.
Hex/Bits and text dump formats for debug/trace purposes.
By definition this is a dump, so doesn't know much about where the fields in the data are. (To do that you'd need a format description language, like DFDL, but this is here to help debug DFDL descriptions, so it really cannot exploit any information about the data format)
- trait DataInputStream extends DataStreamCommon
- trait DataInputStreamImplMixin extends DataInputStream with DataStreamCommonImplMixin with LocalBufferMixin
-
trait
DataOutputStream extends DataStreamCommon
There is an asymmetry between DataInputStream and DataOutputStream with respect to the positions and limits in the bit stream.
There is an asymmetry between DataInputStream and DataOutputStream with respect to the positions and limits in the bit stream.
For the DataInputStream, we have this concept of the current bitPos0b, and optionally there may be abound called bitLimit0b. There are 1b variants of these.
For parsing, these are always absolute values, that is they contain bit position relative the ultimate start of the input stream where parsing began.
For DataOutputStream, we have slightly different concepts.
There are absolute and relative variants. The absolute bitPosOb or absBitPos0b is symmetric to the parser's bitPos0b. It's the position relative to the ultimate start of the output stream.
However, we often do not know this value. So the UState and DataOutputStream have a maybeAbsBitPos0b which can be MaybeULong.Nope if the value isn't known.
In addition we have the relative or relBitPos0b. This is relative to the start of whatever buffer we are doing unparsing into.
When unparsing, we often have to unparse into a buffer where the ultimate actual absolute position isn't yet known, but we have to do the unparsing anyway, for example so that we can measure exactly how long something is.
Conversely, sometimes we simply must have the absolute output bit position, for example, when computing the number of bits to insert to achieve the required alignment.
Hence we have relBitPos0b - always known and is a value >= 0, and we have maybeAbsBitPos0b which is a MaybeULong. If known it is >=0.
Corresponding to bit position we have bit limit, which is measured in the same 0b or 1b units, but is *always* a maybe type, because even in the case where we know the absolute position, we still may or may not have any limit in place. Hence the UState and DataOutputStream have a
maybeRelBitLimit0b
and
maybeAbsBitLimit0b.
One invariant is this: when the absolute bit pos is known, then it is the same as the relative bit pos. Similarly when the absolute bit limit is known, then the relative bit limit is known and is equal.
- trait DataOutputStreamImplMixin extends DataStreamCommonState with DataOutputStream with DataStreamCommonImplMixin with LocalBufferMixin
-
trait
DataStreamCommon extends AnyRef
This is an interface trait, and it defines methods shared by both DataInputStream and DataOutputStream.
This is an interface trait, and it defines methods shared by both DataInputStream and DataOutputStream.
Implementation (partial) is in DataStreamCommonImplMixin.
-
trait
DataStreamCommonImplMixin extends DataStreamCommon
Shared by both DataInputStream and DataOutputStream implementations
- trait DataStreamCommonState extends AnyRef
-
class
DirectOrBufferedDataOutputStream extends DataOutputStreamImplMixin
To support dfdl:outputValueCalc, we must suspend output.
To support dfdl:outputValueCalc, we must suspend output. This is done by taking the current "direct" output, and splitting it into a still direct part, and a following buffered output.
The direct part waits for the OVC calculation to complete, when that is written, it is finished and collapses into the following, which was buffered, but becomes direct as a result of this collapsing.
Hence, most output will be to direct data output streams, with some, while an OVC is pending, will be buffered, but this is eliminated as soon as possible.
A Buffered DOS can be finished or not. Not finished means that it might still be appended to. Not concurrently, but by other code invoked from this thread of control (which might traverse different co-routine "stack" threads, but it's still one thread of control).
Finished means that the Buffered DOS can never be appended to again.
Has two modes of operation, buffering or direct. When buffering, all output goes into a buffer. When direct, all output goes into a "real" DataOutputStream.
The isLayer parameter defines whether or not this instance originated from a layer or not. This is important to specify because this class is reponsible for closing the associated Java OutputStream, ultimately being written to the underlying underlying DataOutputStream. However, if the DataOutputStream is not related to a layer, that means the associated Java OutputStream came from the user and it is the users responsibility to close it. The isLayer provides the flag to know which streams should be closed or not.
chunkSizeInBytes is used when the buffered output stream is using a file as its buffer. This is the size of chunks that will be read into memory before being written to the direct output stream.
maxBufferSizeInByte is the size that the ByteArrayOutputStream will grow to before switching over to a FileOutputStream
tempDirPath is the path where temporary files will be created when switching to a file based buffer
maybeExistingFile is used in the case of blob files, where we already have an existing file containing the data. This is the path to said file.
-
class
ExplicitLengthLimitingStream extends FilterInputStream
This class can be used with any InputStream to restrict what is read from it to N bytes.
This class can be used with any InputStream to restrict what is read from it to N bytes.
This can be used to forcibly stop consumption of data from a stream at a length obtained explicitly.
Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.
- class FileIOException extends Exception
-
trait
FormatInfo extends AnyRef
Abstract interface to obtain format properties or values derived from properties.
Abstract interface to obtain format properties or values derived from properties.
This includes anything the I/O layer needs, which includes properties that can be runtime-valued expressions, or that depend on such.
By passing in an object that provides quick access to these, we avoid the need to have setters/getters that call setters that change state in the I/O layer.
-
abstract
class
InputSource extends AnyRef
The InputSource class is really just a mechanism to provide bytes an InputSourceDataInputStream, which does the heavily lift about converter bits/bytes to numbers and characters.
The InputSource class is really just a mechanism to provide bytes an InputSourceDataInputStream, which does the heavily lift about converter bits/bytes to numbers and characters. This class does not need to know anything about bits, it is purely byte centric. One core difference from this vs an InputStream is that is must have the capability to backtrack to arbitrary points in the InputStreams history. To aide in this, methods are called to let the InputSource know which byte positions might we might need to backtrack to, which can allow it to free data that know longer is needed. One can almost thing of things as an InputStream that supports multiple marks with random access.
-
final
class
InputSourceDataInputStream extends DataInputStreamImplMixin
Realization of the DataInputStream API
Realization of the DataInputStream API
Underlying representation is an InputSource containing all input data.
- class InputSourceDataInputStreamCharIterator extends CharIterator
- class InputSourceDataInputStreamCharIteratorState extends AnyRef
-
class
InputStreamReadZeroError extends UnsuppressableException
Thrown in the specific case where a java.io.InputStream is not properly implemented and is returning 0 from the read(buf, off, len) call when len > 0.
Thrown in the specific case where a java.io.InputStream is not properly implemented and is returning 0 from the read(buf, off, len) call when len > 0.
This non-blocking behavior is not supported by java.io.InputStream's contract and Daffodil depends on InputStreams having only blocking behavior.
To properly place blame for this error on the InputStream (and not Daffodil's I/O layer built on top of it) we throw this very specific, informative exception in this case.
- class LayerBoundaryMarkInsertingJavaOutputStream extends FilterOutputStream
- abstract class LocalBuffer[T <: Buffer] extends AnyRef
-
trait
LocalBufferMixin extends AnyRef
Warning: Only mix this into thread-local state objects.
Warning: Only mix this into thread-local state objects. If mixed into a regular class this will end up sharing the local stack object across threads, which is a very bad idea (not thread safe).
-
final
class
MarkState extends DataStreamCommonState with Mark
The state that must be saved and restored by mark/reset calls
-
class
RegexLimitingStream extends InputStream
Can be used with any InputStream to restrict what is read from it to stop before a particular regex match.
Can be used with any InputStream to restrict what is read from it to stop before a particular regex match.
The regex must have a finite maximum length match string.
This can be used to forcibly stop consumption of data from a stream at a length obtained from a delimiter that is described using a regex.
The delimiter matching the regex is consumed from the underlying stream (if found), and the underlying stream is left positioned at the byte after the regex match string.
IMPORTANT: The delimiter regex cannot contain any Capturing Groups! Use (?: ... ) which is non-capturing, instead of regular ( ... ). For example: this regex matches CRLF not followed by tab or space:
"""\r\n(?!(?:\t|\ ))"""
Notice use of the ?: to avoid a capture group around the alternatives of tab or space.
Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.
- class StreamIterator[T] extends Iterator[T]
-
final
class
StringDataInputStreamForUnparse extends DataInputStreamImplMixin
When unparsing, we reuse all the DFA logic to identify delimiters within the data that need to be escaped, so we need to treat the string data being unparsed as a DataInputStream.
-
trait
ThreadCheckMixin extends AnyRef
Mixin to classes that are supposed to exist 1 to 1 with threads.
Mixin to classes that are supposed to exist 1 to 1 with threads. Such as DataInputStream derived classes and DataOutputStream derived classes.
- sealed abstract class ZeroLengthStatus extends AnyRef
Value Members
-
object
BoundaryMarkLimitingStream
Can be used with any InputStream to restrict what is read from it to stop before a boundary mark string.
Can be used with any InputStream to restrict what is read from it to stop before a boundary mark string.
The boundary mark string is exactly that, a string of characters. Not a regex, nor anything involving DFDL Character Entities or Character Class Entities. (No %WSP; no %NL; )
This can be used to forcibly stop consumption of data from a stream at a length obtained from a delimiter.
The boundary mark string is consumed from the underlying stream (if found), and the underlying stream is left positioned at the byte after the boundary mark string.
Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.
-
object
DataInputStream
This trait defines the low level API called by Daffodil's Parsers.
This trait defines the low level API called by Daffodil's Parsers.
It has features to support
- backtracking
- regex pattern matching using Java Pattern regexs (for lengthKind pattern and pattern asserts)
- character-by-character access as needed by our DFA delimiter/escaping
- very efficient access to small binary data (64-bits or smaller)
- alignment and skipping
- encodingErrorPolicy 'error' and 'replace'
- convenient use of zero-based values because java/scala APIs for I/O are all zero-based
- convenient use of 1-based values because DFDL is 1-based, so debug/trace and such all want to be 1-based values.
A goal is that this API does not allocate objects as I/O operations are performed unless boxed objects are being returned. For example getSignedLong(...) should not allocate anything per call; however, getSignedBigInt(...) does, because a BigInt is a heap-allocated object.
Internal buffers and such may be dropped/resized/reallocated as needed during method calls. The point is not that you never allocate. It's that the per-I/O operation overhead does not require object allocation for every data-accessing method call.
Similarly, text data can be retrieved into a char buffer, and the char buffer can provide a limit on size (available capacity of the char buffer) in characters. The text can be examined in the char buffer, or a string can be created from the char buffer's contents when needed.
This API is very stateful, and not-thread-safe i.e., each thread must have its own object. Some of this is inherent in this API style, and some is inherited from the underlying objects this API uses (such as CharsetDecoder).
This API is also intended to support some very highly optimized implementations. For example, if the schemas is all text, and the encoding is known to be iso-8859-1, then there is no notion of a decode error, and every byte value, extended to a Char value, *is* the Unicode codepoint. No decoder needs to be used in this case and this API becomes a quite thin layer on top of a java.io.BufferedStream.
Terminology:
Available Data - this is the data that is between the current bit position, and some limit. The limit can either be set (via setBitLimit calls), or it can be limited by tunable values, or implementation-specific upper limits, or it can simply be the end of the data stream.
Different kinds of DataInputStreams can have different limits. For example, a File-based DataInputStream may have no limit on the forward speculation distance, because the file can be randomly accessed if necessary. Contrasting that with a data stream that is directly connected to a network socket may have a upper limit on the amount of data that it is willing to buffer.
None of this is a commitment that this API will in fact have multiple specialized implementations. It's just a possibility for the future.
Implementation Note: It is the implementation of this interface which implements the Bucket Algorithm as described on the Daffodil Wiki. All of that bucket stuff is beneath this API.
In general, this API tries to return a value rather than throw exceptions whenever the behavior may be very common. This leaves it up to the caller to decide whether or not to throw an exception, and avoids the overhead of try-catch blocks. The exception to this rule are the methods that involve character decoding for textual data. These methods may throw CharacterCodingException when the encoding error policy is 'error'.
- object DirectOrBufferedDataOutputStream
-
object
FastAsciiToUnicodeConverter
Highly optimized converter for Ascii to Unicode
-
object
InputSourceDataInputStream
Factory for creating this type of DataInputStream
Factory for creating this type of DataInputStream
Provides only core input sources to avoid making any assumptions about the incoming data (i.e. should a File be mapped to a ByteBuffer or be streamed as an InputStream). The user knows better than us, so have them make the decision.
- object Utils
- object ZeroLengthStatus