Class CharsetDecoder
public abstract class CharsetDecoder extends Object
The input byte sequence is wrapped by a
ByteBuffer and the output character sequence is a
CharBuffer. A decoder instance should be used in
the following sequence, which is referred to as a decoding operation:
- invoking the
resetmethod to reset the decoder if the decoder has been used; - invoking the
decodemethod until the additional input is not needed, theendOfInputparameter must be set to false, the input buffer must be filled and the output buffer must be flushed between invocations; - invoking the
decodemethod for the last time, and then theendOfInputparameter must be set to true; - invoking the
flushmethod to flush the output.
The decode method will
convert as many bytes as possible, and the process won't stop until the input
bytes have run out, the output buffer has been filled or some error has
happened. A CoderResult instance will be returned to
indicate the stop reason, and the invoker can identify the result and choose
further action, which includes filling the input buffer, flushing the output
buffer or recovering from an error and trying again.
There are two common decoding errors. One is named malformed and it is returned when the input byte sequence is illegal for the current specific charset, the other is named unmappable character and it is returned when a problem occurs mapping a legal input byte sequence to its Unicode character equivalent.
Both errors can be handled in three ways, the default one is to report the
error to the invoker by a CoderResult instance, and the
alternatives are to ignore it or to replace the erroneous input with the
replacement string. The replacement string is "�" by default and can be
changed by invoking replaceWith method. The
invoker of this decoder can choose one way by specifying a
CodingErrorAction instance for each error type via
onMalformedInput method and
onUnmappableCharacter
method.
This is an abstract class and encapsulates many common operations of the
decoding process for all charsets. Decoders for a specific charset should
extend this class and need only to implement the
decodeLoop method for the basic
decoding. If a subclass maintains an internal state, it should override the
implFlush method and the
implReset method in addition.
This class is not thread-safe.
- See Also:
Charset,CharsetEncoder
-
Constructor Summary
Constructors Modifier Constructor Description protectedCharsetDecoder(Charset charset, float averageCharsPerByte, float maxCharsPerByte)Constructs a newCharsetDecoderusing the givenCharset, average number and maximum number of characters created by this decoder for one input byte, and the default replacement string "�". -
Method Summary
Modifier and Type Method Description floataverageCharsPerByte()Returns the average number of characters created by this decoder for a single input byte.Charsetcharset()Returns theCharsetwhich this decoder uses.CharBufferdecode(ByteBuffer in)This is a facade method for the decoding operation.CoderResultdecode(ByteBuffer in, CharBuffer out, boolean endOfInput)Decodes bytes starting at the current position of the given input buffer, and writes the equivalent character sequence into the given output buffer from its current position.protected abstract CoderResultdecodeLoop(ByteBuffer in, CharBuffer out)Decodes bytes into characters.CharsetdetectedCharset()Gets the charset detected by this decoder; this method is optional.CoderResultflush(CharBuffer out)Flushes this decoder.protected CoderResultimplFlush(CharBuffer out)Flushes this decoder.protected voidimplOnMalformedInput(CodingErrorAction newAction)Notifies that this decoder'sCodingErrorActionspecified for malformed input error has been changed.protected voidimplOnUnmappableCharacter(CodingErrorAction newAction)Notifies that this decoder'sCodingErrorActionspecified for unmappable character error has been changed.protected voidimplReplaceWith(String newReplacement)Notifies that this decoder's replacement has been changed.protected voidimplReset()Reset this decoder's charset related state.booleanisAutoDetecting()Indicates whether this decoder implements an auto-detecting charset.booleanisCharsetDetected()Indicates whether this decoder has detected a charset; this method is optional.CodingErrorActionmalformedInputAction()Returns this decoder'sCodingErrorActionwhen malformed input occurred during the decoding process.floatmaxCharsPerByte()Returns the maximum number of characters which can be created by this decoder for one input byte, must be positive.CharsetDecoderonMalformedInput(CodingErrorAction newAction)Sets this decoder's action on malformed input errors.CharsetDecoderonUnmappableCharacter(CodingErrorAction newAction)Sets this decoder's action on unmappable character errors.Stringreplacement()Returns the replacement string, which is never null or empty.CharsetDecoderreplaceWith(String replacement)Sets the new replacement string.CharsetDecoderreset()Resets this decoder.CodingErrorActionunmappableCharacterAction()Returns this decoder'sCodingErrorActionwhen an unmappable character error occurred during the decoding process.
-
Constructor Details
-
CharsetDecoder
Constructs a newCharsetDecoderusing the givenCharset, average number and maximum number of characters created by this decoder for one input byte, and the default replacement string "�".- Parameters:
charset- theCharsetto be used by this decoder.averageCharsPerByte- the average number of characters created by this decoder for one input byte, must be positive.maxCharsPerByte- the maximum number of characters created by this decoder for one input byte, must be positive.- Throws:
IllegalArgumentException- ifaverageCharsPerByte <= 0 || maxCharsPerByte <= 0 || averageCharsPerByte > maxCharsPerByte.
-
-
Method Details
-
averageCharsPerByte
public final float averageCharsPerByte()Returns the average number of characters created by this decoder for a single input byte. -
charset
Returns theCharsetwhich this decoder uses. -
decode
This is a facade method for the decoding operation.This method decodes the remaining byte sequence of the given byte buffer into a new character buffer. This method performs a complete decoding operation, resets at first, then decodes, and flushes at last.
This method should not be invoked while another
decodeoperation is ongoing.- Parameters:
in- the input buffer.- Returns:
- a new
CharBuffercontaining the the characters produced by this decoding operation. The buffer's limit will be the position of the last character in the buffer, and the position will be zero. - Throws:
IllegalStateException- if another decoding operation is ongoing.MalformedInputException- if an illegal input byte sequence for this charset was encountered, and the action for malformed error isCodingErrorAction.REPORTUnmappableCharacterException- if a legal but unmappable input byte sequence for this charset was encountered, and the action for unmappable character error isCodingErrorAction.REPORT. Unmappable means the byte sequence at the input buffer's current position cannot be mapped to a Unicode character sequence.CharacterCodingException- if another exception happened during the decode operation.
-
decode
Decodes bytes starting at the current position of the given input buffer, and writes the equivalent character sequence into the given output buffer from its current position.The buffers' position will be changed with the reading and writing operation, but their limits and marks will be kept intact.
A
CoderResultinstance will be returned according to following rules:CoderResult.OVERFLOWindicates that even though not all of the input has been processed, the buffer the output is being written to has reached its capacity. In the event of this code being returned this method should be called once more with anoutargument that has not already been filled.CoderResult.UNDERFLOWindicates that as many bytes as possible in the input buffer have been decoded. If there is no further input and no remaining bytes in the input buffer then this operation may be regarded as complete. Otherwise, this method should be called once more with additional input.- A
malformed inputresult indicates that some malformed input error has been encountered, and the erroneous bytes start at the input buffer's position and their number can be got by result'slength. This kind of result can be returned only if the malformed action isCodingErrorAction.REPORT. - A
unmappable characterresult indicates that some unmappable character error has been encountered, and the erroneous bytes start at the input buffer's position and their number can be got by result'slength. This kind of result can be returned only if the unmappable character action isCodingErrorAction.REPORT.
The
endOfInputparameter indicates that the invoker cannot provide further input. This parameter is true if and only if the bytes in current input buffer are all inputs for this decoding operation. Note that it is common and won't cause an error if the invoker sets false and then can't provide more input, while it may cause an error if the invoker always sets true in several consecutive invocations. This would make the remaining input to be treated as malformed input.This method invokes the
decodeLoopmethod to implement the basic decode logic for a specific charset.- Parameters:
in- the input buffer.out- the output buffer.endOfInput- true if all the input characters have been provided.- Returns:
- a
CoderResultinstance which indicates the reason of termination. - Throws:
IllegalStateException- if decoding has started or no more input is needed in this decoding progress.CoderMalfunctionError- if thedecodeLoopmethod threw anBufferUnderflowExceptionorBufferOverflowException.
-
decodeLoop
Decodes bytes into characters. This method is called by thedecodemethod.This method will implement the essential decoding operation, and it won't stop decoding until either all the input bytes are read, the output buffer is filled, or some exception is encountered. Then it will return a
CoderResultobject indicating the result of current decoding operation. The rules to construct theCoderResultare the same as fordecode. When an exception is encountered in the decoding operation, most implementations of this method will return a relevant result object to thedecodemethod, and some performance optimized implementation may handle the exception and implement the error action itself.The buffers are scanned from their current positions, and their positions will be modified accordingly, while their marks and limits will be intact. At most
in.remaining()characters will be read, andout.remaining()bytes will be written.Note that some implementations may pre-scan the input buffer and return a
CoderResult.UNDERFLOWuntil it receives sufficient input.- Parameters:
in- the input buffer.out- the output buffer.- Returns:
- a
CoderResultinstance indicating the result.
-
detectedCharset
Gets the charset detected by this decoder; this method is optional.If implementing an auto-detecting charset, then this decoder returns the detected charset from this method when it is available. The returned charset will be the same for the rest of the decode operation.
If insufficient bytes have been read to determine the charset, an
IllegalStateExceptionwill be thrown.The default implementation always throws
UnsupportedOperationException, so it should be overridden by a subclass if needed.- Returns:
- the charset detected by this decoder, or null if it is not yet determined.
- Throws:
UnsupportedOperationException- if this decoder does not implement an auto-detecting charset.IllegalStateException- if insufficient bytes have been read to determine the charset.
-
flush
Flushes this decoder. This method will callimplFlush. Some decoders may need to write some characters to the output buffer when they have read all input bytes; subclasses can overrideimplFlushto perform the writing operation.The maximum number of written bytes won't be larger than
out.remaining(). If some decoder wants to write more bytes than an output buffer's remaining space allows, then aCoderResult.OVERFLOWwill be returned, and this method must be called again with a character buffer that has more remaining space. Otherwise this method will returnCoderResult.UNDERFLOW, which means one decoding process has been completed successfully.During the flush, the output buffer's position will be changed accordingly, while its mark and limit will be intact.
- Parameters:
out- the given output buffer.- Returns:
CoderResult.UNDERFLOWorCoderResult.OVERFLOW.- Throws:
IllegalStateException- if this decoder isn't already flushed or at end of input.
-
implFlush
Flushes this decoder. The default implementation does nothing and always returnsCoderResult.UNDERFLOW; this method can be overridden if needed.- Parameters:
out- the output buffer.- Returns:
CoderResult.UNDERFLOWorCoderResult.OVERFLOW.
-
implOnMalformedInput
Notifies that this decoder'sCodingErrorActionspecified for malformed input error has been changed. The default implementation does nothing; this method can be overridden if needed.- Parameters:
newAction- the new action.
-
implOnUnmappableCharacter
Notifies that this decoder'sCodingErrorActionspecified for unmappable character error has been changed. The default implementation does nothing; this method can be overridden if needed.- Parameters:
newAction- the new action.
-
implReplaceWith
Notifies that this decoder's replacement has been changed. The default implementation does nothing; this method can be overridden if needed.- Parameters:
newReplacement- the new replacement string.
-
implReset
protected void implReset()Reset this decoder's charset related state. The default implementation does nothing; this method can be overridden if needed. -
isAutoDetecting
public boolean isAutoDetecting()Indicates whether this decoder implements an auto-detecting charset.- Returns:
trueif this decoder implements an auto-detecting charset.
-
isCharsetDetected
public boolean isCharsetDetected()Indicates whether this decoder has detected a charset; this method is optional.If this decoder implements an auto-detecting charset, then this method may start to return true during decoding operation to indicate that a charset has been detected in the input bytes and that the charset can be retrieved by invoking the
detectedCharsetmethod.Note that a decoder that implements an auto-detecting charset may still succeed in decoding a portion of the given input even when it is unable to detect the charset. For this reason users should be aware that a
falsereturn value does not indicate that no decoding took place.The default implementation always throws an
UnsupportedOperationException; it should be overridden by a subclass if needed.- Returns:
trueif this decoder has detected a charset.- Throws:
UnsupportedOperationException- if this decoder doesn't implement an auto-detecting charset.
-
malformedInputAction
Returns this decoder'sCodingErrorActionwhen malformed input occurred during the decoding process. -
maxCharsPerByte
public final float maxCharsPerByte()Returns the maximum number of characters which can be created by this decoder for one input byte, must be positive. -
onMalformedInput
Sets this decoder's action on malformed input errors. This method will call theimplOnMalformedInputmethod with the given new action as argument.- Parameters:
newAction- the new action on malformed input error.- Returns:
- this decoder.
- Throws:
IllegalArgumentException- ifnewAction == null.
-
onUnmappableCharacter
Sets this decoder's action on unmappable character errors. This method will call theimplOnUnmappableCharactermethod with the given new action as argument.- Parameters:
newAction- the new action on unmappable character error.- Returns:
- this decoder.
- Throws:
IllegalArgumentException- ifnewAction == null.
-
replacement
Returns the replacement string, which is never null or empty. -
replaceWith
Sets the new replacement string. This method first checks the given replacement's validity, then changes the replacement value, and at last calls theimplReplaceWithmethod with the given new replacement as argument.- Parameters:
replacement- the replacement string cannot be null, empty, or longer thanmaxCharsPerByte().- Returns:
- this decoder.
- Throws:
IllegalArgumentException- if the given replacement cannot satisfy the requirement mentioned above.
-
reset
Resets this decoder. This method will reset the internal state, and then callsimplReset()to reset any state related to the specific charset. -
unmappableCharacterAction
Returns this decoder'sCodingErrorActionwhen an unmappable character error occurred during the decoding process.
-