Class CharsetEncoder
public abstract class CharsetEncoder extends Object
The input character sequence is a CharBuffer and the
output byte sequence is a ByteBuffer.
Use encode(CharBuffer) to encode an entire CharBuffer to a
new ByteBuffer, or encode(CharBuffer, ByteBuffer, boolean) for more
control. When using the latter method, the entire operation proceeds as follows:
- Invoke
reset()to reset the encoder if this instance has been used before. - Invoke
encodewith theendOfInputparameter set to false until additional input is not needed (as signaled by the return value). The input buffer must be filled and the output buffer must be flushed between invocations.The
encodemethod will convert as many characters as possible, and the process won't stop until the input buffer has been exhausted, the output buffer has been filled, or an error has occurred. ACoderResultinstance will be returned to indicate the current state. The caller should fill the input buffer, flush the output buffer, or recovering from an error and try again, accordingly. - Invoke
encodefor the last time withendOfInputset to true. - Invoke
flush(ByteBuffer)to flush remaining output.
There are two classes of encoding error: malformed input signifies that the input character sequence is not legal, while unmappable character signifies that the input is legal but cannot be mapped to a byte sequence (because the charset cannot represent the character, for example).
Errors can be handled in three ways. The default is to
report the error to the caller. The alternatives are to
ignore the error or replace
the problematic input with the byte sequence returned by replacement(). The disposition
for each of the two kinds of error can be set independently using the onMalformedInput(java.nio.charset.CodingErrorAction)
and onUnmappableCharacter(java.nio.charset.CodingErrorAction) methods.
The default replacement bytes depend on the charset but can be overridden using the
replaceWith(byte[]) method.
This class is abstract and encapsulates many common operations of the
encoding process for all charsets. Encoders for a specific charset should
extend this class and need only to implement the
encodeLoop method for basic
encoding. If a subclass maintains internal state, it should also override the
implFlush and implReset methods.
This class is not thread-safe.
- See Also:
Charset,CharsetDecoder
-
Constructor Summary
Constructors Modifier Constructor Description protectedCharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar)Constructs a newCharsetEncoderusing the given parameters and the replacement byte array{ (byte) '?' }.protectedCharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar, byte[] replacement)Constructs a newCharsetEncoderusing the givenCharset, replacement byte array, average number and maximum number of bytes created by this encoder for one input character. -
Method Summary
Modifier and Type Method Description floataverageBytesPerChar()Returns the average number of bytes created by this encoder for a single input character.booleancanEncode(char c)Tests whether the given character can be encoded by this encoder.booleancanEncode(CharSequence sequence)Tests whether the givenCharSequencecan be encoded by this encoder.Charsetcharset()Returns theCharsetwhich this encoder uses.ByteBufferencode(CharBuffer in)This is a facade method for the encoding operation.CoderResultencode(CharBuffer in, ByteBuffer out, boolean endOfInput)Encodes characters starting at the current position of the given input buffer, and writes the equivalent byte sequence into the given output buffer from its current position.protected abstract CoderResultencodeLoop(CharBuffer in, ByteBuffer out)Encodes characters into bytes.CoderResultflush(ByteBuffer out)Flushes this encoder.protected CoderResultimplFlush(ByteBuffer out)Flushes this encoder.protected voidimplOnMalformedInput(CodingErrorAction newAction)Notifies that this encoder'sCodingErrorActionspecified for malformed input error has been changed.protected voidimplOnUnmappableCharacter(CodingErrorAction newAction)Notifies that this encoder'sCodingErrorActionspecified for unmappable character error has been changed.protected voidimplReplaceWith(byte[] newReplacement)Notifies that this encoder's replacement has been changed.protected voidimplReset()Resets this encoder's charset related state.booleanisLegalReplacement(byte[] replacement)Tests whether the given argument is legal as this encoder's replacement byte array.CodingErrorActionmalformedInputAction()Returns this encoder'sCodingErrorActionwhen a malformed input error occurred during the encoding process.floatmaxBytesPerChar()Returns the maximum number of bytes which can be created by this encoder for one input character, must be positive.CharsetEncoderonMalformedInput(CodingErrorAction newAction)Sets this encoder's action on malformed input error.CharsetEncoderonUnmappableCharacter(CodingErrorAction newAction)Sets this encoder's action on unmappable character error.byte[]replacement()Returns the replacement byte array, which is never null or empty.CharsetEncoderreplaceWith(byte[] replacement)Sets the new replacement value.CharsetEncoderreset()Resets this encoder.CodingErrorActionunmappableCharacterAction()Returns this encoder'sCodingErrorActionwhen unmappable character occurred during encoding process.
-
Constructor Details
-
CharsetEncoder
Constructs a newCharsetEncoderusing the given parameters and the replacement byte array{ (byte) '?' }. -
CharsetEncoder
protected CharsetEncoder(Charset cs, float averageBytesPerChar, float maxBytesPerChar, byte[] replacement)Constructs a newCharsetEncoderusing the givenCharset, replacement byte array, average number and maximum number of bytes created by this encoder for one input character.- Parameters:
cs- theCharsetto be used by this encoder.averageBytesPerChar- average number of bytes created by this encoder for one single input character, must be positive.maxBytesPerChar- maximum number of bytes which can be created by this encoder for one single input character, must be positive.replacement- the replacement byte array, cannot be null or empty, its length cannot be larger thanmaxBytesPerChar, and must be a legal replacement, which can be justified byisLegalReplacement.- Throws:
IllegalArgumentException- if any parameters are invalid.
-
-
Method Details
-
averageBytesPerChar
public final float averageBytesPerChar()Returns the average number of bytes created by this encoder for a single input character. -
canEncode
public boolean canEncode(char c)Tests whether the given character can be encoded by this encoder.Note that this method may change the internal state of this encoder, so it should not be called when another encoding process is ongoing, otherwise it will throw an
IllegalStateException.- Throws:
IllegalStateException- if another encode process is ongoing.
-
canEncode
Tests whether the givenCharSequencecan be encoded by this encoder.Note that this method may change the internal state of this encoder, so it should not be called when another encode process is ongoing, otherwise it will throw an
IllegalStateException.- Throws:
IllegalStateException- if another encode process is ongoing.
-
charset
Returns theCharsetwhich this encoder uses. -
encode
This is a facade method for the encoding operation.This method encodes the remaining character sequence of the given character buffer into a new byte buffer. This method performs a complete encoding operation, resets at first, then encodes, and flushes at last.
This method should not be invoked if another encode operation is ongoing.
- Parameters:
in- the input buffer.- Returns:
- a new
ByteBuffercontaining the bytes produced by this encoding operation. The buffer's limit will be the position of the last byte in the buffer, and the position will be zero. - Throws:
IllegalStateException- if another encoding operation is ongoing.MalformedInputException- if an illegal input character sequence for this charset is encountered, and the action for malformed error isCodingErrorAction.REPORTUnmappableCharacterException- if a legal but unmappable input character sequence for this charset is encountered, and the action for unmappable character error isCodingErrorAction.REPORT. Unmappable means the Unicode character sequence at the input buffer's current position cannot be mapped to a equivalent byte sequence.CharacterCodingException- if other exception happened during the encode operation.
-
encode
Encodes characters starting at the current position of the given input buffer, and writes the equivalent byte sequence into the given output buffer from its current position.The buffers' position will be changed with the reading and writing operation, but their limits and marks will be kept intact.
A
CoderResultinstance will be returned according to following rules:- A
malformed inputresult indicates that some malformed input error was encountered, and the erroneous characters start at the input buffer's position and their number can be got by result'slength. This kind of result can be returned only if the malformed action isCodingErrorAction.REPORT. CoderResult.UNDERFLOWindicates that as many characters as possible in the input buffer have been encoded. If there is no further input and no characters left in the input buffer then this task is complete. If this is not the case then the client should call this method again supplying some more input characters.CoderResult.OVERFLOWindicates that the output buffer has been filled, while there are still some characters remaining in the input buffer. This method should be invoked again with a non-full output buffer.- A
unmappable characterresult indicates that some unmappable character error was encountered, and the erroneous characters start at the input buffer's position and their number can be got by result'slength. This kind of result can be returned only onCodingErrorAction.REPORT.
The
endOfInputparameter indicates if the invoker can provider further input. This parameter is true if and only if the characters in the current input buffer are all inputs for this encoding operation. Note that it is common and won't cause an error if the invoker sets false and then has no more input available, while it may cause an error if the invoker always sets true in several consecutive invocations. This would make the remaining input to be treated as malformed input. input.This method invokes the
encodeLoopmethod to implement the basic encode logic for a specific charset.- Parameters:
in- the input buffer.out- the output buffer.endOfInput- true if all the input characters have been provided.- Returns:
- a
CoderResultinstance indicating the result. - Throws:
IllegalStateException- if the encoding operation has already started or no more input is needed in this encoding process.CoderMalfunctionError- If theencodeLoopmethod threw anBufferUnderflowExceptionorBufferUnderflowException.
- A
-
encodeLoop
Encodes characters into bytes. This method is called byencode.This method will implement the essential encoding operation, and it won't stop encoding until either all the input characters are read, the output buffer is filled, or some exception is encountered. Then it will return a
CoderResultobject indicating the result of the current encoding operation. The rule to construct theCoderResultis the same as forencode. When an exception is encountered in the encoding operation, most implementations of this method will return a relevant result object to theencodemethod, and subclasses may handle the exception and implement the error action themselves.The buffers are scanned from their current positions, and their positions will be modified accordingly, while their marks and limits will be intact. At most
in.remaining()characters will be read, andout.remaining()bytes will be written.Note that some implementations may pre-scan the input buffer and return
CoderResult.UNDERFLOWuntil it receives sufficient input.- Parameters:
in- the input buffer.out- the output buffer.- Returns:
- a
CoderResultinstance indicating the result.
-
flush
Flushes this encoder.This method will call
implFlush. Some encoders may need to write some bytes to the output buffer when they have read all input characters, subclasses can overriddenimplFlushto perform writing action.The maximum number of written bytes won't larger than
out.remaining(). If some encoder wants to write more bytes than the output buffer's available remaining space, thenCoderResult.OVERFLOWwill be returned, and this method must be called again with a byte buffer that has free space. Otherwise this method will returnCoderResult.UNDERFLOW, which means one encoding process has been completed successfully.During the flush, the output buffer's position will be changed accordingly, while its mark and limit will be intact.
- Parameters:
out- the given output buffer.- Returns:
CoderResult.UNDERFLOWorCoderResult.OVERFLOW.- Throws:
IllegalStateException- if this encoder isn't already flushed or at end of input.
-
implFlush
Flushes this encoder. The default implementation does nothing and always returnsCoderResult.UNDERFLOW; this method can be overridden if needed.- Parameters:
out- the output buffer.- Returns:
CoderResult.UNDERFLOWorCoderResult.OVERFLOW.
-
implOnMalformedInput
Notifies that this encoder'sCodingErrorActionspecified for malformed input error has been changed. The default implementation does nothing; this method can be overridden if needed.- Parameters:
newAction- the new action.
-
implOnUnmappableCharacter
Notifies that this encoder'sCodingErrorActionspecified for unmappable character error has been changed. The default implementation does nothing; this method can be overridden if needed.- Parameters:
newAction- the new action.
-
implReplaceWith
protected void implReplaceWith(byte[] newReplacement)Notifies that this encoder's replacement has been changed. The default implementation does nothing; this method can be overridden if needed.- Parameters:
newReplacement- the new replacement string.
-
implReset
protected void implReset()Resets this encoder's charset related state. The default implementation does nothing; this method can be overridden if needed. -
isLegalReplacement
public boolean isLegalReplacement(byte[] replacement)Tests whether the given argument is legal as this encoder's replacement byte array. The given byte array is legal if and only if it can be decoded into characters. -
malformedInputAction
Returns this encoder'sCodingErrorActionwhen a malformed input error occurred during the encoding process. -
maxBytesPerChar
public final float maxBytesPerChar()Returns the maximum number of bytes which can be created by this encoder for one input character, must be positive. -
onMalformedInput
Sets this encoder's action on malformed input error. This method will call theimplOnMalformedInputmethod with the given new action as argument.- Parameters:
newAction- the new action on malformed input error.- Returns:
- this encoder.
- Throws:
IllegalArgumentException- if the given newAction is null.
-
onUnmappableCharacter
Sets this encoder's action on unmappable character error. This method will call theimplOnUnmappableCharactermethod with the given new action as argument.- Parameters:
newAction- the new action on unmappable character error.- Returns:
- this encoder.
- Throws:
IllegalArgumentException- if the given newAction is null.
-
replacement
public final byte[] replacement()Returns the replacement byte array, which is never null or empty. -
replaceWith
Sets the new replacement value. This method first checks the given replacement's validity, then changes the replacement value and finally calls theimplReplaceWithmethod with the given new replacement as argument.- Parameters:
replacement- the replacement byte array, cannot be null or empty, its length cannot be larger thanmaxBytesPerChar, and it must be legal replacement, which can be justified by callingisLegalReplacement(byte[] replacement).- Returns:
- this encoder.
- Throws:
IllegalArgumentException- if the given replacement cannot satisfy the requirement mentioned above.
-
reset
Resets this encoder. This method will reset the internal state and then callsimplReset()to reset any state related to the specific charset. -
unmappableCharacterAction
Returns this encoder'sCodingErrorActionwhen unmappable character occurred during encoding process.
-