Package com.helger.commons.charset
Enum EUnicodeBOM
- java.lang.Object
-
- java.lang.Enum<EUnicodeBOM>
-
- com.helger.commons.charset.EUnicodeBOM
-
- All Implemented Interfaces:
Serializable,Comparable<EUnicodeBOM>
public enum EUnicodeBOM extends Enum<EUnicodeBOM>
Defines the most common Byte Order Markers for Unicode encoded text files.
Source: http://de.wikipedia.org/wiki/Byte_Order_Mark
Important: BOMS with more bytes should come first to avoid wrong detections.
Note: SCSU = A Standard Compression Scheme for Unicode: http://www.unicode.org/reports/tr6/
Note: BOCU = Binary Ordered Compression for Unicode- Author:
- Philip Helger
-
-
Enum Constant Summary
Enum Constants Enum Constant Description BOM_BOCU_1BOCUBOM_BOCU_1_ALT2BOCUBOM_GB_18030GB 18030BOM_SCSUSCSU - Single-byte mode Quote UnicodeBOM_SCSU_TO_UCSSCSU - Single-byte mode Change to UnicodeBOM_SCSU_W0_TO_FE80SCSU - Single-byte mode Define dynamic window 0 to 0xFE80BOM_SCSU_W1_TO_FE80SCSU - Single-byte mode Define dynamic window 1 to 0xFE80BOM_SCSU_W2_TO_FE80SCSU - Single-byte mode Define dynamic window 2 to 0xFE80BOM_SCSU_W3_TO_FE80SCSU - Single-byte mode Define dynamic window 3 to 0xFE80BOM_SCSU_W4_TO_FE80SCSU - Single-byte mode Define dynamic window 4 to 0xFE80BOM_SCSU_W5_TO_FE80SCSU - Single-byte mode Define dynamic window 5 to 0xFE80BOM_SCSU_W6_TO_FE80SCSU - Single-byte mode Define dynamic window 6 to 0xFE80BOM_SCSU_W7_TO_FE80SCSU - Single-byte mode Define dynamic window 7 to 0xFE80BOM_UTF_1UTF-1BOM_UTF_16_BIG_ENDIANUTF-16 Big EndianBOM_UTF_16_LITTLE_ENDIANUTF-16 Little EndianBOM_UTF_32_BIG_ENDIANUTF-32 Big EndianBOM_UTF_32_LITTLE_ENDIANUTF-32 Little EndianBOM_UTF_7UTF-7BOM_UTF_7_ALT2UTF-7BOM_UTF_7_ALT3UTF-7BOM_UTF_7_ALT4UTF-7BOM_UTF_8UTF-8BOM_UTF_EBCDICUTF-EBCDIC
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description byte[]getAllBytes()intgetByteCount()CharsetgetCharset()StringgetCharsetName()static EUnicodeBOMgetFromBytesOrNull(byte[] aBytes)Find the BOM that is matching the passed byte array.static intgetMaximumByteCount()booleanhasCharset()booleanisPresent(byte[] aBytes)Check if the passed byte array starts with this BOM's bytes.static EUnicodeBOMvalueOf(String name)Returns the enum constant of this type with the specified name.static EUnicodeBOM[]values()Returns an array containing the constants of this enum type, in the order they are declared.
-
-
-
Enum Constant Detail
-
BOM_UTF_32_BIG_ENDIAN
public static final EUnicodeBOM BOM_UTF_32_BIG_ENDIAN
UTF-32 Big Endian
-
BOM_UTF_32_LITTLE_ENDIAN
public static final EUnicodeBOM BOM_UTF_32_LITTLE_ENDIAN
UTF-32 Little Endian
-
BOM_UTF_7
public static final EUnicodeBOM BOM_UTF_7
UTF-7
-
BOM_UTF_7_ALT2
public static final EUnicodeBOM BOM_UTF_7_ALT2
UTF-7
-
BOM_UTF_7_ALT3
public static final EUnicodeBOM BOM_UTF_7_ALT3
UTF-7
-
BOM_UTF_7_ALT4
public static final EUnicodeBOM BOM_UTF_7_ALT4
UTF-7
-
BOM_UTF_EBCDIC
public static final EUnicodeBOM BOM_UTF_EBCDIC
UTF-EBCDIC
-
BOM_BOCU_1_ALT2
public static final EUnicodeBOM BOM_BOCU_1_ALT2
BOCU
-
BOM_GB_18030
public static final EUnicodeBOM BOM_GB_18030
GB 18030
-
BOM_UTF_8
public static final EUnicodeBOM BOM_UTF_8
UTF-8
-
BOM_UTF_1
public static final EUnicodeBOM BOM_UTF_1
UTF-1
-
BOM_BOCU_1
public static final EUnicodeBOM BOM_BOCU_1
BOCU
-
BOM_SCSU
public static final EUnicodeBOM BOM_SCSU
SCSU - Single-byte mode Quote Unicode
-
BOM_SCSU_TO_UCS
public static final EUnicodeBOM BOM_SCSU_TO_UCS
SCSU - Single-byte mode Change to Unicode
-
BOM_SCSU_W0_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W0_TO_FE80
SCSU - Single-byte mode Define dynamic window 0 to 0xFE80
-
BOM_SCSU_W1_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W1_TO_FE80
SCSU - Single-byte mode Define dynamic window 1 to 0xFE80
-
BOM_SCSU_W2_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W2_TO_FE80
SCSU - Single-byte mode Define dynamic window 2 to 0xFE80
-
BOM_SCSU_W3_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W3_TO_FE80
SCSU - Single-byte mode Define dynamic window 3 to 0xFE80
-
BOM_SCSU_W4_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W4_TO_FE80
SCSU - Single-byte mode Define dynamic window 4 to 0xFE80
-
BOM_SCSU_W5_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W5_TO_FE80
SCSU - Single-byte mode Define dynamic window 5 to 0xFE80
-
BOM_SCSU_W6_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W6_TO_FE80
SCSU - Single-byte mode Define dynamic window 6 to 0xFE80
-
BOM_SCSU_W7_TO_FE80
public static final EUnicodeBOM BOM_SCSU_W7_TO_FE80
SCSU - Single-byte mode Define dynamic window 7 to 0xFE80
-
BOM_UTF_16_BIG_ENDIAN
public static final EUnicodeBOM BOM_UTF_16_BIG_ENDIAN
UTF-16 Big Endian
-
BOM_UTF_16_LITTLE_ENDIAN
public static final EUnicodeBOM BOM_UTF_16_LITTLE_ENDIAN
UTF-16 Little Endian
-
-
Method Detail
-
values
public static EUnicodeBOM[] values()
Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:for (EUnicodeBOM c : EUnicodeBOM.values()) System.out.println(c);
- Returns:
- an array containing the constants of this enum type, in the order they are declared
-
valueOf
public static EUnicodeBOM valueOf(String name)
Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)- Parameters:
name- the name of the enum constant to be returned.- Returns:
- the enum constant with the specified name
- Throws:
IllegalArgumentException- if this enum type has no constant with the specified nameNullPointerException- if the argument is null
-
getAllBytes
@Nonnull @Nonempty @ReturnsMutableCopy public byte[] getAllBytes()
- Returns:
- A copy of the byte array that identifies this BOM.
-
getByteCount
@Nonnegative public int getByteCount()
- Returns:
- The number of bytes defining this BOM
-
isPresent
public boolean isPresent(@Nullable byte[] aBytes)
Check if the passed byte array starts with this BOM's bytes.- Parameters:
aBytes- The byte array to search for a BOM. May benullor empty.- Returns:
trueif the passed byte array starts with this BOM,falseotherwise.
-
getCharsetName
@Nullable public String getCharsetName()
- Returns:
- The name of the charset. This may be
nullif no known charset exists for Java. This string may be present, even ifgetCharset()returnsnull. To support e.g. "utf-7" you need to add additional JAR files.
-
getCharset
@Nullable public Charset getCharset()
- Returns:
- The charset matching this BOM. May be
nullif the charset is not part of the Sun JDK or there is not even a defined charset.
-
hasCharset
public boolean hasCharset()
- Returns:
trueif this BOM has an assigned charset,falseif not.- Since:
- 9.0.0
- See Also:
getCharset()
-
getMaximumByteCount
@Nonnegative public static int getMaximumByteCount()
- Returns:
- The maximum number of bytes a BOM may have.
-
getFromBytesOrNull
@Nullable public static EUnicodeBOM getFromBytesOrNull(@Nullable byte[] aBytes)
Find the BOM that is matching the passed byte array.- Parameters:
aBytes- The bytes to be checked for the BOM. May benull. To check all BOMs, this array must have at least 4 (=getMaximumByteCount()) bytes.- Returns:
nullif the passed bytes do not resemble a BOM.
-
-