Class StringUtil

java.lang.Object
org.apache.poi.util.StringUtil

@Internal public final class StringUtil extends Object
Collection of string handling utilities
  • Field Details

    • UTF16LE

      public static final Charset UTF16LE
    • UTF8

      public static final Charset UTF8
    • WIN_1252

      public static final Charset WIN_1252
  • Method Details

    • setMaxRecordLength

      public static void setMaxRecordLength(int length)
      Parameters:
      length - the max record length allowed for StringUtil
    • getMaxRecordLength

      public static int getMaxRecordLength()
      Returns:
      the max record length allowed for StringUtil
    • getFromUnicodeLE

      public static String getFromUnicodeLE(byte[] string, int offset, int len) throws ArrayIndexOutOfBoundsException, IllegalArgumentException
      Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.

      { 0x16, 0x00 } -0x16

      Parameters:
      string - the byte array to be converted
      offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
      len - the length of the final string
      Returns:
      the converted string, never null.
      Throws:
      ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
      IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
    • getFromUnicodeLE

      public static String getFromUnicodeLE(byte[] string)
      Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.

      { 0x16, 0x00 } -0x16

      Parameters:
      string - the byte array to be converted
      Returns:
      the converted string, never null
    • getToUnicodeLE

      public static byte[] getToUnicodeLE(String string)
      Convert String to 16-bit unicode characters in little endian format
      Parameters:
      string - the string
      Returns:
      the byte array of 16-bit unicode characters
    • getFromCompressedUnicode

      public static String getFromCompressedUnicode(byte[] string, int offset, int len)
      Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
      Parameters:
      string - byte array to read
      offset - offset to read byte array
      len - length to read byte array
      Returns:
      String generated String instance by reading byte array (ISO-8859-1)
    • getFromCompressedUTF8

      public static String getFromCompressedUTF8(byte[] string, int offset, int len)
      Read 8 bit data (in UTF-8 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
      Parameters:
      string - byte array to read
      offset - offset to read byte array
      len - length to read byte array
      Returns:
      String generated String instance by reading byte array (UTF-8)
    • readCompressedUnicode

      public static String readCompressedUnicode(LittleEndianInput in, int nChars)
      Parameters:
      in - stream,
      nChars - number pf chars
      Returns:
      ISO_8859_1 encoded result
    • readUnicodeString

      public static String readUnicodeString(LittleEndianInput in)
      InputStream in is expected to contain:
      1. ushort nChars
      2. byte is16BitFlag
      3. byte[]/char[] characterData
      For this encoding, the is16BitFlag is always present even if nChars==0.

      This structure is also known as a XLUnicodeString.

    • readUnicodeString

      public static String readUnicodeString(LittleEndianInput in, int nChars)
      InputStream in is expected to contain:
      1. byte is16BitFlag
      2. byte[]/char[] characterData
      For this encoding, the is16BitFlag is always present even if nChars==0.
      This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used.
    • writeUnicodeString

      public static void writeUnicodeString(LittleEndianOutput out, String value)
      OutputStream out will get:
      1. ushort nChars
      2. byte is16BitFlag
      3. byte[]/char[] characterData
      For this encoding, the is16BitFlag is always present even if nChars==0.
    • writeUnicodeStringFlagAndData

      public static void writeUnicodeStringFlagAndData(LittleEndianOutput out, String value)
      OutputStream out will get:
      1. byte is16BitFlag
      2. byte[]/char[] characterData
      For this encoding, the is16BitFlag is always present even if nChars==0.
      This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used.
    • getEncodedSize

      public static int getEncodedSize(String value)
      Returns:
      the number of bytes that would be written by writeUnicodeString(LittleEndianOutput, String)
    • putCompressedUnicode

      public static void putCompressedUnicode(String input, byte[] output, int offset)
      Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)
      Parameters:
      input - the String containing the data to be written
      output - the byte array to which the data is to be written
      offset - an offset into the byte arrat at which the data is start when written
    • putCompressedUnicode

      public static void putCompressedUnicode(String input, LittleEndianOutput out)
    • putUnicodeLE

      public static void putUnicodeLE(String input, byte[] output, int offset)
      Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
      Parameters:
      input - the String containing the unicode data to be written
      output - the byte array to hold the uncompressed unicode, should be twice the length of the String
      offset - the offset to start writing into the byte array
    • putUnicodeLE

      public static void putUnicodeLE(String input, LittleEndianOutput out)
    • readUnicodeLE

      public static String readUnicodeLE(LittleEndianInput in, int nChars)
    • getPreferredEncoding

      public static String getPreferredEncoding()
      Returns:
      the encoding we want to use, currently hardcoded to ISO-8859-1
    • hasMultibyte

      public static boolean hasMultibyte(String value)
      check the parameter has multibyte character
      Parameters:
      value - string to check
      Returns:
      boolean result true:string has at least one multibyte character
    • startsWithIgnoreCase

      public static boolean startsWithIgnoreCase(String haystack, String prefix)
      Tests if the string starts with the specified prefix, ignoring case consideration.
    • endsWithIgnoreCase

      public static boolean endsWithIgnoreCase(String haystack, String suffix)
      Tests if the string ends with the specified suffix, ignoring case consideration.
    • toLowerCase

      @Internal public static String toLowerCase(char c)
    • toUpperCase

      @Internal public static String toUpperCase(char c)
    • isUpperCase

      @Internal public static boolean isUpperCase(char c)
    • mapMsCodepointString

      public static String mapMsCodepointString(String string)
      Some strings may contain encoded characters of the unicode private use area. Currently the characters of the symbol fonts are mapped to the corresponding characters in the normal unicode range.
      Parameters:
      string - the original string
      Returns:
      the string with mapped characters
      See Also:
    • join

      @Internal public static String join(Object[] array, String separator)
    • join

      @Internal public static String join(Object[] array)
    • join

      @Internal public static String join(String separator, Object... array)
    • countMatches

      public static int countMatches(CharSequence haystack, char needle)
      Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
      Parameters:
      haystack - the CharSequence to check, may be null
      needle - the character to count the quantity of
      Returns:
      the number of occurrences, 0 if the CharSequence is null
    • getFromUnicodeLE0Terminated

      public static String getFromUnicodeLE0Terminated(byte[] string, int offset, int len) throws ArrayIndexOutOfBoundsException, IllegalArgumentException
      Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.

      #61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char

      Parameters:
      string - the byte array to be converted
      offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
      len - the max. length of the final string
      Returns:
      the converted string, never null.
      Throws:
      ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
      IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
    • length

      public static int length(CharSequence cs)
      Gets a CharSequence length or 0 if the CharSequence is null. copied from commons-lang3
      Parameters:
      cs - a CharSequence or null
      Returns:
      CharSequence length or 0 if the CharSequence is null.
    • isBlank

      public static boolean isBlank(CharSequence cs)

      Checks if a CharSequence is empty (""), null or whitespace only.

      Whitespace is defined by Character.isWhitespace(char).

       StringUtil.isBlank(null)      = true
       StringUtil.isBlank("")        = true
       StringUtil.isBlank(" ")       = true
       StringUtil.isBlank("bob")     = false
       StringUtil.isBlank("  bob  ") = false
       
      copied from commons-lang3
      Parameters:
      cs - the CharSequence to check, may be null
      Returns:
      true if the CharSequence is null, empty or whitespace only
    • isNotBlank

      public static boolean isNotBlank(CharSequence cs)

      Checks if a CharSequence is not empty (""), not null and not whitespace only.

      Whitespace is defined by Character.isWhitespace(char).

       StringUtil.isNotBlank(null)      = false
       StringUtil.isNotBlank("")        = false
       StringUtil.isNotBlank(" ")       = false
       StringUtil.isNotBlank("bob")     = true
       StringUtil.isNotBlank("  bob  ") = true
       
      copied from commons-lang3
      Parameters:
      cs - the CharSequence to check, may be null
      Returns:
      true if the CharSequence is not empty and not null and not whitespace only
    • repeat

      public static String repeat(char ch, int repeat)

      Returns padding using the specified delimiter repeated to a given length.

       StringUtil.repeat('e', 0)  = ""
       StringUtil.repeat('e', 3)  = "eee"
       StringUtil.repeat('e', -2) = ""
       

      Note: this method does not support padding with Unicode Supplementary Characters as they require a pair of chars to be represented.

      copied from commons-lang3
      Parameters:
      ch - character to repeat
      repeat - number of times to repeat char, negative treated as zero
      Returns:
      String with repeated character