Class NlpUtils


  • public final class NlpUtils
    extends java.lang.Object
    Utility functions for processing String and Characters in NLP problems.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean isControl​(char c)
      Check whether a character is is considered as a control character.
      static boolean isPunctuation​(char c)
      Check whether a character is considered as a punctuation.
      static boolean isWhiteSpace​(char c)
      Check whether a character is is considered as a whitespace.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • isWhiteSpace

        public static boolean isWhiteSpace​(char c)
        Check whether a character is is considered as a whitespace.

        tab, newline and unicode space characters are all considered as whitespace.

        Parameters:
        c - input character to be checked.
        Returns:
        whether a character is considered as a whitespace
      • isControl

        public static boolean isControl​(char c)
        Check whether a character is is considered as a control character.

        tab, newline and ios control characters are all considered as control character.

        Parameters:
        c - input character to be checked.
        Returns:
        whether a character is considered as control character
      • isPunctuation

        public static boolean isPunctuation​(char c)
        Check whether a character is considered as a punctuation.

        We treat all non-letter/number ASCII as punctuation. Characters such as "^", "$", and "`" are not in the Unicode Punctuation class but we treat them as punctuation anyways, for consistency.

        Parameters:
        c - input character to be checked
        Returns:
        whether the character is considered as a punctuation