Class PlingStemmer

java.lang.Object
com.ethlo.zally.rules.common.PlingStemmer

public class PlingStemmer extends Object
Copyright 2016 Fabian M. Suchanek

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:

 System.out.println(PlingStemmer.stem("boy"));
 ----> boy
 System.out.println(PlingStemmer.stem("boys"));
 ----> boy
 System.out.println(PlingStemmer.stem("biophysics"));
 ---->  biophysics
 System.out.println(PlingStemmer.stem("automata"));
 ----> automaton
 System.out.println(PlingStemmer.stem("genus"));
 ----> genus
 System.out.println(PlingStemmer.stem("emus"));
 ----> emu
 

There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.

It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.

The PlingStemmer uses material from WordNet.

It requires the class FinalSet from the Java Tools.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static Set<String>
    Words that do not have a distinct plural form (like "atlas" etc.)
    static Set<String>
    Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
    static Set<String>
    Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
    static Set<String>
    Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
    static Set<String>
    Words that end in "-se" in their plural forms (like "nurse" etc.)
    static Set<String>
    Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
    static Set<String>
    Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
    static Map<String,String>
    Maps irregular Germanic English plural nouns to their singular form
    static Set<String>
    Contains word forms that can either be plural or singular
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static String
    cut(String s, String suffix)
    Cuts a suffix from a string (that is the number of chars given by the suffix)
    static boolean
    Tells whether a word form is plural.
    static boolean
    Tells whether a word form is singular.
    static boolean
    Tells whether a word form is the singular form of one word and at the same time the plural form of another.
    static void
    main(String[] argv)
    Test routine
    static boolean
    Returns true if a word is probably not Latin
    static String
    Stems an English noun

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • categorySE_SES

      public static Set<String> categorySE_SES
      Words that end in "-se" in their plural forms (like "nurse" etc.)
    • category00

      public static Set<String> category00
      Words that do not have a distinct plural form (like "atlas" etc.)
    • categoryUM_A

      public static Set<String> categoryUM_A
      Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
    • categoryON_A

      public static Set<String> categoryON_A
      Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
    • categoryO_I

      public static Set<String> categoryO_I
      Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
    • categoryUS_I

      public static Set<String> categoryUS_I
      Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
    • categoryIX_ICES

      public static Set<String> categoryIX_ICES
      Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
    • categoryIS_ES

      public static Set<String> categoryIS_ES
      Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
    • categoryOE_OES

      public static Set<String> categoryOE_OES
      Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
    • categoryEX_ICES

      public static Set<String> categoryEX_ICES
      Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
    • categoryU_US

      public static Set<String> categoryU_US
      Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
    • categorySSE_SSES

      public static Set<String> categorySSE_SSES
      Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
    • categoryCHE_CHES

      public static Set<String> categoryCHE_CHES
      Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
    • categoryICS

      public static Set<String> categoryICS
      Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
    • categoryIE_IES

      public static Set<String> categoryIE_IES
      Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
    • irregular

      public static Map<String,String> irregular
      Maps irregular Germanic English plural nouns to their singular form
    • singAndPlur

      public static Set<String> singAndPlur
      Contains word forms that can either be plural or singular
  • Constructor Details

    • PlingStemmer

      public PlingStemmer()
  • Method Details

    • isPlural

      public static boolean isPlural(String s)
      Tells whether a word form is plural. This method just checks whether the stem method alters the word
    • isSingular

      public static boolean isSingular(String s)
      Tells whether a word form is singular. Note that a word can be both plural and singular
    • isSingularAndPlural

      public static boolean isSingularAndPlural(String s)
      Tells whether a word form is the singular form of one word and at the same time the plural form of another.
    • cut

      public static String cut(String s, String suffix)
      Cuts a suffix from a string (that is the number of chars given by the suffix)
    • noLatin

      public static boolean noLatin(String s)
      Returns true if a word is probably not Latin
    • stem

      public static String stem(String s)
      Stems an English noun
    • main

      public static void main(String[] argv) throws Exception
      Test routine
      Throws:
      Exception