Class PlingStemmer


  • public class PlingStemmer
    extends Object
    Copyright 2016 Fabian M. Suchanek

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

    The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:

     System.out.println(PlingStemmer.stem("boy"));
     ----> boy
     System.out.println(PlingStemmer.stem("boys"));
     ----> boy
     System.out.println(PlingStemmer.stem("biophysics"));
     ---->  biophysics
     System.out.println(PlingStemmer.stem("automata"));
     ----> automaton
     System.out.println(PlingStemmer.stem("genus"));
     ----> genus
     System.out.println(PlingStemmer.stem("emus"));
     ----> emu
     

    There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.

    It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.

    The PlingStemmer uses material from WordNet.

    It requires the class FinalSet from the Java Tools.

    • Field Detail

      • categorySE_SES

        public static Set<String> categorySE_SES
        Words that end in "-se" in their plural forms (like "nurse" etc.)
      • category00

        public static Set<String> category00
        Words that do not have a distinct plural form (like "atlas" etc.)
      • categoryUM_A

        public static Set<String> categoryUM_A
        Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
      • categoryON_A

        public static Set<String> categoryON_A
        Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
      • categoryO_I

        public static Set<String> categoryO_I
        Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
      • categoryUS_I

        public static Set<String> categoryUS_I
        Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
      • categoryIX_ICES

        public static Set<String> categoryIX_ICES
        Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
      • categoryIS_ES

        public static Set<String> categoryIS_ES
        Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
      • categoryOE_OES

        public static Set<String> categoryOE_OES
        Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
      • categoryEX_ICES

        public static Set<String> categoryEX_ICES
        Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
      • categoryU_US

        public static Set<String> categoryU_US
        Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
      • categorySSE_SSES

        public static Set<String> categorySSE_SSES
        Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
      • categoryCHE_CHES

        public static Set<String> categoryCHE_CHES
        Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
      • categoryICS

        public static Set<String> categoryICS
        Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
      • categoryIE_IES

        public static Set<String> categoryIE_IES
        Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
      • irregular

        public static Map<String,​String> irregular
        Maps irregular Germanic English plural nouns to their singular form
      • singAndPlur

        public static Set<String> singAndPlur
        Contains word forms that can either be plural or singular
    • Constructor Detail

      • PlingStemmer

        public PlingStemmer()
    • Method Detail

      • isPlural

        public static boolean isPlural​(String s)
        Tells whether a word form is plural. This method just checks whether the stem method alters the word
      • isSingular

        public static boolean isSingular​(String s)
        Tells whether a word form is singular. Note that a word can be both plural and singular
      • isSingularAndPlural

        public static boolean isSingularAndPlural​(String s)
        Tells whether a word form is the singular form of one word and at the same time the plural form of another.
      • cut

        public static String cut​(String s,
                                 String suffix)
        Cuts a suffix from a string (that is the number of chars given by the suffix)
      • noLatin

        public static boolean noLatin​(String s)
        Returns true if a word is probably not Latin
      • stem

        public static String stem​(String s)
        Stems an English noun