Class PlingStemmer
- java.lang.Object
-
- com.ethlo.zally.rules.common.PlingStemmer
-
public class PlingStemmer extends Object
Copyright 2016 Fabian M. SuchanekLicensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:
System.out.println(PlingStemmer.stem("boy")); ----> boy System.out.println(PlingStemmer.stem("boys")); ----> boy System.out.println(PlingStemmer.stem("biophysics")); ----> biophysics System.out.println(PlingStemmer.stem("automata")); ----> automaton System.out.println(PlingStemmer.stem("genus")); ----> genus System.out.println(PlingStemmer.stem("emus")); ----> emuThere are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.
It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.
The PlingStemmer uses material from WordNet.
It requires the class FinalSet from the Java Tools.
-
-
Field Summary
Fields Modifier and Type Field Description static Set<String>category00Words that do not have a distinct plural form (like "atlas" etc.)static Set<String>categoryCHE_CHESWords that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural formsstatic Set<String>categoryEX_ICESWords that change from "-ex" to "-ices" (like "index" etc.), listed in their plural formsstatic Set<String>categoryICSWords that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)static Set<String>categoryIE_IESWords that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural formsstatic Set<String>categoryIS_ESWords that change from "-is" to "-es" (like "axis" etc.), listed in their plural formsstatic Set<String>categoryIX_ICESWords that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural formsstatic Set<String>categoryO_IWords that change from "-o" to "-i" (like "libretto" etc.), listed in their plural formsstatic Set<String>categoryOE_OESWords that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural formsstatic Set<String>categoryON_AWords that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural formsstatic Set<String>categorySE_SESWords that end in "-se" in their plural forms (like "nurse" etc.)static Set<String>categorySSE_SSESWords that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural formsstatic Set<String>categoryU_USWords that change from "-u" to "-us" (like "emu" etc.), listed in their plural formsstatic Set<String>categoryUM_AWords that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural formsstatic Set<String>categoryUS_IWords that change from "-us" to "-i" (like "fungus" etc.), listed in their plural formsstatic Map<String,String>irregularMaps irregular Germanic English plural nouns to their singular formstatic Set<String>singAndPlurContains word forms that can either be plural or singular
-
Constructor Summary
Constructors Constructor Description PlingStemmer()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static Stringcut(String s, String suffix)Cuts a suffix from a string (that is the number of chars given by the suffix)static booleanisPlural(String s)Tells whether a word form is plural.static booleanisSingular(String s)Tells whether a word form is singular.static booleanisSingularAndPlural(String s)Tells whether a word form is the singular form of one word and at the same time the plural form of another.static voidmain(String[] argv)Test routinestatic booleannoLatin(String s)Returns true if a word is probably not Latinstatic Stringstem(String s)Stems an English noun
-
-
-
Field Detail
-
categorySE_SES
public static Set<String> categorySE_SES
Words that end in "-se" in their plural forms (like "nurse" etc.)
-
category00
public static Set<String> category00
Words that do not have a distinct plural form (like "atlas" etc.)
-
categoryUM_A
public static Set<String> categoryUM_A
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
-
categoryON_A
public static Set<String> categoryON_A
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
-
categoryO_I
public static Set<String> categoryO_I
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
-
categoryUS_I
public static Set<String> categoryUS_I
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
-
categoryIX_ICES
public static Set<String> categoryIX_ICES
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
-
categoryIS_ES
public static Set<String> categoryIS_ES
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
-
categoryOE_OES
public static Set<String> categoryOE_OES
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
-
categoryEX_ICES
public static Set<String> categoryEX_ICES
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
-
categoryU_US
public static Set<String> categoryU_US
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
-
categorySSE_SSES
public static Set<String> categorySSE_SSES
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
-
categoryCHE_CHES
public static Set<String> categoryCHE_CHES
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
-
categoryICS
public static Set<String> categoryICS
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
-
categoryIE_IES
public static Set<String> categoryIE_IES
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
-
irregular
public static Map<String,String> irregular
Maps irregular Germanic English plural nouns to their singular form
-
-
Method Detail
-
isPlural
public static boolean isPlural(String s)
Tells whether a word form is plural. This method just checks whether the stem method alters the word
-
isSingular
public static boolean isSingular(String s)
Tells whether a word form is singular. Note that a word can be both plural and singular
-
isSingularAndPlural
public static boolean isSingularAndPlural(String s)
Tells whether a word form is the singular form of one word and at the same time the plural form of another.
-
cut
public static String cut(String s, String suffix)
Cuts a suffix from a string (that is the number of chars given by the suffix)
-
noLatin
public static boolean noLatin(String s)
Returns true if a word is probably not Latin
-
-