Class PlingStemmer
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:
System.out.println(PlingStemmer.stem("boy"));
----> boy
System.out.println(PlingStemmer.stem("boys"));
----> boy
System.out.println(PlingStemmer.stem("biophysics"));
----> biophysics
System.out.println(PlingStemmer.stem("automata"));
----> automaton
System.out.println(PlingStemmer.stem("genus"));
----> genus
System.out.println(PlingStemmer.stem("emus"));
----> emu
There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.
It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.
The PlingStemmer uses material from WordNet.
It requires the class FinalSet from the Java Tools.
-
Field Summary
FieldsModifier and TypeFieldDescriptionWords that do not have a distinct plural form (like "atlas" etc.)Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural formsWords that change from "-ex" to "-ices" (like "index" etc.), listed in their plural formsWords that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural formsWords that change from "-is" to "-es" (like "axis" etc.), listed in their plural formsWords that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural formsWords that change from "-o" to "-i" (like "libretto" etc.), listed in their plural formsWords that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural formsWords that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural formsWords that end in "-se" in their plural forms (like "nurse" etc.)Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural formsWords that change from "-u" to "-us" (like "emu" etc.), listed in their plural formsWords that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural formsWords that change from "-us" to "-i" (like "fungus" etc.), listed in their plural formsMaps irregular Germanic English plural nouns to their singular formContains word forms that can either be plural or singular -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringCuts a suffix from a string (that is the number of chars given by the suffix)static booleanTells whether a word form is plural.static booleanisSingular(String s) Tells whether a word form is singular.static booleanTells whether a word form is the singular form of one word and at the same time the plural form of another.static voidTest routinestatic booleanReturns true if a word is probably not Latinstatic StringStems an English noun
-
Field Details
-
categorySE_SES
Words that end in "-se" in their plural forms (like "nurse" etc.) -
category00
Words that do not have a distinct plural form (like "atlas" etc.) -
categoryUM_A
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms -
categoryON_A
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms -
categoryO_I
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms -
categoryUS_I
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms -
categoryIX_ICES
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms -
categoryIS_ES
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms -
categoryOE_OES
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms -
categoryEX_ICES
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms -
categoryU_US
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms -
categorySSE_SSES
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms -
categoryCHE_CHES
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms -
categoryICS
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.) -
categoryIE_IES
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms -
irregular
Maps irregular Germanic English plural nouns to their singular form -
singAndPlur
Contains word forms that can either be plural or singular
-
-
Constructor Details
-
PlingStemmer
public PlingStemmer()
-
-
Method Details
-
isPlural
Tells whether a word form is plural. This method just checks whether the stem method alters the word -
isSingular
Tells whether a word form is singular. Note that a word can be both plural and singular -
isSingularAndPlural
Tells whether a word form is the singular form of one word and at the same time the plural form of another. -
cut
Cuts a suffix from a string (that is the number of chars given by the suffix) -
noLatin
Returns true if a word is probably not Latin -
stem
Stems an English noun -
main
Test routine- Throws:
Exception
-