public class Params extends Object
GenericAnalyzer,
GenericFilter, and GenericTokenizer.
The parameters may be of the following types:
text:TypeString String
text:TypeSet org.apache.lucene.analysis.util.CharArraySet
text:TypeFile java.io.FileReader
text:TypeInt int
text:TypeBoolean boolean
text:TypeAnalyzer org.apache.lucene.analysis.Analyzer
text:TypeTokenStream org.apache.lucene.analysis.TokenStream
Although the list of types is not exhaustive it is a simple matter
to create a wrapper Analyzer, Filter, Tokenizer that reads a file with information
that can be used to initialize any sort of parameters that may be needed.
The provided types cover the vast majority of cases.
For example, org.apache.lucene.analysis.ja.JapaneseAnalyzer
has a constructor with 4 parameters: a UserDict,
a CharArraySet, a JapaneseTokenizer.Mode, and a
Set<String>. So a simple wrapper can extract the values
needed for the various parameters with types not available in this
extension, construct the required instances, and instantiate the
JapaneseAnalyzer.
Adding custom Analyzers, etc., such as the above wrapper analyzer is a simple matter of adding the Analyzer class and any associated filters and tokenizer and so on to the classpath for Jena - usually in a jar. Of course, all of the Analyzers, Filters, and Tokenizers that are included in the Lucene distribution bundled with Jena are available as generics as well.
Each parameter object is specified with:
text:paramName that may be used to document which
parameter is representedtext:paramType which is one of: text:TypeString,
text:TypeSet, text:TypeFile, text:TypeInt,
text:TypeBoolean, text:TypeAnalyzer.
A parameter of type text:TypeSet must have a list of zero or
more Strings.
A parameter of type text:TypeString, text:TypeFile,
text:TypeBoolean, text:TypeInt or text:TypeAnalyzer
must have a single text:paramValue of the appropriate type.
A parameter of type text:TypeTokenStream does not have text:paramValue.
It is used to mark the occurence of the TokenStream parameter for a Filter.
Examples:
text:map (
[ text:field "text" ;
text:predicate rdfs:label;
text:analyzer [
a text:GenericAnalyzer ;
text:class "org.apache.lucene.analysis.en.EnglishAnalyzer" ;
text:params (
[ text:paramName "stopwords" ;
text:paramType text:TypeSet ;
text:paramValue ("the" "a" "an") ]
[ text:paramName "stemExclusionSet" ;
text:paramType text:TypeSet ;
text:paramValue ("ing" "ed") ]
)
] .
[] a text:TextIndexLucene ;
text:defineFilters (
text:filter [
a text:GenericFilter ;
text:class "fi.finto.FoldingFilter" ;
text:params (
[ text:paramName "source" ;
text:paramType text:TypeTokenStream ]
[ text:paramName "whitelisted" ;
text:paramType text:TypeSet ;
text:paramValue ("รง") ]
)
]
)
| Modifier and Type | Field and Description |
|---|---|
static String |
TYPE_ANALYZER |
static String |
TYPE_BOOL |
static String |
TYPE_FILE |
static String |
TYPE_INT |
static String |
TYPE_SET |
static String |
TYPE_STRING |
static String |
TYPE_TOKENSTREAM |
| Constructor and Description |
|---|
Params() |
public static final String TYPE_ANALYZER
public static final String TYPE_BOOL
public static final String TYPE_FILE
public static final String TYPE_INT
public static final String TYPE_SET
public static final String TYPE_STRING
public static final String TYPE_TOKENSTREAM
Licenced under the Apache License, Version 2.0