Class GenericTokenizerAssembler
- java.lang.Object
-
- org.apache.jena.assembler.assemblers.AssemblerBase
-
- org.apache.jena.query.text.assembler.GenericTokenizerAssembler
-
- All Implemented Interfaces:
Assembler
public class GenericTokenizerAssembler extends AssemblerBase
Creates generic tokenizers given a fully qualified Class name and a list of parameters for a constructor of the Class.The parameters may be of the following types:
text:TypeString String text:TypeSet org.apache.lucene.analysis.util.CharArraySet text:TypeFile java.io.FileReader text:TypeInt int text:TypeBoolean boolean text:TypeAnalyzer org.apache.lucene.analysis.AnalyzerAlthough the list of types is not exhaustive it is a simple matter to create a wrapper Analyzer that reads a file with information that can be used to initialize any sort of parameters that may be needed for a given Analyzer. The provided types cover the vast majority of cases.For example,
org.apache.lucene.analysis.ja.JapaneseAnalyzerhas a constructor with 4 parameters: aUserDict, aCharArraySet, aJapaneseTokenizer.Mode, and aSet<String>. So a simple wrapper can extract the values needed for the various parameters with types not available in this extension, construct the required instances, and instantiate theJapaneseAnalyzer.Adding custom Analyzers such as the above wrapper analyzer is a simple matter of adding the Analyzer class and any associated filters and tokenizer and so on to the classpath for Jena - usually in a jar. Of course, all of the Analyzers that are included in the Lucene distribution bundled with Jena are available as generic Analyzers as well.
Each parameter object is specified with:
- an optional
text:paramNamethat may be used to document which parameter is represented - a
text:paramTypewhich is one of:text:TypeString,text:TypeSet,text:TypeFile,text:TypeInt,text:TypeBoolean,text:TypeAnalyzer. - a text:paramValue which is an xsd:string, xsd:boolean or xsd:int or resource.
A parameter of type
text:TypeSetmust have a list of zero or moreStrings.A parameter of type
text:TypeString,text:TypeFile,text:TypeBoolean,text:TypeIntortext:TypeAnalyzermust have a singletext:paramValueof the appropriate type.Examples:
<#indexLucene> a text:TextIndexLucene ; text:directory <file:Lucene> ; text:entityMap <#entMap> ; text:defineAnalyzers ( [text:addLang "sa-x-iast" ; text:analyzer [ . . . ]] [text:defineAnalyzer <#foo> ; text:analyzer [ . . . ]] [text:defineTokenizer <#bar> ; text:tokenizer [ a text:GenericTokenizer ; text:class "org.apache.lucene.analysis.ngram.NGramTokenizer" ; text:params ( [ text:paramName "minGram" ; text:paramType text:TypeInt ; text:paramValue 3 ] [ text:paramName "maxGram" ; text:paramType text:TypeInt ; text:paramValue 7 ] ) ] ] )
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classGenericTokenizerAssembler.TokenizerSpec
-
Field Summary
-
Fields inherited from interface org.apache.jena.assembler.Assembler
content, defaultModel, documentManager, fileManager, general, infModel, locationMapper, memoryModel, modelSource, ontModel, ontModelSpec, prefixMapping, reasonerFactory, ruleSet, unionModel
-
-
Constructor Summary
Constructors Constructor Description GenericTokenizerAssembler()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description GenericTokenizerAssembler.TokenizerSpecopen(Assembler a, Resource root, Mode mode)-
Methods inherited from class org.apache.jena.assembler.assemblers.AssemblerBase
getOptionalClassName, getRequiredResource, open, open, openModel, openModel
-
-
-
-
Method Detail
-
open
public GenericTokenizerAssembler.TokenizerSpec open(Assembler a, Resource root, Mode mode)
- Specified by:
openin interfaceAssembler- Specified by:
openin classAssemblerBase
-
-