public interface DocnoMapping
Interface for an object that maintains a bidirectional mapping between docids and docnos. A docid is a globally-unique String identifier for a document in the collection. For many types of information retrieval algorithms, documents in the collection must be sequentially numbered; thus, each document in the collection must be assigned a unique integer identifier, which is its docno. Typically, the docid/docno mappings are stored in a mappings file, which is loaded into memory by concrete objects implementing this interface.
Unless there are compelling reasons otherwise, it is preferable to start numbering docnos from one instead of zero. This is because zero cannot be represented in many common compression schemes that are used in information retrieval (e.g., Golomb codes).
| Modifier and Type | Interface and Description |
|---|---|
static interface |
DocnoMapping.Builder
Interface for an object that constructs a
DocnoMapping. |
static class |
DocnoMapping.BuilderUtils |
static class |
DocnoMapping.DefaultBuilderOptions |
| Modifier and Type | Method and Description |
|---|---|
DocnoMapping.Builder |
getBuilder()
Returns the builder for this mapping.
|
String |
getDocid(int docno)
Returns the docid for a particular docno.
|
int |
getDocno(String docid)
Returns the docno for a particular docid.
|
void |
loadMapping(org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.FileSystem fs)
Loads a mapping file.
|
int getDocno(String docid)
docid - the docidString getDocid(int docno)
docno - the docnovoid loadMapping(org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.FileSystem fs)
throws IOException
path - path to the mappings filefs - reference to the FileSystemIOExceptionDocnoMapping.Builder getBuilder()
Copyright © 2015. All rights reserved.