| Package | Description |
|---|---|
| edu.umd.cloud9.collection |
Base classes and interfaces for working with specific document
collections.
|
| edu.umd.cloud9.collection.aquaint2 |
Provides classes for working with the AQUAINT-2 collection.
|
| edu.umd.cloud9.collection.clue |
Provides classes for working with
the ClueWeb09
collection.
|
| edu.umd.cloud9.collection.line | |
| edu.umd.cloud9.collection.medline |
Provides classes for working with MEDLINE citations in XML format
(particularly, for the TREC 2004-5 genomics tracks).
|
| edu.umd.cloud9.collection.trec |
Provides classes for working with the TREC collection (particularly
disks 4 and 5).
|
| edu.umd.cloud9.collection.trecweb |
Provides classes for working with the GOV2 collection.
|
| edu.umd.cloud9.collection.wikipedia |
Provides classes for working with Wikipedia XML dumps.
|
| edu.umd.cloud9.collection.wikipedia.language |
Provides language dependent classes for working with Wikipedia XML dumps.
|
| edu.umd.cloud9.webgraph.data |
| Modifier and Type | Interface and Description |
|---|---|
interface |
DocumentForwardIndex<T extends Indexable>
Interface for a document forward index.
|
class |
IndexableFileInputFormat<K,V extends Indexable>
Abstract class representing a
FileInputFormat for Indexable objects (org.apache.hadoop.mapreduce API). |
class |
IndexableFileInputFormatOld<K,V extends Indexable>
|
| Modifier and Type | Class and Description |
|---|---|
class |
WebDocument |
| Modifier and Type | Method and Description |
|---|---|
void |
ExtractHTMLFieldCollection.MyMapper.map(org.apache.hadoop.io.LongWritable key,
Indexable doc,
org.apache.hadoop.mapreduce.Mapper.Context context) |
| Modifier and Type | Class and Description |
|---|---|
class |
Aquaint2Document |
| Modifier and Type | Class and Description |
|---|---|
class |
ClueWarcRecord |
| Modifier and Type | Class and Description |
|---|---|
class |
TextDocument
Object representing a simple document.
|
| Modifier and Type | Class and Description |
|---|---|
class |
MedlineCitation
Object representing a MEDLINE citation.
|
| Modifier and Type | Class and Description |
|---|---|
class |
TrecDocument
Object representing a TREC document.
|
| Modifier and Type | Class and Description |
|---|---|
class |
TrecWebDocument |
| Modifier and Type | Class and Description |
|---|---|
class |
WikipediaPage
A page from Wikipedia.
|
| Modifier and Type | Class and Description |
|---|---|
class |
ArabicWikipediaPage
An Arabic page from Wikipedia.
|
class |
ChineseWikipediaPage
An Chinese page from Wikipedia.
|
class |
CzechWikipediaPage
An Czech page from Wikipedia.
|
class |
EnglishWikipediaPage
An English page from Wikipedia.
|
class |
GermanWikipediaPage
An German page from Wikipedia.
|
class |
SpanishWikipediaPage
An Spanish page from Wikipedia.
|
class |
SwedishWikipediaPage
A Swedish page from Wikipedia.
|
class |
TurkishWikipediaPage
An Turkish page from Wikipedia.
|
| Modifier and Type | Class and Description |
|---|---|
class |
IndexableAnchorText
An Indexable implementation for anchor text/web graph collections, used in generating ForwardIndex.
|
Copyright © 2015. All rights reserved.