See: Description
| Class | Description |
|---|---|
| CountWikipediaPages |
Tool for counting the number of pages in a particular Wikipedia XML dump file.
|
| DumpWikipediaToPlainText |
Tool for taking a Wikipedia XML dump file and spits out articles in a flat text file (article
title and content, separated by a tab).
|
| LookupWikipediaArticle |
Tool for providing command-line access to page titles given either a docno or a docid.
|
| RepackWikipedia |
Tool for repacking Wikipedia XML dumps into
SequenceFiles. |
| WikipediaDocnoMapping |
Provides a mapping between Wikipedia internal ids (docids) and sequentially-numbered ints
(docnos).
|
| WikipediaDocnoMappingBuilder |
Tool for building the mapping between Wikipedia internal ids (docids) and sequentially-numbered
ints (docnos).
|
| WikipediaForwardIndex |
Forward index for Wikipedia collections.
|
| WikipediaForwardIndexBuilder |
Tool for building a document forward index for Wikipedia.
|
| WikipediaPage |
A page from Wikipedia.
|
| WikipediaPage.Link | |
| WikipediaPageInputFormat |
Hadoop
InputFormat for processing Wikipedia pages from the XML dumps. |
| WikipediaPageInputFormat.WikipediaPageRecordReader | |
| WikipediaPageInputFormatOld |
Hadoop
InputFormat for processing Wikipedia pages from the XML dumps. |
| WikipediaPageInputFormatOld.WikipediaPageRecordReader |
Hadoop
RecordReader for reading Wikipedia pages from the XML dumps. |
| WikipediaPagesBz2InputStream |
Class for working with bz2-compressed Wikipedia article dump files on local disk.
|
Provides classes for working with Wikipedia XML dumps.
Copyright © 2015. All rights reserved.