See: Description
| Class | Description |
|---|---|
| CountMedlineCitations |
Simple demo program that counts all the documents in the TREC collection.
|
| MedlineCitation |
Object representing a MEDLINE citation.
|
| MedlineCitationInputFormat |
Hadoop
InputFormat for processing the MEDLINE citations in XML format (new API). |
| MedlineCitationInputFormat.MedlineCitationRecordReader | |
| MedlineCitationInputFormatOld |
Hadoop
InputFormat for processing the MEDLINE citations in XML format (old API). |
| MedlineCitationInputFormatOld.MedlineCitationRecordReader |
Hadoop
RecordReader for reading MEDLINE citations in XML format. |
| MedlineDocnoMapping |
Object that maps between MEDLINE docids (PMIDs) to docnos (sequentially-numbered ints).
|
| MedlineDocnoMappingBuilder |
Tool that builds the mapping from MEDLINE docids (PMIDs) to docnos (sequentially-numbered ints).
|
Provides classes for working with MEDLINE citations in XML format (particularly, for the TREC 2004-5 genomics tracks). The TREC 2004 and TREC 2005 genomics tracks used a 10-year subset of MEDLINE totaling 4,591,008 records (citations); this is commonly called the MEDLINE04 collection. These classes are designed to work with the XML-formatted version of the distribution, which comes in five different files:
Here are the two steps for preparing the collection for processing with Hadoop:
NumberMedlineCitations
accomplishes this. Here is a sample invocation:hadoop jar cloud9.jar edu.umd.cloud9.collection.medline.NumberMedlineCitations \ /umd/collections/medline04.raw/ \ /user/jimmylin/medline-docid-tmp \ /user/jimmylin/docno.mapping 100
After the corpus has been prepared, it is ready for processing with
Hadoop. The
class DemoCountMedlineCitations
is a simple demo program that counts all documents in the collection.
It provides a skeleton for MapReduce programs that process the
collection. Here is a sample invocation:
hadoop jar cloud9.jar edu.umd.cloud9.collection.medline.DemoCountMedlineCitations \ /umd/collections/medline04.raw/ \ /user/jimmylin/count-tmp \ /user/jimmylin/docno.mapping 100
The output key-value pairs in this sample program are the docid to docno mappings.
Copyright © 2015. All rights reserved.