Package org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
- java.lang.Object
-
- org.apache.poi.xwpf.extractor.XWPFWordExtractor
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,POITextExtractor,POIXMLTextExtractor
public class XWPFWordExtractor extends java.lang.Object implements POIXMLTextExtractor
Helper class to extract text from an OOXML Word file
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.List<XWPFRelation>SUPPORTED_TYPES
-
Constructor Summary
Constructors Constructor Description XWPFWordExtractor(OPCPackage container)XWPFWordExtractor(XWPFDocument document)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidappendBodyElementText(java.lang.StringBuilder text, IBodyElement e)voidappendParagraphText(java.lang.StringBuilder text, XWPFParagraph paragraph)XWPFDocumentgetDocument()Returns opened documentXWPFDocumentgetFilesystem()java.lang.StringgetText()Retrieves all the text from the document.booleanisCloseFilesystem()voidsetCloseFilesystem(boolean doCloseFilesystem)voidsetConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)Should we concatenate phonetic runs in extraction.voidsetFetchHyperlinks(boolean fetch)Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.poi.ooxml.extractor.POIXMLTextExtractor
checkMaxTextSize, close, getCoreProperties, getCustomProperties, getExtendedProperties, getMetadataTextExtractor, getPackage
-
-
-
-
Field Detail
-
SUPPORTED_TYPES
public static final java.util.List<XWPFRelation> SUPPORTED_TYPES
-
-
Constructor Detail
-
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container) throws java.io.IOException
- Throws:
java.io.IOException
-
XWPFWordExtractor
public XWPFWordExtractor(XWPFDocument document)
-
-
Method Detail
-
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
-
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction. Default istrue- Parameters:
concatenatePhoneticRuns- If phonetic runs should be concatenated
-
getText
public java.lang.String getText()
Description copied from interface:POITextExtractorRetrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getTextin interfacePOITextExtractor- Returns:
- All the text from the document
-
appendBodyElementText
public void appendBodyElementText(java.lang.StringBuilder text, IBodyElement e)
-
appendParagraphText
public void appendParagraphText(java.lang.StringBuilder text, XWPFParagraph paragraph)
-
getDocument
public XWPFDocument getDocument()
Description copied from interface:POIXMLTextExtractorReturns opened document- Specified by:
getDocumentin interfacePOITextExtractor- Specified by:
getDocumentin interfacePOIXMLTextExtractor- Returns:
- the opened document
-
setCloseFilesystem
public void setCloseFilesystem(boolean doCloseFilesystem)
- Specified by:
setCloseFilesystemin interfacePOITextExtractor- Parameters:
doCloseFilesystem-true(default), if underlying resources/filesystem should be closed onPOITextExtractor.close()
-
isCloseFilesystem
public boolean isCloseFilesystem()
- Specified by:
isCloseFilesystemin interfacePOITextExtractor- Returns:
true, if resources/filesystem should be closed onPOITextExtractor.close()
-
getFilesystem
public XWPFDocument getFilesystem()
- Specified by:
getFilesystemin interfacePOITextExtractor- Returns:
- The underlying resources/filesystem
-
-