Class XWPFWordExtractor

java.lang.Object
org.apache.poi.xwpf.extractor.XWPFWordExtractor
All Implemented Interfaces:
Closeable, AutoCloseable, POITextExtractor, POIXMLTextExtractor

public class XWPFWordExtractor extends Object implements POIXMLTextExtractor
Helper class to extract text from an OOXML Word file
  • Field Details

  • Constructor Details

  • Method Details

    • setFetchHyperlinks

      public void setFetchHyperlinks(boolean fetch)
      Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
    • setConcatenatePhoneticRuns

      public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
      Should we concatenate phonetic runs in extraction. Default is true
      Parameters:
      concatenatePhoneticRuns - If phonetic runs should be concatenated
    • getText

      public String getText()
      Description copied from interface: POITextExtractor
      Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
      Specified by:
      getText in interface POITextExtractor
      Returns:
      All the text from the document
    • appendBodyElementText

      public void appendBodyElementText(StringBuilder text, IBodyElement e)
    • appendParagraphText

      public void appendParagraphText(StringBuilder text, XWPFParagraph paragraph)
    • getDocument

      public XWPFDocument getDocument()
      Description copied from interface: POIXMLTextExtractor
      Returns opened document
      Specified by:
      getDocument in interface POITextExtractor
      Specified by:
      getDocument in interface POIXMLTextExtractor
      Returns:
      the opened document
    • setCloseFilesystem

      public void setCloseFilesystem(boolean doCloseFilesystem)
      Specified by:
      setCloseFilesystem in interface POITextExtractor
      Parameters:
      doCloseFilesystem - true (default), if underlying resources/filesystem should be closed on POITextExtractor.close()
    • isCloseFilesystem

      public boolean isCloseFilesystem()
      Specified by:
      isCloseFilesystem in interface POITextExtractor
      Returns:
      true, if resources/filesystem should be closed on POITextExtractor.close()
    • getFilesystem

      public XWPFDocument getFilesystem()
      Specified by:
      getFilesystem in interface POITextExtractor
      Returns:
      The underlying resources/filesystem