Class XHTMLImporterImpl

java.lang.Object
org.docx4j.convert.in.xhtml.XHTMLImporterImpl
All Implemented Interfaces:
XHTMLImporter

public class XHTMLImporterImpl extends Object implements XHTMLImporter
Convert XHTML + CSS to WordML content. Can convert an entire document, or a fragment consisting of one or more block level objects. Your XHTML must be well formed XML! For usage examples, please see org.docx4j.samples/XHTMLImportFragment, and XHTMLImportDocument For best results, be sure to include src/main/resources on your classpath. Includes support for: - paragraph and run formatting - tables - images - lists (ordered, unordered)# People complain flying-saucer is slow (due to DTD related network lookups). See http://stackoverflow.com/questions/5431646/is-there-any-way-improve-the-performance-of-flyingsaucer Looking at FSEntityResolver, the problem is that there is no resources/schema on dir anymore which can be put on the classpath. Once this problem is fixed, things work better. TODO: - insert, delete - space-before, space-after unrecognized CSS property
Since:
2.8
Author:
jharrop
  • Field Details

    • log

      public static org.slf4j.Logger log
    • wordMLPackage

      protected org.docx4j.openpackaging.packages.WordprocessingMLPackage wordMLPackage
  • Constructor Details

    • XHTMLImporterImpl

      public XHTMLImporterImpl(org.docx4j.openpackaging.packages.WordprocessingMLPackage wordMLPackage)
  • Method Details

    • getMathXSLT

      public static Templates getMathXSLT() throws IOException, TransformerConfigurationException
      Throws:
      IOException
      TransformerConfigurationException
    • setHyperlinkStyle

      public void setHyperlinkStyle(String hyperlinkStyleID)
      Configure, how the Importer styles hyperlinks If hyperlinkStyleId is set to null, hyperlinks are styled using just the CSS. This is the default behavior. If hyperlinkStyleId is set to "someWordHyperlinkStyleName", that style is used. The default Word hyperlink style name is "Hyperlink". It is currently your responsibility to define that style in your styles definition part.
      Specified by:
      setHyperlinkStyle in interface XHTMLImporter
      Parameters:
      hyperlinkStyleID - The style to use for hyperlinks (eg Hyperlink)
    • setXHTMLImageHandler

      public void setXHTMLImageHandler(XHTMLImageHandler xHTMLImageHandler)
      If you have your own implementation of the XHTMLImageHandler interface which you'd like to use.
    • setMaxWidth

      public void setMaxWidth(int maxWidth, String tableStyle)
      Description copied from interface: XHTMLImporter
      Set the maximum width available (in twips); useful for scaling bare images if they are to go in a table cell.
      Also set table style if images are really to go in a table cell (needed to remove table style margins from final width).
      Specified by:
      setMaxWidth in interface XHTMLImporter
      tableStyle - - can be null
    • setDivHandler

      public void setDivHandler(DivHandler divHandler)
    • getListHelper

      protected ListHelper getListHelper()
    • getTableHelper

      protected TableHelper getTableHelper()
    • getRenderer

      public DocxRenderer getRenderer()
      Returns:
      the renderer
    • setRenderer

      public void setRenderer(DocxRenderer renderer)
      Parameters:
      renderer - the renderer to set
    • addFontMapping

      public static void addFontMapping(String cssFontFamily, org.docx4j.wml.RFonts rFonts)
      Map a font family, for example "Century Gothic" in: font-family:"Century Gothic", Helvetica, Arial, sans-serif; to a w:rFonts object, for example: <w:rFonts w:ascii="Arial Black" w:hAnsi="Arial Black"/> Assuming style font-family:"Century Gothic", Helvetica, Arial, sans-serif; the first font family for which there is a mapping is the one which will be used. xhtml-renderer's CSSName defaults font-family: serif It is your responsibility to ensure a suitable font is available on the target system (or embedded in the docx package). If we (eventually) support CSS @font-face, docx4j could do that for you (at least for font formats we can convert to something embeddable). You should set these up once, for all your subsequent imports, since some stuff is cached and currently won't get updated if you add fonts later.
      Since:
      3.0
    • addFontMapping

      public static void addFontMapping(String cssFontFamily, String font)
    • setRunFormatting

      public void setRunFormatting(FormattingOption runFormatting)
      Specified by:
      setRunFormatting in interface XHTMLImporter
      Parameters:
      runFormatting - the runFormatting to set
    • setParagraphFormatting

      public void setParagraphFormatting(FormattingOption paragraphFormatting)
      Specified by:
      setParagraphFormatting in interface XHTMLImporter
      Parameters:
      paragraphFormatting - the paragraphFormatting to set
    • setTableFormatting

      public void setTableFormatting(FormattingOption tableFormatting)
      Specified by:
      setTableFormatting in interface XHTMLImporter
      Parameters:
      tableFormatting - the tableFormatting to set
    • getTableFormatting

      protected FormattingOption getTableFormatting()
    • setCssWhiteList

      @Deprecated public static void setCssWhiteList(Set<String> cssWhiteList)
      Deprecated.
      If the CSS white list is non-null, a CSS property will only be honoured if it is on the list. Useful where suitable default values aren't being provided via
      Parameters:
      cssWhiteList - the cssWhiteList to set
    • getBookmarkIdLast

      public AtomicInteger getBookmarkIdLast()
      Specified by:
      getBookmarkIdLast in interface XHTMLImporter
    • setBookmarkIdNext

      public void setBookmarkIdNext(AtomicInteger val)
    • convert

      public List<Object> convert(File file, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Convert the well formed XHTML contained in file to a list of WML objects.
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      file -
      baseUrl -
      wordMLPackage -
      Returns:
      Throws:
      IOException
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convert

      public List<Object> convert(InputSource is, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Convert the well formed XHTML from the specified SAX InputSource
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      is -
      baseUrl -
      wordMLPackage -
      Returns:
      Throws:
      IOException
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convertMHT

      public List<Object> convertMHT(InputStream is, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Throws:
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convert

      public List<Object> convert(InputStream is, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      is -
      baseUrl -
      wordMLPackage -
      Returns:
      Throws:
      IOException
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convert

      public List<Object> convert(Node node, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      node -
      baseUrl -
      wordMLPackage -
      Returns:
      Throws:
      IOException
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convert

      public List<Object> convert(Reader reader, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      reader -
      baseUrl -
      wordMLPackage -
      Returns:
      Throws:
      IOException
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convert

      public List<Object> convert(URL url) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Convert the well formed XHTML found at the specified URI to a list of WML objects.
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      url -
      wordMLPackage -
      Returns:
      Throws:
      org.docx4j.openpackaging.exceptions.Docx4JException
    • convert

      public List<Object> convert(String content, String baseUrl) throws org.docx4j.openpackaging.exceptions.Docx4JException
      Convert the well formed XHTML contained in the string to a list of WML objects.
      Specified by:
      convert in interface XHTMLImporter
      Parameters:
      content -
      baseUrl -
      wordMLPackage -
      Returns:
      Throws:
      org.docx4j.openpackaging.exceptions.Docx4JException
    • getCascadedProperties

      public Map<String,com.openhtmltopdf.css.parser.PropertyValue> getCascadedProperties(com.openhtmltopdf.css.style.CalculatedStyle cs)
    • getLengthPrimitiveType

      public static short getLengthPrimitiveType(com.openhtmltopdf.css.style.FSDerivedValue val)
    • getContentContextStack

      protected LinkedList<org.docx4j.wml.ContentAccessor> getContentContextStack()
    • getSequenceCounters

      public Map<String,Integer> getSequenceCounters()
      Get the current numbers of SEQ fields, used in image captions. Typically you'd use this if you are importing multiple times into a single docx (as for example, OpenDoPE does).
      Specified by:
      getSequenceCounters in interface XHTMLImporter
      Parameters:
      sequenceCounters -
    • setSequenceCounters

      public void setSequenceCounters(Map<String,Integer> sequenceCounters)
      Set the last used numbers of SEQ fields, used in image captions. Key is sequence name. The default is "Figure", but you can also use others (matching value of @sequence).
      Specified by:
      setSequenceCounters in interface XHTMLImporter
      Parameters:
      sequenceCounters -
    • getPPr

      protected org.docx4j.wml.PPr getPPr(com.openhtmltopdf.render.BlockBox blockBox, Map<String,com.openhtmltopdf.css.parser.PropertyValue> cssMap)
    • isBidi

      protected boolean isBidi(String pText)
    • populatePPr

      protected void populatePPr(org.docx4j.wml.PPr pPr, com.openhtmltopdf.layout.Styleable blockBox, Map<String,com.openhtmltopdf.css.parser.PropertyValue> cssMap)
    • getStyleByIdOrName

      protected org.docx4j.wml.Style getStyleByIdOrName(String... parameters)
      If one parameter is passed then search style by id (1st parameter), if style by id is not found then search style by name (also 1st parameter).
      If two - then search by id (1st parameter) and if style by id is not found then search style by name (2nd parameter).
      Other parameters are ignored.
    • getLocalIndentation

      protected int getLocalIndentation(com.openhtmltopdf.layout.Styleable styleable)
      Inside a list item, get the contribution of any div.
    • getAncestorIndentation

      protected int getAncestorIndentation(com.openhtmltopdf.layout.Styleable styleable)
    • setBookmarkNamePrefix

      public void setBookmarkNamePrefix(String bookmarkNamePrefix)
      The prefix (if any) to be added to bookmark names generated during this run. Useful for preventing name collisions, when importing multiple fragments into a single docx.
      Parameters:
      bookmarkNamePrefix -