Class Xml10FilterReader

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Readable

    public class Xml10FilterReader
    extends FilterReader
    FilterReader to skip invalid xml version 1.0 characters. Valid Unicode chars for xml version 1.0 according to http://www.w3.org/TR/xml are #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD], [#x10000-#x10FFFF] . In other words - any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.

    More details on the blog

    • Constructor Detail

      • Xml10FilterReader

        public Xml10FilterReader​(Reader in)
        Creates filter reader which skips invalid xml characters.
        Parameters:
        in - original reader
    • Method Detail

      • read

        public int read​(char[] cbuf,
                        int off,
                        int len)
                 throws IOException
        Every overload of Reader.read() method delegates to this one so it is enough to override only this one.

        To skip invalid characters this method shifts only valid chars to left and returns decreased value of the original read method. So after last valid character there will be some unused chars in the buffer.

        Overrides:
        read in class FilterReader
        Returns:
        Number of read valid characters or -1 if end of the underling reader was reached.
        Throws:
        IOException