Class LineReader

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    @LimitedPrivate("MapReduce")
    @Unstable
    public class LineReader
    extends Object
    implements Closeable
    A class that provides a line reader from an input stream. Depending on the constructor used, lines will either be terminated by:
    • one of the following: '\n' (LF) , '\r' (CR), or '\r\n' (CR+LF).
    • or, a custom byte sequence delimiter
    In both cases, EOF also terminates an otherwise unterminated line.
    • Constructor Summary

      Constructors 
      Constructor Description
      LineReader​(InputStream in)
      Create a line reader that reads from the given stream using the default buffer-size (64k).
      LineReader​(InputStream in, byte[] recordDelimiterBytes)
      Create a line reader that reads from the given stream using the default buffer-size, and using a custom delimiter of array of bytes.
      LineReader​(InputStream in, int bufferSize)
      Create a line reader that reads from the given stream using the given buffer-size.
      LineReader​(InputStream in, int bufferSize, byte[] recordDelimiterBytes)
      Create a line reader that reads from the given stream using the given buffer-size, and using a custom delimiter of array of bytes.
      LineReader​(InputStream in, org.apache.hadoop.conf.Configuration conf)
      Create a line reader that reads from the given stream using the io.file.buffer.size specified in the given Configuration.
      LineReader​(InputStream in, org.apache.hadoop.conf.Configuration conf, byte[] recordDelimiterBytes)
      Create a line reader that reads from the given stream using the io.file.buffer.size specified in the given Configuration, and using a custom delimiter of array of bytes.
    • Constructor Detail

      • LineReader

        public LineReader​(InputStream in)
        Create a line reader that reads from the given stream using the default buffer-size (64k).
        Parameters:
        in - The input stream
        Throws:
        IOException
      • LineReader

        public LineReader​(InputStream in,
                          int bufferSize)
        Create a line reader that reads from the given stream using the given buffer-size.
        Parameters:
        in - The input stream
        bufferSize - Size of the read buffer
        Throws:
        IOException
      • LineReader

        public LineReader​(InputStream in,
                          org.apache.hadoop.conf.Configuration conf)
                   throws IOException
        Create a line reader that reads from the given stream using the io.file.buffer.size specified in the given Configuration.
        Parameters:
        in - input stream
        conf - configuration
        Throws:
        IOException
      • LineReader

        public LineReader​(InputStream in,
                          byte[] recordDelimiterBytes)
        Create a line reader that reads from the given stream using the default buffer-size, and using a custom delimiter of array of bytes.
        Parameters:
        in - The input stream
        recordDelimiterBytes - The delimiter
      • LineReader

        public LineReader​(InputStream in,
                          int bufferSize,
                          byte[] recordDelimiterBytes)
        Create a line reader that reads from the given stream using the given buffer-size, and using a custom delimiter of array of bytes.
        Parameters:
        in - The input stream
        bufferSize - Size of the read buffer
        recordDelimiterBytes - The delimiter
        Throws:
        IOException
      • LineReader

        public LineReader​(InputStream in,
                          org.apache.hadoop.conf.Configuration conf,
                          byte[] recordDelimiterBytes)
                   throws IOException
        Create a line reader that reads from the given stream using the io.file.buffer.size specified in the given Configuration, and using a custom delimiter of array of bytes.
        Parameters:
        in - input stream
        conf - configuration
        recordDelimiterBytes - The delimiter
        Throws:
        IOException
    • Method Detail

      • readLine

        public int readLine​(org.apache.hadoop.io.Text str,
                            int maxLineLength,
                            int maxBytesToConsume)
                     throws IOException
        Read one line from the InputStream into the given Text.
        Parameters:
        str - the object to store the given line (without newline)
        maxLineLength - the maximum number of bytes to store into str; the rest of the line is silently discarded.
        maxBytesToConsume - the maximum number of bytes to consume in this call. This is only a hint, because if the line cross this threshold, we allow it to happen. It can overshoot potentially by as much as one buffer length.
        Returns:
        the number of bytes read including the (longest) newline found.
        Throws:
        IOException - if the underlying stream throws
      • readLine

        public int readLine​(org.apache.hadoop.io.Text str,
                            int maxLineLength)
                     throws IOException
        Read from the InputStream into the given Text.
        Parameters:
        str - the object to store the given line
        maxLineLength - the maximum number of bytes to store into str.
        Returns:
        the number of bytes read including the newline
        Throws:
        IOException - if the underlying stream throws
      • readLine

        public int readLine​(org.apache.hadoop.io.Text str)
                     throws IOException
        Read from the InputStream into the given Text.
        Parameters:
        str - the object to store the given line
        Returns:
        the number of bytes read including the newline
        Throws:
        IOException - if the underlying stream throws
      • getBufferPosn

        protected int getBufferPosn()
      • getBufferSize

        protected int getBufferSize()
      • unsetNeedAdditionalRecordAfterSplit

        protected void unsetNeedAdditionalRecordAfterSplit()