Class NestedColumnReader<BufferType>

java.lang.Object
io.trino.parquet.reader.AbstractColumnReader<BufferType>
io.trino.parquet.reader.NestedColumnReader<BufferType>
All Implemented Interfaces:
ColumnReader

public class NestedColumnReader<BufferType> extends AbstractColumnReader<BufferType>
This class works similarly to FlatColumnReader. The difference is that the resulting number of values might (and usually is) different from the number of chunks. Therefore the output buffers are dynamically sized and some repetition/definition levels logic is added. This reader is universal i.e. it will properly read flat data, yet flat readers are preferred due to better performance.

Brief explanation of reading repetition and definition levels: Repetition level equal to 0 means that we should start a new row, i.e. set of values Any other value means that we continue adding to the current row Following data (containing 3 rows): repetition levels: 0,1,1,0,0,1,[0] (last 0 implicit) values: 1,2,3,4,5,6 will result in sets of (1,2,3), (4), (5,6).

The biggest complication here is that in order to know if n-th value is the last in a row we need to check the n-th+1 repetition level. So if the page has n values we need to wait for the beginning of the next page to figure out whether the row is finished or contains additional values. Above example split into 3 pages would look like: repetition levels: 0,1 1,0 0,1 values: 1,2 3,4 5,6 Reading the first page will only give us information that the first row starts with values (1,2), but we need to wait for another page to figure out that it contains another value (3). After reading another row from page 2 we still need to read page 3 just to find out that the first repetition level is '0' and the row is already over.

Definition levels encodes one of 3 options: -value exists and is non-null (level = maxDef) -value is null (level = maxDef - 1) -there is no value (level < maxDef - 1) For non-nullable (REQUIRED) fields the (level = maxDef - 1) condition means non-existing value as well.

Quick example (maxDef level is 2): Read 3 rows out of: repetition levels: 0,1,1,0,0,1,0,... definition levels: 0,1,2,1,0,2,... values: 1,2,3,4,5,6,... Resulting buffer: n,3,n, 6 that is later translated to (n,3),(n),(6) where n = null