Class ParquetPageSourceFactory
java.lang.Object
io.trino.plugin.hive.parquet.ParquetPageSourceFactory
- All Implemented Interfaces:
HivePageSourceFactory
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final HiveColumnHandleIf this object is passed as one of the columns forcreatePageSource, it will be populated as an additional column containing the index of each row read. -
Constructor Summary
ConstructorsConstructorDescriptionParquetPageSourceFactory(TrinoFileSystemFactory fileSystemFactory, FileFormatDataSourceStats stats, ParquetReaderConfig config, HiveConfig hiveConfig) -
Method Summary
Modifier and TypeMethodDescriptionstatic ParquetDataSourcecreateDataSource(TrinoInputFile inputFile, OptionalLong estimatedFileSize, ParquetReaderOptions options, AggregatedMemoryContext memoryContext, FileFormatDataSourceStats stats) static ReaderPageSourcecreatePageSource(TrinoInputFile inputFile, long start, long length, List<HiveColumnHandle> columns, List<TupleDomain<HiveColumnHandle>> disjunctTupleDomains, boolean useColumnNames, org.joda.time.DateTimeZone timeZone, FileFormatDataSourceStats stats, ParquetReaderOptions options, Optional<ParquetWriteValidation> parquetWriteValidation, int domainCompactionThreshold, OptionalLong estimatedFileSize) This method is available for other callers to use directly.createPageSource(ConnectorSession session, Location path, long start, long length, long estimatedFileSize, long fileModifiedTime, Schema schema, List<HiveColumnHandle> columns, TupleDomain<HiveColumnHandle> effectivePredicate, Optional<AcidInfo> acidInfo, OptionalInt bucketNumber, boolean originalFile, AcidTransaction transaction) static ConnectorPageSourcecreateParquetPageSource(List<HiveColumnHandle> baseColumns, org.apache.parquet.schema.MessageType fileSchema, org.apache.parquet.io.MessageColumnIO messageColumn, boolean useColumnNames, ParquetPageSourceFactory.ParquetReaderProvider parquetReaderProvider) static Optional<org.apache.parquet.schema.Type> getColumnType(HiveColumnHandle column, org.apache.parquet.schema.MessageType messageType, boolean useParquetColumnNames) static Optional<org.apache.parquet.schema.MessageType> getParquetMessageType(List<HiveColumnHandle> columns, boolean useColumnNames, org.apache.parquet.schema.MessageType fileSchema) static TupleDomain<org.apache.parquet.column.ColumnDescriptor> getParquetTupleDomain(Map<List<String>, org.apache.parquet.column.ColumnDescriptor> descriptorsByPath, TupleDomain<HiveColumnHandle> effectivePredicate, org.apache.parquet.schema.MessageType fileSchema, boolean useColumnNames) static booleanstripUnnecessaryProperties(String serializationLibraryName)
-
Field Details
-
PARQUET_ROW_INDEX_COLUMN
If this object is passed as one of the columns forcreatePageSource, it will be populated as an additional column containing the index of each row read.
-
-
Constructor Details
-
ParquetPageSourceFactory
@Inject public ParquetPageSourceFactory(TrinoFileSystemFactory fileSystemFactory, FileFormatDataSourceStats stats, ParquetReaderConfig config, HiveConfig hiveConfig)
-
-
Method Details
-
stripUnnecessaryProperties
-
createPageSource
public Optional<ReaderPageSource> createPageSource(ConnectorSession session, Location path, long start, long length, long estimatedFileSize, long fileModifiedTime, Schema schema, List<HiveColumnHandle> columns, TupleDomain<HiveColumnHandle> effectivePredicate, Optional<AcidInfo> acidInfo, OptionalInt bucketNumber, boolean originalFile, AcidTransaction transaction) - Specified by:
createPageSourcein interfaceHivePageSourceFactory
-
createPageSource
public static ReaderPageSource createPageSource(TrinoInputFile inputFile, long start, long length, List<HiveColumnHandle> columns, List<TupleDomain<HiveColumnHandle>> disjunctTupleDomains, boolean useColumnNames, org.joda.time.DateTimeZone timeZone, FileFormatDataSourceStats stats, ParquetReaderOptions options, Optional<ParquetWriteValidation> parquetWriteValidation, int domainCompactionThreshold, OptionalLong estimatedFileSize) This method is available for other callers to use directly. -
createDataSource
public static ParquetDataSource createDataSource(TrinoInputFile inputFile, OptionalLong estimatedFileSize, ParquetReaderOptions options, AggregatedMemoryContext memoryContext, FileFormatDataSourceStats stats) throws IOException - Throws:
IOException
-
getParquetMessageType
public static Optional<org.apache.parquet.schema.MessageType> getParquetMessageType(List<HiveColumnHandle> columns, boolean useColumnNames, org.apache.parquet.schema.MessageType fileSchema) -
getColumnType
public static Optional<org.apache.parquet.schema.Type> getColumnType(HiveColumnHandle column, org.apache.parquet.schema.MessageType messageType, boolean useParquetColumnNames) -
getParquetTupleDomain
public static TupleDomain<org.apache.parquet.column.ColumnDescriptor> getParquetTupleDomain(Map<List<String>, org.apache.parquet.column.ColumnDescriptor> descriptorsByPath, TupleDomain<HiveColumnHandle> effectivePredicate, org.apache.parquet.schema.MessageType fileSchema, boolean useColumnNames) -
createParquetPageSource
public static ConnectorPageSource createParquetPageSource(List<HiveColumnHandle> baseColumns, org.apache.parquet.schema.MessageType fileSchema, org.apache.parquet.io.MessageColumnIO messageColumn, boolean useColumnNames, ParquetPageSourceFactory.ParquetReaderProvider parquetReaderProvider) throws IOException - Throws:
IOException
-