Package io.trino.plugin.hive.parquet
Class ParquetPageSourceFactory
java.lang.Object
io.trino.plugin.hive.parquet.ParquetPageSourceFactory
- All Implemented Interfaces:
HivePageSourceFactory
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final HiveColumnHandleIf this object is passed as one of the columns forcreatePageSource, it will be populated as an additional column containing the index of each row read. -
Constructor Summary
ConstructorsConstructorDescriptionParquetPageSourceFactory(TrinoFileSystemFactory fileSystemFactory, FileFormatDataSourceStats stats, ParquetReaderConfig config, HiveConfig hiveConfig) -
Method Summary
Modifier and TypeMethodDescriptionstatic ReaderPageSourcecreatePageSource(TrinoInputFile inputFile, long start, long length, List<HiveColumnHandle> columns, TupleDomain<HiveColumnHandle> effectivePredicate, boolean useColumnNames, org.joda.time.DateTimeZone timeZone, FileFormatDataSourceStats stats, ParquetReaderOptions options, Optional<ParquetWriteValidation> parquetWriteValidation, int domainCompactionThreshold) This method is available for other callers to use directly.createPageSource(ConnectorSession session, Location path, long start, long length, long estimatedFileSize, Properties schema, List<HiveColumnHandle> columns, TupleDomain<HiveColumnHandle> effectivePredicate, Optional<AcidInfo> acidInfo, OptionalInt bucketNumber, boolean originalFile, AcidTransaction transaction) static ConnectorPageSourcecreateParquetPageSource(List<HiveColumnHandle> baseColumns, org.apache.parquet.schema.MessageType fileSchema, org.apache.parquet.io.MessageColumnIO messageColumn, boolean useColumnNames, ParquetPageSourceFactory.ParquetReaderProvider parquetReaderProvider) static Optional<org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore>getColumnIndexStore(ParquetDataSource dataSource, org.apache.parquet.hadoop.metadata.BlockMetaData blockMetadata, Map<List<String>, org.apache.parquet.column.ColumnDescriptor> descriptorsByPath, TupleDomain<org.apache.parquet.column.ColumnDescriptor> parquetTupleDomain, ParquetReaderOptions options) static Optional<org.apache.parquet.schema.Type>getColumnType(HiveColumnHandle column, org.apache.parquet.schema.MessageType messageType, boolean useParquetColumnNames) static Optional<org.apache.parquet.schema.MessageType>getParquetMessageType(List<HiveColumnHandle> columns, boolean useColumnNames, org.apache.parquet.schema.MessageType fileSchema) static TupleDomain<org.apache.parquet.column.ColumnDescriptor>getParquetTupleDomain(Map<List<String>, org.apache.parquet.column.ColumnDescriptor> descriptorsByPath, TupleDomain<HiveColumnHandle> effectivePredicate, org.apache.parquet.schema.MessageType fileSchema, boolean useColumnNames) static PropertiesstripUnnecessaryProperties(Properties schema)
-
Field Details
-
PARQUET_ROW_INDEX_COLUMN
If this object is passed as one of the columns forcreatePageSource, it will be populated as an additional column containing the index of each row read.
-
-
Constructor Details
-
ParquetPageSourceFactory
@Inject public ParquetPageSourceFactory(TrinoFileSystemFactory fileSystemFactory, FileFormatDataSourceStats stats, ParquetReaderConfig config, HiveConfig hiveConfig)
-
-
Method Details
-
stripUnnecessaryProperties
-
createPageSource
public Optional<ReaderPageSource> createPageSource(ConnectorSession session, Location path, long start, long length, long estimatedFileSize, Properties schema, List<HiveColumnHandle> columns, TupleDomain<HiveColumnHandle> effectivePredicate, Optional<AcidInfo> acidInfo, OptionalInt bucketNumber, boolean originalFile, AcidTransaction transaction) - Specified by:
createPageSourcein interfaceHivePageSourceFactory
-
createPageSource
public static ReaderPageSource createPageSource(TrinoInputFile inputFile, long start, long length, List<HiveColumnHandle> columns, TupleDomain<HiveColumnHandle> effectivePredicate, boolean useColumnNames, org.joda.time.DateTimeZone timeZone, FileFormatDataSourceStats stats, ParquetReaderOptions options, Optional<ParquetWriteValidation> parquetWriteValidation, int domainCompactionThreshold) This method is available for other callers to use directly. -
getParquetMessageType
public static Optional<org.apache.parquet.schema.MessageType> getParquetMessageType(List<HiveColumnHandle> columns, boolean useColumnNames, org.apache.parquet.schema.MessageType fileSchema) -
getColumnType
public static Optional<org.apache.parquet.schema.Type> getColumnType(HiveColumnHandle column, org.apache.parquet.schema.MessageType messageType, boolean useParquetColumnNames) -
getColumnIndexStore
public static Optional<org.apache.parquet.internal.filter2.columnindex.ColumnIndexStore> getColumnIndexStore(ParquetDataSource dataSource, org.apache.parquet.hadoop.metadata.BlockMetaData blockMetadata, Map<List<String>, org.apache.parquet.column.ColumnDescriptor> descriptorsByPath, TupleDomain<org.apache.parquet.column.ColumnDescriptor> parquetTupleDomain, ParquetReaderOptions options) -
getParquetTupleDomain
public static TupleDomain<org.apache.parquet.column.ColumnDescriptor> getParquetTupleDomain(Map<List<String>, org.apache.parquet.column.ColumnDescriptor> descriptorsByPath, TupleDomain<HiveColumnHandle> effectivePredicate, org.apache.parquet.schema.MessageType fileSchema, boolean useColumnNames) -
createParquetPageSource
public static ConnectorPageSource createParquetPageSource(List<HiveColumnHandle> baseColumns, org.apache.parquet.schema.MessageType fileSchema, org.apache.parquet.io.MessageColumnIO messageColumn, boolean useColumnNames, ParquetPageSourceFactory.ParquetReaderProvider parquetReaderProvider) throws IOException - Throws:
IOException
-