Class SchemaConformingTransformer
- java.lang.Object
-
- org.apache.pinot.segment.local.recordtransformer.SchemaConformingTransformer
-
- All Implemented Interfaces:
Serializable,RecordTransformer
public class SchemaConformingTransformer extends Object implements RecordTransformer
This transformer transforms records with varying keys such that they can be stored in a table with a fixed schema. Since these records have varying keys, it is impractical to store each field in its own table column. At the same time, most (if not all) fields may be important to the user, so we should not drop any field unnecessarily. So this transformer primarily takes record-fields that don't exist in the schema and stores them in a type of catchall field.For example, consider this record:
{ "timestamp": 1687786535928, "hostname": "host1", "HOSTNAME": "host1", "level": "INFO", "message": "Started processing job1", "tags": { "platform": "data", "service": "serializer", "params": { "queueLength": 5, "timeout": 299, "userData_noIndex": { "nth": 99 } } } }And let's say the table's schema contains these fields:- timestamp
- hostname
- level
- message
- tags.platform
- tags.service
- indexableExtras
- unindexableExtras
Without this transformer, the entire "tags" field would be dropped when storing the record in the table. However, with this transformer, the record would be transformed into the following:
{ "timestamp": 1687786535928, "hostname": "host1", "level": "INFO", "message": "Started processing job1", "tags.platform": "data", "tags.service": "serializer", "indexableExtras": { "tags": { "params": { "queueLength": 5, "timeout": 299 } } }, "unindexableExtras": { "tags": { "userData_noIndex": { "nth": 99 } } } }Notice that the transformer:- Flattens nested fields which exist in the schema, like "tags.platform"
- Drops some fields like "HOSTNAME", where "HOSTNAME" must be listed as a field in the config option "fieldPathsToDrop".
- Moves fields which don't exist in the schema and have the suffix "_noIndex" into the "unindexableExtras" field (the field name is configurable)
- Moves any remaining fields which don't exist in the schema into the "indexableExtras" field (the field name is configurable)
The "unindexableExtras" field allows the transformer to separate fields which don't need indexing (because they are only retrieved, not searched) from those that do. The transformer also has other configuration options specified in
SchemaConformingTransformerConfig.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SchemaConformingTransformer(TableConfig tableConfig, Schema schema)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanisNoOp()Returnstrueif the transformer is no-op (can be skipped),falseotherwise.GenericRowtransform(GenericRow record)Transforms a record based on some custom rules.static voidvalidateSchema(Schema schema, SchemaConformingTransformerConfig transformerConfig)Validates the schema against the given transformer's configuration.
-
-
-
Constructor Detail
-
SchemaConformingTransformer
public SchemaConformingTransformer(TableConfig tableConfig, Schema schema)
-
-
Method Detail
-
validateSchema
public static void validateSchema(@Nonnull Schema schema, @Nonnull SchemaConformingTransformerConfig transformerConfig)Validates the schema against the given transformer's configuration.
-
isNoOp
public boolean isNoOp()
Description copied from interface:RecordTransformerReturnstrueif the transformer is no-op (can be skipped),falseotherwise.- Specified by:
isNoOpin interfaceRecordTransformer
-
transform
@Nullable public GenericRow transform(GenericRow record)
Description copied from interface:RecordTransformerTransforms a record based on some custom rules.- Specified by:
transformin interfaceRecordTransformer- Parameters:
record- Record to transform- Returns:
- Transformed record, or
nullif the record does not follow certain rules.
-
-