Default format provider implementation based on default Chronon supported open source library versions.
Trait to track the table format in use by a Chronon dataset and some utility methods to help retrieve metadata / configure it appropriately at creation time
Dynamically provide the read / write table format depending on table name.
Purpose of LogFlattenerJob is to unpack serialized Avro data from online requests and flatten each field (both keys and values) into individual columns and save to an offline "flattened" log table.
Purpose of LogFlattenerJob is to unpack serialized Avro data from online requests and flatten each field (both keys and values) into individual columns and save to an offline "flattened" log table.
Steps: 1. determine unfilled range and pull raw logs from partitioned log table 2. fetch joinCodecs for all unique schema_hash present in the logs 3. build a merged schema from all schema versions, which will be used as output schema 4. unpack each row and adhere to the output schema 5. save the schema info in the flattened log table properties (cumulatively)
Dynamically provide the read / write table format depending on table name. This supports reading/writing tables with heterogeneous formats. This approach enables users to override and specify a custom format provider if needed. This is useful in cases such as leveraging different library versions from what we support in the Chronon project (e.g. newer delta lake) as well as working with custom internal company logic / checks.