public class HoodieMultiWriterTestSuiteJob
extends Object
Multi write test suite job to assist in testing multi-writer scenarios. This test spins up one thread per writer as per configurations.
Three params are of interest to this job in addition to regular HoodieTestsuiteJob.
--input-base-paths "base_path/input1,base_path/input2"
--props-paths "file:props_path/multi-writer-1.properties,file:/props_path/multi-writer-2.properties"
--workload-yaml-paths "file:some_path/multi-writer-1-ds.yaml,file:/some_path/multi-writer-2-sds.yaml"
Each of these should have same number of comma separated entries.
Each writer will generate data in the corresponding input-base-path.
and each writer will take in its own properties path and the respective yaml file as well.
Common tests:
Writer 1 DeltaStreamer ingesting data into partitions 0 to 10, Writer 2 Spark datasource ingesting data into partitions 100 to 110.
Multiple spark datasource writers, each writing to exclusive set of partitions.
Example comamnd
spark-submit
--packages org.apache.spark:spark-avro_2.11:2.4.0
--conf spark.task.cpus=3
--conf spark.executor.cores=3
--conf spark.task.maxFailures=100
--conf spark.memory.fraction=0.4
--conf spark.rdd.compress=true
--conf spark.kryoserializer.buffer.max=2000m
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.memory.storageFraction=0.1
--conf spark.shuffle.service.enabled=true
--conf spark.sql.hive.convertMetastoreParquet=false
--conf spark.driver.maxResultSize=12g
--conf spark.executor.heartbeatInterval=120s
--conf spark.network.timeout=600s
--conf spark.yarn.max.executor.failures=10
--conf spark.sql.catalogImplementation=hive
--conf spark.driver.extraClassPath=/var/demo/jars/*
--conf spark.executor.extraClassPath=/var/demo/jars/*
--class org.apache.hudi.integ.testsuite.HoodieMultiWriterTestSuiteJob /opt/hudi-integ-test-bundle-0.11.0-SNAPSHOT.jar
--source-ordering-field test_suite_source_ordering_field
--use-deltastreamer
--target-base-path /user/hive/warehouse/hudi-integ-test-suite/output
--input-base-paths "/user/hive/warehouse/hudi-integ-test-suite/input1,/user/hive/warehouse/hudi-integ-test-suite/input2"
--target-table hudi_table
--props-paths "multi-writer-1.properties,multi-writer-2.properties"
--schemaprovider-class org.apache.hudi.integ.testsuite.schema.TestSuiteFileBasedSchemaProvider
--source-class org.apache.hudi.utilities.sources.AvroDFSSource --input-file-size 125829120
--workload-yaml-paths "file:/opt/multi-writer-1-ds.yaml,file:/opt/multi-writer-2-sds.yaml"
--workload-generator-classname org.apache.hudi.integ.testsuite.dag.WorkflowDagGenerator
--table-type COPY_ON_WRITE --compact-scheduling-minshare 1
--input-base-path "dummyValue"
--workload-yaml-path "dummyValue"
--props "dummyValue"
--use-hudi-data-to-generate-updates
Example command that works w/ docker.