Package io.openlineage.spark.agent.util
Class PlanUtils
java.lang.Object
io.openlineage.spark.agent.util.PlanUtils
Utility functions for traversing a
LogicalPlan.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic io.openlineage.client.OpenLineage.DatasourceDatasetFacetdatasourceFacet(io.openlineage.client.OpenLineage openLineage, String namespaceUri) Construct aOpenLineage.DatasourceDatasetFacetgiven a namespace for the datasource.static List<org.apache.hadoop.fs.Path>findRDDPaths(List<org.apache.spark.rdd.RDD<?>> fileRdds) Given a list of RDDs, it collects list of data location directories.static org.apache.hadoop.fs.PathgetDirectoryPath(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf) static org.apache.hadoop.fs.PathgetDirectoryPathOl(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf) static <T,D> io.openlineage.spark.agent.util.OpenLineageAbstractPartialFunction<T, Collection<D>> merge(Collection<? extends scala.PartialFunction<T, ? extends Collection<D>>> fns) Given a list ofPartialFunctions merge to produce a single function that will test the input against each function one by one until a match is found orPartialFunction$.empty()is returned.static StringnamespaceUri(URI outputPath) static io.openlineage.client.OpenLineage.ParentRunFacetparentRunFacet(UUID parentRunId, String parentJob, String parentJobNamespace) Construct aOpenLineage.ParentRunFacetgiven the parent job's parentRunId, job name, and namespace.static <T,D> List<T> apply method implementation that should never throw an error or exceptionstatic booleansafeIsDefinedAt(scala.PartialFunction pfn, Object x) isDefinedAt method implementation that should never throw an error or exceptionstatic booleansafeIsInstanceOf(Object instance, String classCanonicalName) instanceOf alike implementation which does not fail in case of a missing class.static io.openlineage.client.OpenLineage.SchemaDatasetFacetschemaFacet(io.openlineage.client.OpenLineage openLineage, org.apache.spark.sql.types.StructType structType) Given a schema, construct a validOpenLineage.SchemaDatasetFacet.static org.apache.spark.sql.types.StructTypetoStructType(List<org.apache.spark.sql.catalyst.expressions.Attribute> attributes) Given a list of attributes, constructs a validOpenLineage.SchemaDatasetFacet.
-
Constructor Details
-
PlanUtils
public PlanUtils()
-
-
Method Details
-
merge
public static <T,D> io.openlineage.spark.agent.util.OpenLineageAbstractPartialFunction<T,Collection<D>> merge(Collection<? extends scala.PartialFunction<T, ? extends Collection<D>>> fns) Given a list ofPartialFunctions merge to produce a single function that will test the input against each function one by one until a match is found orPartialFunction$.empty()is returned.- Type Parameters:
T-D-- Parameters:
fns-- Returns:
-
schemaFacet
public static io.openlineage.client.OpenLineage.SchemaDatasetFacet schemaFacet(io.openlineage.client.OpenLineage openLineage, org.apache.spark.sql.types.StructType structType) Given a schema, construct a validOpenLineage.SchemaDatasetFacet.- Parameters:
structType-- Returns:
-
toStructType
public static org.apache.spark.sql.types.StructType toStructType(List<org.apache.spark.sql.catalyst.expressions.Attribute> attributes) Given a list of attributes, constructs a validOpenLineage.SchemaDatasetFacet.- Parameters:
attributes-- Returns:
-
namespaceUri
-
datasourceFacet
public static io.openlineage.client.OpenLineage.DatasourceDatasetFacet datasourceFacet(io.openlineage.client.OpenLineage openLineage, String namespaceUri) Construct aOpenLineage.DatasourceDatasetFacetgiven a namespace for the datasource.- Parameters:
namespaceUri-- Returns:
-
parentRunFacet
public static io.openlineage.client.OpenLineage.ParentRunFacet parentRunFacet(UUID parentRunId, String parentJob, String parentJobNamespace) Construct aOpenLineage.ParentRunFacetgiven the parent job's parentRunId, job name, and namespace.- Parameters:
parentRunId-parentJob-parentJobNamespace-- Returns:
-
getDirectoryPathOl
public static org.apache.hadoop.fs.Path getDirectoryPathOl(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf) -
getDirectoryPath
public static org.apache.hadoop.fs.Path getDirectoryPath(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf) -
findRDDPaths
public static List<org.apache.hadoop.fs.Path> findRDDPaths(List<org.apache.spark.rdd.RDD<?>> fileRdds) Given a list of RDDs, it collects list of data location directories. For each RDD, a parent directory is taken and list of distinct locations is returned.- Parameters:
fileRdds-- Returns:
-
safeIsInstanceOf
instanceOf alike implementation which does not fail in case of a missing class.- Parameters:
instance-classCanonicalName-- Returns:
-
safeIsDefinedAt
isDefinedAt method implementation that should never throw an error or exception- Parameters:
pfn-x-- Returns:
-
safeApply
apply method implementation that should never throw an error or exception- Parameters:
pfn-x-- Returns:
-