Class PlanUtils

java.lang.Object
io.openlineage.spark.agent.util.PlanUtils

public class PlanUtils extends Object
Utility functions for traversing a LogicalPlan.
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static io.openlineage.client.OpenLineage.DatasourceDatasetFacet
    datasourceFacet(io.openlineage.client.OpenLineage openLineage, String namespaceUri)
    Construct a OpenLineage.DatasourceDatasetFacet given a namespace for the datasource.
    static List<org.apache.hadoop.fs.Path>
    findRDDPaths(List<org.apache.spark.rdd.RDD<?>> fileRdds)
    Given a list of RDDs, it collects list of data location directories.
    static org.apache.hadoop.fs.Path
    getDirectoryPath(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf)
     
    static org.apache.hadoop.fs.Path
    getDirectoryPathOl(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf)
     
    static <T, D> io.openlineage.spark.agent.util.OpenLineageAbstractPartialFunction<T,Collection<D>>
    merge(Collection<? extends scala.PartialFunction<T,? extends Collection<D>>> fns)
    Given a list of PartialFunctions merge to produce a single function that will test the input against each function one by one until a match is found or PartialFunction$.empty() is returned.
    static String
    namespaceUri(URI outputPath)
     
    static io.openlineage.client.OpenLineage.ParentRunFacet
    parentRunFacet(UUID parentRunId, String parentJob, String parentJobNamespace)
    Construct a OpenLineage.ParentRunFacet given the parent job's parentRunId, job name, and namespace.
    static <T, D> List<T>
    safeApply(scala.PartialFunction<D,List<T>> pfn, D x)
    apply method implementation that should never throw an error or exception
    static boolean
    safeIsDefinedAt(scala.PartialFunction pfn, Object x)
    isDefinedAt method implementation that should never throw an error or exception
    static boolean
    safeIsInstanceOf(Object instance, String classCanonicalName)
    instanceOf alike implementation which does not fail in case of a missing class.
    static io.openlineage.client.OpenLineage.SchemaDatasetFacet
    schemaFacet(io.openlineage.client.OpenLineage openLineage, org.apache.spark.sql.types.StructType structType)
    Given a schema, construct a valid OpenLineage.SchemaDatasetFacet.
    static org.apache.spark.sql.types.StructType
    toStructType(List<org.apache.spark.sql.catalyst.expressions.Attribute> attributes)
    Given a list of attributes, constructs a valid OpenLineage.SchemaDatasetFacet.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • PlanUtils

      public PlanUtils()
  • Method Details

    • merge

      public static <T, D> io.openlineage.spark.agent.util.OpenLineageAbstractPartialFunction<T,Collection<D>> merge(Collection<? extends scala.PartialFunction<T,? extends Collection<D>>> fns)
      Given a list of PartialFunctions merge to produce a single function that will test the input against each function one by one until a match is found or PartialFunction$.empty() is returned.
      Type Parameters:
      T -
      D -
      Parameters:
      fns -
      Returns:
    • schemaFacet

      public static io.openlineage.client.OpenLineage.SchemaDatasetFacet schemaFacet(io.openlineage.client.OpenLineage openLineage, org.apache.spark.sql.types.StructType structType)
      Given a schema, construct a valid OpenLineage.SchemaDatasetFacet.
      Parameters:
      structType -
      Returns:
    • toStructType

      public static org.apache.spark.sql.types.StructType toStructType(List<org.apache.spark.sql.catalyst.expressions.Attribute> attributes)
      Given a list of attributes, constructs a valid OpenLineage.SchemaDatasetFacet.
      Parameters:
      attributes -
      Returns:
    • namespaceUri

      public static String namespaceUri(URI outputPath)
    • datasourceFacet

      public static io.openlineage.client.OpenLineage.DatasourceDatasetFacet datasourceFacet(io.openlineage.client.OpenLineage openLineage, String namespaceUri)
      Construct a OpenLineage.DatasourceDatasetFacet given a namespace for the datasource.
      Parameters:
      namespaceUri -
      Returns:
    • parentRunFacet

      public static io.openlineage.client.OpenLineage.ParentRunFacet parentRunFacet(UUID parentRunId, String parentJob, String parentJobNamespace)
      Construct a OpenLineage.ParentRunFacet given the parent job's parentRunId, job name, and namespace.
      Parameters:
      parentRunId -
      parentJob -
      parentJobNamespace -
      Returns:
    • getDirectoryPathOl

      public static org.apache.hadoop.fs.Path getDirectoryPathOl(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf)
    • getDirectoryPath

      public static org.apache.hadoop.fs.Path getDirectoryPath(org.apache.hadoop.fs.Path p, org.apache.hadoop.conf.Configuration hadoopConf)
    • findRDDPaths

      public static List<org.apache.hadoop.fs.Path> findRDDPaths(List<org.apache.spark.rdd.RDD<?>> fileRdds)
      Given a list of RDDs, it collects list of data location directories. For each RDD, a parent directory is taken and list of distinct locations is returned.
      Parameters:
      fileRdds -
      Returns:
    • safeIsInstanceOf

      public static boolean safeIsInstanceOf(Object instance, String classCanonicalName)
      instanceOf alike implementation which does not fail in case of a missing class.
      Parameters:
      instance -
      classCanonicalName -
      Returns:
    • safeIsDefinedAt

      public static boolean safeIsDefinedAt(scala.PartialFunction pfn, Object x)
      isDefinedAt method implementation that should never throw an error or exception
      Parameters:
      pfn -
      x -
      Returns:
    • safeApply

      public static <T, D> List<T> safeApply(scala.PartialFunction<D,List<T>> pfn, D x)
      apply method implementation that should never throw an error or exception
      Parameters:
      pfn -
      x -
      Returns: