Class SparkOpenLineageExtensionVisitor

java.lang.Object
io.openlineage.spark.shade.extension.v1.lifecycle.plan.SparkOpenLineageExtensionVisitor

public final class SparkOpenLineageExtensionVisitor extends Object
This class serves as a visitor that wraps method calls for handling input and output lineage in Spark jobs, as defined in the OpenLineage-Spark extension.

The OpenLineage-Spark library uses reflection to access these wrapper methods for extracting lineage information from Spark's LogicalPlan and other relevant components. The visitor class handles different types of lineage nodes, such as InputLineageNode and OutputLineageNode, and allows conversion to a format suitable for lineage tracking.

  • Constructor Details

    • SparkOpenLineageExtensionVisitor

      public SparkOpenLineageExtensionVisitor()
  • Method Details

    • isDefinedAt

      public boolean isDefinedAt(Object lineageNode)
      Determines if the given lineageNode is of a type that this visitor can process. Specifically, it checks if the object is an instance of LineageRelationProvider, LineageRelation, InputLineageNode, or OutputLineageNode.
      Parameters:
      lineageNode - the node representing a lineage component
      Returns:
      true if the node is of a supported type, false otherwise
    • apply

      public Map<String,Object> apply(Object lineageNode, String sparkListenerEventName, Object sqlContext, Object parameters)
      Applies the visitor to a LineageRelationProvider, extracting lineage information such as the DatasetIdentifier from the provided lineageNode.
      Parameters:
      lineageNode - the lineage node to process
      sparkListenerEventName - the name of the Spark listener event
      sqlContext - the SQL context of the current Spark execution
      parameters - additional parameters relevant to the lineage extraction
      Returns:
      a map containing lineage information in a serialized format
    • apply

      public Map<String,Object> apply(Object lineageNode, String sparkListenerEventName)
      Applies the visitor to a LineageRelation, InputLineageNode, or OutputLineageNode, extracting and serializing the relevant lineage information.
      Parameters:
      lineageNode - the lineage node to process
      sparkListenerEventName - the name of the Spark listener event
      Returns:
      a map containing the serialized lineage data