Class ProtoCoder<T extends com.google.protobuf.Message>

  • Type Parameters:
    T - the Protocol Buffers Message handled by this Coder.
    All Implemented Interfaces:
    java.io.Serializable
    Direct Known Subclasses:
    DynamicProtoCoder

    public class ProtoCoder<T extends com.google.protobuf.Message>
    extends org.apache.beam.sdk.coders.CustomCoder<T>
    A Coder using Google Protocol Buffers binary format. ProtoCoder supports both Protocol Buffers syntax versions 2 and 3.

    To learn more about Protocol Buffers, visit: https://developers.google.com/protocol-buffers

    ProtoCoder is registered in the global CoderRegistry as the default Coder for any Message object. Custom message extensions are also supported, but these extensions must be registered for a particular ProtoCoder instance and that instance must be registered on the PCollection that needs the extensions:

    
     import MyProtoFile;
     import MyProtoFile.MyMessage;
    
     Coder<MyMessage> coder = ProtoCoder.of(MyMessage.class).withExtensionsFrom(MyProtoFile.class);
     PCollection<MyMessage> records = input.apply(...).setCoder(coder);
     

    Versioning

    ProtoCoder supports both versions 2 and 3 of the Protocol Buffers syntax. However, the Java runtime version of the google.com.protobuf library must match exactly the version of protoc that was used to produce the JAR files containing the compiled .proto messages.

    For more information, see the Protocol Buffers documentation.

    ProtoCoder and Determinism

    In general, Protocol Buffers messages can be encoded deterministically within a single pipeline as long as:

    • The encoded messages (and any transitively linked messages) do not use map fields.
    • Every Java VM that encodes or decodes the messages use the same runtime version of the Protocol Buffers library and the same compiled .proto file JAR.

    ProtoCoder and Encoding Stability

    When changing Protocol Buffers messages, follow the rules in the Protocol Buffers language guides for proto2 and proto3 syntaxes, depending on your message type. Following these guidelines will ensure that the old encoded data can be read by new versions of the code.

    Generally, any change to the message type, registered extensions, runtime library, or compiled proto JARs may change the encoding. Thus even if both the original and updated messages can be encoded deterministically within a single job, these deterministic encodings may not be the same across jobs.

    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.beam.sdk.coders.Coder

        org.apache.beam.sdk.coders.Coder.Context, org.apache.beam.sdk.coders.Coder.NonDeterministicException
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static long serialVersionUID  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected ProtoCoder​(java.lang.Class<T> protoMessageClass, java.util.Set<java.lang.Class<?>> extensionHostClasses)
      Private constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      T decode​(java.io.InputStream inStream)  
      T decode​(java.io.InputStream inStream, org.apache.beam.sdk.coders.Coder.Context context)  
      void encode​(T value, java.io.OutputStream outStream)  
      void encode​(T value, java.io.OutputStream outStream, org.apache.beam.sdk.coders.Coder.Context context)  
      boolean equals​(@Nullable java.lang.Object other)  
      static org.apache.beam.sdk.coders.CoderProvider getCoderProvider()
      Returns a CoderProvider which uses the ProtoCoder for proto messages.
      java.util.Set<java.lang.Class<?>> getExtensionHosts()  
      com.google.protobuf.ExtensionRegistry getExtensionRegistry()
      Returns the ExtensionRegistry listing all known Protocol Buffers extension messages to T registered with this ProtoCoder.
      java.lang.Class<T> getMessageType()
      Returns the Protocol Buffers Message type this ProtoCoder supports.
      protected com.google.protobuf.Parser<T> getParser()
      Get the memoized Parser, possibly initializing it lazily.
      int hashCode()  
      static <T extends com.google.protobuf.Message>
      ProtoCoder<T>
      of​(java.lang.Class<T> protoMessageClass)
      Returns a ProtoCoder for the given Protocol Buffers Message.
      static <T extends com.google.protobuf.Message>
      ProtoCoder<T>
      of​(org.apache.beam.sdk.values.TypeDescriptor<T> protoMessageType)
      Returns a ProtoCoder for the Protocol Buffers Message indicated by the given TypeDescriptor.
      void verifyDeterministic()  
      ProtoCoder<T> withExtensionsFrom​(java.lang.Class<?>... moreExtensionHosts)
      ProtoCoder<T> withExtensionsFrom​(java.lang.Iterable<java.lang.Class<?>> moreExtensionHosts)
      Returns a ProtoCoder like this one, but with the extensions from the given classes registered.
      • Methods inherited from class org.apache.beam.sdk.coders.CustomCoder

        getCoderArguments
      • Methods inherited from class org.apache.beam.sdk.coders.Coder

        consistentWithEquals, getEncodedElementByteSize, getEncodedTypeDescriptor, isRegisterByteSizeObserverCheap, registerByteSizeObserver, structuralValue, verifyDeterministic, verifyDeterministic
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ProtoCoder

        protected ProtoCoder​(java.lang.Class<T> protoMessageClass,
                             java.util.Set<java.lang.Class<?>> extensionHostClasses)
        Private constructor.
    • Method Detail

      • of

        public static <T extends com.google.protobuf.Message> ProtoCoder<T> of​(java.lang.Class<T> protoMessageClass)
        Returns a ProtoCoder for the given Protocol Buffers Message.
      • of

        public static <T extends com.google.protobuf.Message> ProtoCoder<T> of​(org.apache.beam.sdk.values.TypeDescriptor<T> protoMessageType)
        Returns a ProtoCoder for the Protocol Buffers Message indicated by the given TypeDescriptor.
      • withExtensionsFrom

        public ProtoCoder<T> withExtensionsFrom​(java.lang.Iterable<java.lang.Class<?>> moreExtensionHosts)
        Returns a ProtoCoder like this one, but with the extensions from the given classes registered.

        Each of the extension host classes must be an class automatically generated by the Protocol Buffers compiler, protoc, that contains messages.

        Does not modify this object.

      • encode

        public void encode​(T value,
                           java.io.OutputStream outStream)
                    throws java.io.IOException
        Specified by:
        encode in class org.apache.beam.sdk.coders.Coder<T extends com.google.protobuf.Message>
        Throws:
        java.io.IOException
      • encode

        public void encode​(T value,
                           java.io.OutputStream outStream,
                           org.apache.beam.sdk.coders.Coder.Context context)
                    throws java.io.IOException
        Overrides:
        encode in class org.apache.beam.sdk.coders.Coder<T extends com.google.protobuf.Message>
        Throws:
        java.io.IOException
      • decode

        public T decode​(java.io.InputStream inStream)
                 throws java.io.IOException
        Specified by:
        decode in class org.apache.beam.sdk.coders.Coder<T extends com.google.protobuf.Message>
        Throws:
        java.io.IOException
      • decode

        public T decode​(java.io.InputStream inStream,
                        org.apache.beam.sdk.coders.Coder.Context context)
                 throws java.io.IOException
        Overrides:
        decode in class org.apache.beam.sdk.coders.Coder<T extends com.google.protobuf.Message>
        Throws:
        java.io.IOException
      • equals

        public boolean equals​(@Nullable java.lang.Object other)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • verifyDeterministic

        public void verifyDeterministic()
                                 throws org.apache.beam.sdk.coders.Coder.NonDeterministicException
        Overrides:
        verifyDeterministic in class org.apache.beam.sdk.coders.CustomCoder<T extends com.google.protobuf.Message>
        Throws:
        org.apache.beam.sdk.coders.Coder.NonDeterministicException
      • getMessageType

        public java.lang.Class<T> getMessageType()
        Returns the Protocol Buffers Message type this ProtoCoder supports.
      • getExtensionHosts

        public java.util.Set<java.lang.Class<?>> getExtensionHosts()
      • getExtensionRegistry

        public com.google.protobuf.ExtensionRegistry getExtensionRegistry()
        Returns the ExtensionRegistry listing all known Protocol Buffers extension messages to T registered with this ProtoCoder.
      • getParser

        protected com.google.protobuf.Parser<T> getParser()
        Get the memoized Parser, possibly initializing it lazily.
      • getCoderProvider

        public static org.apache.beam.sdk.coders.CoderProvider getCoderProvider()
        Returns a CoderProvider which uses the ProtoCoder for proto messages.

        This method is invoked reflectively from DefaultCoder.