This package provides a KCL application which implements the multi language protocol. The multi language protocol
defines a system for communication between a KCL multi-lang application and another process (referred to as the
"child process") over STDIN and STDOUT of the child process. The units of communication are JSON messages which
represent the actions the receiving entity should perform. The child process is responsible for reacting
appropriately to four different messages: initialize, processRecords, checkpoint, and shutdown. The KCL multi-lang
app is responsible for reacting appropriately to two messages generated by the child process: status and checkpoint.
Action messages sent to child process
{ "action" : "initialize",
"shardId" : "string",
}
{ "action" : "processRecords",
"records" : [{ "data" : "<base64encoded_string>",
"partitionKey" : "<partition key>",
"sequenceNumber" : "<sequence number>";
}] // a list of records
}
{ "action" : "checkpoint",
"checkpoint" : "<sequence number>",
"error" : "<NameOfException>"
}
{ "action" : "shutdown",
"reason" : "<TERMINATE|ZOMBIE>"
}
Action messages sent to KCL by the child process
{ "action" : "checkpoint",
"checkpoint" : "<sequenceNumberToCheckpoint>";
}
{ "action" : "status",
"responseFor" : "<nameOfAction>";
}
High Level Description Of Protocol
The child process will be started by the KCL multi-lang application. There will be one child process for each shard
that this worker is assigned to. The multi-lang app will send an initialize, processRecords, or shutdown message upon
invocation of its corresponding methods. Each message will be on a single line, the messages will be
separated by new lines.The child process is expected to read these messages off its STDIN line by line. The child
process must respond over its STDOUT with a status message indicating that is has finished performing the most recent
action. The multi-lang daemon will not begin to send another message until it has received the response for the
previous message.
Checkpointing Behavior
The child process may send a checkpoint message at any time
after receiving a processRecords or shutdown
action and
before sending the corresponding status message back to the processor. After sending a checkpoint
message over STDOUT, the child process is expected to immediately begin to read its STDIN, waiting for the checkpoint
result message from the KCL multi-lang processor.
Protocol From Child Process Perspective
Initialize
- Read an "initialize" action from STDIN
- Perform initialization steps
- Write "status" message to indicate you are done
- Begin reading line from STDIN to receive next action
ProcessRecords
- Read a "processRecords" action from STDIN
- Perform processing tasks (you may write a checkpoint message at any time)
- Write "status" message to STDOUT to indicate you are done.
- Begin reading line from STDIN to receive next action
Shutdown
- Read a "shutdown" action from STDIN
- Perform shutdown tasks (you may write a checkpoint message at any time)
- Write "status" message to STDOUT to indicate you are done.
- Begin reading line from STDIN to receive next action
Checkpoint
- Read a "checkpoint" action from STDIN
- Decide whether to checkpoint again based on whether there is an error or not.
Base 64 Encoding
The "data" field of the processRecords action message is an array of arbitrary bytes. To send this in a JSON string
we apply base 64 encoding which transforms the byte array into a string (specifically this string doesn't have JSON
special symbols or new lines in it). The multi-lang processor will use the Jackson library which uses a variant of
MIME called MIME_NO_LINEFEEDS
(see
Jackson doc for more details) MIME is the basis of most base64 encoding variants including
RFC 3548 which is the standard used by Python's
base64 module.