technology
Wire Protocol
Overview
This document describes the Citrusleaf wire protocol. The protocol is a purpose-built, binary-oriented protocol for getting and setting values in a Citrusleaf cluster.
While Citrusleaf provides example clients in a number of languages, the
primary supported interface will be this protocol. Customers are encouraged to modify the provided clients to suit their needs and environment. Citrusleaf will extend this protocol, and add services and features. These features will be added in a backward-compatible way.
The protocol is a self-parsing binary protocol layered over TCP. Self-parsing is the method of using length specifiers in such a way that new information – fields, operations – can be added to the protocol, but old clients and servers can still parse the data stream effectively, responding with errors when they see messages they don't understand.
Citrusleaf Protocol Header
The protocol first has a fixed header section, which contains three fields: "type", "version", "length". Two message types are currently supported: "Citrusleaf Info" and "Citrusleaf Message". Depending on the "type", the appropriate message payload follows the fixed header section.
"Citrusleaf Info" is a simple, string-oriented message-type that allows a client to check the status of a cluster node, get information about the capabilities of the node, and explore the cluster.
"Citrusleaf Message" (AS_MSG) is a high-performance, efficient, binary-oriented message for getting and setting data within the cluster. The protocol supports pipelining but not out-of-order responses.
Where appropriate, network byte order always apply.
| offset | meaning | length (in bytes) |
|---|---|---|
| 0 | version - current version is '2' | 1 |
| 1 | type -current defined types are 1 ("Citrusleaf Info") and 3 ("Citrusleaf Message") | 1 |
| 2 | length - network byte ordering, number of bytes to follow | 6 |
Citrusleaf Info – (type 1)
This simple but critical message type allows clients to determine the capabilities of each node in the Citrusleaf cluster using a simple name-value pattern.
Besides providing version and feature information, "info" request enables a client to discover all nodes in the cluster. This allows a client to continually adjust to cluster members without requiring application-level configuration changes.
The message is comprised of name-value-pairs. All values are case-sensitive, and UTF-8 encoded, although every effort will be made to keep the names and values “ascii” (seven-bit safe). If the value represents a list, a common seperator is semi-colon (';') Below list examples of the currently supported name-value pairs.
Example name/value
| name | value(s) |
|---|---|
| build | Aerospike-0.9-nnnn (where 'nnnn' is the build number) |
| node | Unique string representing this node. Currently a 64-bit hexadecimal number. |
| services | Semicolon-delimited list of addresses where other nodes are found. |
| statistics | Semicolon-delimited list of addresses where other nodes are found. |
| replicas-read | List of read replicas hosted by this node. List for form: namespace;partition_id; |
| replicas-write | List of write replicas hosted by this node. List of form: namespace:partition_id; |
Citrusleaf Message (type 3 – AS_MSG)
A Citrusleaf Message is the basic message structure used to request an object read/write, as well as functionalities such as scan. The client sends a request, and the server sends a response.
The request message contains a small, fixed size header, followed by a variable number of "fields" (See "Fields" section), and then followed by a variable number of operations (See "Operations" section below). In case of future expansion, the 'header_size' field will increase, and extra header values will be added to the end of the fixed header.
Citrusleaf Message: Header Section
| offset | name | size | description |
|---|---|---|---|
| 0 | header_size | 1 byte | number of bytes in this header. Currently 22. |
| 1 | info1 | 1 byte | bitfield of READ flags |
| 2 | info2 | 1 byte | bitfield of WRITE flags |
| 3 | info3 | 1 byte | bitfield of RESPONSE flags |
| 4 | unused | 1 byte | an unused byte |
| 5 | result_code | 1 byte | on a response, whether the request succeeded or failed. 0 on requests. |
| 6 | generation | 4 bytes | on request, apply this transaction only if the generation matches. On response, the current generation of this row. |
| 10 | expiration | 4 bytes | on request, set the expiration of this row to this number of seconds in the future (from now). 0 means no expiration. |
| 14 | transaction ttl | 4 bytes | on request, set transaction ttl |
| 18 | n_field | 2 bytes | number of fields to follow in the data payload, which will be first |
| 20 | n_ops | 2 bytes | number of operations to follow in the data payload, which will follow the fields |
| 22 | data[] | data contains first the fields, then the ops |
The 'info1' field is a set of flags specifying the overall type of the operation. These flags refer to the entire transaction, while there may be different operations on each bin.
| value | name | description |
|---|---|---|
| 1 | CL_MSG_INFO1_READ | contains a read operation |
| 2 | CL_MSG_INFO1_GET_ALL | get all bins data |
| 4 | CL_MSG_INFO1_GET_ALL_NODATA | get all bins WITHOUT data |
| 8 | CL_MSG_INFO1_VERIFY | Verify get transactions includes data |
| 16 | CL_MSG_INFO1_XDR | operation is performed by XDR |
| 32 | CL_MSG_INFO1_NOBINDATA | do not read the bin information |
The 'info2' field is a set of flags specifying the overall type of the operation. These flags refer to the entire transaction, while there may be different operations on each bin.
| value | name | description |
|---|---|---|
| 1 | CL_MSG_INFO2_WRITE | contains a write operation |
| 2 | CL_MSG_INFO2_DELETE | fling a record into the belly of Moloch |
| 4 | CL_MSG_INFO2_GENERATION | pay attention to the generation |
| 8 | CL_MSG_INFO2_GENERATION_GT | apply write if new generation >= old, good for restore |
| 16 | CL_MSG_INFO2_GENERATION_DUP | if a generation collision, create a duplicate |
| 32 | CL_MSG_INFO2_WRITE_UNIQUE | write only if it doesn't exist |
| 64 | CL_MSG_INFO2_WRITE_BINUNIQUE | write only if the bin doesn't exist already |
The 'info3' field is a set of flags specifying the overall type of the operation. These flags refer to the entire transaction, while there may be different operations on each bin.
| value | name | description |
|---|---|---|
| 1 | CL_MSG_INFO3_LAST | this is the last of a multi-part message |
| 2 | CL_MSG_INFO3_TRACE | apply server trace logging for this transaction |
Note that in many cases the read and write fields must be combined with other fields. For example, if a record is to be deleted, the write bit must also be set. Similarly, for a 'get all' operation, the 'read' bit must be set.
A record can contain both read and write operations. The writes will be applied before the read operations.
Citrusleaf Message: Fields
Fields are per-request information that identifies the record to be operated on.
Typical values are the namespace, the key (or digest), the set, and other
information to determine the record. Generally multiple fields are required to uniquely identify a (list of) record(s).
| offset | name | size | description |
|---|---|---|---|
| 0 | size | 4 | size of data to follow |
| 4 | field_type | 1 | type of data -The different types allowed are described below |
| 5 | data | size-1 | data |
Allowed "field" types
| value | name | description |
|---|---|---|
| 0 | CL_MSG_FIELD_TYPE_NAMESPACE | namespace |
| 1 | CL_MSG_FIELD_TYPE_SET | a particular "set" within the namespace |
| 2 | CL_MSG_FIELD_TYPE_KEY | the key |
| 4 | CL_MSG_FIELD_TYPE_DIGEST_RIPE | the RIPEMD160 digest representing the key (20 bytes) |
| 6 | CL_MSG_FIELD_TYPE_DIGEST_RIPE_ARRAY | an array of digests |
| 7 | CL_MSG_FIELD_TYPE_TRID | transaction id |
Citrusleaf Message: Operations
Operations describe the actions that are to be taken on the specified bin(s) within the identified record.
Operations are of the following format
| offset | name | size | description |
|---|---|---|---|
| 0 | size | 4 | N bytes to follow |
| 4 | op | 1 | operation to apply- Allowed types described below |
| 5 | bin_data_type | 1 | type of data to follow |
| 6 | binname length (N) | 1 | size of bin name to follow |
| 7 | binname | N | bin name in UTF-8 |
| 7+N | data | size-(N+3) | bin data according to the bin_data_type |
Allowed operations
| value | name | description |
|---|---|---|
| 1 | CL_MSG_OP_READ | read the value in question |
| 2 | CL_MSG_OP_WRITE | write the value in question |
| 3 | CL_MSG_OP_WRITE_UNIQUE | write a namespace-wide unique value |
| 5 | CL_MSG_OP_ADD | In case of integer, add to existing value |

