citrusleaf the noSQL database for the enterprise

technology

Wire Protocol

Overview

This document describes the Citrusleaf wire protocol. The protocol is a purpose-built, binary-oriented protocol for getting and setting values in a Citrusleaf cluster.

While Citrusleaf provides example clients in a number of languages, the
primary supported interface will be this protocol. Customers are encouraged to modify the provided clients to suit their needs and environment. Citrusleaf will extend this protocol, and add services and features. These features will be added in a backward-compatible way.

The protocol is a self-parsing binary protocol layered over TCP. Self-parsing is the method of using length specifiers in such a way that new information – fields, operations – can be added to the protocol, but old clients and servers can still parse the data stream effectively, responding with errors when they see messages they don't understand.

Citrusleaf Protocol Header

The protocol first has a fixed header section, which contains three fields: "type", "version", "length". Two message types are currently supported: "Citrusleaf Info" and "Citrusleaf Message". Depending on the "type", the appropriate message payload follows the fixed header section.

"Citrusleaf Info" is a simple, string-oriented message-type that allows a client to check the status of a cluster node, get information about the capabilities of the node, and explore the cluster.

"Citrusleaf Message" (AS_MSG) is a high-performance, efficient, binary-oriented message for getting and setting data within the cluster. The protocol supports pipelining but not out-of-order responses.

Where appropriate, network byte order always apply.

offset meaning length (in bytes)
0 version - current version is '2' 1
1 type -current defined types are 1 ("Citrusleaf Info") and 3 ("Citrusleaf Message") 1
2 length - network byte ordering, number of bytes to follow 6

Citrusleaf Info – (type 1)

This simple but critical message type allows clients to determine the capabilities of each node in the Citrusleaf cluster using a simple name-value pattern.

Besides providing version and feature information, "info" request enables a client to discover all nodes in the cluster. This allows a client to continually adjust to cluster members without requiring application-level configuration changes.

The message is comprised of name-value-pairs. All values are case-sensitive, and UTF-8 encoded, although every effort will be made to keep the names and values “ascii” (seven-bit safe). If the value represents a list, a common seperator is semi-colon (';') Below list examples of the currently supported name-value pairs.

Example name/value

name value(s)
build Aerospike-0.9-nnnn (where 'nnnn' is the build number)
node Unique string representing this node. Currently a 64-bit hexadecimal number.
services Semicolon-delimited list of addresses where other nodes are found.
statistics Semicolon-delimited list of addresses where other nodes are found.
replicas-read List of read replicas hosted by this node. List for form: namespace;partition_id;
replicas-write List of write replicas hosted by this node. List of form: namespace:partition_id;

Citrusleaf Message (type 3 – AS_MSG)

A Citrusleaf Message is the basic message structure used to request an object read/write, as well as functionalities such as scan. The client sends a request, and the server sends a response.

The request message contains a small, fixed size header, followed by a variable number of "fields" (See "Fields" section), and then followed by a variable number of operations (See "Operations" section below). In case of future expansion, the 'header_size' field will increase, and extra header values will be added to the end of the fixed header.

Citrusleaf Message: Header Section

offset name size description
0 header_size 1 byte number of bytes in this header. Currently 22.
1 info1 1 byte bitfield of READ flags
2 info2 1 byte bitfield of WRITE flags
3 info3 1 byte bitfield of RESPONSE flags
4 unused 1 byte an unused byte
5 result_code 1 byte on a response, whether the request succeeded or failed. 0 on requests.
6 generation 4 bytes on request, apply this transaction only if the generation matches. On response, the current generation of this row.
10 expiration 4 bytes on request, set the expiration of this row to this number of seconds in the future (from now). 0 means no expiration.
14 transaction ttl 4 bytes on request, set transaction ttl
18 n_field 2 bytes number of fields to follow in the data payload, which will be first
20 n_ops 2 bytes number of operations to follow in the data payload, which will follow the fields
22 data[] data contains first the fields, then the ops

The 'info1' field is a set of flags specifying the overall type of the operation. These flags refer to the entire transaction, while there may be different operations on each bin.

value name description
1 CL_MSG_INFO1_READ contains a read operation
2 CL_MSG_INFO1_GET_ALL get all bins data
4 CL_MSG_INFO1_GET_ALL_NODATA get all bins WITHOUT data
8 CL_MSG_INFO1_VERIFY Verify get transactions includes data
16 CL_MSG_INFO1_XDR operation is performed by XDR
32 CL_MSG_INFO1_NOBINDATA do not read the bin information

The 'info2' field is a set of flags specifying the overall type of the operation. These flags refer to the entire transaction, while there may be different operations on each bin.

value name description
1 CL_MSG_INFO2_WRITE contains a write operation
2 CL_MSG_INFO2_DELETE fling a record into the belly of Moloch
4 CL_MSG_INFO2_GENERATION pay attention to the generation
8 CL_MSG_INFO2_GENERATION_GT apply write if new generation >= old, good for restore
16 CL_MSG_INFO2_GENERATION_DUP if a generation collision, create a duplicate
32 CL_MSG_INFO2_WRITE_UNIQUE write only if it doesn't exist
64 CL_MSG_INFO2_WRITE_BINUNIQUE write only if the bin doesn't exist already

The 'info3' field is a set of flags specifying the overall type of the operation. These flags refer to the entire transaction, while there may be different operations on each bin.

value name description
1 CL_MSG_INFO3_LAST this is the last of a multi-part message
2 CL_MSG_INFO3_TRACE apply server trace logging for this transaction

Note that in many cases the read and write fields must be combined with other fields. For example, if a record is to be deleted, the write bit must also be set. Similarly, for a 'get all' operation, the 'read' bit must be set.

A record can contain both read and write operations. The writes will be applied before the read operations.

Citrusleaf Message: Fields

Fields are per-request information that identifies the record to be operated on.
Typical values are the namespace, the key (or digest), the set, and other
information to determine the record. Generally multiple fields are required to uniquely identify a (list of) record(s).

offset name size description
0 size 4 size of data to follow
4 field_type 1 type of data -The different types allowed are described below
5 data size-1 data

Allowed "field" types

value name description
0 CL_MSG_FIELD_TYPE_NAMESPACE namespace
1 CL_MSG_FIELD_TYPE_SET a particular "set" within the namespace
2 CL_MSG_FIELD_TYPE_KEY the key
4 CL_MSG_FIELD_TYPE_DIGEST_RIPE the RIPEMD160 digest representing the key (20 bytes)
6 CL_MSG_FIELD_TYPE_DIGEST_RIPE_ARRAY an array of digests
7 CL_MSG_FIELD_TYPE_TRID transaction id

Citrusleaf Message: Operations

Operations describe the actions that are to be taken on the specified bin(s) within the identified record.

Operations are of the following format

offset name size description
0 size 4 N bytes to follow
4 op 1 operation to apply- Allowed types described below
5 bin_data_type 1 type of data to follow
6 binname length (N) 1 size of bin name to follow
7 binname N bin name in UTF-8
7+N data size-(N+3) bin data according to the bin_data_type

Allowed operations

value name description
1 CL_MSG_OP_READ read the value in question
2 CL_MSG_OP_WRITE write the value in question
3 CL_MSG_OP_WRITE_UNIQUE write a namespace-wide unique value
5 CL_MSG_OP_ADD In case of integer, add to existing value