Technology

Technology

Citrusleaf is a next-generation system for data storage and access.  It leverages distributed systems technologies to ensure linear scalability and high availability in the face of component failures. Citrusleaf is built as a true shared-nothing service using a pure distributed-systems methodology. Citrusleaf technology has the following key benefits:

  • Distributed consensus and automatic data rebalancing algorithms provide robust self-management of the system during failures.
  • High performance transaction support guarantees data consistency.
  • Support for SSD storage devices results in the lowest possible cost-per-byte.
  • A light-weight data model enables agile application development.
  • An optimized data flow model results in fewer network hops between the application and data storage.
  • Automatic backup and restore protects against the catastrophic loss of data.
  • Built-in load balancing enables clients to dynamically adapt to changes in cluster state.

More detail on each of the above is provided below.

Self Managing Cluster

The component nodes of a Citrusleaf cluster can dynamically change due to addition of new nodes and removal of nodes due to failures or for maintenance. To ensure robust scalability and availability, the cluster is self managing. In order to understand how, let's look at the nodes in a little bit more detail.

Each node runs a copy of the Citrusleaf software on a commodity 64-bit Linux server (no special hardware is required). A Citrusleaf cluster node has three critical functions: data storage, participating in the cluster's distributed consensus system, and migrating data to other nodes as necessary.  Some brief notes on these functions are presented below.

Data Storage

Every node stores an equal fraction of the total set of data; the precise fraction varies based on data retention policies and other parameters.  Operations on any particular data element are either satisfied internally or transparently routed to the correct node via the internal cluster interconnect.

Distributed Consensus

All of the nodes participate in a distributed consensus algorithm, which is used to ensure agreement on a minimal amount of critical shared state.  The most critical part of this shared state is the list of nodes that are participating in the cluster.  Consequently, every time a node arrives or departs, the consensus algorithm runs to ensure that agreement is reached.  This process takes a fraction of a second.

Data Rebalancing

Once a node has been added to or removed from a cluster, the data needs to be rebalanced amongst the participating nodes.  The data rebalancing is conducted across the internal cluster interconnect, ensures that query volume is distributed evenly across all nodes, and is robust in the event of node failure happening during rebalancing itself.  This provides Citrusleaf's most powerful feature: scalability in both performance and capacity can be achieved entirely horizontally.  Furthermore, the system is designed to be continuously available, so data rebalancing doesn't impact cluster behavior.

Back to top

High Performance Transactions

Citrusleaf is not intended to replace your relational database systems.  It is intended to outperform them by an order of magnitude in the environments where you need that performance most: e.g., the high volume of frequently updated data that drives the front end of your business.  Citrusleaf is optimized to work with the latest in storage and database technology to squeeze as much transaction throughput as possible while still guaranteeing adequate consistency to make application development easy.

Citrusleaf makes strict guarantees about the atomicity of operations: each operation on a record be applied atomically and completely.  For example, a read from or a write to multiple bins in a record is guaranteed a consistent view of the record.  After a write is completely applied and the client is notified of success, all subsequent read requests are guaranteed to find the newly written data: there is no possibility of reading stale data.  When a read and a write operation for a record are pending simultaneously, they will be internally serialized before completion, but their precise ordering is not guaranteed.  Finally, in most data storage systems, reading a data element, modifying it, and then writing it back exposes a race condition that could lead to data corruption.  Citrusleaf, however, supports atomic conditional operations, making the very common read-modify-write cycle safe -- without the often-crippling overhead of explicit locking.

Back to top

Lowest Cost Storage

Citrusleaf supports data stored in RAM or directly attached storage. The number of nodes and amount of storage in each node must be sized based on the number of objects and replication factor. As the amount of memory in a node increases, the size of the cluster backplane network interface must also increase.

SSDs are the practical choice for locally attached storage. Citrusleaf has integrated with and validated the Intel X25-E. Citrusleaf's internal software methodology works best with the latency patterns inherent in enterprise SSD drives.

Choosing your hardware patterns for best ROI involves a look at your existing hardware possibilities, and your data load. Citrusleaf can be installed and tried on any Linux servers, but please feel free to contact us for help with larger scale planning or regarding support for other advanced storage technologies.

Back to top

Data Model

The Citrusleaf data model eschews the traditional schema-based relational database model in favor of a much more lightweight approach.  Let's see how it works.

In Citrusleaf, all data is aggregated into policy containers called namespaces, one or more of which are configured when the cluster is started(see here for more information on cluster startup). Any piece of data in a namespace is uniquely identified by its ''set'' and its ''key''.  A key is a reference to a piece of data: common keys include usernames and session identifiers.  A set is a grouping of common keys.  A key is unique within a set, but the same key could be reused in different sets.

The data referenced by the combination of a set and a key is called a ''record,'' and is organized as a collection of ''bins,'' which are just named values.  In some sense, a bin is analogous to a database column, and all the bins in a record comprise a row.  For example, in the record described above (corresponding to the username of `alovelace` in the set `Users`) contains six bins.

The contents of bins, or ''values,'' are typed.  The record above illustrates this: some bins contain strings (`username`, `passwordHash`, `password`), some contain integers (`failedLogins`), others contain timestamps (`birthdate`), and still others contain binary data (`picture`).  The available types correspond directly to the most common data types used in software development, and are mapped by the Citrusleaf library directly into the native representation of the calling language where possible.

Citrusleaf's data model is completely schema-free.  If you decide that you need a new set or bin to support a feature, just start using it. It's that simple!

Back to top

Data Flow Model

The following figure describes the data flow in a Citrusleaf system.

As we mentioned earlier, all data in Citrusleaf is aggregated into namespaces.  Apart from being used to isolate unrelated application data from each other, namespaces are also used to control retention and reliability requirements for a given set of data.  For example, one of the most important policies is the ''replication factor,'' which controls the number of stored copies of every piece of data; this lets you trade increased storage requirements for improved resiliency to simultaneous hardware failures.

In a Citrusleaf cluster, the contents of a namespace are partitioned by key value and the associated records are spread across every node in the cluster.  This virtual partitioning is entirely automatic and is completely invisible to the client. Therefore, when client libraries make calls using a lightweight Client API to the Citrusleaf cluster, any node can take requests for any piece of data.

If a cluster node receives a request for a piece of data it does not have locally, it satisfies the request by internally proxying the request, fetching the data from the real owner using the internal cluster interconnect, and subsequently replying to the client directly. Citrusleaf protocol also implements caching of latest known locations of client requested data in the client library itself to minimize the number of network hops required to respond to a client request.

Back to top

Backup and Restore

Citrusleaf provides automatic backup and restore. Since the system already implements sophisticated data rebalancing as part of its runtime operation, taking a backup for a hot standby is merely changing a configuration parameter. Similarly restoring from a backup can also be done by changing a configuration parameter. Please note that restoring from a previous backup should only be done for recovering from catastrophic failure. Most everyday hardware failures are handled by the Citrusleaf system natively and does not require any operator involvement.

Back to top

Built-in Load Balancing

Applications link against the Citrusleaf client library, which exposes a common set of interfaces for data insertion, retrieval, modification, and deletion.  This library serves a similar purpose to the ODBC and driver layers used to communicate with relational databases, connecting the application to the Citrusleaf cluster as a whole.

The client library serves several purposes: to implement the Citrusleaf TCP protocol, and to route the request to the "best" cluster node. An efficient TCP connection pool is maintained to cluster nodes. The library knows where individual data elements are stored. Most importantly, the client library dynamically tracks the size and state of the cluster, so no reconfiguration is necessary when cluster nodes are added or removed.

While it is extremely easy to implement applications to use the client libraries (described here), Citrusleaf also exposes a REST API for use with the Cloud Service.

Back to top