technology
ACID Compliance
The Citrusleaf database platform is unique in its ability to deliver extremely high throughput of ACID transactions using commodity hardware.
Citrusleaf is intended to outperform traditional databases by an order of magnitude in the mission-critical environments where you need that performance most: e.g., the high volume of frequently updated data that drives the front end of your business. Citrusleaf is optimized to work with the latest in storage and database technology to squeeze as much transaction throughput as possible while still guaranteeing strong consistency (ACID) to make application development easy.
Atomicity
For read/write operations on a single record, Citrusleaf makes strict guarantees about the atomicity of these operations:
- Each operation on a record is applied atomically and completely. For example, a read from or a write to multiple bins in a record is guaranteed a consistent view of the record.
- After a write is completely applied and the client is notified of success, all subsequent read requests are guaranteed to find the newly written data: there is no possibility of reading stale data. Therefore, Citrusleaf transactions provide immediate consistency.
In addition to single record operations, Citrusleaf supports distributed multi-key transactions using a simple and fast iteration interface where a client can simply request all or part of the data in a particular set. This mechanism is currently used for database backup and basic analytics on the data and delivers extremely high throughput. Single key transactions are serialized with respect to a multi-key transaction but two multi-key operations may not be serialized with each other.
Consistency
For operations on single keys, Citrusleaf provides provides immediate consistency using synchronous replication.
Multi-key transactions are implemented as a sequence of single key operations and do not hold record locks except for the time required to read a clean copy. Thus the multi-key transaction provides a consistent snapshot of the data in the database (i.e., no "dirty reads" are done).
Citrusleaf's support for relaxing consistency models gives operators the ability to maintain high performance during the cluster recovery phase after node failure. E.g., read and write transactions to records that have unmerged duplicates in the cluster can be sped up by bypassing the duplicate merge phase.
In the presence of failures, the cluster can run in one of two modes - Partition Tolerant or High Consistency. The difference between the two modes is seen only for a brief period during cluster recovery after a node failure. Citrusleaf is highly consistent when the cluster does not split.
Partition Tolerant Mode
In Partition Tolerant mode, when a cluster splits, each faction of the cluster continues operating. One faction - or the other - may not have all of the data, so an application reading data may have successful transactions stating that data is not found in the cluster. Each faction will being the process of obeying the replication factor rules, thus replicating data, and may accept writes from clients. Application servers which read from the other faction will not see the applied writes, and may write to the same primary keys. If, at a later point, the factions rejoin, data which has been written in both factions will be detected as inconsistent. Two policies may be followed. Either Citrusleaf will 'auto-merge' the data by favoring the last write (the write with the latest server timestamp), or both copies of the data will be retained. If two copies - versions - of the data are available in the cluster, a read of this value will return both versions, allowing the application to resolve the inconsistency. The client application - the only entity with knowledge of how to resolve these differences - must then re-write the data in a consistent fashion.
High Consistency Mode
In High Consistency mode, when the cluster splits, the minority quorum could be made to halt. This action prevents any client from receiving inconsistent data, but will reduce availability.
Isolation
Citrusleaf implements distributed isolation techniques consisting of latches and short-term record locks to ensure isolation between multiple transactions. Therefore, when a read and a write operation for a record are pending simultaneously, they will be internally serialized before completion, though their precise ordering is not guaranteed.
For enabling simple multi-record transactions, Citrusleaf supports an optimistic concurrency control scheme based on atomic conditional operations (CAS - Check and Set), making the very common read-modify-write cycle safe -- without the often-crippling overhead of explicit locking. In many simplistic data storage systems, reading a data element, modifying it, and then writing it back exposes a race condition that could lead to data corruption during highly concurrent access to the data item.
For multi-key operations, one of the cluster nodes anchors the iteration operation and requests data from all the other nodes in parallel. Snapshots are taken of the indexes at various points to allow minimal lock hold times. As data is retrieved in parallel from the working nodes, it is forwarded to the client, with client flow control exerting backpressure on the distributed transaction.
Any client-server system suffers from the client being potentially disconnected from the server at any time. This can result in the client being unable to distinguish whether a transaction in flight has been applied or not. Recovery from this situation may be quite complex. Citrusleaf supports mechanisms for retrying writes and using a client generated unique persistent transaction identifier that enables clients to properly determine if the transaction has completed properly.
Durability
In order to keep your data always available, Citrusleaf provides multi-server replication of your data. The cluster is configured to contain multiple namespaces (like 'databases' within an RDBMs), and each namespace contains configuration of the storage system, and storage policies, for the data contained in that namespace.
The basic mechanisms for providing durability in Citrusleaf consists of replication to multiple nodes using both DRAM and persistent storage. Therefore, every transaction update is written to multiple locations on the cluster before returning to the client. For example, in a persistent namespace that stores data in both DRAM and disk with a replication factor 2, the record is stored in four locations, two copies in DRAM on two nodes and two additional copies on disk.
Citrusleaf is resilient to simultaneous hardware failures.
In the presence of node failures, clients are able to seamlessly retrieve one of the copies of the data from the cluster with no special effort. This is because, in a Citrusleaf cluster, the virtual partitioning and distribution of data within the cluster is completely invisible to the client. Therefore, when client libraries make calls using a lightweight Client API to the Citrusleaf cluster, any node can take requests for any piece of data.
If a cluster node receives a request for a piece of data it does not have locally, it satisfies the request by internally proxying the request, fetching the data from the real owner using the internal cluster interconnect and subsequently replying to the client directly. The Citrusleaf client-server protocol also implements caching of latest known locations of client requested data in the client library itself to minimize the number of network hops required to respond to a client request.
During the period immediately after a cluster node has been added or removed, the Citrusleaf cluster automatically transfers data between the nodes to rebalance and achieve data availability. During this time, Citrusleaf's internal "proxy" transaction tracking allows high-consistency to be achieved by applying reads and writes to the cluster nodes which have the data, even if the data is in motion.
Citrusleaf's backup and restore enables offsite data storage and cross data center portability.
Citrusleaf provides online backup and restore, which, as the name indicates, can be applied while the cluster is in operation. Even though data replication will solve most real-world data center availability issues, an essential tool of any database administrator is the ability to run backup and restore. A Citrusleaf cluster has the ability to iterate all data within a namespace (similar to a map/reduce). The backup and restore tools are typically run on a maintenance machine with a large amount of inexpensive, standard rotational disk.
Citrusleaf backup and restore tools are made available with full source. The file format in use is optimized for high speed but uses an ASCII format, allowing an operator to validate the data inserted into the cluster, and use standard scripts to move data from one datastore to another. The backup tool splits the backup into multiple files to allow restores to occur in parallel from multiple machines in the case of needing a very rapid response to a catastorphic failure event.

