rick_grehan
Contributing Editor

Review: Aerospike kicks scale-out NoSQL into high gear

reviews
Mar 18, 201512 mins
Data ManagementDatabasesNoSQL Databases

Aerospike Server leverages memory and SSDs to bring extremely high throughput to a flexible, scalable key-value database

In this new world of scale-out, clustered NoSQL databases that process thousands — even millions — of transactions per second, good cache management is critical for good performance. Service I/O requests from RAM whenever you can; service I/O requests from spinning disks only when you must.

Aersospike takes this principle to its logical extreme — service all requests from RAM — and even goes a step further: By leveraging the benefits of SSDs, Aerospike amplifies the performance and scalability of an otherwise all-RAM database. Aerospike does not completely ignore the utility of spinning disks, but “traditional” hard disks do not participate in the moment-to-moment activity of an Aerospike cluster. Instead, they serve as backup media for data protection in the event of catastrophic failure.

Aerospike is an open source, in-memory NoSQL database that is fundamentally a key-value store. It’snamed after a kind of rocket engine whose principal advantage is that it uses less fuel at lower altitudes, as compared to a conventional engine. There are currently no Aerospike rocket engines in use; the same is not true of the database.

Aerospike is available in a free community edition and in a paid-for enterprise edition. The cost of the enterprise edition depends on the amount of data stored. (For details, see Aerospike’s products page.) The main difference between the two editions — other than that the community edition is without 24-7 customer support — is the cross-data center replication provided in the enterprise edition, a process that executes in parallel with the core Aerospike engine to replicate data between clusters.

aerospike dashboard2

The Aerospike Management Console dashboard provides an overview of cluster statistics, real-time displays of read and write throughput, a detailed view of individual nodes, and a view into cluster namespaces.

Inside Aerospike

To help new users get up to speed, Aerospike’s online documentation provides a handy list of the system’s key elements and matches them to their roughly corresponding equivalents in the RDBMS world. So, the topmost organizing component is a namespace, which is analogous to a database. Within a namespace, data is contained in sets (analogous to tables) and, within sets, records (analogous to rows). A record is composed of bins (think columns) where a bin is a key-value pair; the key being the bin’s name and the value its content. A bin’s content can be any of the data types that Aerospike supports: integers, strings, or byte arrays (effectively, a blob). A bin can also hold one of the two complex data types, List and Map, in addition to serialized data from the native format of Java, C#, Python, or Ruby.

Aerospike is described as “schemaless” in that sets and bins can be created on the fly; neither need be defined at database creation time. So, two records in the same set might be composed of a completely different set of bins.

Every record is indexed by a primary key, which is specified when the record is created. The key is usually a string, but can also be an integer, a byte array, or a serialized value. The primary key is used in conjunction with a hashing algorithm (described later) to distribute data among cluster members (commonly called “sharding”). Secondary indexes can be defined on bins and serve as the mechanism to perform queries.

Aerospike always keeps all indexes in RAM. This is true whether the database is running in memory only, or as what Aerospike calls a “hybrid” database — that is, a combination of RAM and SSDs. In a hybrid system, if a namespace is configured to use flash storage, the actual data of the database is kept on the SSDs in a log-structured file system. Aerospike hot-rods flash access by accessing the raw blocks of the SSDs directly, rather than through the native OS’s filesystem.

In addition, the keys themselves are never kept in the index; only their hashed values, generated by a RIPE-MD 160 algorithm. The advantage of this scheme is that the index size is immune to variations in key sizes. Each hash entry (adding necessary overhead) is 64 bytes. It is therefore straightforward to determine how many keys will fit in a given amount of RAM. Nevertheless, this sounds risky, as it depends on no two keys hashing to the same value. The Aerospike engineers claim that the RIPE-MD 160 algorithm (which produces 160-bit hashes) has “no known collisions.” Even if a collision happens, its incidence would be so rare that it might actually go unnoticed. Note too that it wouldn’t corrupt existing data; it would simply cause a new piece of data’s primary key to be rejected.

Aerospike is ACID compliant, with read-committed isolation level on I/O transactions. This means that reads during a write transaction might return “old” or “new” data, but once the transaction completes, all reads will return only new data. Data protection is enhanced by synchronous data replication among the cluster nodes.

Aerospike clusters and partitions

Nodes within an Aerospike cluster are symmetric: a given I/O request can be sent to any node. However, as data is distributed via sharding, an I/O request for a specific record will be satisfied most promptly by the node responsible for that record’s shard partition. Because Aerospike client software keeps itself abreast of the cluster’s partition map automatically, virtually all single-record requests are sent directly to the proper node.

If a new node enters the cluster, or an existing node drops out, the cluster’s members communicate with one another and manage rebalancing activities invisibly. (Aerospike uses the Paxos algorithm for arriving at a cluster consensus for any cluster state changes.) Rebalancing operations occur in the background, and Aerospike allows you to configure the rebalancing priority, so you can tune the level of impact rebalancing might have on database transaction performance.

An Aerospike cluster is highly resilient and will continue servicing requests even if a node drops out. Consequently, you can upgrade a live cluster without interrupting clients: simply “walk” around the cluster — removing, upgrading, and returning each node — and the cluster will adjust itself accordingly. The same idea holds if a node must be temporarily removed from the cluster to service failed hardware, such as when replacing a bad SSD. When the node is returned to the cluster, the cluster will re-adjust itself automatically.

Aerospike is principally an in-memory database; memory is a precious — and limited — resource. As one means of managing that resource, all rows written within a namespace are tagged with a configurable “time to live” value. A row that has exceeded its time to live will be automatically deleted, a process that Aerospike refers to as “eviction.” Note that a record’s time to live is reset whenever the record is modified.

In addition, when a namespace is created, the administrator specifies a maximum amount of RAM or SSD available to that namespace. The administrator can also specify a “high water mark” as a percentage of that allocated space. When the high water mark is exceeded, Aerospike will begin deleting records closest to their eviction time.

Sets have a separate eviction configuration, and you can specify the maximum number of records allowed in a set. As with the namespace high water mark, Aerospike will delete records when the number is exceeded, starting with those closest to their eviction time.

Aerospike queries and scans

Aerospike’s queries rely on secondary indexes. You create a secondary index by specifying a bin, and the Aerospike engine builds the index asynchronously, populated by the specific bin’s values. (The Aerospike client library gives you the option of making the index creation a synchronous operation, should your application require it.) Currently, secondary indexes support only string and integer values. In addition, querying is really filtering, as you can define only two principal sorts of queries: equality and range.

To keep traffic between cluster and clients to a minimum, you can specify the subset of bins that a query returns (analogous to the column names in an SQL SELECT statement).

An Aerospike UDF (user defined function) is the database’s counterpart to a relational database’s stored procedure. You write UDFs in the Lua language, and they are deployed on the Aerospike cluster. Aerospike recognizes two kinds of UDFs: record and stream. A record UDF is applied to records fetched by a query. Typically, you define the query to fetch the subset of records to be operated on, and the UDF “transforms” each record in that subset. So, you could use a query and UDF combo to update a specific bin’s fetched records, or even delete records that met some sort of UDF-determined criterion (a “culling” operation).

Stream UDFs — which are also applied to the results of a query — are used to calculate aggregate values, and actually work along the lines of mapreduce operations (made popular by Hadoop). The stream UDF API provides map() and reduce() methods, where the map() method will map (transform) the value taken from each record in the stream to another value. The reduce() method reduces (aggregates) those values into a single value, typically the result (though such UDFs can be “chained” so that the output of one reduce() step might become the input of the next stage’s map() operation).

Finally, bulk operations on a namespace or a set are performed by Aerospike scans. Unlike a query, a scan does not require a secondary index. There are two kinds of Aerospike scan operations. A read-only scan will read all or a specified subset of the records in a set or namespace. A read-write scan will read all the records in a set or namespace, and update specific records in the result set via a record UDF. Because read-write scans employ UDFs, they execute “closer to the data” and are therefore more efficient than fetching each record to the client for updating. Scans are entirely asynchronous; they run in the background and are designed so that their execution has minimal effects on database transactions.

Managing Aerospike

Once you’ve got an Aerospike cluster installed, you can monitor its health and manage its moving parts using the Aerospike Web-based management console (which is a separate download). The console — running in a browser — communicates with a separate process that connects to any cluster node.

As with the database, the management console comes in community and enterprise editions, and the enterprise edition has capabilities not found in the community version. For example, the enterprise edition has cross data center management and provides tools for cluster backup and restore; you’ll find neither in the community edition.

The management console’s interface is easily navigated, consisting of several pages, each with its own view or function. The Dashboard page displays namespace information, the cluster’s disk and RAM usage, and node membership, and read- and write-throughput for individual nodes. The Statistics page, as its name implies, displays overall statistics for nodes, namespaces, and secondary indexes. On the Definitions page, you can create secondary indexes within namespaces, or view information about and properties of UDFs defined in each namespace. Finally, the Jobs page is where you go to track the scans that are currently executing on the cluster.

Aerospike documentation

Aerospike’s documentation — all online — is generally adequate. I found plenty of pages of descriptive materials, with example code sprinkled throughout. However, parts of the documentation were confusing, and some sections appeared to be missing. For example, the documentation page for scans mentions both a “Manage Scans” section and a “Scan Developer Guide,” neither of which I was able to locate.

The Java client download (which is the language I chose to experiment with the database) is accompanied by a Swing-based application for exercising a variety of example database operations. It provides a good starting point for developing Java-based Aerospike applications. Of course, the download also provided the client *.jar files, which I was easily able to load into a Groovy test project.

Built for speed

While Aerospike’s documentation directed to RDBMS programmers is helpful, its support for a descriptive query language is underdeveloped. Although a command-line tool with support for such a language does exist — it is AQL, for Aerospike Query Language — it is still a work in progress. One certainly hopes that development continues in that direction, as it offers the promise of a unified language for working with Aerospike, without having to resort to specific language APIs.

Nevertheless, Aerospike has a good feel to it. The primary aspects of its implementation — the close coupling of RAM and SSDs — is a sensible step that some database product was bound to make sooner or later, given the declining cost and rising densities of both those technologies. And Aerospike has shown some remarkably good performance numbers. A recent posting by Intel suggests that a single Aerospike node can achieve 1 million transactions per second using SSD for the database store.

Aerospike is easy to install, and it provides near hands-free management and configuration. Its data architecture is easy to conceptualize, but not so simple as to be inflexible. Above all, it is designed to be fast. Whereas many of the distributed NoSQL databases are written in Java, Aerospike is written in carefully tuned C. If your database application needs guaranteed high-transaction throughput, you should take a look at Aerospike.