Another Go binding for LMDB

In the series on designing and building a distributed concurrent editor, I’ve been using bbolt as the embedded key-value store. That is inspired by LMDB, which is written in C, is extremely fast, is very widely used, and has been around for a fair while now.

LMDB makes read-write transactions fully serialised (literally, only one can happen at a time). When a read-only transaction starts, it works on a snapshot of the database that contains every committed read-write transaction up to the moment the read-only transaction started. The read-only transaction continues to see this snapshot even if other read-write transactions start and commit concurrently with the read-only transaction. These are very nice and easy to work with semantics: you never have to worry about seeing data from uncommitted transactions, you always see a consistent snapshot, and you don’t have to worry about weird interactions between concurrent read-write transactions because there are none.

Several years ago, I tweaked an existing Go binding for LMDB and extended it a bit. I recently went back and had a look at that code, and the normal thing happened when looking at your own code from more than 6 months ago: “Which idiot wrote this? This code is terrible. I would never write it like this today”. Etc. So, using the actors framework I wrote about last week, I decided to write a new Go binding to LMDB.

The most comprehensive existing Go binding I came across is Bryan Matsuo’s. However, even that is several years bit-rotten: the tests don’t pass without a bit of tweaking, and although he’s clearly gone to some trouble to minimise unnecessary copying of data between Go land and C land, on a close inspection, the C code does violate the Cgo contract about not storing any Go pointers in Go memory. I suspect the code was written before that detail of Cgo was formalised anyway.

All of the existing bindings that I could find (including my previous one), are low-level bindings: they expose the LMDB API fairly faithfully. That has pros and cons: you get a lot of flexibility, but you also have to rebuild a lot of the things that these days you might expect to get for free – particularly things you get for free with BBolt. So I wanted this new binding to be pretty high-level: scenarios where it’s both fairly obvious what the ideal behaviour is, and it’s achievable, the binding should take care of things for you. Especially when the result is an API and semantics that are somewhat more idiomatic and user-friendly.

With LMDB you get some similar stuff to BBolt:

  • keys and values are byte arrays.
  • normal get and put APIs.
  • keys are sorted lexically (though there are flags to change to that to numerically sorted keys).
  • you can use cursors to move about, based on the key sorting.
  • many concurrent readers are supported, but just one writer at a time.

But you also get some bits I wanted to hide:

  • if the database file gets full, you get an error. There’s an API to increase the size, but it’s up to you to call it.
  • the cursors API is a little weird and tersely documented.
  • once you start a read-write transaction, you need to stay in the same OS-thread, which in Go means using os.LockOSThread.

Additionally, commits of lots of small read-write transactions can demonstrate instantaneous performance slower than you might expect. This can be alleviated by batching together read-write transactions where possible, and thus amortising the cost of the commit.

An actor for the read-write side works quite nicely: that can take care of the batching, and auto-resizing, and the server-side Go-routine can lock itself to an OS thread. Read-only transactions can hang off the same struct for a clean API design, but they don’t talk to the actor at all other using a RW-mutex to make sure there’s not a resize going on and so it’s safe to start.

I’ve tried not to hide too much of LMDB, but there are a number of features I’ve decided to not support as they just don’t fit with the API design. I don’t particularly want to duplicate the LMDB documentation (except where it’s really necessary to improve it), so I frequently do refer back to the upstream API docs.

It’s all done, and it seems to work. I’ve written what I think is a fairly reasonable soak test: i.e. a test that runs for as long as you want, and stresses the API and concurrency. Nearly all of the rest of the API is covered by tests too.

I am aware that there is a degree of competitiveness between LMDB and other designs for embedded key-value stores: LMDB is B+tree based and uses mmap. Many other key-value stores use log-structured merge-trees. If you go searching, you’ll easily find some rivalry between these two camps, and evidence that performing benchmarks which are meaningful and make best use of all these designs is tricky, and so it’s easy to make conclusions from unsafe data. There’s also a very recent paper Are You Sure You Want to Use MMAP in Your Database Management System? which explains that once your dataset gets bigger than your RAM, relying on the OS to page data in and out (as is the case when using mmap) is almost certainly a bad idea. So, I’m not at all suggesting that LMDB (or BBolt) are the last word, or even the best option, in embedded key-value database designs. But, I’ve used them a fair amount over the years with success.