Let's build! A distributed, concurrent editor: Part 6 - Testing

In this series:


So far, I’ve defined some semantics for this distributed, concurrent editor. I’ve defined a protocol, built a really basic browser-based client in TypeScript, and a server in Go. But I’ve not written a single test. This must be rectified: some of the algorithms are not trivial or obviously correct, particularly those that deal with calculating the document state when reading from disk.

I’ve built the server using actors, and each document is managed by its own actor. The actor provides a nice API to test against. All it depends on is the disk-store, for which I’m using bbolt. Because this is an embedded key-value store, using it in tests is very easy: just create a fresh new database file, use it, and make sure you delete it at the end. No need for horrendous choreography to spin up fresh PostgreSQL or MySQL databases, no maddening complexity of trying to rendezvous on some fresh network port, and no need for mocks. Testing nirvana.

I’ve written a little documentTester struct. This has a few fields and methods to help support the tests: the ability to create a fresh database, spawn an actor manager, spawn a document, and shut everything down. With that in place, I can start writing some tests that verify an empty document is indeed empty.

func TestListenEmptyDocument(t *testing.T) {
   documentTester := newDocumentTester(t, seed, plainManagerSpawnFun)
   defer documentTester.close()
   documentTester.spawnDocument(documentTester.randomWord(false))

   listener := documentTester.newListeningClient()
   str, ok := listener.nextDocumentRendering()
   documentTester.is.True(ok)
   documentTester.is.Equal(str, "") // it's a new document so it must be empty

   documentTester.closeDocument()
   _, ok = listener.nextDocumentRendering()
   documentTester.is.True(!ok) // listener should have observed death of document
}

This checks that I can create a new document, I can subscribe to it (listing to changes to the document), and as soon as I subscribe I should be given the current state of the document, which should be the empty string. It also checks that once the document actor has been terminated, the subscription should be cancelled. This is all machinery that the WebSocket code relies on: as soon as a browser-client connects, I ensure the correct document actor is running, and subscribe to changes to the document. That subscription should immediately be sent the current state of the document so it can be sent down to the browser.

I’ve chosen to use Mat Ryer’s is test assertions library, rather than the more normal (I think) testify. So that’s why some of the assertions look a little different if you’re used to testify. Mainly, I wanted to try something different. I don’t think there’s anything too wrong with testify, but its API is bonkersly huge.

As this test shows, the documentTester has a newListeningClient method that allows me to subscribe to the document actor and listen for updates. That listener has its own, independent, representation of the document which is based purely on the updates it has received from the document actor. I can test that this document, constructed from updates received from the actor, matches the document that the test thinks should have been constructed.

The documentTester also has a newMutatingClient method which allows me to create a document and mutate it in the test code, and then send those mutations up to the document actor. This mutating client models a document as a list of words, just like the browser-client and like the server. Its API allows for the document to be modified: existing words can be edited or deleted, new words can be added, and all such changes can be sent to the document actor. I can use these methods to make basic changes to the document and check the listener receives updates that result in the expected document; the document can be rendered to a simple string by joining together all of its words. These strings can be tested for equality.

func TestSingleMutationToRoot(t *testing.T) {
   documentTester := newDocumentTester(t, seed, plainManagerSpawnFun)
   defer documentTester.close()
   documentTester.spawnDocument(documentTester.randomWord(false))

   mutator := documentTester.newMutatingClientWithEmptyRoot()
   mutator.unsentGeneration.editExistingWord()
   expected := mutator.unsentGeneration.String()
   mutator.unsentGeneration.send()

   listener := documentTester.newListeningClient()
   got, ok := listener.nextDocumentRendering()
   documentTester.is.True(ok)
   documentTester.log.Debug().Str("expected", expected).Str("received", got).Send()
   documentTester.is.Equal(got, expected) // we created the listener AFTER sending the mutation. So we should definitely observe the mutation

   documentTester.closeDocument()
   _, ok = listener.nextDocumentRendering()
   documentTester.is.True(!ok) // listener should have observed death of document
}

What is this unsentGeneration thing and this generation type? Back in episode 2 where I work on the client-server protocol, and again in episode 4 where I decide what I want to write to disk, I discuss different ways of modelling how the document evolves over time. In this test-code, I take the view it’s best to try to keep the code simple and obvious at the expense of efficiency. A generation is a snapshot of the entire document, and I maintain a list of generations which show how the document evolved. I can use this to look back at the previous generation and see what I expect the document to be if I send an undo message to the document actor. Equivalently for redo. The unsent generation is a copy of the current generation but potentially with modifications that are yet to be sent to the document actor. Once they get sent, the unsent generation gets moved appropriately into the list of generations and a new unsent generation gets created based on what’s just been sent.

I can now write tests that check that when I send undo and redo messages, the document formed from the updates the listener receives matches the document the mutating client thinks it’s created:

func TestUndoAndOverwrite(t *testing.T) {
   documentTester := newDocumentTester(t, seed, plainManagerSpawnFun)
   defer documentTester.close()
   documentTester.spawnDocument(documentTester.randomWord(false))

   mutator := documentTester.newMutatingClientWithEmptyRoot()
   // mutate the root word twice
   mutator.unsentGeneration.editExistingWord()
   expected := mutator.unsentGeneration.String()
   mutator.unsentGeneration.send()
   mutator.unsentGeneration.editExistingWord()
   mutator.unsentGeneration.send()
   // undo twice, and redo the first change.
   mutator.sendUndo()
   mutator.sendUndo()
   mutator.sendRedo()
   // mutate again, then undo it
   mutator.unsentGeneration.editExistingWord()
   mutator.unsentGeneration.send()
   mutator.sendUndo()

   listener := documentTester.newListeningClient()
   got, ok := listener.nextDocumentRendering()
   documentTester.is.True(ok)
   documentTester.log.Debug().Str("expected", expected).Str("received", got).Send()
   documentTester.is.Equal(got, expected)
}

Randomised sequences of events

One of the “fun” challenges of programs that deal with event streams, is that bugs often result from a particular order of events. Some sequence of events, which had been overlooked when designing the system, causes the system to misbehave. If I only write bog standard unit tests, I can end up merely verifying the effect of a single event at a time, or really short sequences of events, as demonstrated above. Whilst each individual event could be correctly handled (at least as far as the limited unit test is concerned), a longer sequence of events could be incorrectly handled. Ideally, for any set of events, you want to test every permutation. But that can become tricky, because the number of permutations of a set of items is the factorial of the number of items. 10 events leads to 3.6 million permutations, which could take a while to test.

What I tend to do in these situations is to generate a stream of random events. Yes, given enough time, it will generate every possible permutation of events; though extremely inefficiently. But I think what’s more important is that it doesn’t spend hours exploring a large number of permutations which all start with a certain sequence of events, that guarantee no error will occur. It’s quite inviting to construct the permutations depth-first, but that means you then have to go through a lot of the permutations before the first items change. You could generate them all, and then shuffle them randomly before “running” any of them, but you can spend a lot of time (and memory) generating all those permutations. If you want to, you can structure the test code in such a way as to make it easy to switch between a randomly-generated stream of events, and a permutation.

It’s important that the tests are still repeatable and deterministic, even when randomly generating events. For this, I create a new random number generator, and seed it with either a value given on the command line, or with the current Unix time, and I make sure I print it to the log. Then, if an error does occur, I can re-run the test with the exact same seed, and the random number generator will generate the same sequence of numbers, which will lead to the same sequence of events, and the bug should continue to manifest itself until it’s fixed. I’ve already been using this random number generator in the tests covered so far. For example the editExistingWord method uses the random number generator to pick which word in the document to edit, and what to change it to. The behaviour of the system should be the same no matter which word is edited or what that word is changed to. I believe tests that are precise about what properties they are testing for, and what properties they are agnostic about, lead to a more robust system: I would consider these tests less useful if they’d hard-coded which word is being edited or what the word is being changed to.

For this style of longer-running randomised tests (I tend to call these soak tests), I think it’s preferable to validate the state of the system after every event, or as close to that as possible. That way, if an error does crop up, I should be able to find it quickly and not have to wade back through hundreds of events wondering which one caused the problem.

Given the machinery already build, my soak test isn’t too long. The guts of it is this:

func (self *soaker) performRandomAction(log zerolog.Logger) {
   n := self.rng.Intn(10)
   switch {
   case n == 0:
      log.Trace().Msg("restarting document")

      self.closeDocument()

      self.spawnDocument(self.documentName)
      self.mutator.documentClient = self.documentClientFactory.NewClient()
      self.listener = self.newListeningClient()
      self.validateExpected()
      return

   case n <= 2 && self.mutator.canUndo():
      log.Trace().Msg("undo")
      self.mutator.sendUndo()
      self.validateExpected()
      return

   case n <= 4 && self.mutator.canRedo():
      log.Trace().Msg("redo")
      self.mutator.sendRedo()
      self.validateExpected()
      return

   default:
      log.Trace().Msg("mutating words")
      for mutations := self.rng.Intn(5); mutations >= 0; mutations-- {
         action := self.rng.Intn(5)
         switch {
         case action == 0:
            self.mutator.unsentGeneration.addWord()
         case action == 1 && len(self.mutator.unsentGeneration.orderedWords) > 1:
            self.mutator.unsentGeneration.deleteWord()
         default:
            self.mutator.unsentGeneration.editExistingWord()
         }
      }
      self.mutator.unsentGeneration.send()
      self.validateExpected()
      return
   }
}

I do have to hard-code the probability of each event type. I could either try to make weird and bizarre sequences of events more likely (which might expose more corner cases), or I could try to make the sequence of events be kinda similar to what a user might do. Here I think I’ve gone for the latter: about 50% of the time the document’s going to get edited – a few changes to a few words. Abort 20% of the time the event will be an undo, and 20% of the time it’ll be a redo. And the remaining 10% of the time will terminate the document actor, spawn a new one (which will read the list of events back in from disk), and create a new listener. After each event, I validate that the document as the mutator believes it is, matches with the document as the listener believes it is.

Because every actor has a single mailbox (a single linear queue of messages), it doesn’t matter if these events are coming from one client or several: it’ll make no difference to the document actor as all the events get added to the same mailbox. The mutating client does though make sure that when it sends an update, the updated words all carry a version number that’s high enough to guarantee it’ll take effect. I probably should extend this test with the ability to send updates with lower version numbers, to simulate several users editing the same document at the same time and occasionally colliding when editing the same word.

Running all of these tests with code coverage turned on shows I’m hitting around 90% of document.go, db.go, frame.go, and wordmanager.go, which seems quite good. I’m not testing any part of the document registry though, so at some point I should add some tests for that. Code coverage metrics are a minimal measurement: it shows me which bits of my code I should definitely work on adding tests for; but even 100% code coverage is far from sufficient, which is kinda the point of this whole thing: bugs can manifest from the order of events; you could get to 100% code coverage just by writing unit tests and never uncover such bugs.


I do want to adapt this soak test so that it can be used with fuzz testing: my plan is either for the fuzzer to provide the seed for the random number generator, or for the fuzzer to provide a slice of bytes, and the test grabs numbers from that slice of bytes in place of grabbing numbers from the random number generator. However, Go’s support for fuzz testing is arriving in version 1.18, which is expected at some point in February 2022 - a month away. The fuzzer is meant to be coverage-guided: i.e. it’s meant to try to vary the test input to provoke more code to be run. Whether that’s just “does this line of code get run?” or “is this particular path through the control-flow-graph run?” I don’t yet know. The latter would be much more powerful, because that would enable it to explore different sequences of events. But until Go 1.18 is released, I think I’ll have to put this series on hold.

Last week, when I wrote about my binding to LMDB, I mentioned there’s a soak test in there too. It doesn’t benefit from quite the same level of machinery and framework as I’ve built here, but the principle is the same.