Let's build! A distributed, concurrent editor: Part 3 - Client

In this series:


In last week’s slightly mammoth instalment, I explored how difficult it is to come up with sane semantics for concurrent editing, particularly when you have to support undo and redo features. Concurrent modification by multiple users, and the fact that changes by those users don’t instantaneously update all clients and the server, cause different users (and the server) to see the same events in different orders. Making sure that everyone ends up with the same document in front of them is not trivial; there are trade-offs everywhere. By the end of the article, I had a fairly simple protocol defined, a general sketch of how communication is going to work between server and clients, and a reasonable understanding of the semantics I’m hoping to achieve. They’re not perfect by any means, but I hope they’re sensible enough to be understandable and usable without causing surprise and frustration. And I’m not sure perfect is at all possible.

This week, I want to implement the client. The client and the server will share some algorithms, but the client is in general simpler:

  • The client only talks to one server, but the server talks to many clients.
  • The server needs to persist the document to disk in some way. The client doesn’t have any such concerns.
  • The server needs to persist and understand the history of the document so that it can calculate the effect of undo/redo messages. The client has no memory of the history of the document.

I really dislike programming browsers. These days I don’t mind the language too much: it’s perfectly possible to be productive in JavaScript, or any language that compiles to it, though I do feel JavaScript has become far too much of a kitchen-sink language. But, I find using the APIs that browsers provide to be a very frustrating experience. There’s more than enough pointless rabid ranting on the internet these days and I see no value in adding to it. So for that reason I’m going to say very little about programming the browser, and focus only on the parts of the client that don’t really use browser APIs.

Overview

  • The client is going to maintain a WebSocket connection to the server. Whenever that drops, for whatever reason, the client needs to repeatedly try to reconnect, and when it does connect, it’ll need to resync by some means.

  • The user will type, and I need to understand which word the user is altering, or creating. These modified or new words will be sent to the server over this WebSocket. I will not wait for any sort of reply from the server to these updates.

  • The server will send, unprompted, updates to the client. These updates will contain changes to the document that other users are making (also, the updates that result from the user pressing undo and redo, as the server is the only one that knows the history of the document and is therefore responsible for calculating the effect of an undo or redo message).

Last week, I really focused on the conflicts that can happen at the server, when it receives updates from several clients. But these conflicts can happen at the client too, between updates the client receives from the server, and updates the client receives from the user typing. For example, it could receive an update from the server for a word which the user has already updated (and maybe even sent to the server). In part 2, I introduced the idea of a version for each word. This comes to my rescue here too: the client, when it encounters this sort of situation, can use the version to decide which version of the word is the winner. Critically, all clients and the server, will make the same decision, regardless of the order in which they receive the two versions of the same word. Other fun examples include that the client can delete a word completely, but some other user can update it, which can lead to the client receiving an update for a word which it thinks is deleted. And vice versa.

URL structure

The server is going to serve various bits of HTML, JavaScript and CSS. Those files will appear directly under / so that the client can be presented with the /index.html as easily as possible. There will be a WebSocket URL for each document: selecting which document is being edited is part of the URL of the WebSocket, rather than part of the protocol within the WebSocket. So if the user wants to change which document they’re editing, they disconnect from one WebSocket, and connect to another. The WebSockets will be available under /documents/$documentName.

WebSocketManager

The WebSocketManager is stateful. It tries to maintain a WebSocket connection, and if it closes for some reason then it will automatically try to reconnect. It needs to be a bit careful because there are races even here: it could be that the Alice tries to change the document she’s editing at the same time as the previous connection becomes established, and so I need to be careful to ensure that when I receive an event from a WebSocket, I only fire off my own callbacks if the WebSocket that just emitted the event is the current WebSocket.

export class WebSocketManager {
   private currentUrl: string = "";
   private sock?: WebSocket;

   onopen: (url: string, socket: WebSocket) => void = (url, sock) => { };
   onclose: (url: string) => void = (url) => { };
   ondata: (url: string, socket: WebSocket, msg: MessageEvent) => void = (url, sock, msg) => { };

   // other methods

   reconnect = (url: string): void => {
      if (url !== this.currentUrl) {
         return;
      }
      const sock = new WebSocket(url);
      this.sock = sock;

      sock.onclose = (event: Event) => {
         if (sock !== this.sock) {
            return;
         }
         this.sock = undefined;
         this.onclose(url);
         setTimeout(() => {
            if (this.sock !== undefined) {
               return;
            }
            this.reconnect(url);
         }, 1000);
      };

      sock.onmessage = (event: MessageEvent) => {
         if (sock !== this.sock) {
            sock.close();
            return;
         }
         this.ondata(url, sock, event);
      };

      // other sock event handlers
   }

   send = (msg: ArrayBufferView): void => {
      const sock = this.sock;
      if (sock && sock.readyState === WebSocket.OPEN) {
         sock.send(msg);
      }
   }
}

Whenever I get an event off sock I check to see whether sock === this.sock just in case the sock’s event has been buffered for some reason and the desired socket has changed in the mean time.

Also shown is how I do the re-connection: again, check to see if the current socket is the desired socket. If so, fire off my own callbacks, and then set a timer to attempt to reconnect in a second. And that also has to carry the current URL for the WebSocket so that if the user changes the document that’s being edited before the timer expires, I can detect that and make sure I don’t incorrectly connect to the old URL.

As already mentioned, client sending to the server is fire-and-forget: there’s no attempt to queue sends if the socket isn’t connected. Instead, when a connection is finally established, the server will send the entire document at its current version to the client, and the client will concurrently send the entire document as it knows it to the server. Because both client and server will use the same algorithms to ignore different versions of the same word, they will both synchronise and end up with the same document.

Document Model

Last week I decided that I was going to model the document as a linked list of words, at least in the protocol. I’ll do that in the client too. However, in the protocol, the pointers to the Next and Previous are encoded as the WordId of the next and previous word. In the client, it would nicer if I had real pointers to the actual words. This will make it a little easier to walk over the list of words.

The IWord interface that bebop generates is simple and useful, so I can wrap it lightly to create a Word class that I’ll use for modelling words:

import * as edist from './protocol';

export class Word {
   readonly document: Document;
   readonly wordId: edist.IWordId;

   version: edist.IVersion;
   letters: string = "";
   previous?: Word;
   next?: Word;

   isDirty: boolean = false;

   constructor(document: Document, iword?: edist.IWord) {
      this.document = document;
      if (iword) {
         this.wordId = iword.wordId;
         this.version = iword.version;
         this.letters = iword.letters;
      } else {
         this.wordId = document.nextWordId(this);
         this.version = { counter: 1n, clientId: document.clientId };
         this.isDirty = true;
         document.setDirty();
      }
      document.words.set(wordIdToStr(this.wordId), this);
   }

   serialize = (): edist.IWord => {
      return {
         wordId: this.wordId,
         version: { counter: this.version.counter, clientId: this.version.clientId },
         letters: this.letters,
         links: {
            previous: this.previous?.wordId,
            next: this.next?.wordId,
         },
      };
   }

   bump = (): void => {
      if (this.isDirty) {
         return;
      }
      this.isDirty = true;
      this.document.setDirty();
      versionBump(this.document.clientId, this.version);
   }
}

The constructor needs to cope with two different scenarios:

  1. The local user is creating this word. In this case, there will be no IWord from the WebSocket, and so the Word must be given a brand new wordId, an initial version, and it must be marked as dirty so that it gets sent up to the server in the next batch of updates.
  2. This word is not already known to the client, and has been received via the WebSocket. So I must copy the wordId, version and letters from the IWord. The next and previous fields are not set up by the constructor; they’ll get sorted out elsewhere.

The serialize method takes a snapshot of the Word, ready to be encoded to binary by bebop and sent over the WebSocket. It’s safer to take a snapshot of the Word; the alternative would be to use the Word object itself in some way. The danger though is that the state of the Word could change between the call to serialize and it being sent over the socket, which could mean that the wrong things get sent over the socket. Taking a snapshot guarantees that what is sent is exactly the state of the object at the time when serialize is called.

The bump method is called whenever the local user changes a word. It makes sure the word and the document are both marked as dirty so that they will soon be sent over the WebSocket. If the Word is transitioning from non-dirty to dirty, then the version gets bumped. I showed version-bumping code in Go last time. Here’s exactly the same thing in TypeScript:

function versionBump(clientId: bigint, version: edist.IVersion): void {
    version.counter += 1n;
    version.clientId = clientId;
}

In the discussion last week about Ordering Edits I wanted to make sure that changes that travel slowly over a bad network connection are understood to come from the past, and are ignored by the server, even though they may arrive at the server after more recently authored changes. The version achieves that, but it’s also important to only bump the version once as the Word transitions from non-dirty to dirty. As the discussion on Lamport Clocks said, bumping the version is really associated with the send event, and not the edit itself.

The Document

The Document class does all the rest of the work. The bulk of it is either receiving events from the browser and figuring out what changes those events cause to the document and its Words, or is rendering the document’s Words and updating the DOM. Both of these sections of code involve using browser APIs, which upset me, so I will say nothing further about them. There’s a little fun to be had when the user presses space and splits one word into two, or when the user presses delete and removes the space between two words. But it’s not mind-blowing stuff. That code also identifies when the user presses undo (ctl-z) or redo (shift-ctl-z) and sends off appropriate messages to the server.

As I’ve already talked a little about marking words as dirty, let’s finish that off:

import * as websocket from './websocket';

export type WordIdString = string;

function wordIdToStr(wordId: edist.IWordId): WordIdString {
    return ('' + wordId.clientId).padStart(10, '0')
        + ':'
        + ('' + wordId.counter).padStart(10, '0')
        + ':nid';
}

export class Document {
   readonly clientId: bigint;
   readonly websocketManager: websocket.WebSocketManager;
   readonly root: Word;
   readonly words: Map<WordIdString, Word>;
   private isDirty: boolean = false;

   setDirty = (): void => {
      if (this.isDirty) {
         return;
      }
      this.isDirty = true;
      setTimeout(this.flush, 100);
   }

   flush = (): void => {
      const words = this.dirty();
      if (words.length === 0) {
         return;
      }
      this.websocketManager.send(edist.Message.encode({ updates: words }));
      this.gc();
   }

   dirty = (): Array<edist.IWord> => {
      const words: Array<edist.IWord> = [];
      for (let word of this.words.values()) {
         if (word.isDirty) {
            words.push(word.serialize());
            word.isDirty = false;
         }
      }
      this.isDirty = false;
      return words;
   }

   gc = (): void => {
      const reachable = new Set<WordIdString>();
      const worklist: Word[] = [this.root];
      for (let cur = worklist.shift(); cur; cur = worklist.shift()) {
         reachable.add(wordIdToStr(cur.wordId));
         if (cur.next) {
            worklist.push(cur.next);
         }
      }
      this.words.forEach((word, wordIdStr) => {
         if (!reachable.has(wordIdStr)) {
            this.words.delete(wordIdStr);
         }
      });
   }

   markAllDirty = (): void => {
      for (let word of this.words.values()) {
         word.isDirty = true;
      }
      this.setDirty();
   }

   // etc

The Document keeps track of the root (or left-most) word; this is always the start of the document. It also has a Map<WordIdString, Word> which is very handy for being able to cope with words which may be detached from the linked list, but still need to be kept track of. JavaScript’s Map doesn’t support calling an Equals or HashCode method on keys (like Java does), and JavaScript doesn’t support structural equality like Go does. So I see no alternative than to have to convert the WordIds into strings so I can use them as keys in the Map. Bleugh.

I’ve decided to try to batch together updates that get sent to the server. This may prove to be a bad idea after user testing: for example it will mean that for a sequence of changes to a single word, the intermediate states of the word will be lost, and only the final state will be communicated to the server. The benefit could be lower load on the server, which will allow the server to scale to more users and more documents. The downside is that undo/redo will also jump over those intermediate states. This could be a problem for the user: a single press of undo could cause several changes to be undone at once. Time and testing will tell. So, when the document is marked as dirty, I set off a timer that calls flush 100ms later. All changes that happen within that 100ms will be part of the same batch of updates.

When that timer fires, I find all the words that are marked dirty, and I set their dirty flags back to false: I am not waiting for any acknowledgement from the server and I want to only send these updates to the server once. Once sent, I call the gc method, which does garbage collection. This ensures the Map mentioned previously only contains words which are still part of the document. I could call gc before finding the dirty words but the risk there is if a batch contains both an edit of a word and the deletion of that word (which really means updates to the next and previous pointers of the deleted-word’s neighbours), then gc first would cause the update of the deleted word to be lost as that word is detached from the document. It’s probably not a huge deal, but it seems better to me to make sure the final state of the word is recorded before it is deleted.

Finally, a few times now I’ve talked about (re-)syncing when the WebSocket connection is established. This is the purpose of markAllDirty: it just marks all the words as dirty (without bumping them), and makes sure the document itself is dirty which starts the timer which calls flush which makes sure all the local words get sent up to the server. At the same time, as soon as the connection is established, the server will be sending to the client the current document as the server understands it.

Receiving updates

So then the other part of the puzzle is how the client receives and processes updates from the server. There is just one method for this:

   applyUpdates = (updates: Array<edist.IWord>): void => {
      const filteredUpdates: Array<[edist.IWord, Word]> = [];
      updates.forEach((iword) => {
         const wordIdStr = wordIdToStr(iword.wordId);
         const word = this.words.get(wordIdStr);
         if (word === undefined) {
            filteredUpdates.push([iword, new Word(this, iword)]);
         } else if (versionIsGreaterThan(iword.version, word.version)) {
            filteredUpdates.push([iword, word]);
         }
      });

      filteredUpdates.forEach(([iword, word]: [edist.IWord, Word]) => {
         word.version = iword.version;
         word.letters = iword.letters;

         if (iword.links.previous) {
            word.previous = this.words.get(wordIdToStr(iword.links.previous));
         } else {
            word.previous = undefined;
         }

         if (iword.links.next) {
            word.next = this.words.get(wordIdToStr(iword.links.next));
         } else {
            word.next = undefined;
         }
      });
   }

The updates that come in from the server could contain words that the client doesn’t know about. So I use that words Map to identify whether or not each word is already known. If not, I create the new word and add it to a list of valid updates that need a 2nd round of processing. If the word is already known, I test to see if its version is greater than the local version; if it’s not, then I ignore the update. If all clients and the server behave the same in this regard, then it guarantees they will all arrive at the same document, regardless of the order in which they receive updates.

I need to process the valid updates in two passes because of wanting to get the previous and next fields sorted out. There is no guarantee made on the order of updates within a message from the server. So if there are several new words in that update which are neighbours of each other (for example, another user added a whole sentence), I can’t link together the words via the previous and next fields until I’ve first created all the words. By the time of the 2nd pass, all the words exist and will be in the words Map, I’m only dealing with valid updates, and so I can copy over the version and letters, and sort out the previous and next fields.

It’s entirely possible that due to local edits, a client can receive updates from the server that create new words, but those words are not part of the linked list at all. They’ll still go into the Document’s words Map, but they’ll be tidied up the next time gc runs.

Miscellaneous bits

That’s pretty much it. There’s a main file which is the entry point. That looks up various elements from the HTML and attaches event listeners, plus creating the WebSocketManager and getting everything booted and working.

I’m using parcel to compile the TypeScript to JavaScript and link and bundle it all together. I absolutely don’t know what I’m doing in this area, but I spent a couple of hours trying to get tsc, babeljs and webpack to work and gave up in frustration. The state of those tool chains seems absurd to me. Parcel appears to have some bugs, but it did at least work first time for me.

It should all be fairly usable by this point, so if you feel inclined, try it out.


Next week, I think I’ll cover how the server writes updates to disk and manages undo/redo messages and calculations. It’s rather more involved than you might at first think, but kinda elegant!