Reactionary Visions

Fri, 13 Mar 2015

Paxos notes

I've been doing a lot of reading on Paxos lately. There are many papers to read. Some of them are very good, some of them are less so. Paxos is not very complex - there are only four messages types in total in the classic Paxos case (though most implementations will end up with at least six), but it is quite subtle. Very few of the papers speak at all about why things need to be written to disk, or when, for example. The following are some notes of mine, which might be useful to others.

I'm not going to attempt to describe Paxos (though I may do so by accident). The most succinct description I've come across is the first paragraph of the Paxos Made Practical paper by Robbert van Renesse:

Paxos is a simple protocol that a group of machines in a distributed system can use to agree on a value proposed by a member of the group. If it terminates, the protocol reaches consensus even if the network was unreliable and multiple machines simultaneously tried to propose different values. The basic idea is that each proposal has a unique number. Higher numbered proposals override lower-numbered ones. However, a "proposer" machine must notify the group of its proposal number before proposing a particular value. If, after hearing from a majority of the group, the proposer learns one or more values from previous proposals, it must re-use the same value as the highest-numbered previous proposal. Otherwise, the proposer can select any value to propose.

Before going further, you probably need to have read the Paxos Made Simple paper and probably the Wikipedia page on Paxos too.

Far too many papers choose different terminology for the different phases of the protocol. Thus in the following I'm just going to call them phase 1 and phase 2. Phase 1 is the one where: (1a) a proposer invents a ballot number and sends that to a majority of the acceptors; (1b) each acceptor replies iff the ballot number from (1a) is greater than any ballot number it's previously seen in a (1a) message, and that reply contains the ballot number and value of any (2b) message. Phase 2 is the one where: (2a) a proposer asks the majority of acceptors to accept a value with a ballot number; (2b) each acceptor replies (and accepts the value) iff the ballot number is greater or equal to the maximum ballot number it has seen in a (1a) message.

These papers normally talk about being able to tolerate failure of up to F machines (actually, acceptors). For this to be possible, Paxos still requires the availability of a majority of the original set of acceptors. So that means a total of 2F + 1 acceptors (if you have 2F + 1 machines, then a majority of them is F+1 machines, hence F of them can still fail and you have access to a majority of the original set of machines). The importance of a majority is that if you randomly choose two sets of F+1 machines, there will be at least one machine in common (intersection is never empty). The whole point of ballot numbers (and the way they're constructed such that they can never collide) is so that at least one machine in any two majority sets of machines will be able to correctly order any two different ballot numbers. Thus the point of phase 1 is to figure out if your ballot is currently the maximum known ballot number to each acceptor, and if it is, what value you're allowed to present for phase 2.

In the original papers, acceptors only ever reply iff the ballot number from the proposer meets certain requirements. If it does not, the acceptors are silent and the proposer is meant to determine through some sort of timeout that their message has failed. However, there's no need to implement it like this - several systems have the acceptors actively send back nack messages to the proposer. Paxos will work perfectly well on unreliable communication channels, which means it's fine over UDP. However, UDP frequently doesn't work in the cloud due to cloud providers breaking PMTU discovery and not supporting fragmentation. In such an environment, Paxos will not violate its properties, but you might find nothing makes any progress. If none of that applies to you and so you use UDP then you may well need to implement the timeouts too, in case the nacks go missing (if you choose to use that nacks too) and you can't detect that loss. If you're using TCP then you might decide you can just rely on the nacks (application-layer nacks, not TCP), not bother with timeouts, and also watch for the TCP connection dropping. The argument against timeouts is that the machine with whom you're communicating might just be under heavy load. But then again, is that acceptable for the service you're trying to build?

Paxos ensures that once a majority of the acceptors have accepted a value (by accepted, I mean an acceptor has received a phase 2a message which meets the requirements such that it intends to reply with a 2b message), it is impossible to get a majority of the acceptors to accept a different value. This, in combination with the construction of ballot numbers means that if any two acceptors have accepted a value for the same ballot, it is the same value. An entire instance of Paxos creates consensus in perpetuity on one value only. Normally you want to create a stream of values, so you need to run lots of rounds. How you construct your stream is up to you, but simply name-spacing each instance by the event log ID works just fine.

If in phase 1 you established your ballot number is the greatest ballot number ever and then you were just allowed to pick any old value, then it should be clear that anyone else could come along later, pick an even greater ballot number, and change the accepted value. So this is why phase 1b includes the value and ballot number for the most recent 2b message the acceptor has sent. The proposer has to receive 2b messages from a majority of acceptors before it can make further progress. Now let's pretend that each acceptor actually sends back all the ballot numbers and values for all 2b messages its ever sent as part of this instance. The proposer now has the complete history of all values accepted from a majority of acceptors. These histories can be different for each acceptor, though as said above, where any two acceptors accepted a value for the same ballot, the value will be the same. You can therefore create a list of all the accepted values, with the ballot numbers as indices; there may well be gaps in this list. What should the proposer now do? All the proposer can do is to cause the acceptors to append to this imaginary list - the acceptors will only act on a greater-than-ever-before ballot number, which means appending to the end of our imaginary list. We want future proposers to be forced to continue our work rather than diverge. If, for example, we should force the proposer to send a 2a message with the earliest accepted value then that does not cause more acceptors to agree on what their earliest accepted value is. So the only other sane choice is to force the proposer to send a 2a message with the value of the greatest ballot number it knows of that's been accepted. This can't reduce the spread of this particular value: even if the proposer dies right here, the acceptors haven't lost anything. It can increase the spread of this value though by passing this value to acceptors who haven't previously accepted any value for this ballot number (and because we got a 1b reply from that acceptor, we know that our current ballot number is acceptable to that acceptor; from this point, we can realise that we're most likely to make progress if the majority we send our 2a message to is the same as the majority we sent our 1a message to).

Once a value has been accepted by a majority of acceptors, any further valid (by valid, I mean it does not get ignored, or cause a nack) 1a message from a proposer will guarantee the accepted value is returned in the 1b message and must be chosen again by the proposer in its next 2a message (by the definition of a majority, it is not possible for a different value to have been accepted at the same time by a different majority (even with a higher ballot number)). Once a value is accepted by a majority of acceptors, at least one member of that majority is guaranteed to be in any other majority, and will thus present its accepted value in any 1b messages it sends. Acceptors are considered single-threaded in their dealings with Paxos. So when an acceptor is dealing with a valid 2a message and sending a 2b response, it is not simultaneously processing 1a messages. Thus if an acceptor receives a valid 2a message and accepts that value, some other proposer may be sending phase 1a messages or even different 2a messages to this same acceptor, but they have to wait their turn. In both cases: if some 1a or 2a message arrives afterwards, they are ignored (or a nack sent) if their ballot number is too low, otherwise the 1a will elicit a response (1b) containing the newly accepted value. However, the 2a cannot be valid. This is because a proposer would only send a 2a if it got a 1b back, which implies its ballot number is the greatest. But this acceptor has just accepted a value, implying the accepted value's ballot number must be greater still. Thus in any interleaving involving multiple proposers and an acceptor, the only value acceptable will be with the ballot number of the most recently issued 1b message (or put another way, a 2a will only be accepted from proposer ρ if the previous valid message the acceptor received was a 1a from proposer ρ (other than the special case of the first ballot number where phase 1 isn't necessary - covered below)).

All this talk of majorities is often confusing in combination with failures. The simplest formulation is that a proposer should send the 1a and 2a messages to all acceptors, and can make progress when it receives 1b and 2b message responses (respectively) from a majority of them. This is the simplest way of dealing with the possibility of failures. However, it's frequently a good idea to optimise for the common case, which is when failures don't occur. Thus you can pick your majority of acceptors at the start (perhaps randomly, perhaps not) and communicate with just them, sending your 1a and 2a messages just to them, and waiting for all of them to reply. But what then happens if any of them fail? You're now not talking to a majority. The simplest choice here is to abandon the current ballot, pick a fresh majority (this is a majority of the original 2F+1 acceptors - don't reduce the size of your machines just because of a failure; you should only reduce the size of your machines when you know a machine is not going to come back - covered below), and start from phase 1a with an increased ballot number.

When a proposer receives 2b messages back from a majority of acceptors, it knows the value chosen is never going to change for this particular instance of Paxos. The proposer can then disseminate this information as it chooses (this is often called phase 3/learning phase). If you plan it so, you can have the acceptors send their 2b message to not-just-the-proposer: that way, several parties can learn at the same time that a value has been chosen, without needing the extra hop of going through the proposer. You'll need to deal with some learners dying, whilst others don't, and the need to re-synchronise what's been learnt. The re-synchronising will however be easy because you won't have any conflicts - Paxos guarantees us that. So it should just be adding everything that's been learnt together from all available sources.

The proposer only gets to propose its own value in a 2a message when there is no information returned in the 1b messages from a majority of acceptors. Thus if a proposer is trying to get a particular value added to an event log, it may have to take part in several instances of Paxos before it finds one where its able to get 2a messages to a majority of acceptors for the first ballot number of that instance. Note that in light of failures of acceptors it should not immediately skip to the next instance: it could be that it was able to get its value to some acceptors (albeit not a majority), and some other proposer just happened to pick some of those acceptors in its own majority, and succeeded in spreading that value to a majority. It should only move on to a new Paxos instance if it has learnt it has lost the current instance.

In this light, if the proposer fails after having sent some 2a messages, you have no knowledge as to what value is going to be agreed on by later ballots. If the proposer did manage to get 2a messages to the majority then yes, you have consensus. But if the 2a messages didn't reach a majority, then a different proposer can happen to choose a majority not including any of the previous acceptors, and get a totally different value accepted. Or it can choose a majority which has some acceptors in common with the original 2a messages, and thus complete the instance with the value as originally sent. So you can't assume anything about what will happen in an instance when a proposer dies.

If you happen to construct your system such that you know a particular instance will only ever be started by a particular proposer, then that proposer doesn't need to bother with phase 1 at all - it can start with a phase 2a message (you're guaranteeing there's no history of this instance of Paxos for the proposer to learn through phase 1). Of course, again in light of failures of acceptors it may need to revert to phase 1, but in the common case (no failures), this is a useful optimisation that can halve the number of messages.

The original Paxos papers talk about having the acceptors write their state to disk, though it's not really explained why. If you do have the acceptors write state to disk then it means they can be restarted and continue where they left off - they'll have to read their state off disk and send some more messages, perhaps redundantly, and so your proposers will have to make sure they can handle messages they receive (i.e. 1b and 2b messages) idempotently, but you've probably done that anyway. But for this to work also requires that the restart of the acceptors isn't detected as a failure by the proposers, which may mean you're using UDP rather than TCP, which means you've probably gone down the timeout route. All of this means performance is unlikely to be amazing: the extra fsyncs necessary is going to hurt, the timeouts may have to be fairly generous (and how on earth do you decide what amount of time is allowed for an acceptor to restart without considering that a failure?), and detecting actual failures is going to be more complex.

If you decide to use TCP instead, and you decided that a TCP connection dropping indicates a failure at the other end, then it means that an acceptor being restarted will be considered a failure of an acceptor. In which case, it doesn't matter if that acceptor loses its state. Consequently, the acceptors don't need to write anything to disk. To see this, consider the following: due to design, or some programming bug, you have always chosen the exact same F+1 as your majority of acceptors. They have never crashed, and so they contain all the information to date about every instance of Paxos you've run. The remaining F acceptors contain nothing. Suddenly there's a power failure, and F of those chosen F+1 acceptors die. But Paxos can handle the death of up to F acceptors, so you're still fine. But now you have just F+1 acceptors left, and those F+1 acceptors are your only choice, ongoing, for the majority of acceptors. Crucially, that includes the 1 surviving acceptor from the original majority that has all the state. So nothing has been lost, even without writing anything to disk.

Now yes, if you need to be able to power down the entire system and then resume where you left off then you're going to have to write to disk. But that may still be outside of Paxos rather than within. For example, if you're using Paxos to achieve consensus on some stream of commands then the client which is submitting the command doesn't return until: 1) Paxos has successfully added the command to the stream (i.e. the client, playing the role of proposer, has had 2b messages back from a majority of acceptors for an instance of Paxos in which the proposer was able to pick its own value (command) for the 2a message); 2) the command has been processed by some later step (command processor) and some result returned. Now if the acceptors are co-located with the command processor and you need to turn everything off then does it matter if the stream of commands is lost? The client that submitted the command is just going to get an error, even if its command is eventually processed some time later, so do we really care if that command is lost?

It depends. Presumably the state of the command processors is written to disk every time the state changes, and if you have several of these processors then they could run at different speeds. So it's then a matter of: how do you re-synchronise the state of these command processors? Some of the papers show that you could use Paxos for this, which is true, but then you may need Paxos to maintain quite a history and add other messages to the acceptors so that you can efficiently catch up. Or you could build this re-synchronisation through other means, outside of Paxos, and just keep Paxos for speedy fault-tolerant non-blocking consensus.

Update (13th March 2015): There's a case I missed here. Consider that paxos isn't writing to disk. Acceptors send out their 2b messages to learners. Only 1 learner receives at least F+1 2b messages before all the acceptors die. That 1 learner acts on the 2bs, proceeding in some way (which does involve writing to disk) before it too dies. At this point, the system is blocked because >F acceptors have died, and indeed the entire system is now turned off. Everyone now comes back up, except for the one node that received all the 2bs. Consequently, knowledge of this paxos instance is lost, but the system is operational as <F acceptors are dead. If that one node ever comes back, you have divergence. Even worse, that one node that acted on the 2bs may have done enough work to mean the result of that instance became visible to the outside world.

So how do you actually deal with intentional removal (or addition) of acceptors? One simple idea is that the set of identities of acceptors amounts to a topology, and this is versioned (ver n). So if you want to add or remove an acceptor then you calculate what the new topology is, give it an increased version (ver n+1), and now run a Paxos instance proposing achieving consensus on what ver n+1 of the topology really is (I think you want to do this Paxos instance with the new topology rather than the old). All instances of Paxos reference the topology version. In a Paxos instance, once a value has been accepted by the majority, you cannot change the topology for that instance. If you increased the number of acceptors then you could render the old majority so small that none of them are chosen in a later ballot, thus allowing the accepted value to be changed. If you decreased the number of acceptors then you could remove enough of the old majority such that a new majority from the new topology need not include any of the original majority, and again the accepted value could change. Thus the easiest is simply that any 1a message would have to carry the topology version, and once an acceptor has received a valid 1a message, the topology version for that Paxos instance is fixed. So then if an acceptor receives a 1a or 2a message for that instance which has the wrong topology number, it will issue a nack, indicating the correct topology number, hopefully forcing the proposer to discover the correct new topology. That does mean that if >F acceptors fail, a Paxos instance can just get stuck. This, you'll just have to deal with explicitly, having everyone detect this case and completely aborting the instance.

There are further complications here too. Consider the earlier case where the same majority of F+1 kept getting chosen, and then F of them died, leaving just 1 acceptor with all the information. If a topology change now happens, removing all the failed acceptors then all the information held by this special acceptor is certainly not held by a majority any more, and is in danger of being lost. So historical instances of Paxos must be made read only, and before the topology change is agreed, it may be essential to duplicate or otherwise make safe such data.

Hopefully the above is useful to some people, not just myself. I've struggled to find much information regarding the various approaches and tradeoffs when implementing Paxos. There are various papers such as Paxos Made Live which are certainly worth reading. But they don't seem to cover much of what I've tried to cover above. Such papers tend to record their concrete implementations rather than considering various different uses of Paxos and what the consequences are. Comments and corrections are very welcome - please let me know.

posted at: 21:30 | path: / | permanent link to this entry

Sat, 17 Jan 2015

Free speech values

In the aftermath of the attacks on Charlie Hebdo in Paris, there has been some high quality thinking and writing. There's also been some really stupid things said, from the usual protagonists. It's an interesting facilitation that the internet now provides: as I no longer watch any news on TV (in fact I don't watch any news at all), nor subscribe to any newspaper, I'm used to reading articles from a wide range of sources. Equally, it's much easier for me to avoid opinion I disagree (right-wing press) with or trivialised dumbed-down reporting (e.g. BBC news). Because of this ease of reading what you want to (in both the good and bad sense), I thought a lot of the reaction was measured and sensible. Turns out I was just unaware of most of the reaction going on.

Anyway there seems to be virtually nothing left to say on this, so this post is really little more than a bunch of links to pieces I thought were thoughtful and well written.

I am an agnostic. I personally don't believe in any religion but I accept I can't prove that every religion is false and so I may be wrong. I tend to treat the beliefs of any religion as arbitrary and abstract ideas. Thus one place to start is the acknowledgement that the laws of any country or civilisation are as arbitrary and ad-hoc as the rules or teachings of any religion. They are just things that people choose to believe in, or follow, or not violate. In the UK, some of our law is based in Christianity (e.g. thou shalt not murder - though I've no idea whether ideas like that actually predate Christianity; I wouldn't be surprised if they do) though other parts of Christianity are not encoded in law (adultery is not illegal, for example).

Many have careless labelled these attacks as an attack of free speech and thus that the reaction is about defending free speech. As such, much has been written about how it's possible to defend the right Charlie Hebdo has to publish anything they want, even if it's offensive, even whilst criticising their choice of content.

Freedom of speech is, I believe, essential for any sort of democracy to work. This doesn't suggest that if you have freedom of speech then you have a democracy (I don't believe in the UK, or arguably anywhere in Europe or north America there is functioning democracy; merely systemic corporate dictatorships disguised by corrupt elected faux-representatives), but without freedom of speech, you certainly don't have any ability to transparently hold people to account and thus corruption and abuse will obviously take hold. But freedom of speech is a choice, it's not something axiomatic to existence; it's something many people have chosen to attach huge importance to and as such will defend. But just because there are widely agreed reasons to defend the concept of freedom of speech doesn't mean that it's self-evidently a "better" idea than not having freedom of speech. To judge something as "better" than something else requires all manner of discussion of the state of human existence. Consequently, criticising people for not holding freedom of speech in the same regard as we claim to do is no different from criticising people for their choice of clothing, or housing, or diet, or career, or religious views.

Of course in the UK, we don't have freedom of speech as an absolute concept nor does most of Europe. In the UK, "incitement to racial hatred" was established as an offence by the provisions of §§ 17-29 of the Public Order Act 1986, and The Criminal Justice and Public Order Act 1994 made publication of material that incited racial hatred an arrestable offence. It's not difficult to find stories of people getting arrested for saying things on twitter, the easiest is of course making bomb threats. UKIP (an extremist right wing political party in the UK) even managed to get police to visit a man who'd posted a series of tweets fact-checking UKIP policies. Thankfully, he wasn't arrested.

The USA appears to hold freedom of speech as a much more inviolable concept. For example:

The ACLU vigorously defends the right of neo-Nazis to march through a community filled with Holocaust survivors in Skokie, Illinois, but does not join the march; they instead vocally condemn the targeted ideas as grotesque while defending the right to express them.

But whilst the outpouring in Paris and the crowds gathered as a statement of unity were warmly defiant, it is somewhat likely that rather more than physical violence that was being defied, and more than freedom of speech defended by the crowd. As Brian Klug wrote:

Here is a thought experiment: Suppose that while the demonstrators stood solemnly at Place de la Republique the other night, holding up their pens and wearing their “je suis charlie” badges, a man stepped out in front brandishing a water pistol and wearing a badge that said “je suis cherif” (the first name of one of the two brothers who gunned down the Charlie Hebdo staff). Suppose he was carrying a placard with a cartoon depicting the editor of the magazine lying in a pool of blood, saying, “Well I’ll be a son of a gun!” or “You’ve really blown me away!” or some such witticism. How would the crowd have reacted? Would they have laughed? Would they have applauded this gesture as quintessentially French? Would they have seen this lone individual as a hero, standing up for liberty and freedom of speech? Or would they have been profoundly offended? And infuriated. And then what? Perhaps many of them would have denounced the offender, screaming imprecations at him. Some might have thrown their pens at him. One or two individuals — two brothers perhaps — might have raced towards him and (cheered on by the crowd) attacked him with their fists, smashing his head against the ground. All in the name of freedom of expression. He would have been lucky to get away with his life.

It seems that some things you can say and governments will try and protect you. It would appear in much of Europe that blaspheming Islam is legally OK. But, as noted above, saying other things will get you arrested. The French "comedian" Dieudonné was arrested just 48 hours after the march through Paris on charges of "defending terrorism". Whilst not arrested, the UK Liberal Democrat MP David ward tweeted "Je suis Palestinian" during the Paris marches and criticised the presence of the Israeli prime minister, Netanyahu, subsequently eliciting a complaint from the Israeli ambassador to Britain. Of course the "world leaders" who gathered in Paris have a wonderful record themselves on "protecting" free speech.

Charlie Hebdo did not practise satire by mocking select, specific targets, nor did they mock all major religions equally. It's been widely reported that they sacked a cartoonist for making an anti-Semitic remark and that:

Jyllands-Posten, the Danish newspaper that published caricatures of the Prophet in 2005, reportedly rejected cartoons mocking Christ because they would "provoke an outcry" and proudly declared it would "in no circumstances ... publish Holocaust cartoons".

But of course it comes down to the content of the publication. In this case the cartoons exist to ridicule, make fun of and offend members of one of the world's largest religions, Islam, by mocking their prophet Mohammed. As Amanda Taub writes:

Dalia Mogahed, the Director of Research at the Institute for Social Policy and Understanding, explained that Mohammed is a beloved figure to Muslims, and "it is a human impulse to want to protect what's sacred to you".

Mogahed compared the cartoons to the issue of flag-burning in the United States, noting that a majority of Americans favour a constitutional amendment to ban flag-burning for similar reasons: the flag is an important symbol of a national identity, and many Americans see flag-burning as an attack on that identity, or even on the country itself. That's not extremism or backwardness; it's about protecting something you cherish.

In any large group of people, there will be the vast majority of sound mind and thought, and a small minority who are not. This is just the fact that all over the earth, humans are not all the same: there is some variance in health, intelligence, and every other aspect of what a human is. Any large sampling of humans will show the same set of variations. So if you offend a huge group of people, you will offend tall people and short people, rich people and poor people, fat people and thin people, violent people and peaceful people. Unsurprisingly, it would appear that the background of these killers suggests there is little to do with Islam there, and more to do with the their upbringing, family, education and integration with society.

Thus even if you feel Charlie Hebdo's publications of the cartoons served some purpose (given their biased choice of target, their purpose does not seem to be an exercise in itself of freedom of speech), it should be obvious that by offending so many people, they were placing themselves in danger. The same is true of any sustained, systemic, deliberate offence to any of this planet's major religions, races, nationalities or any other grouping of humans which share values. So it becomes a balancing act between how much do you believe in the message you're publishing versus the risk you're putting yourself in. You can view the actions of Edward Snowden in this same context: he felt that the message he was delivering on the abuses of surveillance power carried out by governments across the world outweighed the significant danger he was putting himself in, and so both delivered the message and accepted the need to flee from his country, probably never to return, in fear of the consequences of his actions.

Thankfully, throughout history, there have been people who have chosen to put themselves in the path of great harm (often losing their lives as a result) in order to report, expose, document and publicise matters which the wider public needed to know. Governments, monarchies and empires have crumbled when faced with popular revolt.

So freedom of speech requires consideration. It is perfectly reasonable not to say something because you anticipate you won't enjoy the consequences. Most of us do not conduct our lives by going around saying anything and everything we want to our friends and family: if we did, we'd rapidly lose a lot of friends. The expression "biting your tongue" exists for a reason. Equally, it's perfectly reasonable for a news outlet to decide not to re-publish the Charlie Hebdo cartoons if they fear a violent response that they suspect the local police forces cannot prevent; not to mention just not wanting to offend so many people.

I view as daft the idea that people should never choose not to publish something out of fear. People absolutely should choose not to publish, if they feel the risk to the things they hold dear is not outweighed by the message they're delivering. Everything in life is a trade-off and every action has consequences. Whilst I agree with the right to free speech, that does not imply saying anything you like is free of consequences. If it were, it would require that words have no meaning, and subsequently all communication is void: if anything you say has no consequence then you can say nothing.

I am certainly not suggesting the murders were in any way justified, or that Islam or any other religion or collection of humans should be beyond criticism or even ridicule. At the end of the day, no human is perfect, and as such we can all benefit from a thorough dose of criticism once in a while. Every article I've linked to in this post repeats that such violence, regardless of the provocation, is abhorrent, and I agree with that: murder is never an acceptable response to any drawing, written or spoken word. But that doesn't mean that these events weren't predictable.

Finally then we get to the insanely idiotic response from the UK government. That MI5 should have more powers that they don't need (they probably just need more money), and that we must deny terrorists "safe space" to communicate online. Which means banning encryption, which means it's impossible to use the internet for anyone. The home secretary, Theresa May said:

We are determined that as far as possible there should be no safe spaces for terrorists to communicate. I would have thought that that should be a principle ... that could have been held by everybody, across all parties in this House of Commons.

So of course, if terrorists can't communicate in private then no one can. Quickly, we've immediately gone from lazy labelling of events as an attack on free speech to a knee jerk response of "free speech yes, but you certainly can't have free speech in private, because you might be a terrorist". Again, it's a trade-off. I doubt that having such restrictions on communication will make the earth or this country safer for anyone and of course the impossibility of a controlled study means it cannot be proven one way or another. No security service is ever going to be flawless and from time to time very horrible things will continue to happen. I think most people are aware of this and accept this; we're all going to die after all. The loss of civil liberties though is certainly far more worrying to me.

In theory, I would think these proposals so lunatic as to never see the light of day (it would be completely impossible to enforce for one thing - terrorists along with everyone else would learn to use stenography to encode their messages in pictures of cats, thus rendering their traffic no different to that of everyone else). Sadly, Labour have stated they don't believe their position to be that far away from the Conservatives, which is deeply worrying. Labour don't exactly have a great record on this area either given their previous ID card schemes and the introduction of detention-without-charge. What is needed is some transparency. We need an informed debate, with MI5 and GCHQ providing some independently verifiable facts and figures that demonstrate how they are being thwarted in what they're trying to do. We need to understand properly what the risk to us is, and most importantly we need to understand why these threats exist and what else we can do to make them decrease.

I've never seen it said that any UK Government policy in the last 15 years has made the UK less of a target for such attacks. Maybe we should look at that before we start subjecting ourselves to Orwellian control.

posted at: 17:35 | path: / | permanent link to this entry

Sun, 19 Oct 2014

Anonymity in public life

In an article published in the Guardian yesterday, author Kathleen Hale recounts how her first book got some negative reviews by reviewers on a book review website. One reviewer in particular upset her and Kathleen ends up figuring out the reviewer is using a false identity, finds out who the reviewer really is and confronts her. The piece doesn't read to me like some sort of valedictory "I outed a fraud" type piece (though there are some passages in there which are questionable in that direction) and equally there are several passages where Kathleen expresses deep embarrassment and regret for the course of action she took. This episode, and that article in particular has caused substantial reaction: currently 600 comments on the Guardian article plus several other blog posts. There's no shortage of opinion to be found on Twitter either, as you'd expect.

The course of action that Kathleen took seems to be fairly undisputed as far as I can find. There is some dispute from some of the other blog posts as to exactly what was tweeted and said by whom, and there is dispute over Kathleen's claim that there are factual inaccuracies made in a review of her book. It is not disputed that the reviewer was using a false identity and that the reviewer had at least public Twitter, Facebook, and Instagram accounts under the false identity. The false identity was also a real name (Blythe Harris), by which I mean a name which if you introduced yourself by that name, no one would think you're using a false identity. This is distinct from claiming to be Peter Rabbit, or Buzz Lightyear.

Many people have equated Kathleen's actions with stalking. My dictionary defines the verb to stalk as:

  1. to follow or approach (game, prey, etc.) stealthily and quietly
  2. to pursue persistently and, sometimes, attack (a person with whom one is obsessed, often a celebrity)
  3. , 4,... [not relevant]

The second item there certainly fits. The British legal approach, whilst it gives no strict definition gives examples and guidance:

....following a person, watching or spying on them or forcing contact with the victim through any means, including social media.

The effect of such behaviour is to curtail a victim's freedom, leaving them feeling that they constantly have to be careful. In many cases, the conduct might appear innocent (if it were to be taken in isolation), but when carried out repeatedly so as to amount to a course of conduct, it may then cause significant alarm, harassment or distress to the victim.

I'm glad it includes "social media" there. Some comments have suggested that stalking "in real life" is worse than online. This seems bizarre to me: as if through a computer you are not interacting with other human beings but merely with shiny pixels who have no emotional capacity. "In real life" is everything we know. Whilst we're alive we have no personal experience of anything other than "in real life".

So I'm fairly sold on the whole argument that Kathleen's behaviour towards this reviewer can be considered stalking and as such is reprehensible.

To me, the far more interesting issue is the use of anonymity, false identities and any realistic expectation we have of privacy on the internet. A number of people who claim to write book reviews on such sites have suggested that the behaviour of Kathleen is exactly why they write their reviews under false names. I think there's something of a contradiction going on here.

But let's work backwards. Firstly, Kathleen, through some social engineering (she requested from the book review site the address of the reviewer so that she could post her a copy of the book) got the address of the book reviewer. She then used a telephone directory and census results to identify who really lived there (or likely owned the land). Now the use of the telephone directory seems a bit odd to me: telephony directories map names to numbers (and maybe addresses). Yes, you could use it to map an address to a name but it's very inefficient: you're essentially searching through the whole directory looking for the address whilst the directory is sorted by name, not address. So unless it was a very small telephone directory, I don't really buy that. Using census results is far more creditable: they're public documents and when they're online, they do allow you to search by address. In the UK you can only get access to the raw census details 100 years after the census has been published which, to a high probability, rules it out as a means to tie an address to a person who's still alive. You can get statistics and aggregates from more recent census results but you can't get the raw data. I'm assuming that in the US there's no such restriction on access to raw census data. If there is then I don't understand how Kathleen really managed to get a name for the owner of the property.

Instead, in the UK, if you want to find out who owns some land, you can pay the land registry £3 and they'll tell you. Presumably there are means by which you can legally hide this; I'm sure the rich have figured this out - probably some method by which some fake company in a tax haven technically "owns" the land and as they're registered abroad, they don't have to divulge any further details about that company. So yes, you could argue the Land Registry is profiting from facilitating stalkers, but equally there are a bunch of legitimate reasons to need access to such data and I can't think of any sane way to ensure the use of such a service isn't abused. So from that I conclude that unless the owner is a millionaire, the owner of any land is public knowledge.

The use of social engineering to get the address in the first place is more interesting but very obvious. This sort of thing happens a lot and sometimes to horrifying consequences (e.g. the Australian DJs who phoned up a hospital, pretending to be the Queen and Prince of Wales, enquiring as to the health of the Duchess of Cambridge. The nurse fell for the hoax and put the call through. Three days later, the nurse committed suicide). As a species we are not good at taking the time to verify who we're talking to or why. Whilst (hopefully) most of us would hang up if our bank apparently rang us and then asked for our credit card details "for security" this is largely only because it's in the bank's interest (in terms of cost of insurance) to reduce fraud, so they've trained us as such. But in all sorts of other scenarios we implicitly trust people we've no real reason to. A simple example: ticket inspectors on public transport. They may be wearing the uniform, but it could be faked. With their travel-card readers they could be seeing who has the expensive yearly travel cards, scanning the unique numbers from them and then using them to program up fraudulent cards. The crypto on those things is notoriously weak. Has anyone ever requested some means to verify the identity of a ticket inspector? And even if you could, how do you know they're not crooked regardless?

So phoning someone up, impersonating someone else, or pretending to have valid reasons to request the information you're requesting is always likely to work. It might be illegal in some cases, but it's certainly human nature to try to be helpful and if you're given a plausible justification, on what basis could you refuse the request unless it's contrary to some sort of company policy? In this case, if you're concerned about anonymity, wouldn't you be concerned about this possibility, and make use of an anonymous mail box?

Article 8 of the European Convention on Human Rights guarantees an individual's right to respect for privacy and family life, including correspondence. Is privacy the same as anonymity? No, definitely not:

In conflating anonymity and privacy, we have failed to see an important factual difference between them: under the condition of privacy, we have knowledge of a person’s identity, but not of an associated personal fact; whereas under the condition of anonymity, we have knowledge of a personal fact, but not of the associated person’s identity

The vast violations of our lives by state surveillance as revealed by Snowdon over the last year demonstrates the whole-scale collation of everything we do online and off by our governments. This is both being able to observe an action and identify the individual who caused it (thus we have no hope of performing any action anonymously), and being able to observe an individual and know the actions they take (thus no privacy). I can't work out whether the ECHR has anything to say on a right to anonymity; I get the sense that it doesn't try to protect that. So that's basically saying: "the state shouldn't record your every move (as that's an invasion of privacy), but moves that we're interested in, we can know who did them". Of course, we now know they're recording everything anyway.

We also know that computer systems can always be hacked into - there is no real security anywhere. Given a skilled and sufficiently funded adversary, any computer system connected in any way to the internet can be hacked into. Why? Because humans wrote the software that runs on those computers and humans are incapable of writing bug-free software. Look at all the large scale data breaches in recent history. Nothing is secure.

So we have laws that seem to try and protect privacy, but they're violated by our own governments, and in any case, we have countless examples of our inability to store any information securely. So is there really any hope to be able to exist with anonymity on the internet?

As ever, it depends who your adversary is. If your adversary is a government (either your own or some foreign government) then no, you have no hope. If it's a previous partner of yours who has no particular computer training, then yes, you're probably going to have a reasonable chance of being anonymous for a while. But you need to read up on this and think hard: it's not a trivial undertaking. There are some good guides as to how to do this, but:

All writers - whether writing under their own names or not - should be aware of the risks they may incur by hitting 'publish'.

What is the effect of hitting "publish"? It's to put more data points out there which may lead people to be able to identify you. The fewer data points out there, the better. So coming back to our book reviewer, if you want to review books anonymously, and if your justification for acting anonymously is to avoid being stalked by authors who don't like your reviews, then why put so many data points out there? Why have the Facebook page, the Instagram profile with the faked photos, the Twitter account? Why give your real postal address to the book review club knowing they're going to post books to it and might conceivably give your address out to other people?

The social media accounts in particular I find most odd. If you want to review books then review books. Build your following, your reputation and cachet on the quality of your reviews. If I'm looking at a book review I really don't care where you went on holiday, what your tweets are, or how many pets you have. Putting that information out there undermines your entire justification for being anonymous: if you want to be anonymous (i.e. you don't want people to find out who you are) then why are you putting so much unnecessary information out there that may allow people to figure out who you are?

Equally, use a name that clearly communicates to me you're trying to be anonymous: call yourself TheBookReviewer53, DostoyevskyLover or OrwellWasRight. Doing so doesn't lessen the validity of your opinions on your chosen subject and is more honest with people reading your reviews: it's overtly saying "I have reasons to want to exist anonymously on the internet". It reveals nothing more about your real identity either: regardless of the obvious fictitious-ness of your online persona, if you can be found, you can be found.

Researchers show that four data points about a person’s location can identify that person with 95% accuracy. FOUR. You think you can tweet anonymously from your phone? You think apps like Whisper allow you to act anonymously? As with pretty much everything related to the internet and computing, unless you've spent the last 20 years of your life working with computers, studying computers and thinking very hard about threat models and what data you're putting out there, and are utterly paranoid, you basically haven't got a chance. Do you turn off wifi on your phone when you leave the house? You should. You trust that USB pen drive you're transferring documents on? You shouldn't.

Finally and most obviously, any attempt at anonymity clearly doesn't insulate you from the law. As members of various hacking groups such as lulzsec found out, you always can be found out by law enforcement agencies. Yes, you might be able to make it difficult for a poorly funded person to come after you for libel (which is really just an indictment of the disgusting relationship between justice and money) but it's quite a risk to take. If you wouldn't put it in print with your real name attached, you're placing an awful lot of trust on your ability to maintain your anonymity against an adversary you probably don't know as well as you need to.

posted at: 16:00 | path: / | permanent link to this entry

Sun, 05 Oct 2014

Programming in the real world

One of the things that annoys me about Object Oriented Programming is how it's often suggested that it models the "real world". Frequently tutorials will start with creating an object modelling a chair, and through inheritance you'll be able to build up composable aspects of chairs: different numbers of legs, different colours, different designs. Sometimes they use tables rather than chairs. This is lovely, but it actually has everything to do with data modelling through inheritance, decomposition, abstraction and encapsulation, and almost nothing to do with Object Orientation: the key is that these chairs have no modifying methods on them. If they have any methods at all then they'll be for things like getting the number of legs or the colour, or volume or something - something that is basically fixed once the object is instantiated. At this point in such tutorials I'd probably claim this is not actually programming yet: all that's been achieved so far is that we've assigned some semantics to some numbers held in memory and we can write some numbers in memory. Programming is when we manipulate numbers: that involves reading and writing numbers.

The problem then is that Object Orientation immediately stops being about modelling the "real world" as soon as we can modify memory. If we think about how we actually would go about getting a chair made for us, it could go a bit like this:

  1. Go see your local carpenter,
  2. Have a discussion with them about the style and type of chair you'd like,
  3. They make the chair and give it to you in return for payment,
  4. You take the chair home,
  5. You decide you don't like the colour so you take it to your garage and repaint it yourself.

It should be clear that the inanimate object (the chair) is the odd one out here. Everything else is done by actors that have their own state, mainly act asynchronously, and can communicate with other actors through protocols - protocols that do not involve sharing mutable state (e.g. if I say something to you, that speech is immutable; you can't change what I've said (though you could choose to mishear me!)). At no point is any state of any actor actually exposed to another actor: I may share with you what I'm currently thinking, and you can try to influence me, but we don't exactly need a mutex around memory in my brain because YOU'RE NOT GETTING IN THERE!

If you tried modelling this sort of thing through Object Orientation without actors then you'd end up with your own thread doing all the work: it'd be you, it'd be the carpenter and it'd be the chair, maybe all at once. If your carpenter is in fact a growing business with a receptionist, a design team and a billing department your thread would be playing those roles too and would probably have to use locks to avoid unexpected interactions with other threads doing the same commissioning-receptioning-designing-constructing-delivery-repainting dance. And all the time, whilst you're doing the carpentry yourself, you'd could easily have your own thoughts, feelings, aspirations and regrets all on the same stack for your carpenter-alias to mess with.

Thus Object Orientation causes multiple personality disorder.

So in my view, the way Object Orientation gets introduced tends to be more like "useful tools for modelling data". But the OO approach to manipulating that data goes wrong as soon as you try to model the animated real world. Firstly it has nothing to say about separating out threads to self-contained actors (but try this in a language or on a platform without green-threads, or without the ability to preempt threads and you can quickly hit pain), and secondly even if you do have actors, OO encourages the sharing of mutable data rather than passing around either immutable data or copies of data. Yes, good programming discipline can result in sane designs and a successful result, but it's not a core aspect of the OOP mantra.

So, OOP has nothing good to say on manipulating data at all - it either says nothing or it encourages silly ideas like using locks. The data modelling bits are fine, but I think they're a broader concept beyond the confines of OOP. What else does OOP get you? An arbitrary restriction on the receiver of any method. That's about it. It's thanks to this restriction that writing combinators like cons on a list library in an OO language is really painful.

This week Erik Meijer wrote an article called The Curse of the Excluded Middle: "Mostly functional" programming does not work. After an introduction, we get onto The Problem, which (paraphrasing) is that languages that are mainly imperative but offer some features from pure functional languages are not as safe as pure functional languages.

The first three examples, from C# are certainly surprising to me (I barely know any C# at all though). The first two problems come from trying to compose side-effecting stuff with laziness. In the first case it's not clear that the problem is with the IO operation (printing things out) or actually with the laziness, but more the odd behaviour of the Where operator (presumably the implementation of Where doesn't know that a Cartesian product isn't necessary, but surely any normal monadic/list-comprehension implementation wouldn't have this problem?). The second case is certainly the terrifying composition of laziness with throwing exceptions and thus the exception having the potential to pop out anywhere where the lazy expression gets forced. However, if you know the Select operator is lazy, it's not really that surprising. It's arguably piss-poor language design that there's nothing there to help you, but C# doesn't have checked exceptions; apparently programmers don't like having to deal with errors so you reap what you sow. The third case is how C# has a nice using feature which binds a resource to a lexical scope. But if you construct a closure capturing the resource and then send it out of that lexical scope then using goes wrong (it will still discard the resource even though there's a reference to it within the closure which remains in-scope). This is certainly piss-poor language design: if the closure captures stuff from your lexical scope and you're not reference counting (or equivalent) your lexical scope then YOU'VE DONE IT WRONG. This is as bad as in C allocating stuff on your stack and then returning pointers to it.

Next he moves on somewhat tangentially to the point that if object creation is an observable action then you can't optimise it out. I'm not sure anyone outside a pure functional language specialist would ever want object creation to be optimised out, but the point is that if your constructor has side effects or can in any other way be observed then you can't have your language runtime do memoization of object creation. Doing side effects in object constructors has long been discouraged: I first read that back in the Effective Java book about a decade ago and I'm sure it wasn't exactly a ground-breaking piece of advice then.

So far then we have that side effects which are untracked have the potential to be bad: whether it's printing things out, or throwing exceptions, or discarding resources early, or preventing compiler optimisations. But next I feel the article goes a bit wrong. He first moves onto how channels in C⍵ can store state so they're not pure either, thus bad. And then goes onto how in Erlang you have the same problem as you're just modelling mutable state in actors:

Note how this Erlang actor basically encodes an object with dynamic method dispatch using the pattern-matching, message-sending, and recursion primitives of the language, which you may happily leverage to implement mutable references, sabotaging the fact that the Erlang language does not natively expose mutable state.

This is wrong: you cannot implement mutable references in Erlang. Data is immutable in Erlang so if you send some value out of an actor, you are sending that value. Not a reference to a value or variable. Even if you create a closure and send that out of the actor, the closure is capturing those values as they exist at that point in time. If you have received a value sent to you from an actor, you may use it to create other values, but doing so does not affect the "original", and similarly, the actor itself can continue to modify its own state, but it does not affect the values it sent to you. Yes, you can use Erlang actors to model objects. But an actor's own modifications of its state cannot be observed as side effects on values you've previously retrieved from that actor, and vice versa.

The reference you have to an actor is a process identifier (also immutable) which does not present any information itself about the state of the actor. Through that, you can send messages to an actor and test whether or not the actor is still alive, but that is all. And in any case, where has the sudden objection to mutable state come from? State is just a catamorphism on prior inputs. State is not the problem: unconstrained side effects are the problem. Certainly sharing mutable state is a problem (and you could argue that mutating shared state is a side effect and that it should be tracked statically), but Erlang does not allow for that.

He may have been better off going for an example of opening an file, sending the file handle to another process and then closing the file handle before it's been used (i.e. the same as the third C# example). Except:

  1. All file operations can return an error anyway so handling errors in such code is completely normal;
  2. In Erlang a file handle is normally an actor itself, so what you're doing is passing around a process identifier. Sending messages to a dead process (once the file is closed) is a normal activity and you can detect if the process has died in normal ways;
  3. If you bypass such normal file handling for performance reasons and open the file in "raw" mode then Erlang has a light form of object capabilities in which only the process that opened the file is allowed to use the file handle, so again the use of the file handle would error predictably;
  4. The language doesn't have the same C# feature for discarding resources once you return out of a lexical scope. Consequently closing a file is an explicit operation and given the asynchronous concurrent mindset one develops when working in Erlang, it's very likely you'll realise how odd it is to be closing a file handle whilst there's some closure out there which may not have been run yet.

Beyond this, he introduces the Haskell type system and explains that it captures side effects statically. As a result, by bowing to the demands of the type checker, it offers you a proof that if such effects occur, your program will handle them: exceptions will not go uncaught, IO operations are only permitted where the semantics lead to expected outcomes, resources are not used after they're discarded and the compiler can use all these proofs to do all manner of optimisations to your program.

These proofs can certainly be very valuable (though they are no substitute for disciplined, high quality design and careful implementation). Obviously, they don't capture everything though. Particularly relevant for concurrent and distributed programs, they don't capture sufficient side effects to allow for a proof of the absence of deadlocks. Haskell standard libraries contain channels and semaphores which can easily be used to sporadically end up with a deadlock between processes. A deadlock is definitely a side effect: the effect is the program probably stops working. The cause is an insufficient use of locks to control the scheduler (be it scheduling of OS threads or language runtime scheduling of green threads).

More broadly, the proof a type checker offers is that the specification you've provided (type signatures) is not violated by its inferences about your code. Until the type checker allows "and makes progress" as part of a specification, Haskell itself is no safer than any other language that claims to be "mostly functional".

posted at: 10:22 | path: / | permanent link to this entry

Sat, 27 Sep 2014

Reading highlights this week

I remember when I first went to University, coming across people who were both clearly extremely expert in their fields, from whom I wanted to learn, but were also religious, and how this utterly baffled me. At that point I would cheerfully describe myself as an avid atheist. My ignorance and naivety was somewhat extensive.

Over a decade later I like to think I have a more nuanced view. The most recent war in Gaza led, obviously, to vast amounts of suffering but some excellent articles on the subject (this one by Hadley Freeman in particular) helped me see perspectives more clearly and articulated how crucial it is to be precise with criticism: are you criticising a religion, a people, a government, a policy or something else? Nothing is ever black-and-white and it seems increasingly important to anticipate the consequences of an ill-thought-through comment or reaction. A good example of that is George Galloway's comments this week in the debate about this country once again getting involved in Iraq. On the face of it, and certainly without being remotely well-enough informed to evaluate the accuracy of his claims, if his claims on the size and makeup of ISIS/ISIL are true then there seems little likelihood that the bombing campaigns being discussed will be effective, and quite likely counter-productive. But all of that got lost due his description of Iraqis as quiescent. The way in which that description was seized upon by other MPs and the resultant media storm resulted in the over-shadowing not just of the rest of his contribution to the debate, but also of other important aspects of the debate, such as the resignation of Rushanara Ali (Labour's Shadow Minister for Education), citing once again the lack of a credible long-term plan for the region and our involvement.

Addressing the broader and somewhat more abstract issue is this enlightening article by Karen Armstrong. Again, I'm not claiming to be expert in the area, merely I found the article very educative. It had barely occurred to me that the western world's separation of the secular from the sacred was firstly such a recent occurrence, and secondly that it arose from a specific set of circumstances. There is no implicit reason why separation of state from church is an inevitable or even likely happenstance (to me, this reminds me of the question "if humans evolved from monkeys, then why can't we find monkeys still evolving into humans today?", to which the answer is "the circumstances are not right for that to occur"). The fact that the English word "religion" can't really be translated accurately into other languages (especially not languages that predate English such as Greek or Latin; as historically faith is all encompassing of life, not merely a private affair as we treat it today in the west) starts to show quite how odd the separation of secular from sacred in the modern west really is.

More interesting still is the observation that in the west, belonging to a Nation has in some ways subsumed the role of belonging to a Religion, only apparently with more positive overtones: we consider it almost reprehensible to die for your religion, but honourable to die for your nation. It would seem the concept of even belonging to a nation and having any sense of greater community outside your immediate surroundings only came about with the increased ability of governments to engage with (or intrude upon) their citizens. Before that point, presumably with church attendance widespread and frequent, one's interaction with "the wider world" was through the representative of the church. This would seem to explain a lot about why governments of the past sought the blessing of their nation's church for particular courses of action: maybe the church was seen as the bridge between the government (or monarchy) and the people. The whole article is worth a read.

posted at: 15:08 | path: / | permanent link to this entry