"You shouldn't write systems that violate laws of physics"

The late Joe Armstrong, one of Erlang's inventors, gave a talk in 2014, called "The Mess We're In". It's a physics-driven look at the failures of modern computing, and it's worth watching in its entirety, but I especially enjoyed this bit (23:49), approximately transcribed here:

I'm an ex-physicist, and I'll just remind you of some laws of physics. Causality. A cause must always precede its event. Something happens, and something happens later because of it. And how things happens are by propagating rays of light or sound or something. We don't know something's happened until we get a ray of light, or something else conveying information to us.

Now, a lot of systems actually break the laws of physics.

This notion that you can have consistent data in two different places; it's the Byzantine general problem. Suppose I'm one of two computers, and I tell the other "Hello, the value of X is five"; can I assume that it knows that the value of X is five? Well no, because I don't know if that message got there. So the other computer can send me a message back, "I know that X is five". Can this computer now assume that I know that it knows the value of X is five? Well, no, because it doesn't know that that message has got there.

So it won't know that that message has arrived unless I send it a message that its has arrived, and so on. That's the Byzantine generals problem.

So, you can't actually replicate data because you have different knowledge of the system at different points. Despite this fact, we build systems with two-phase commit and forget about that. You've got two-phase commit? "Yeah, it works!" Well, two-phase commit breaks the laws of physics. So it's not very good...

...you shouldn't write systems that violate laws of physics.

Of course, I'm familiar with the Byzantine generals problem; in short, agreement between separate systems is (really) hard, and it's impossible to send a message over a network and know with certainty that it was received or lost. It's a challenge that any full-stack developer will encounter with some regularity, and there are plenty of mitigation tactics (frequent polling, timeouts, consensus mechanisms, etc.). Although it's an old problem, Armstrong's perspective was new to me:

The Byzantine generals problem isn't a logistics problem, it's an attempt to violate causality.

The key insight is that any attempt to synchronize state between two systems is, by definition, a futile challenge against the essential forces of the universe. Communicating state with absolute certainty is logistically impossible, and duplicating state with absolute certainty is physically impossible. Any distributed system, whether that be a database, Bluetooth-controlled wearable, or web application with a backend will always be fighting against the same challenge: coordination and consensus that requires agreement on the timing and content of communication are limited by the nature of both timing and communication. Perhaps knowing the inherent difficulty of the problem will be a comfort the next time you encounter such a frustration.

More than just a new perspective, Armstrong also provides a call to action. "You shouldn't write systems that violate laws of physics." That certainly seems like a reasonable thing to say, doesn't it? After all, the laws of nature do tend to prevail when opposed. And yet, state synchronization is a critical element of our world. Stock markets and MMOs both serve as evidence that low-latency, low-failure, high-volume distributed systems are possible, and indeed, economically valuable. Consistency across databases, despite being impossible, serves as the keystone of most low-latency intercontinental networks (like the DNS server that brought you to this page!).

Instantaneous replication of state is impossible, but there's great value in getting as close to that limit as possible. I believe the principal takeaway from this realization is that when designing distributed systems, fault tolerance should be the primary feature. Anything more may be unachievable, but anything less would be an assumption that your code can violate causality.

← Read my other thoughts | Written 2022-07-12 | License