Sobering up on Blockchain & AI

Aug. 6, 2020

Cutting through the hype surrounding AI and blockchain. In this article we take a look at what machine learning really is, advances in deep learning, and we unpack blockchain technology.

For a few years now, we’ve been hearing that “blockchain” is going to revolutionize the world, and AI is either going to make our lives immeasurably better, or otherwise lead mankind beyond an event horizon where the future is indeterminable. The technological singularity. Moreover, if both of these technologies are such game changers, then surely combining them, must be something even more profound! Right? No. Slow down.

Let’s back up a bit, and take a look at what these technologies really are, and what problems they aim to solve. Hopefully, by the end of it, we’ll have a sober, more realistic view of these, with all their limitations, so that we can make some real progress in adding value, and dispense with some of the hype.

Let’s start with AI.

Artificial Intelligence has been around for a very long time. Decades. There are broadly speaking, two approaches: Expert Systems, and Machine Learning.

Expert Systems aim to capture human knowledge using large databases of hand crafted facts, such as London is a city in England, and Thames is a river in London, and A river is a body of water and A body of water is a geographical feature, and so on up the hierarchy until you hit something completely abstract like A geographical feature is a Concept. This database of facts — known as a knowledge base— supports an engine which can be queried to reason about those facts.

This inference engine can infer than if the Thames is a river in London, and London is a city in England, then therefore the Thames must be a river in England. You’ve probably heard of “the semantic web” and “ontologies” from the Web 2.0 era. That’s what that was all about; capturing human knowledge in a structured way, so that an inference engine could reason about it. Did it change the World? A little. Wikipedia structures their knowledge that way. Researchers use it. Google does too, behind the scenes.

Then there’s Machine Learning. Machine learning approaches AI from another angle: take an input, like a piece of text from an email, and a category label, like “spam” or “not-spam”, and teach the machine to tell which is which, by showing it a bunch of examples. This has been around for a long time too. Forty years of research lies behind some classic staple algorithms, like support vector machines. All this is also much less magical than it sounds. There is a plethora of different approaches, and almost all make heavy use mathematics (genetic programming algorithms being exceptions).

The trick lies mainly, in figuring out how to represent your input as numbers — such as mapping the text from an email to the probability of each word occurring given that the email is either spam or not-spam — and then learning which combination of those numbers can be used to map your input, to your categories, by using some function which rewards correct results, and punishes incorrect ones. This process produces something called a model which is typically just a data structure, which encodes, and stores internally, a bunch of those numbers (probabilities, weights, and so on), which worked best to get from the input, to the desired output.

I’ve described supervised learning for classification problems here. There’s also regression, where the output is a real number, and not a category, but many of the same principles apply. And there’s unsupervised and semi-supervised learning. The differences don’t matter that much.

More recently, however, some smart people figured out that it was practical to feed the output of one model into the input of another, and to stack these on top of each other. In particular, we’ve been stacking artificial neural networks.

The benefit of doing this is that it frees the practitioner of thinking too hard about how to extract and encode useful features (like detecting edges, and high contrast areas in images) before feeding the data to the model. The networks figure out these abstractions themselves. Thus Deep Learning was born. Everyone went nuts. There has been real innovation here, but if we’re honest, it’s mainly Moore’s Law which has helped us. That and every gamer’s need for faster graphics processors. Nvidia’s did some great things too. Okay, so it is all pretty cool. I’m not knocking it.

Deep learning has yielded impressive results in computer vision tasks… but… these models take ages to train. Months, in some cases. On specialized hardware, usually. Still, they’re very cool. If you’re Google. Or Facebook. For most of the industry, however, it’s still classic bread-and-butter algorithms all the way. Support vector machines are still widely used. Gradient boosted trees and random forest algorithms are popular. Why?

The answer is simple: if I can get an off-the-shelf implementation of a classical, well understood, algorithm which gives me 95% accuracy, which I can train in minutes or hours, using commodity hardware, do I really need something which will give me 98% accuracy after training it for 5 days on 8 high end GPUs? Because to get those few percent extra, compared to previous algorithms, the engineering cost is disproportionate, and in most cases the classics are good enough.

Okay, so that’s AI. More specifically, it’s narrow AI. Deep Learning has become a thing. It’s cool. It’s not practical for most problems. Years after deep learning appeared on the radar, most of us are still using other algorithms.

What about strong AI? IBM’s Watson playing Jeopardy against humans? Google’s bot which can call up a hair salon and make an appointment for you? While these systems may seem impressive, they’re no more magical. Watson is essentially a hybrid of expert systems and machine learning (they bootstrapped it with a copy of Wikipedia and a search engine). Watson, Siri and Alexa are good at language, but they’re still a long way from the Skynet super intelligence featured in the Terminator. There are even some strong arguments made for why it isn’t possible, even in principle, to build a consciousness on current computer architectures, no matter how fast they get.

Alright, let’s look at “blockchain”.

I quote “blockchain’ like that, because it’s really a misnomer. Not only is the data-structure — which is the only part deserving of the name — not the key innovation, the whole problem space is characterized more by the problem of consensus, than anything else, and the data structure solves only a part of this problem, namely: Recording the outcome once consensus has been reached, in a verifiable way.

The bigger question is, what problems does the World have, for which this technology is a solution? To answer that, though, we need to do some unpacking, because “blockchain” isn’t just one thing. There also isn’t a single thing you can point at, and say: “blockchain solves this, and nothing else can.”

We’ve had event log replication for a long time. We’ve had fault-tolerant networks, transactions, even transactional messaging, since at least the 1980's. We’ve had domain specific languages, distributed applications (remember CORBA from 20 years ago?), and consensus algorithms. Merkle trees, which form the basis of the blockchain data structure, were first patented in 1979. We’ve been doing all of these things for decades. So what’s all the fuss about?

You could argue, that as with most innovation, it’s the combination which adds value. More often, though, you’d hear about solving the “double spend problem”. This is the problem where information, and therefore data, doesn’t behave like physical objects: it duplicates itself when transmitted. If I give you a physical object, say a book, then I no longer have it. You do. It doesn’t just clone itself at zero cost. Information doesn’t behave like this.

So, how do I give you digital money, which is just data, in such a way, that I cannot also give it to someone else, thereby giving me an infinite supply, and destroying any value the currency might otherwise have?

Well, it’s not hard to do, if you have a centralized system, like the database at a bank. You subtract money from one account, and add it to another, and you do this in a single transaction. The jargon here is that these two operations — debiting one account and crediting another — are atomic. In other words, either both operations succeed together, or both must fail together, so that you cannot have a debit without a corresponding credit, and vise versa.

It gets more tricky if you want to do this in a decentralized way. Moreover, you need to deal with the fact that, because it’s money, there’s a strong incentive to try to cheat the system. Enter consensus.

The problem of getting computers on a network to agree on a value has been around for as long as there have been computers and networks. There has been intensive research in this area for a long time, and all of it occurs under the shadow of a famous problem: The Fischer-Lynch-Paterson (FLP) impossibility result.

In 1985, a seminal paper was published, called Impossibility of Distributed Consensus with One Faulty Process, by Michael Fischer, Nancy Lynch and Michael Paterson. As you can tell from the title, this presented a pretty serious problem. The authors proved that in an asynchronous network, consensus is impossible, if just a single node crashes. Game over? Well, not quite.

Consensus algorithms have been around for a while as well. The most famous being the Paxos algorithm, by Leslie Lamport. The idea is, that while it remains impossible to guarantee consensus is reached under all conditions, you can get pretty close in practice, if you’re willing to make a trade off. However, consensus algorithms require a lot of coordination. The whole problem boils down to causality tracking. In other words, deciding what happens before what. To understand why causality matters, consider that the event of money being deposited in my account must precede me spending that money. If every node keeps a ledger of all transactions, then they must all agree on exactly the same ordering of these events. Eventually.

Now, it may not be obvious, why the ordering gets out of whack in the first place. Can’t computers just send messages to each other in a sane order? Well, the issue lies in the fact that asynchronous networks, while they can guarantee ordering of messages sent from one node to another, make no guarantees as to what happens at the receiving side, if a node is receiving messages from two peers which are operating concurrently. There are no constraints imposed by the network on how messages arriving from two concurrent peers should be interleaved; there’s no overall causal ordering. That is the definition of concurrency.

To understand this in more detail, we need to first look at what it means for a network to be asynchronous. It is important to note, that the following are simplifying assumptions. That is to say, they provide a theoretical network model and framework in which to reason about the network. This simplifies things from a theoretical standpoint. The general requirements for a network to be asynchronous are as follows:

There are no known bounds for how long it may take for a node to process a message, once it has received one.
There are no known bounds for how much the clocks of two nodes on the network can drift or differ.
There are no known bounds for how long it can take a message to be transferred from one node to another, via the network links.

All this means that nodes can be arbitrarily unresponsive, when waiting for a reply from a message, may have their internal clocks set to some arbitrary time, and one node may send a message to a peer before another node does, but that message arrives after. Moreover, having any peer crash randomly at any time, really throws a spanner into the proverbial works, because you cannot even know whether it has processed a message or not, immediately before it went offline. To make matters worse, what if some of the nodes actually deliberately tried to subvert the system by sending arbitrary messages?

Under these conditions, and given the FLP result, it may seem hopeless for a receiving node trying to impose any sort of ordering over the messages it received from any number of concurrent peers. Fortunately, computer scientists aren’t so easily discouraged. So what’s the solution?

Firstly, the peers need to be aware of each other’s existence, so that they can coordinate. That’s the first prerequisite. We’ll call that the overlay network, because it’s a logical network built on top of lower-level networking primitives. It doesn’t include all the machines on the Internet, just the ones which are part of the distributed application. Like Bitcoin.

Then we need something like a logical clock. This may take the form of a numeric token which is passed around between the nodes which has the property that it can only increase, can be compared to recover ordering, and is independent of the wall clock time. Each node can reason about whether or not it is lagging behind when it receives a message from another, and messages are paired with this token. A numeric token isn’t the only way though. A blockchain (the data structure) has similar properties. It has ordering. It can only increase. This is only a building block for consensus, though. We need more. We need a majority to agree on the state of the World after any messages have been sent.

The key idea behind consensus, is that, for each pair of events occurring — transactions, say — if a majority of just a single node more, agrees that A happened before B for one pair of events, and a different majority of just a single node more, agrees that B happened before C for a subsequent pair, then there must be at least one node which was part of both majorities, and has the full and true picture of what actually happened.

Now if you’re super careful about issuing those tokens, and you allow for nodes to lose races and try again, and you’re okay with it being possible (though unlikely) for the network to live-lock and not make progress, then you can figure out which node(s) have the full picture, the others can replicate accordingly. Doing this, though, means lots of messages between nodes. They need to talk to each other. A lot.

All of this back-and-forth to reach consensus, no matter how it is implemented, adds a significant amount of overhead. It is an inescapable fact, that as the number of nodes grows, the number of edges potentially connecting them grows exponentially.

Consensus is the main bottleneck in these systems, and for serious industry adoption, even ignoring being resilient to adversarial nodes (so crash fault tolerance only), current blockchain solutions lack the low latency and high throughput to make them viable for most domains.

But do we really need consensus? Is the network truly asynchronous? As soon as you impose a time-out, then you’re adding bounds on the link latency, making the network at least partially synchronous. Clocks can be synchronized (via Network Time Protocol) to fairly high accuracy, leading to solutions requiring approximately synchronized clocks.

There are other solutions. Stock exchanges and air traffic control systems don’t run on a single machine, and don’t have much room for network errors taking down the system, either. And we’ve had those running for a good long while. These systems aim to simulate a synchronous network, by distinguishing between a node receiving a message, and that node delivering the message to the application running on it. The idea being that either all nodes see a message, or none do. Atomic broadcast. If you can guarantee that, then you don’t need consensus.

It’s even been shown that you can implement a crypto-currency, in a distributed setting, without consensus, provided you’re willing to impose the restriction of allowing only the owner of an account to be authorized to make modifications (as opposed to any anonymous agent).

So far we’ve seen that, although some hype has been surrounding deep learning of late, AI is mature, on solid footing, and this is reflected in the level of industry adoption for solving real problems. The hyped deep learning approaches are gaining some traction too, although they still have significant competition from the staples. Strong AI is some way off, if we ever see it at all.

Blockchain technology is a bit more shaky. Despite considerable investment appetite, and lots of surrounding hype, it’s still plagued by scalability issues, which run deeply into the tacit assumption that consensus is needed in the first place, which in turn, is based on the underlying assumption that the network is asynchronous. These assumptions ignore decades of practical research into building fault-tolerant, distributed systems.

A marriage of these two technologies, by itself, doesn’t lead to any solutions. An AI solution running on a blockchain, can be most likely be had with any other fault-tolerant replication log, since the algorithms are agnostic to the storage layer. Blockchain technology, on the other hand, has some serious fundamental issues, which AI isn’t going to fix.

Having said that, there is at least one project out there, which aims to bring federated learning to a blockchain, where the goal is to create a marketplace for training machine learning models, without distributing data to 3rd parties. They’re tackling the fundamental problem of data privacy, though. A laudable goal.

Return to blog