tl;dr: Network coding is awesome! Let’s build a better Internet!
If somehow you’ve reached this blog post, I’m assuming you want to learn more about network performance and are interested in elegant communication protocols. This particular post is focused on network coding, a groundbreaking mechanism that enables efficient and reliable communication over data networks.
Let’s get ready then!
Since the birth of the first communication networks (ARPANET, Internet, ALOHAnet, ...), network engineers have worked relentlessly to provide reliable, efficient, and fast communication networks. We have now access to gigabit ethernet links and WiFi/mobile connections are blazingly fast, reaching hundreds of megabits per second. At least in theory.
Even today, it’s not uncommon for us to experience the frustration of waiting for an app to fully load its content or having video calls with frozen screens and poor audio quality. And why does this happen? The answer is fairly simple. While the advances we’ve seen in the last decades are impressive, so is the growth of network usage, in particular over wireless networks.
Have you noticed how many devices are using your home or office network connection? Well, probably a lot...
And guess what? All of them are using a portion of your home bandwidth. All of them are also using a portion of your ISP’s bandwidth. Probably most of them are also using a wireless connection, meaning they will face the challenges of latency, jitter and packet loss. Given all of these facts, it’s not that surprising that we face many situations where network performance is far from ideal (or its theoretical promises).
To lay the ground for this post, let me first introduce the basic data flow over a communication network (feel free to skip to the next section if this seems too introductory).
When two nodes wish to communicate over a network path (say your typical client-server application), the requests (in the form of data packets) from your application are sent to your wireless router, which then sends them to your ISPs network, which on their turn sends them to a core network, which then forwards them to the appropriate server that should handle your request (I do apologize for the oversimplification).
At all these points, network components (routers and switches) operate in a store-and-forward way. They have a buffer which they use to store the received packets to then send them to the next hop in the data path. If too much bandwidth is being used, these buffers fill up, adding latency and packet loss to the connection (along with the intrinsic latency and packet of the communication path). If the client’s first hop is a wireless connection, the situation is even worse, due to its characteristics (added latency due to MAC contention and packet loss due to interference and signal propagation issues...).
So, by now it should be clear, if I was the only guy using the Internet, things would be great! (apart from being the only guy on Twitter which would probably suck). I would send/receive my packets at a nice rate, devices could store-and-forward them without any errors (as long as my sending/receiving rate would not surpass the underlying end-to-end bandwidth).
However the reality is very different from this - there is a lot of competition for bandwidth, which ends up creating undesired effects on the network such as packet loss and latency. Now, I’m sure you’re wondering, can we do better at scaling network communications? Turns out we can.
Network coding was firstly described at the turn of the century (funny coincidence huh?), by Ashlwede, Cai, Li and Yeung, in their revolutionary work about network information flows. What they showed was essentiallly that traditional store-and-forward routing strategies were not necessarily optimal from a capacity point of view.
This was easily illustrated by the (now) well-known butterfly network example.
Suppose we have a network as illustrated above. We have a source node (e.g. a server) that is trying to send information (two packets) to two destination nodes (e.g. two mobile devices) - this is a typical multicast example.
In a store-and-forward network, you would send packets 1 and 2 over possible different routes and expect the network to handle the delivery of both packets to each of the receiving nodes. We can see that packet 1 could reach the destination node 1 via route s → r1 → d1. Similarly, packet 2 could reach destination node 2 via route s → r2 → d2. However, for the destination node 1 to get packet 2, we would need to rely on node r3 to send it to the destination node. Likewise, to deliver packet 1 to destination node 2, we would rely on r3 to send it to the destination node. This makes node r3 a bottleneck.
In a coded network, the flow of information would be very similar. However, the data flowing through the network would be substantially different. What network coding proposes is that nodes in a network can mix packets together (e.g. creating a packet that contains the XoR of multiple packets) and send these packets to the next node. This next node could again mix the packets it receives and send them to the next node. The process repeats itself until packets have reached their destination.
What changed here? Well, some very relevant things. First, nodes in the network do not store-and-forward anymore but rather operate over the packets they receive, generating coded packets (the mixed packets). Second, the destination nodes do not receive the data packets as they were originally generated (they now receive coded packets!), so destination nodes must be able to “decode” these packets. How would they do that, you may ask? Let’s go for an example.
In algebraic formulations of network codes (e.g. linear network codes), coded packets are generated as linear combinations of packets (coded packets are of the form p =ɑ1 p1 + ɑ2 p2 + ... + ɑn pn , where the ɑ‘s are called coding coefficients, p’s are data packets and n is the total number of packets we’re mixing). Our goal is to recover packets p1, p2, ..., pn . So, what we need to do is to get enough packets, such that we can solve a system of linear equations. If you remember from your algebra classes, this amounts to receiving n linearly independent combinations of packets (spoiler alert: this means that the way you choose your coefficients is really important to achieve high efficiency).
So, let’s get back to our butterfly network. How would the information flow through the network? Well, exactly in the same way. But now, we allow the bottleneck node r3 to generate coded packets.
So instead of using two-time slots to send packets 1 and 2, node r3 can now combine these two packets together and send this linear combination to the next node. The node r4 sends this coded packet to destination nodes d1 and d2.
What happens at node d1? Well, it has received two packets: packet 1 via the route s → r1 → d1 and packet 3 (which contains packets 1 and 2 combined) via the route r3 → r4 → d1. Since d1 received two linearly independent packets, it can now decode packets 1 and 2 by solving the following system of equations:
So we save a time slot by allowing intermediate nodes to code these packets.
Hopefully, at this point, you’re convinced that network coding can effectively increase network throughput.
You may (understandably) be worried that this would only happen in multicast scenarios. No need to worry, in unicast scenarios network coding can also increase network throughput.
Consider the following scenario: source node s1 wants to send packet 1 to destination node d2 and source node s2 wants to send packet 1 to destination node d1. There is no direct route s1 → d2 nor s2 → d1. In a store-and-forward network, r3 would need to make two transmissions: packets 1 and 2 would be sent in two different time slots. However, if we allow r3 to encode packets, we could benefit from the direct link between s1 → d1 and s2 → d2 to send information that would allow r3 to send a single packet. If s1 sends packet 1 to d1 and s2 sends packet 2 to d2, then r3 may only send the combination of these packets onwards, as both destination nodes would be able to recover their packet from the coded packet and the other packet received from a source node.
So you can see here that the core word is actually networks.
Well, there’s a reason you feel this way. What network coding does is very similar to an error-correcting code. However, the properties of the network code are fairly different from your traditional error-correcting code.
Keep in mind that error-correcting codes are designed for point-to-point communications (check this video if the above does not ring a bell). They do not care at all about the network. Data is encoded at the source, replayed at intermediate nodes, and decoded at the destination. Parity bits are added to the original data stream to allow for recovering any bit flips or erasures (see the illustration below).
In network coding however, intermediate nodes operate over the packets, creating new coded packets from the packets they received. And since they can also receive coded packets, this means the re-encoding must be possible.
Consequently, the most significant difference is that network codes allow for re-encoding at intermediate nodes without the need to decode the received packets, which is not necessarily true for error-correcting codes. These constraints make code design rather distinct.
Don’t get me wrong, error-correcting codes are the ones that allow you to have a reliable gigabit connection (LDPC codes are used for instance in the WiFi IEEE 802.11ac standard). They simply are not tailored to solve network problems.
And this leads me to the second relevant point. When you design an error-correcting code, you target some packet loss rate that the code should be able to overcome. This is fine when you think about point to point (everything in the middle is a link with some packet loss probability). In complex networks (such as the Internet) this can be an issue since your links have very different packet loss characteristics. Your wireless hop can be very lossy, one of the links in your route can be suffering from bufferbloat and dropping a few packets, etc, … This makes parametrization of error-correcting codes hard to generalize. You can have low overhead and small error-correcting capabilities or you have higher overhead, recover everything, but suffer the efficiency penalty of sending more data than needed.
When thinking about the multiple links in a network, the underlying concept of error-correcting codes does just not fit. Your code can be very efficient over one link but very inefficient over the other (not even thinking here about the variability of a single wireless link when subject to multiple users).
On the other hand, the concept proposed by network coding abstracts from all of this. You just simply send packets and the receiver needs to collect enough packets on its end to recover the original data. So you don’t necessarily need to parametrize for a specific link error rate - although you do have to design your code to be efficient in terms of generating linearly independent combinations, i.e. you have to worry about making every transmission useful to the client, but that would take another blogpost.
So, bottom line, network codes serve a different purpose than error- correcting codes and should be designed based on different criteria. But one does not replace the other!
We’ve already talked about how network coding can increase network throughput. Are there any more benefits? I don’t know about you but I’ve always liked a free lunch! (PS: I’ll keep the next sections more high-level since, with the basics layed down, the topics we’ll discuss should be more clear).
Well, let us list some of the other benefits of network coding:
This is a fair question. My intuition is the following: in practice, network coding may be implemented at any layer of the OSI model above the network layer. However, the origins of network coding are precisely at the network layer. This means that routers and switches must implement some form of network coding (i.e. they should be able to operate over packets and mix them together). Aside from these developments, you must also deploy all these devices across the globe. You can see how replacing the whole internet could be a little bit unfeasible.
You're now thinking that you’ve been reading all of this for nothing… Let me hold you back on those thoughts. While it may take “some time” to see network coding widespread across the globe, the concepts of network coding can be developed at higher layers in particular at the application layer. You’ll see it in a minute.
Right! So far we’ve mostly discussed applications of network coding (multicast/broadcast/p2p) which represent a somewhat limited use case, but network coding can have a global impact. How, you may ask? It’s simple: we can use the network coding to develop new transport layer protocols, one of the core pieces of communication networks and used in every application that needs to access the Internet (we’ve written about this here, here or here in other scopes).
We’ve unveiled a little bit of how network coding can be used to perform congestion control and flow control in the previous sections. Let’s expand a little bit on that.
HTTP rings a bell to you, right? It is an application protocol for transmitting hypermedia documents and resources. Basically, the foundation of the web as we know it. Under the hood, HTTP uses TCP (transmission control protocol) to reliably exchange data between two endpoints. (side note: HTTP/3 will be using UDP as a transport protocol). However, the performance of TCP is severely impaired by the presence of packet loss and latency. The reason is two-fold:
Now, the first premise is not really valid in today’s world, specifically when speaking about WiFi/mobile. Packets get lost for many reasons! Interference, signal propagation issues, faulty hardware and so on. So, if you’re experiencing random packet losses why should you reduce your transmission rate? These losses do not necessarily mean that you are congesting the network…
The second point, we’ve discussed a little bit. Do we really need to wait for acknowledgments to be able to operate a reliable transport protocol? If you are sending untouched data yes, you need your packet sequencing and everything in between. But if you are using coding, not really… Any packet can be used to recover useful data.
So instead of reasoning about losses as signals of congestion, we may look at how much information is passing through the channel (goodput). This way, you’ll get a much better idea if you are actually congesting the network. And, in a network coding mindset, goodput is simply the amount of information your receiver is able to decode.
What network coding offers to the transport layer is the following:
So, in effect, network coding allows you to design a better transport layer protocol and consequently a better Internet for all!
(At the end you’ll find some links with performance numbers comparing ARQ and network coding based protocols, feel free to check them out!)
One image is worth a thousand words:
You can breathe now… Let me just give you some takeaways!
What we’ve talked about:
What we did not talk about:
You already have a lot of info to process, but let me leave you some nice links if you are interested in more network coding stuff...
Network coding applications:
Benchmarks and numbers:
Myths: