This post is about availability. More specifically, Denial-of-Service protection. In a nutshell, as a network security guy working at Codavel for faster mobile content delivery, I will demystify the idea that DoS attacks are UDP’s fault, even though most of the people think otherwise.
Availability is a top concern of any popular Internet service. As such, deploying networking systems that provide robustness to failure is crucial. While it is fairly easy to tackle problems related to hardware failure (mostly through redundancy mechanisms), ensuring the same robustness level when it comes to security exploits is a much more difficult task.
As ensuring the security of a networking system generally involves many aspects and small hidden details, it is often the case that something is overlooked, resulting on vulnerabilities that can be used to reduce availability. Think about poorly configured services or protocols that are used for purposes they were not intended in the first place. This was precisely what led to the largest ever DDoS attack publicly disclosed, targeted at GitHub early this year, a popular service used by millions of developers for online code management.
The recent years presented us with new security challenges that arise from the natural evolution of communication infrastructures, where availability assumes critical importance - especially in the domain of content delivery.
As the Internet connects directly with the physical world and evolves into a larger ecosystem of people, devices, and networks, Internet users are developing a culture of high expectations. They take for granted that their data is always accessible and ready for use, no matter what device or network they use - and that they can access it fast. If that is not the case, users will simply run away and choose alternative services available in the market.
A Carnegie Mellon University study states that more than 73% of the top 100,000 popular Internet services, classified by Alexa Stats, “are vulnerable to reduction in availability due to potential attacks on third-party DNS, CDN, CA services that they exclusively rely on”.
To make it clear, the problem lies on the fact that the popular services mentioned by the researchers depend on other services to work, therefore suffering from the existence of single points of failure/attack in their design or architecture. This means that if a small piece (or dependency) of the system fails (or is attacked), the whole system might become unavailable.
The problem can get even worse because there is a tendency for using a single provider for a certain type of service (i.e. popular services don’t implement redundancy at the provider level), meaning that if that service becomes unavailable, there is no alternative for it. As an example, if the main service chooses to have just one CDN, it cannot fallback traffic to other CDN in the case is needed.
Based on their findings, the researchers highlight that “services should understand their effective attack surface via direct and indirect dependencies and build sufficient levels of redundancy”.
While not long ago hardware failure was the number one reason for network downtime, Denial-of-Service (DoS) attacks, although not new, are evolving to novel and more sophisticated forms, becoming more effective with respect to turning a particular service unavailable. Specifically, Distributed Denial-of-Service (DDoS) attacks take place when multiple sources (e.g., botnets) are programmed to perform a synchronized DoS attack targeted to a specific victim. With the growing diversity of DDoS attacks, a myriad of challenging security concerns should be considered by organizations that are special targets to this kind of attacks.
Network and application services, especially those that use non-connection oriented transports, are particularly vulnerable to Denial-of-Service (DoS) attacks. Although there is a large diversity of DoS attacks, the amplification attack is receiving special attention lately. First, because of its effectiveness and second, because it has been recently used to deny the availability of popular Internet services. In particular, an attacker sending requests with the spoofed ip address of the victim can be very effective when the response sent by the server is much larger than the request is received, thus flooding the victim’s machine.
On February 2018, GitHub servers were hit by the largest ever DDoS attack publicly disclosed, that reached record peeks of 1.35 Tbps.
In this case, attackers took advantage of misconfigured memcached instances to amplify their reflection by more than 50,000 times. Fortunately, this attack was detected 10 minutes later using a DDoS protection service that detected a significant increase in inbound traffic.
After moving traffic to Akamai, the system was fully recovered in the next few minutes. The denial of availability of GitHub services lasted for 20 minutes, which would have cost $3.8 million if we were talking about Amazon services.
Memcached is a distributed memory object caching system, designed to speedup Internet services by caching objects in a shared RAM that, by design, should not be exposed to the Internet. In addition to that, Memcached protocol specification reveals it is particularly susceptible to amplification, by allowing to deliver data without common security checks, as well as to send a huge amount of data in response to tiny requests.
These caveats would not be a problem if the servers were running in a secure and controlled environment. The combination of the unbalanced request-response ratio, the lack of security checks and the power of UDP result in a grenade ready to detonate when memcached is misconfigured and accessed from the Internet. UDP is a non-connection oriented transport that simplifies the process of sending to the target device a massive amount of data without prior consent. At the moment the attack to Github was performed, there were nearly 100,000 exposed memcached servers on the Internet, according to Rapid7 and SANS ISC.
Memcached was not designed to be accessible from the Internet. By default, memcached servers listen both TCP and UDP on port 11211. Unless you have a good reason to access memcached servers externally, do not expose them to the Internet. This task can be accomplished either through memcached server configurations, or by blocking port 11211 on your firewall.
Memcached servers can be configured to block external requests by binding the memcached server to specific IP addresses, e.g., 127.0.0.1, as described in SANS. If you really need to access memcached servers remotely, consider to use a VPN, and disable UDP support in case you don’t need it. However, does this mean UDP is a problem?
A very common suggestion to prevent reflection attacks on memcached is to disable UDP support. Although you should if you are not using it (as everything in life), UDP is not the primary source of the problem, it just potentiates the lack of defensibility of memcached protocol, which was not designed to run on the Internet. Cloudflare goes even further against the use of UDP in Memcrashed - Major amplification attacks from UDP port 11211 blog post, leaving a recommendation note to developers: “Please please please: Stop using UDP. If you must, please don't enable it by default.”
I cannot agree with the statement that developers should stop using UDP. That would be the same as saying to a developer focused on performance to stop using C programing language because it is vulnerable to buffer overflow, making it easier for an attacker to perform a DoS or an even more dangerous attack; or to a CDN provider to not use the 0-RTT feature of TLS 1.3 on their servers because it does not provide inherent protection against replay attacks between connections - although these can be limited, or prevented at the application layer.
It is a fact that there are more security concerns when dealing with UDP traffic than with TCP, but that doesn’t mean you can’t use UDP when these concerns are properly addressed. Since bad user experience caused by performance issues can cost a lot of money, depending on what the main goal is, you should use UDP instead of TCP. In particular, TCP fails to deliver speed in unstable networks, especially in mobile content delivery with increasing latency and packet loss coming from the wireless last mile. A good example of how speed is important is the significant investment by Google and other key players (such as Cloudflare, surprisingly) in QUIC, a UDP-based protocol intended to reduce latency with a primary focus on handshake optimization.
Also, recall that TCP is also subject to DoS attacks. In particular, SYN flood attack is a DoS attack to the three-way handshake of TCP that creates half-open connections, consuming resources on the server that eventually exceed its capabilities. Although this attack can be mitigated, as mentioned by Cloudflare, it still occurs very often, as shown in the following plot provided by Chinese security firm Qihoo 360.
Finally, there are techniques to limit the level of amplification attacks on UDP-based protocols (e.g., increase the size of the request messages, and reduce the size of server’s responses). Having said that, when properly used, UDP has the potential to achieve better performance than TCP, especially in mobile content delivery, while maintaining the same level of security.
Bottom-line, instead of “Please please please: Stop using UDP. If you must, please don't enable it by default.”, I would say: “Use UDP when you’re looking for performance but make sure you know what that implies. Don’t forget that 'with great power comes great responsibility'”!
You can read more about network protocols, here.