Weekly EP12: Will QUIC become the Next-Generation IoV Protocol

Hello, EMQX community survey is in progress. Please submit your feedback on using EMQX: https://forms.office.com/r/djjc373Bvm

MQTT is currently the most widely used communication protocol in the IoT domain. It is utilized in various industries, including the Internet of Vehicles, manufacturing, energy, utility, and smart homes.

MQTT has a compact packet structure, making it highly suitable for network traffic-sensitive scenarios.

As a binary protocol, MQTT is capable of transmitting various types of business data, including text-based data like JSON or XML, binary data like Protobuf, or even compressed image data.

The loose coupling characteristic brought by MQTT's publish-subscribe mechanism allows us to focus more on business logic and improve development efficiency. We can easily and flexibly add or remove topics, publishers, and subscribers to meet subsequent changes in the business.

MQTT provides three QoS levels to ensure the reliable delivery of messages even in harsh network environments, so it has a high tolerance for poor network quality.

The MQTT ecosystem is highly mature. No matter what platform you are on or what hardware you use, you can easily find a matching software stack. Compared to other protocols, users could easily adopt MQTT technologies at a low cost.

As MQTT spread, some challenges arose. For instance, in some scenarios where a large amount of data flowed through the MQTT broker, the processing capacity of the consumer became a bottleneck. To address this issue, EMQ implemented a shared subscription feature, allowing multiple clients to consume data in a load-balanced manner, MQTT 5.0 subsequently improved this feature and standardized it.

MQTT 5.0 introduced several other features suitable for IoT scenarios, including request-response, topic alias, etc.

MQTT 5.0 is a significant improvement over the previous version of the protocol. But there are still challenges for us when using MQTT in certain scenarios. Some problems are not caused by the MQTT protocol itself, but rather by limitations of the underlying transport protocol. In this blog, we will discuss in depth about these problems.

MQTT doesn't have mandatory requirements on the underlying transport protocol, as long as it can provide ordered, reliable and bi-directional transmission of bytes streams. TCP and WebSocket, as the recommended default transport protocols for the MQTT protocol, are supported by nearly all brokers and most clients..

The TCP protocol, which also underlies WebSocket, relies on a four-tuple to identify the connection. This four-tuple consists of source IP, source port, destination IP, and destination port.

Once a connection is established, this four-tuple is determined and will remain constant through out the lifecycle of the connection. All packets sent by both parties will contain these four elements. So the receiver can decide which connection it should deliver to based on the values of these four elements in the packet. If any one of these four elements in the TCP packet changes, the receiver will consider that this packet belongs to another TCP connection.

Imagine these scenarios:

Your phone's signal switches automatically from the cellular network to indoor Wi-Fi when you return home.
Or your car driving from one base station's signal coverage to another, causing it re-enter to the network.

In both cases, these actions cause the terminal's IP address to change. If the terminal continues to send packets to the other side, the source IP of those packets will differ from before. When the other side receives these packets, it will find that there is no corresponding connection existing, so it will directly discard these packets.

To keep the communication going, the terminal must disconnect the previous connection and create a new one. However, this leads to several issues:

Firstly, TCP's three-way handshake takes time, and after the TCP connection is established, MQTT as an application layer protocol, needs to perform a two-way handshake to pass necessary connection information, such as Client ID, username, password, etc.

In theory, the protocol allows the client to send subsequent packets right after the connection packet without waiting for a response packet. However, most of the current client SDKs don't support this, so those two handshakes are still necessary.

Therefore, this means that we must wait for two round trips every time we connect. If it's a network like indoor Wi-Fi, two RTTs won't take too long, it might even be hard for us to notice. But if it's an outdoor environment, we can't guarantee the final signal quality, so these two RTTs could turn into several hundred milliseconds.

To ensure the security of the communication process, we usually also recommend users enable TLS, which will require more RTTs.

For example, with TLS 1.2, the client and server need to perform an additional four handshakes. Therefore, adding the TCP and MQTT handshakes, we need a total of eight handshakes to establish a connection, meaning we need to wait for four RTTs. If our current network latency is 50ms, establishing a connection would require 400 milliseconds.

During the entire process of connection rebuilding, we are unable to transmit any application data, which can be difficult to accept in some latency-sensitive scenarios.

And if we are transmitting packets during the network switch, those packets will be lost, including the packets that we have sent but have not been delivered, and the packets that the other end has sent but we have not yet received. For MQTT, messages like QoS 0, which are fire-and-forget, will be lost directly. For messages like QoS 1 and 2, although we can ensure their arrival by retransmitting them after reconnection, this repetitive sending wastes network bandwidth and increases message latency.

Furthermore, due to TCP's slow start, we can't transfer data at full speeds immediately after reconnecting.

The original intention of this mechanism is to prevent the sender from overloading the network by sending too much data without knowing the network condition. Under the influence of this mechanism, the sender's congestion window can only increase slowly, and the size of this window determines how much data we can send at once.

So if we want to send a larger message immediately after reconnection, this message will be divided into multiple sends because the initial congestion window is not large enough. It has to take several round trips before the other end receives all the fragments. Obviously, the delay of this message will become very large.

Unlike other TCP connection disruptions caused by network failures due to abnormal intermediate devices, the disconnection in some specific scenarios is entirely predictable, such as the IP change caused by base station switching during the vehicle driving process, which is very common in the field of Vehicle Networking. While there is a method to prevent IP change during base station switching, it requires hardware and software support from the base station.

Share EMQ Dev Newsletter

The above are some problems that TCP has when connecting. TCP also has some big challenges when transferring data, the first of which is its head of line blocking.

We know that TCP ensures reliable and ordered packet delivery. However, in real-life scenarios where multiple packets are sent together, these packets may take different paths in the network. Some packets may traverse longer routes, while others may take shorter ones, potentially resulting in out-of-order delivery, where the packets sent later arrive first.

To make sure packets are in order when they get to the application layer, these packets that are sent later but arrive earlier must wait for the ones that are sent first but arrive later.

Also, packet loss is random when packets go through the network. So it can happen that we send 10 packets but lose the 3rd and 4th packets. The first two packets can be immediately placed into the receive buffer waiting for the upper-level application to read them after they reach the receiving end. But for packets like 5, 6, 7, and 8, even though they are already reached the receiver, due to the third and fourth packets being lost, these packets must wait for the sender to retransmit the missing packets to maintain the correct order for the upper layer. Only after the two retransmitted packets arrive can these packets be delivered together. This undoubtedly increases the delay of these packets that arrive first.

Imagine you have a device that reports data. Sometimes it tells you how much battery it has left, and sometimes it warns you about something urgent. These messages are not related to each other, and they don’t have to be in order. But constrained by the TCP HOL blocking, if a normal message gets lost, our urgent event notifications may be blocked due to the loss of regular status messages, and cannot be delivered northward.

Although we use MQTT topics to handle different data streams, when these streams pass through TCP, they merge into a single stream. Unfortunately, this will affect TCP’s transmission performance in weak network environments.

The next problem is TCP's hard-to-change congestion control algorithm. TCP needs to change the size of the congestion window according to the network conditions at any time, which determines the amount of data that the sender can send simultaneously. TCP needs to avoid making things worse when the network is already jammed up and make the most of the network's capacity when it's not too busy. All of this depends on the specific congestion control algorithm that TCP uses. Since TCP first came into the scene, there have been loads of different versions of the congestion control algorithms.

But it is a challenge for us to introduce a new congestion control algorithm or modify existing ones. Because TCP is a transmission protocol implemented at the kernel layer, and the congestion control algorithm it uses is usually bound to the kernel of the OS. And the congestion control algorithm also needs end-to-end support to achieve the final control effect, which means that if we want to change a congestion control algorithm, we need to upgrade not only the server but also all the clients. Just envision the complexity of such a task!

And regardless of the congestion control algorithm used, they all heavily rely on the accurate calculation of RTT(Round-Trip Time). However, when TCP was initially designed, it utilized the same sequence number for both the original and retransmitted packets. Consequently, upon receiving a response, there's simply no way to distinguish whether it's for the first packet or the retransmitted one.

If we mistakenly treat the reply to the original packet as a response to retransmission, it leads to a short-calculated RTT. Consequently, the sender perceives the network to be better than it actually is and sends data again too soon, resulting in increased network congestion. On the other hand, if the RTT is inaccurately long, the sender will be slow to retransmit, leading to underutilization of the network's bandwidth.

To resolve this issue, TCP introduced the timestamp option. This means that when the sender sends a packet, it includes a timestamp, and the receiver echoes this timestamp when replying. Consequently, when the sender receives the response, it can accurately calculate the round-trip time by simply subtracting the current timestamp from the one in the response packet.

Nevertheless, while the timestamp feature resolves the RTT calculation issue, it comes with its own challenge. The inclusion of a timestamp in each TCP packet adds an extra 10 bytes of overhead, which could become a concern in bandwidth-sensitive scenarios.

The issues we've discussed today stem from certain inherent mechanisms and features of the TCP protocol. Being implemented in the kernel layer, TCP proves to be a formidable challenge for modifications, even when we have identified its shortcomings.

To address these challenges, EMQ embraced the QUIC protocol—an initiative of Google launched back in 2013. The full name is QUICK UDP Internet Connections, and as the name suggests, QUIC is built on top of UDP.

In a nutshell, think of QUIC as a souped-up version of TCP. It re-implements TCP features such as packet loss detection, message retransmission, congestion control, etc., on top of UDP. But QUIC knows what problems to avoid, and unlike TCP, it does not have to introduce compatibility designs due to various legacy issues. It also brings in some fresh features, such as connection migration, multiplexing, etc., to solve some of the long-standing problems with TCP.

In contrast to TCP, QUIC offers significantly faster network speeds and a substantial enhancement in transmission performance in situations with poor network conditions.

For the time being, QUIC has gained widespread adoption among global technology giants like Google, Microsoft, Apple, Facebook, Ali, Huawei, and more. Another important development that showcases the advantages of QUIC is HTTP/3, which was officially published as an RFC standard in June 2022. HTTP/3 decided to use QUIC as its underlying transport protocol over traditional TCP.

Different from the TCP protocol, QUIC is implemented at the application layer and has built-in TLS. This brings three benefits. First, QUIC and TLS can perform their handshakes simultaneously. In QUIC, the handshake waiting time for our first connection only needs one RTT, and the subsequent connection recovery can achieve zero RTT handshake through the Early Data function. This means the client can directly send application data to the server when initiating a connection, without waiting for the server's confirmation.

Second, unlike TCP, QUIC does not need to consider the situation of intermediate devices and therefore does not have to retain support for historical versions of TLS. So, QUIC only supports the latest TLS 1.3 version. The benefit of this is that it can ensure that both the client and the server inevitably run on top of TLS 1.3.

Before the advent of QUIC, TCP and TLS were separate entities. Unfortunately, some users who lacked awareness of security concerns relied solely on TCP, inadvertently exposing their data to vulnerabilities. Even with TLS, some users used outdated versions or weak configurations, compromising the security of their data.

TLS 1.3 also simplifies cipher suites significantly by discarding those with known security vulnerabilities, leaving only five carefully selected cipher suites. In contrast to the previous plethora of choices, this makes it much easier to select a secure and efficient cipher suite.

Finally, QUIC overcomes the difficulties of updating congestion control in TCP and brought us a very important feature called pluggable congestion control. We can configure different congestion control algorithms for different connections, and even switch control algorithms at runtime without stopping the service at all. In addition to adjusting various parameters of common congestion control algorithms such as Cubic and BBR, QUIC also fully allows us to implement a congestion control algorithm by ourselves to meet our more complex transmission needs.

Moreover, QUIC revolutionizes the way congestion control algorithms calculate RTT by utilizing a strictly monotonically increasing Packet Number instead of packet timestamps. As mentioned, in TCP, a retransmitted packet retains the same sequence number, potentially leading to confusion in RTT calculation. However, with QUIC, a retransmitted packet receives a new Packet Number, ensuring accurate RTT calculation without the need for additional data in the packets.

Another advantage of QUIC is that it supports connection migration. The TCP’s four-tuple causes the connection to halt when the network switches, so QUIC has switched to using a randomly-generated 64-bit connection ID. This way, even if a network switch occurs and one party's IP and port change, QUIC can still deliver the message to the correct connection through the connection ID.

Of course, just like TCP, QUIC can't ensure that the new network is the same as or close to the original one due to changes in the network environment. If the data is still sent at the previous transfer rate, it may cause the new network to be overloaded. Therefore, when the connection is migrated, QUIC usually resets or downgrades the sending rate, and then gradually recovers under the role of the congestion control algorithm.

But this avoids the loss of sent messages due to disconnection. And in the case of port changes caused by NAT rebinding, we can assume that the terminal is still in the same physical network, so QUIC will not reset the congestion window.

For specific scenarios like IP changes when switching from a cellular network to Wi-Fi, we can also customize our congestion control strategies. For example, by adopting a relatively more aggressive congestion recovery strategy when detecting that we are in a Wi-Fi network, we can recover to the optimal transmission rate more quickly.

QUIC effectively addresses the HOL blocking issue present in TCP through its robust multiplexing capability. By creating multiple streams within a single connection, QUIC ensures even data distribution and independent congestion control for each stream. In the event of data loss in one stream, it does not hinder data transmission in other streams. This unique feature allows for the separation of different types of data into distinct streams, such as isolating control commands from regular data.

This also brings another advantage that some urgent data can be sent as soon as possible. In a single TCP stream, when we have piled up a lot of ordinary messages waiting to be sent, if there is an urgent message that needs to be sent immediately, it must be at the end of the current sending queue. So the length of the current sending queue and the current status of the network will affect the real-time performance of this emergency message.

However, in QUIC, if we use an independent stream to transmit such urgent messages, then they can achieve an effect similar to cutting in queue, because the messages in each stream will be sent alternately, so even if there are a lot of messages accumulated in other streams, our urgent messages can also be processed very quickly.

We can also set different priorities for streams if we want such urgent messages to be sent faster. QUIC will send data in high-priority streams as much as possible based on the priority of the stream and the current congestion situation.

QUIC provides invaluable solutions to some of the critical challenges faced in IoV scenarios. One significant issue is network disconnection, as vehicles in motion encounter varying network qualities while traversing city roads, country roads, underground garages, or tunnels. In the context of IoV, maintaining uninterrupted service becomes crucial, requiring swift network reconnections and minimal disconnections. Here, QUIC's zero RTT handshake and connection migration features emerge as indispensable tools, enabling seamless and reliable communication in the dynamic IoV environment.

Furthermore, QUIC proves valuable in delivering time-sensitive messages to vehicles, such as green wave speed guidance. By calculating the ideal speed based on distance and traffic light countdown, drivers can adjust their speed to avoid unnecessary stops at intersections. To ensure that these critical messages are not delayed by less important ones, QUIC leverages its multiplexing features and stream prioritization, allowing for efficient and timely data delivery.

Recognizing the challenges and needs across various industries, we introduced QUIC to EMQX 5.0, making it the first product to utilize QUIC over MQTT. In June 2023, we released EMQX 5.1, which makes MQTT over QUIC ready for production. Now, you can leverage the remarkable benefits of QUIC's features, such as connection migration, multiplexing, and zero RTT handshakes, with the EMQX platform.

Let's take multiplexing as an example, one of the most valuable features of QUIC. Here's how it works: the client can establish separate streams for each topic, ensuring that messages from different topics do not hinder each other. If the importance of our message is more closely related to the QoS levels, for example, we don't want QoS 0 messages to block QoS 1 or QoS 2 messages, we can create individual streams for each QoS level. Additionally, for MQTT QoS 1 and QoS 2 messages accompanied by acknowledgments, we can split the sending and receiving messages into distinct streams, preventing those acknowledgments from being obstructed by other application messages.

In terms of performance, QUIC undoubtedly showcases clear advantages, consistently outperforming TCP and TLS in numerous comparison tests. The following is the test environment we use:

Test Platform: EMQX 5.0 with a single node
Server Specification: AWS EC2 M4.2xlarge (8 Cores 32GB)
Operating System: Ubuntu 20.04
Number of MQTT Clients: 5000
Loadgen Parallel Number: 8
Latency Measurements: P95 (Percentile)

We conducted a connection handshake test with a 30-millisecond round-trip packet delay first. Surprisingly, even with TLS authentication and encryption, QUIC's first connection RTT proved to be remarkably similar to TCP's. However, QUIC's true advantage shines through when resuming connections, as it can efficiently carry MQTT connection packets using Early Data, resulting in significantly reduced latency.

Not only is QUIC fast, but it is also remarkably efficient. On identical hardware, featuring a 2.4 GHz 8-core CPU and 32GB RAM, QUIC utilizes only 60% of the CPU capacity during the first connection, while TLS consumes 80%. Even when resuming a connection, QUIC demonstrates around 10% lower CPU usage than TLS. Moreover, QUIC significantly reduces maximum memory usage by 3GB compared to TLS.

Next, we thoroughly tested QUIC's connection migration capabilities, a pivotal feature of this protocol. For comparison, we subjected both tests to the same load, one utilizing TLS and the other employing QUIC.

In our tests, we introduced network changes through NAT rebinding. The results were strikingly distinct between the two groups. The TLS group encountered significant message delivery interruptions, while the QUIC group, in stark contrast, experienced seamless and uninterrupted message delivery without any jitter.

Lastly, we conducted tests to assess QUIC's behavior in poor network conditions. We once again compared TLS and QUIC, subjecting both to a packet rate of 20K/s. To simulate a weak network on the client side, we introduced 20% noise and 10% packet loss. Additionally, for QUIC, we alternated the network every 30 seconds. The results were telling—while TLS struggled to cope with the adverse network conditions, it could only reach half of the expected rate until the network improved, QUIC remained entirely unaffected, consistently maintaining its packet rate of 20K/s.

These tests clearly demonstrate that QUIC is better than TCP for networks that change a lot or have poor quality.

While QUIC boasts numerous advantages, and EMQX has already incorporated support for MQTT over QUIC, it is crucial to note that practical implementation of QUIC necessitates client support.

However, finding a client SDK that supports MQTT over QUIC is challenging at present, and implementing it from scratch would be highly difficult and costly for users.

In order to allow our users to enjoy the various benefits of MQTT over QUIC as soon as possible, we have provided corresponding solutions on both the client side and the edge side. On the client side, we have launched the NanoSDK that supports MQTT over QUIC, which currently supports four languages: C, C++, Java, and Python, covering most hardware platforms. Of course, we are also planning to support more languages to meet the needs of developers as much as possible.

In addition to NanoSDK, we've also added QUIC support to our MQTT performance testing tool, emqtt-bench. It empowers us to comprehensively test and verify the performance of QUIC within the EMQX ecosystem.

For those devices that cannot find a suitable MQTT over QUIC SDK or are difficult to modify the firmware, we provide an edge-side solution based on NanoMQ.

NanoMQ is an ultra-lightweight and high-performance edge MQTT messaging engine. We have added QUIC support to its MQTT bridging feature, and now it can convert regular TCP traffic into QUIC traffic. In this way, we don't need to make any changes to the terminal devices, and these devices can enjoy the various benefits brought by QUIC, such as avoiding the head-of-line blocking problem of traditional bridging through the multiplexing feature and allowing high-priority data to be transmitted first.

With these two solutions, we have assisted numerous users in the IoV industry in seamlessly adopting QUIC into their production environments. The results have been nothing short of amazing: in weak network conditions, QUIC has demonstrated a substantial reduction in both packet loss rate and disconnection frequency when compared to traditional TCP.

The aim of this article is to illustrate how QUIC effectively resolves various challenges inherent in traditional TCP. We want you to recognize that QUIC is a viable option when confronted with similar obstacles and can provide numerous advantages, as we've demonstrated.

Let's finally review the advantages of QUIC once again:

Faster connection speed. High-performance and low latency of connection handshake with one round trip or zero round trip.
Better Security. End-to-end encryption, handshake authentication via TLS 1.3.
Multiplexing. Allows a connection to carry multiple streams to transmit data in parallel and allows setting different priorities for streams.
Pluggable congestion control algorithm. Allows for flexible adjustment and customization of congestion control algorithms.
Connection migration. Network switching will not cause connection interruption.

Furthermore, EMQ is actively striving to integrate MQTT over QUIC into the MQTT protocol standard. We firmly believe that this integration will accelerate the industry's adoption of MQTT over QUIC, enabling everyone to experience its exceptional benefits.

Share EMQ Dev Newsletter

Crafting IoT Scenarios with ChatGPT and MQTTX: In this article, we will delve into integrating ChatGPT with the MQTT client tool, MQTTX, with the goal of simulating and generating authentic IoT data streams. → Read more.
A simple Unity3d project for using M2MQTT with Unity: The project includes an example scene with a user interface for managing the connection to the broker and testing messaging. → Read more.
Connected Vehicle Streaming Data Pipelines: This article will provide a demo to illustrate how MQTT and Kafka can be integrated. We will simulate vehicle devices and their dynamic Telematics data, connect them to an MQTT Broker, and then send the data to Apache Kafka. → Read more.
EMQX Single Node Benchmark: In this post, we provide the benchmarking result of EMQX message throughput - a single node EMQX processes 2M message throughput per second. → Read more.
EMQX Single Node Message Latency & Response Time: This is a low load test and the results show that EMQX has very low message latency. These results indicate that EMQX can be a valuable tool in IoT applications that require a high level of real-time responsiveness. → Read more.

MQTT Made Easy with EMQX Enterprise 5.1! We're hosting a session on the groundbreaking features of EMQX 5.1 on September 6, 2023! Dive into MQTT over QUIC, larger file transfer capabilities, IoT data channel unification, and comprehensive integration views. This is the future of MQTT you don't want to miss! → Register Now!

EMQX Enterprise 5.1.1 is now officially released! This version adds QoS level and retain flag checks in authorization to allow more flexible client access control. It also introduces 3 new random functions in rules SQL to meet specific use case needs. → Read more.

Neuron is an industrial IoT server that can connect with diverse devices simultaneously using multiple protocols. It aims to address the challenge of accessing data from automated equipment and provide infrastructure support for smart manufacturing.

Previously, all available modules of Neuron only offered a 15-day free trial. Now you can enjoy a permanent free license with no more than 30 data tags! —>Get started now!

EMQX Cloud is an MQTT middleware for the IoT from EMQ. As the world's first fully managed MQTT 5.0 cloud messaging service, EMQX Cloud provides a one-stop O&M colocation and a unique isolated environment for MQTT services. —>Get started free!

Share EMQ Dev Newsletter