Measuring Connection Tampering Around the World 

Twitter logo
LinkedIn logo
Facebook logo
October 17, 2023

A middlebox is a computer networking device that can transform, inspect, filter, and manipulate Internet traffic — otherwise known as connection tampering — that is deemed as restricted between clients and servers due to reasons such as copyright infringement or Internet censorship.

Although these intentions seek to improve security and performance, there is a need to audit and measure their usage, not only for monitoring opaque restrictions to Internet freedom but also for helping content providers understand and explain reasons for unavailability.

So far, the community’s understanding of connection tampering has been solely driven by active measurements. This involves obtaining access to certain networks through purchasing vantage points or recruiting volunteers and running measurements to Internet content from within these networks to test reachability. However, active measurements are inherently limited by the unavailability of user-driven real-world data, the lack of accessible vantage points in many networks, and the need to identify and update important content to test. 

In light of this, my colleagues and I from the University of Michigan, Cloudflare, EPFL, and the University of Maryland developed a completely passive methodology that allows us to detect connection tampering from real-world user data.

Key points:
  • Middleboxes tampering with client traffic exhibit traffic patterns that do not look like typical client traffic to a server.
  • Researchers developed a set of 19 tampering signatures, packet sequences that may indicate middlebox tampering.
  • Some signatures are predominantly observed only in certain regions (for example, PSH ⟶ RST;RST₀ is only seen in China) while others are observed globally.
 

How to Detect Connection Tampering Passively? 

As shown in Figure 1, a typical browser connection to a web server involves the establishment of a TCP handshake (also called a SYN handshake) and then the exchange of multiple data packets. The first data packet typically contains the TLS Client Hello or GET request for HTTP connections. After the exchange of data is completed, the connection is terminated using a FIN handshake.

Animation showing a typical TCP connection between a Client and a Server.
Figure 1 — A typical TCP connection between Client and Server.

However, when middleboxes tamper with traffic, they cause traffic patterns that are different from a typical TCP connection. Middleboxes tamper with traffic by either:

  • Injecting packets that are designed to terminate connections, such as TCP Reset (RST) packets (Figure 2), or
  • Dropping packets forces both the Client and the Server to close the connection (Figure 3). 
Animation showing how middlebox tampering by injecting RST packets after observing the TLS Client Hello packet.
Figure 2 — Middlebox tampering by injecting RST packets after observing the TLS Client Hello packet.
Animation showing middlebox tampering by dropping the TLS Client Hello packet.
Figure 3 — Middlebox tampering by dropping the TLS Client Hello packet.

The presence of TCP RST packets or packet drops are key indicators of connection tampering. While TCP RSTs and packet loss are extremely common in the Internet, the large-scale incidence of such patterns occurring exactly at the stage where middleboxes act (such as immediately after the TLS client hello) is highly indicative of tampering intent. For example: 

  • The censorship apparatus in China (also called the Great Firewall) injects multiple RST packets after observing a restricted domain name in the TLS client hello.
  • Censorship middleboxes in Iran drop the TLS client hello packet when tampering.
Table 1 — Tampering signatures.

Using insights from prior work in censorship detection and manual investigation of large-scale passive data collected by a large CDN, we developed a set of 19 tampering signatures, packet sequences that may indicate middlebox tampering. These packet sequences detect tampering at various stages of a TCP connection and primarily detect RST injection and packet drop-based tampering. 

We note here that these signatures do not only detect tampering: there are specific client behaviors (such as happy eyeballs, scanning, and forceful connection closures) that may cause these patterns to occur, but we hope that our study can reveal large-scale patterns that can be studied further through active measurements. We provide a detailed analysis of each of the signatures and evaluate them in our paper

Global Analysis of Connection Tampering

We applied our signatures to 0.01% of all incoming connections at Cloudflare, a large CDN provider with more than 275 points of presence and global connectivity. The results shown in Figure 4 are from two weeks of passive data in January 2023.

Figure 4 — Percentage of tampering signature matches over select regions and globally.

They show that some signatures are predominantly observed only in certain regions (for example, PSH ⟶ RST;RST₀ is only seen in China) while others are observed globally (for example, PSH;Data ⟶ RST). This indicates that our signatures capture both cases of connection closures common across different countries (that may not be due to tampering). It also reveals specific properties of well-known censorship systems such as the Great Firewall. Our results highlight regions that require deeper focus (for example, Peru and Uzbekistan) while also confirming observations about censorship systems in previous work (China and Iran).

Passive measurements allowed us to also see trends in real-world tampering across time. Figure 5 shows that tampering accounts for a larger percentage of traffic from certain countries at certain times and days, particularly in Russia and Iran.

Time series graph showing the pattern of percentage of flows matching certain tampering signatures in certain regions..
Figure 5 — Tampering rates show a diurnal pattern over time in some regions.

In terms of the latter, we can also see how tampering increased in Iran following protests in September 2022 (Figure 6).

Time series graph showing passive detection of censorship event in Iran around September 2022.
Figure 6 — Passive detection of censorship event in Iran around September 2022.

Increasing the Resolution of Internet Health

We view passive measurements as a powerful complement to active measurement strategies, and together these techniques can provide us with a much more comprehensive picture of connection tampering globally. We encourage service providers and ISPs to adopt our technique to contribute to the community’s knowledge about connection termination. Please reach out! 

Learn more via our SIGCOMM Research paper and SIGCOMM Talk.

Ram Sundara Raman is a PhD Candidate at the University of Michigan whose research focuses on measuring large-scale network interference and censorship. The views expressed by the authors of this blog are their own and do not necessarily reflect the views of the Internet Society.


Photo by K. Mitch Hodge on Unsplash