Authoritarian regimes often censor websites for those within their borders, threatening free and open communication on the Internet. Measuring what is blocked, how censors operate, and how censorship changes over time is essential to understanding and circumventing these censorship efforts.
Unfortunately, it is challenging to measure such things, particularly in countries that restrict researchers’ ability to get local vantage points.
Through my Pulse Research Fellowship, I am building on research that my colleagues and I at the University of Maryland and the University of Chicago have been working on to overcome several of these challenges and improve how the measurement community can longitudinally measure censorship without requiring assistance from within the country.
Apply now for the 2024 Pulse Research Fellowship [link]
Censorship is Traditionally Measured In-Country
There is currently a wide array of efforts—like OONI and CensoredPlanet—measuring Internet censorship by requesting URLs and observing whether the corresponding websites are accessible. A limitation of these projects is that they rely on volunteers or finding vantage points within censored countries to conduct these requests.
This is further compounded in countries with small populations, highly repressive regimes, low Internet penetration rates, and poor Internet infrastructure. In such countries, even when vantage points are available, they are often limited to only a small handful of measurements, followed by periods of no or low measurement—giving us only a glimpse of censorship at a single point in time.
Our Approach Measures Censorship From the Outside
To overcome this challenge, my colleagues and I developed a novel technique that can longitudinally measure censorship without requiring assistance from within the country. This technique takes advantage of two quirks with the ways some countries censor.
Many countries deploy bidirectional censorship that blocks censored traffic regardless of whether the request censors receive originates from inside or outside the country (Figure 1). This means we can have our clients measure the level of censorship by sending requests from outside the censored country to servers inside the censored country.
However, finding these public servers inside censored countries can be burdensome for the above reasons.
Yet, we don’t always need a responsive server to trigger censorship bidirectionally because some censoring devices (middleboxes) exhibit TCP noncompliant behavior.
A middlebox is a computer networking device that can transform, inspect, filter, and manipulate Internet traffic — otherwise known as connection tampering — that is deemed restricted between clients and servers due to copyright infringement, corporate network interference, or Internet censorship.
Tricking Censors Into Censoring
A regular HTTP(S) censorship event occurs when a client connects to a live server with a TCP three-way handshake and sends a PSH+ACK packet that contains a request to a censored website. The censor sees this request and takes blocking action by either dropping or throttling the client’s traffic, sending a blockpage back to the client, or sending a reset packet (Figure 2), denoted as RST, back to the client to terminate the connection.
The TCP three-way handshake requires a response from the server—a SYN+ACK packet. However, our goal is to measure censorship in places where there isn’t a server at all.
To achieve our goal, we rely on the fact that censors are expected to miss some packets within a connection due to asymmetric routes, load balancing, and heavy traffic. For example, a censor may miss the ACK packet sent from a client to the server in a TCP three-way handshake.
When the client sends a subsequent PSH+ACK packet with a censored domain, we would expect the censor to disregard the packet as, from the censor’s perspective, there is no ongoing connection since the client and the server did not conduct a TCP three-way handshake. Yet many censors still take blocking action, such as sending a RST packet back to the client (Figure 3). Many censors are, therefore, not fully TCP compliant. They rely only on presumption, not confirmation, of an ongoing connection to block a censored request.
This means we can craft packet sequences that trigger censorship without requiring any live servers to complete the TCP handshake to trigger the censor.
Continuing the previous example, the client can send a SYN packet followed by a PSH+ACK packet to a non-responsive IP address to trigger the censor (Figure 4).
This means we can now measure censorship in networks that have no participants. Due to bidirectional censorship, we can send these packet sequences from clients we control outside the censoring country. Moreover, we can direct our censorship measurements to non-responsive IP addresses with no users or machines behind them, mitigating potential user risks and ethical concerns regarding connections to live machines.
Automating the Process
The SYN followed by a PSH+ACK packet sequence is one of many that trigger some censors. However, it is not a standard packet sequence that will successfully trigger censorship in all censored regimes. Therefore, we must discover which packet sequences trigger censoring middleboxes across different censoring regimes.
In my initial attempt to apply this technique, my colleagues and I studied censorship in Turkmenistan—a notoriously difficult country to measure from within, given its low Internet penetration and extremely harsh laws about Internet use. I attempted to trigger the censoring middleboxes within the country by manually crafting packet sequences. I discovered that sending a SYN followed by a PSH+ACK packet twice, separated by 5-29 seconds between packets, successfully triggered censorship.
While encouraging, these results took considerable manual effort, which will not scale to different countries or ISPs within the same country.
As part of my Pulse Research Fellowship, I am developing techniques that automate the discovery of censorship-triggering packet sequences, allowing us to measure censorship in many countries worldwide that are out of reach for traditional measurement techniques.
To do so, I plan to use Geneva—an open-source genetic algorithm that trains against live censors to discover packet sequences that evade censorship. However, instead of having Geneva evade censorship, I plan to modify it to have it discover packet sequences that trigger censorship instead. This will include adding new capabilities to Geneva. For example, Geneva would not have been able to find the packet sequence used to trigger censorship in Turkmenistan as Geneva does not support breaks between sending packets.
New Method Will Allow Us to Study Censorship in Overlooked Countries
For this censorship measurement technique to work, we need both bidirectional censorship and a censoring device that can be tricked into censoring with specially crafted packet sequences. So far, we’ve found:
- Belarus, Brunei, China, Iran, Libya, Russia, Tajikistan, and Uzbekistan are censoring bidirectionally.
- Sending a PSH+ACK packet twice successfully triggers censorship in Tajikistan, while the SYN followed by a PSH+ACK packet sequence is sufficient for the other countries.
- Burundi, Equatorial Guinea, Kyrgyzstan, and Myanmar are not censoring bidirectionally, so we cannot study them with this technique.
We are in the process of studying more countries that have long been overlooked to understand what domains get censored, how homogeneous the censorship policies throughout a given country are, how censorship policies differ across regions of the world, and how censorship changes over time.
And stay up to date with future developments via our website.
Sadia Nourin is a Computer Science Masters Student at the University of Maryland and Pulse Research Fellow.