The Challenges of Measuring Decentralized Networks: The Case of the InterPlanetary File System

Twitter logo
LinkedIn logo
Facebook logo
May 29, 2023

The decentralized web, also known as Web 3.0, is an innovative approach to the traditional web we currently use. It provides users with more resilience, security, and control of where their data is stored. However, because of its decentralized and peer-to-peer (P2P) structure, Web 3.0 makes measuring and monitoring its performance challenging.

In this post, we will talk about our experience measuring the stability, performance, and cartography of InterPlanetary File System (IPFS), one of the largest decentralized, P2P networks in operation.

Key points:
  • Measuring the performance and characterizing traffic in decentralized, permissionless networks is more challenging than for centralized networks.
  • The content delivery performance of IPFS is lacking behind the performance achieved by centralized counterparts of traditional Web 2.0 platforms but is still meeting the requirements of a large variety of Internet use cases. These include website and general file storage and delivery, chat, and interactive document editing.
  • Innovative measurement methodologies have allowed us to uncover important details for one of the central content routing subsystems of IPFS, the public IPFS distributed hash table (DHT).
  • The public IPFS HTTP Gateways are a popular way of accessing the content in the IPFS network, serving more than 300M requests per day, with the majority of requests coming from North America and the EU.
 

What is the InterPlanetary File System?

In a decentralized web, no central authority or large corporations control the data. Instead, data is stored across a network of computers, which are often connected in a P2P fashion, making it more difficult for hackers to bring the network down as there is no single point of failure. This ensures that users have more control over their data and their personal information is not at the mercy of a single corporation. It also encourages a more innovative and diverse technology landscape, where users can choose from various applications and platforms, all built on the same decentralized infrastructure.

IPFS is an open-source, community-built, decentralized P2P network and is one of the most widely adopted decentralized web networks.

One of the unique characteristics of IPFS is its content-addressing nature, which comes in contrast to the host-addressing operation of Internet protocols. This makes IPFS suitable for large-scale content storage and distribution efficiently due to the inherent caching properties. In IPFS, when content is requested by one of the peers in the network, it is temporarily (or permanently, if indicated by the peer) cached and can be served from that peer in the network when subsequently requested. Clearly, this brings significant benefits in the case of popular content that is asked for and served multiple times and as a result, cached at many points in the network.

North America and Europe Benefit Most From Having More Servers

Although performance indicators had been built by several engineers and community members, this did not provide enough data to make informed decisions on operational processes and protocol optimizations. My colleagues and I at ProbeLab, therefore, have sought to increase the visibility into the performance of the different components that collectively form the IPFS network. Our aim has been to:

  • Monitor network stability in terms of node uptime and fluctuation of network size.
  • Evaluate network performance in terms of publishing to and fetching content from the network.
  • Map a rough cartography of the network (for example, geolocation of peers) to drive design decisions.

We have achieved the above goals by developing and using a purpose-built network crawler, a fleet of network probes (nodes), and infrastructure logs. The openness and community-based development of IPFS necessitates that the community and the developers that contribute or build on IPFS are aware of the network’s performance. As such, we have built a website to inform the community of the details of our methodology and the results we’re seeing through our infrastructure. Below is a summary of the results we have so far garnered from monitoring IPFS’s performance:

  • Peer churn is high, with the majority (~80%) of nodes leaving within two hours after joining the network. Despite the high rate of churn, the stable number of nodes that remain online provides remarkable network resilience, as we saw during an incident earlier this year.
  • Around ~20k DHT Server nodes are constantly online.
  • The majority of IPFS DHT Server nodes are located in North America and Europe (Figure 1), giving an advantage to requests coming from these regions. We found that the content discovery time from those regions is around 150-300 ms.
Figure 1 — Geolocation of nodes in the IPFS DHT network in terms of IP addresses.
  • The public IPFS HTTP Gateways are a popular way of accessing the content in the IPFS network, serving more than 300M requests per day, with the majority of requests coming from North America and Europe.

Diving into these details uncovers important findings and optimization potential that would not be possible without such tools and studies. This is a necessary step to make the performance and adoption of Web 3.0 protocols and networks comparable to those we use today.

Learn more

All of the action on network measurements is public and reported at the following Github repository: https://github.com/protocol/network-measurements. We hold biweekly Office Hours where we invite the community and our external collaborators to join, bring up questions and discuss hot topics. You can register and join at: https://lu.ma/ipfs-network-measurements.

You can also reach the ProbeLab team via email [email protected] or through IPFS Discord or Filecoin Slack at the #probe-lab channel.

Yiannis Psaras has a long-standing interest in Information- or Content-Centric Networks with several noteworthy contributions in the area.