Skip to content

Commit

Permalink
backfilling avail & celestia reports 2024-49
Browse files Browse the repository at this point in the history
  • Loading branch information
guillaumemichel committed Dec 11, 2024
1 parent dbd2e97 commit 2e180f8
Show file tree
Hide file tree
Showing 29 changed files with 317 additions and 3 deletions.
154 changes: 154 additions & 0 deletions content.en/avail/dht/mainnet/2024-49-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
title: Week 2024-49
plotly: true
slug: 2024-49
weight: 1045610
---

# Avail Report Cal. Week 49 - 2024

## General Information

The following results show measurement data that were collected in calendar week 49 in 2024 from `2024-12-02` to `2024-12-09`.

- **Number of crawls: `84`**

The number of crawls executed during the week indicated above.

- **Number of visits: `643,417`**

Every attempt to dial or connect to a peer counts as a visit. Regardless of errors that may occur. We visit peers many times during a given week to extract information from them or probe for their uptime. We visit all peers that we discover in the DHT and peers that we found to be still online from the previous week.

- **Number of unique peer IDs discovered in the DHT: `4,808`**

The total number of unique peers that our crawler learned about from other peers over the period of the given week and attempted to dial or connect to (visit). This is not the total number of Peer IDs that existed simultaneously in the network at any point during the week. It is the aggregate over the entire week.

- **Number of unique IP addresses discovered: `2,644`**

The total number of unique IP addresses that the discovered peers advertised to the DHT network within the given week. IPv4 and IPv6 addresses are included in this number even if they address the same peer. This is not the total number of unique IP addresses that participated in the DHT simultaneously at any given point in time. It is the aggregate over the entire week.

- **Network Size: `4,633`**

The average number of peers that were found simultaneously (i.e., throughout the duration of one crawl) connected to the network. This number signifies the average network size from all crawls, during the given week.

Timestamps are in UTC if not mentioned otherwise.

## Agents versions

This stacked plot shows the distribution of agent versions over time, it offers insights into the uptake of newer versions.

{{< plotly json="/plots/2024/12/02/avail-versions-distribution.json" height="600px" >}}

This chart presents the version distribution of avail nodes, with each bar representing the average number of online peers over the course of a week.

{{< plotly json="/plots/2024/12/02/avail-agents-versions.json" height="600px" >}}

## Churn Analysis

This plot presents the Cumulative Distribution Function (CDF) of peer departure
times from the network over a measurement period of one-week. The plot
basically shows the percentage of nodes online at a point in time (`t=0`) that
will have disconnected after 1 hour, 2 hours, 3 hours, .. up to 24 hours after `t=0`.

- **X-Axis (Time in Hours)**: Represents the elapsed time since the reference
point (`t=0`), measured in hours.
- **Y-Axis (Percentage of Offline Peers)**: Indicates the cumulative
percentage of peers that have disconnected from the network by the
corresponding time interval. Specifically, it shows the proportion of peers
that were online at `t=0` and have since gone offline.

This visualization aids in understanding peer churn dynamics, which is crucial
for optimizing network stability, resource allocation, and improving overall
decentralized network performance.

Note that the sum of the CDF is **NOT** 100%, as it only includes peers that were
online for up to 24h.

{{< plotly json="/plots/2024/12/02/avail-network-stability-cdf.json" height="800px" >}}

## Geolocation

Geographical data is sourced from the [MaxMind database](https://www.maxmind.com), which maps IP addresses to corresponding countries.

This bar plot illustrates the distribution of observed nodes across different countries.

{{< plotly json="/plots/2024/12/02/avail-geo-agent-All-bars.json" height="600px" >}}

This plot displays the weekly geographical distribution of nodes, categorized by country.

{{< plotly json="/plots/2024/12/02/avail-geo-agent-All-lines.json" height="600px" >}}

## Cloud Providers

Cloud providers data is sourced from [Udger](https://udger.com/resources/datacenter-list), which maps IP addresses to known hosting providers.

### Cloud Hosting Rate

This line chart displays the count of nodes within the Avail network that are hosted on known commercial cloud providers, compared to those that are not. It tracks the distribution over a specified period, offering insights into the infrastructure preferences for node hosting.

Regular analysis of this chart can reveal trends in the adoption of cloud services for nodes. Such information is crucial for understanding the network's resilience and the potential reliance on cloud infrastructure.

{{< plotly json="/plots/2024/12/02/avail-cloud-rate-agent-All-lines.json" height="600px" >}}

This bar chart presents the weekly distribution of Avail nodes among various cloud providers, including nodes not hosted within data centers (grouped under _Non-Datacenter_).

{{< plotly json="/plots/2024/12/02/avail-cloud-agent-All-bars.json" height="600px" >}}

The line chart illustrates the trends in the distribution of all Avail nodes across cloud providers over the given time period. Note that nodes hosted outside of data centers are not included in this representation.

{{< plotly json="/plots/2024/12/02/avail-cloud-agent-All-lines.json" height="600px" >}}

## Crawls

### Protocols

This plot illustrates the evolving count of nodes supporting each of the listed protocols over time. It exclusively presents data gathered from nodes accessible through a libp2p connection via our crawler. The identification of supported protocols relies on the [libp2p identify protocol](https://github.com/libp2p/specs/tree/master/identify), hence necessitating a libp2p connection for discovery.

{{< plotly json="/plots/2024/12/02/avail-crawl-protocols.json" height="1200px" >}}

### Errors

{{< plotly json="/plots/2024/12/02/avail-connection-errors-single.json" height="600px" >}}

{{< plotly json="/plots/2024/12/02/avail-crawl-errors-single.json" height="600px" >}}

## Keyspace Density Monitoring

{{< hint info >}}
💡 The latest keyspace density data is available [here](../../keydensity/).
{{< /hint >}}

In Kademlia, every object indexed by the DHT requires a binary identifier. In the libp2p DHT implementation, peers are identified by the digest of `sha256(peer_id)` and CIDs are identified by the digest of `sha256(cid)`. This Kademlia identifier determines the location of an object within the Kademlia XOR keyspace.

The following plots examine the peer distribution within the keyspace, aiding in the identification of potential [Sybil](https://en.wikipedia.org/wiki/Sybil_attack) and eclipse attacks.

### Keyspace population distribution

**Description**: The plot illustrates the number of peers whose Kademlia identifier matches each prefix for all prefixes of a given size, for a given network crawl. Since the density of keyspace regions follows a [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution) distribution, it is expected to observe some regions that are significantly more populated than others.

**How to read the plot:** The selected `depth` indicates the prefix size. There are `2^i` distinct prefixes at depth `i`. The slider at the bottom of the plot enables visualization of the population evolution over time across multiple crawls.

**What to look out for:** The red dashed line represents the expected density per region, corresponding to the number of peers matching a prefix. A bar exceeding the expected density by more than twice suggests that a region of the keyspace might be under an eclipse attack.

{{< plotly json="/plots/2024/12/02/avail-regions-population.json" height="600px" >}}

### Keyspace density distribution

**Description:** As previously mentioned, the keyspace population follows a [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution) distribution, which can make identifying outliers challenging. The plot below counts the number of regions for each population size and facilitates the identification and analysis of outliers. While it is normal for some regions to have populations above the average, the plot enables us to quantify these deviations.

**How to read the plot:** The red dashed line represents the expected number of regions for each population size. Note that the Poisson distribution is more evident at greater depths (longer prefix size), while analyzing data at lower depths provides limited insights. It is recommended to read the plot for depths between 6 and 8.

**What to look out for:** If a bar significantly exceeds its expected value on the right side of the plot, or if an isolated bar appears on the far right, it may indicate a potential eclipse attack, warranting further investigation.

{{< plotly json="/plots/2024/12/02/avail-density-distributions.json" height="600px" >}}

## Stale Peer Records

This stacked plot depicts the count of peer records stored within each node's routing table and made accessible through the DHT.
These peer records are used to discover new remote peers within the network, enabling efficient and secure routing towards the target peer or content

Ensuring the reachability of referenced peers shared within the network holds paramount importance. The plot delineates the occurrences of reachable and non-reachable (stale) peer records. Note that nodes running behind a NAT are counted as unreachable even though they may be online.

For instance, if a peer's record is present in the routing tables of 100 other nodes and the peer is reachable, the plot will reflect an increase of 100 in the _online_ category.

{{< plotly json="/plots/2024/12/02/avail-stale-records-stacked.json" height="600px" >}}
136 changes: 136 additions & 0 deletions content.en/celestia/dht/2024-49-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
title: Week 2024-49
plotly: true
slug: 2024-49
weight: 1045610
---

# Celestia Report Cal. Week 49 - 2024

## General Information

The following results show measurement data that were collected in calendar week 49 in 2024 from `2024-12-02` to `2024-12-09`.

- **Number of crawls: `84`**

The number of crawls executed during the week indicated above.

- **Number of visits: `79,250`**

Every attempt to dial or connect to a peer counts as a visit. Regardless of errors that may occur. We visit peers many times during a given week to extract information from them or probe for their uptime. We visit all peers that we discover in the DHT and peers that we found to be still online from the previous week.

- **Number of unique peer IDs discovered in the DHT: `298`**

The total number of unique peers that our crawler learned about from other peers over the period of the given week and attempted to dial or connect to (visit). This is not the total number of Peer IDs that existed simultaneously in the network at any point during the week. It is the aggregate over the entire week.

- **Number of unique IP addresses discovered: `250`**

The total number of unique IP addresses that the discovered peers advertised to the DHT network within the given week. IPv4 and IPv6 addresses are included in this number even if they address the same peer. This is not the total number of unique IP addresses that participated in the DHT simultaneously at any given point in time. It is the aggregate over the entire week.

- **Network Size: `185`**

The average number of peers that were found simultaneously (i.e., throughout the duration of one crawl) connected to the network. This number signifies the average network size from all crawls, during the given week.

Timestamps are in UTC if not mentioned otherwise.

## DHT Servers Agents

This plot shows the distribution of various DHT servers user agents over time.

{{< plotly json="/plots/2024/12/02/celestia-agents-overall-stacked.json" height="600px" >}}

This stacked chart illustrates how the distribution of DHT servers agent versions for all key user agents evolves over time. It offers insights into the uptake of newer versions among these agents.

{{< plotly json="/plots/2024/12/02/celestia-versions-distribution.json" height="600px" >}}

<!--## Lightnode Population-->
<!---->
<!--This chart shows the distribution of all agents interacting with the DHT (both DHT clients and DHT servers) over time, including the Lightnode population.-->
<!---->
<!--> 💡 To show the Lightnode population only, double-click on the `celestia-node/light` label.-->
<!---->
<!--{{< plotly json="/plots/2024/12/02/celestia-client-agents-overall-stacked.json" height="600px" >}}-->
<!---->
<!--This stacked plot illustrates the agent version distribution over time for all DHT clients and servers. It offers insights into the uptake of newer versions among these agents.-->
<!---->
<!--{{< plotly json="/plots/2024/12/02/celestia-client-versions-distribution.json" height="600px" >}}-->

## Churn Analysis

This plot presents the Cumulative Distribution Function (CDF) of peer departure
times from the network over a measurement period of one-week. The plot
basically shows the percentage of nodes online at a point in time (`t=0`) that
will have disconnected after 1 hour, 2 hours, 3 hours, .. up to 24 hours after `t=0`.

- **X-Axis (Time in Hours)**: Represents the elapsed time since the reference
point (`t=0`), measured in hours.
- **Y-Axis (Percentage of Offline Peers)**: Indicates the cumulative
percentage of peers that have disconnected from the network by the
corresponding time interval. Specifically, it shows the proportion of peers
that were online at `t=0` and have since gone offline.

This visualization aids in understanding peer churn dynamics, which is crucial
for optimizing network stability, resource allocation, and improving overall
decentralized network performance.

Note that the sum of the CDF is **NOT** 100%, as it only includes peers that were
online for up to 24h.

{{< plotly json="/plots/2024/12/02/celestia-network-stability-cdf.json" height="800px" >}}

## Geolocation

Geographical data is sourced from the [MaxMind database](https://www.maxmind.com), which maps IP addresses to corresponding countries.

This bar plot illustrates the distribution of observed nodes across different countries.

{{< plotly json="/plots/2024/12/02/celestia-geo-agent-All-bars.json" height="600px" >}}

This plot displays the weekly geographical distribution of nodes, categorized by country.

{{< plotly json="/plots/2024/12/02/celestia-geo-agent-All-lines.json" height="600px" >}}

### Cloud Providers

Cloud providers data is sourced from [Udger](https://udger.com/resources/datacenter-list), which maps IP addresses to known hosting providers.

#### Cloud Hosting Rate

This line chart displays the count of nodes within the Celestia network that are hosted on known commercial cloud providers, compared to those that are not. It tracks the distribution over a specified period, offering insights into the infrastructure preferences for node hosting.

Regular analysis of this chart can reveal trends in the adoption of cloud services for nodes. Such information is crucial for understanding the network's resilience and the potential reliance on cloud infrastructure.

{{< plotly json="/plots/2024/12/02/celestia-cloud-rate-agent-All-lines.json" height="600px" >}}

This bar chart presents the weekly distribution of Celestia nodes among various cloud providers, including nodes not hosted within data centers (grouped under _Non-Datacenter_).

{{< plotly json="/plots/2024/12/02/celestia-cloud-agent-All-bars.json" height="600px" >}}

The line chart illustrates the trends in the distribution of all Celestia nodes across cloud providers over the given time period. Note that nodes hosted outside of data centers are not included in this representation.

{{< plotly json="/plots/2024/12/02/celestia-cloud-agent-All-lines.json" height="600px" >}}

## Crawls

### Protocols

This plot illustrates the evolving count of nodes supporting each of the listed protocols over time. It exclusively presents data gathered from nodes accessible through a libp2p connection via our crawler. The identification of supported protocols relies on the [libp2p identify protocol](https://github.com/libp2p/specs/tree/master/identify), hence necessitating a libp2p connection for discovery.

{{< plotly json="/plots/2024/12/02/celestia-crawl-protocols.json" height="1200px" >}}

### Errors

{{< plotly json="/plots/2024/12/02/celestia-connection-errors-single.json" height="600px" >}}

{{< plotly json="/plots/2024/12/02/celestia-crawl-errors-single.json" height="600px" >}}

## Stale Peer Records

This stacked plot depicts the count of peer records stored within each node's routing table and made accessible through the DHT.
These peer records are used to discover new remote peers within the network, enabling efficient and secure routing towards the target peer or content

Ensuring the reachability of referenced peers shared within the network holds paramount importance. The plot delineates the occurrences of reachable and non-reachable (stale) peer records. Note that nodes running behind a NAT are counted as unreachable even though they may be online.

For instance, if a peer's record is present in the routing tables of 100 other nodes and the peer is reachable, the plot will reflect an increase of 100 in the _online_ category.

{{< plotly json="/plots/2024/12/02/celestia-stale-records-stacked.json" height="600px" >}}

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion content.en/plots/2024/12/02/avail-agents-overall.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"data": [{"hovertemplate": "Agent: %{x} <br>Count: %{y:,}<br>Percentage: %{text}<br><extra></extra>", "marker": {"color": ["#937860", "#8172b3", "#c44e52", "#55a868", "#dd8452", "#4c72b0"]}, "text": ["46.8%", "29.3%", "10.7%", "6.0%", "3.6%", "3.6%"], "textposition": "outside", "x": ["v2.2.0", "v2.2.1", "v1.10.0", "v1.11.0", "v2.1.5", "v2.1.0"], "y": [171, 107, 39, 22, 13, 13], "type": "bar"}], "layout": {"template": {"data": {"scatter": [{"type": "scatter"}]}}, "xaxis": {"title": {"text": "Agent"}, "tickangle": -45}, "yaxis": {"title": {"text": "Count"}, "tickformat": ","}, "autosize": true, "annotations": [{"font": {"size": 18}, "showarrow": false, "text": "Agent Types", "x": 0.5, "xanchor": "center", "xref": "paper", "y": 1.15, "yanchor": "bottom", "yref": "paper"}, {"font": {"size": 14}, "showarrow": false, "text": "Total Count: 365", "x": 0.5, "xanchor": "center", "xref": "paper", "y": 1.1, "yanchor": "bottom", "yref": "paper"}, {"font": {"size": 10}, "showarrow": false, "text": "Data: week ending 08 Dec 2024. Source: Nebula", "x": 0.99, "xanchor": "right", "xref": "paper", "y": -0.14, "yanchor": "top", "yref": "paper"}]}}
{"data": [{"hovertemplate": "Agent: %{x} <br>Count: %{y:,}<br>Percentage: %{text}<br><extra></extra>", "marker": {"color": ["#4c72b0"]}, "text": ["100.0%"], "textposition": "outside", "x": ["Avail Node"], "y": [419], "type": "bar"}], "layout": {"template": {"data": {"scatter": [{"type": "scatter"}]}}, "xaxis": {"title": {"text": "Agent"}, "tickangle": -20}, "yaxis": {"title": {"text": "Count"}, "tickformat": ","}, "autosize": true, "annotations": [{"font": {"size": 18}, "showarrow": false, "text": "Agent Types", "x": 0.5, "xanchor": "center", "xref": "paper", "y": 1.15, "yanchor": "bottom", "yref": "paper"}, {"font": {"size": 14}, "showarrow": false, "text": "Total Count: 419", "x": 0.5, "xanchor": "center", "xref": "paper", "y": 1.1, "yanchor": "bottom", "yref": "paper"}, {"font": {"size": 10}, "showarrow": false, "text": "Data: week ending 08 Dec 2024. Source: Nebula", "x": 0.99, "xanchor": "right", "xref": "paper", "y": -0.14, "yanchor": "top", "yref": "paper"}]}}
Loading

0 comments on commit 2e180f8

Please sign in to comment.