Direct file transfer is slow #6599

Stebalien · 2019-08-21T21:32:16Z

This is a meta-issue explaining why direct file transfer may be slower than tools like SCP and the related issues.

This issue assumes the following setup:

Two directly connected nodes.
One node has a large file or directory tree.
The other node is trying to pin that large file or directory tree.

Explicitly, this issue is not about finding peers that have content (#6383).

There may or may not be other connected nodes that have the content.

Background

IPFS chunks files/directories into smaller pieces and links them together into a merkledag (a generalized merkle-tree). We call these pieces "blocks".

To fetch a file or a directory tree, IPFS currently uses a protocol called bitswap to ask for each block in the file or directory tree. Specifically, IPFS (currently) uses bitswap to tell a subset of our connected peers when we "want" a block. The set of blocks we're looking for is called our "wantlist". If the peer has the block, they send it to us. At the moment, peers that don't have the block stay silent.

Dag Traversal

We have two issues with dag traversal that may limit throughput.

`ipfs get`

When calling ipfs get to fetch a file, go-ipfs will traverse the merkledag in-order (approximately). We do a bit of pre-fetching but our prefetching algorithm isn't optimal: ipfs/go-ipld-format#53

`ipfs pin`

When calling ipfs pin add, traverse the graph in parallel with 32 worker goroutines, each fetching one block at a time. Unfortunately, this means we can only request 32 blocks at a time when pinning a file. Given 256KiB blocks, that's a maximum throughput of 8MiB/RTT where RTT is the latency between the two nodes. Assuming an RTT of ~100ms, that's a maximum throughput of 80MiB/s (or ~670mbps). In other words, it won't max out a gigabit connection.

Unfortunately, this is also an optimistic upper-bound. If we're downloading a large directory tree of small files, these blocks will be much smaller (e.g., 1KiB). In that case, our throughput will be more like 2-3mbps.

Bitswap Issues

Bitswap works on a block-by-block basis. That is, a layer on-top-of bitswap will ask bitswap for a block, deserialize the block, determine if that block is some kind of intermediate block, and then ask bitswap for that block's children (if any).

Round Trips

One unfortunate side effect is that traversing a path (/ipfs/QmFoobar/a/b/c/d) requires one round trip per path component. The path resolution logic will ask bitswap for the block identified by the CID (content identifier) QmFoobar, deserialize the block into a directory, lookup "a" within that directory, find the associated content identifier (let's say QmA), and then ask bitswap for QmA. We then repeat this process for b, c, and d.

A more efficient protocol would simply send the entire path to the remote peer and ask for the entire file all at once. We are currently working on such a protocol (https://github.com/ipfs/go-graphsync) but it may be a while until it's feature complete enough to be included in go-ipfs.

Duplicate Blocks

STATUS: We have now fixed the protocol limitations here. We can now ask if a peer has a block and ask peers to tell us immediately if they don't have a block. Now, when we try to download a block, we ask:

A single peer for the block, and ask them to tell us if they don't have it.
Other peers if they have the block.

This means:

If the peer from step 1 doesn't have the block, we can look elsewhere in 1RTT instead of waiting for some timeout.
Because we ask other peers if they have the block in parallel, the second request for a block should almost always succeed.
Because we can ask peers if they have a block, we can avoid downloading the block multiple times.

Previous Text:

~~When fetching blocks from more than one peer, bitswap will occasionally fetch the same block from multiple peers in case one doesn't respond. We do this because:~~

Bitswap doesn't currently have any form of acknowledgement. That is, if peer A tells peer B that it "wants" a block, peer B will only respond (by sending the block) if and when it gets the block. It won't tell peer A that it doesn't have the block. This means that we can't currently tell the difference between a peer that's taking a while to respond and a peer that doesn't have the block and will never respond.
When asking a peer for a block, we have no way to know if the peer actually has it. We can guess based on whether or not the peer had a related block but that peer may only have part of the directory tree/file that we're trying to download.

~~We're currently considering augmenting the protocol with additional messages to solve both of these issues.~~

Finally, the current logic that decides how many peer from which we should request a block is will over-optimize for asking multiple peers (ipfs/go-bitswap#120). If multiple peers have the block we're looking for, we'll have a 50% duplicate block overhead in the best case scenario.

Wantlist Size Limitations

When asking a single peer for a set of blocks, we currently only ask for at most 32 at a time. We do this for several reasons:

We don't want to force the peer to keep a large "wantlist" in memory.
We want to be able to efficiently parallelize retrieving data from multiple peers without getting too many duplicate blocks. If we send our entire wantlist to a single peer, we'd have to cancel part of that wantlist before sending it to a new, faster peer. We can do this, but we don't have any way to know when the remote peer received the cancel and what blocks may currently be in transit.

We're considering adding a protocol extension (insert link here when we finish the writeup) to better determine the correct wantlist size per peer.

Worker Limitations

Bitswap currently sends out 6 blocks in parallel using 6 workers to avoid consuming too much memory.

We should probably bump this up, possibly scaling with available memory (ipfs/boxo#116).

Libp2p

Another issue is overhead in libp2p itself.

Small Packets

At the moment, go-libp2p doesn't do a good job of combining small writes into larger TCP/IP packets. This can significantly limit the available network throughput.

We have fixes for this issue but had to temporarily revert them as they exposed a bug in secio (libp2p/go-libp2p#644) that would have lead to network interoperability issues. Once the network has upgraded, we'll turn this feature back on.

STATUS: We've tried to revive these fixes but ran into other performance issues in go's scheduler.

Slow Stream Multiplexers

The default go-libp2p stream multiplexer, yamux, is a bit slow. This comes from the fact that it has a fixed send window per stream so bandwidth is pretty limited on a per-stream basis.

However, due to how bitswap works, I'm doubtful that this is an issue:

The default window size is currently 16MiB.
Each message is at most 2MiB.

This means the fixed window shouldn't matter when sending blocks.

It's also unlikely that this preventing us from sending wantlist entries fast enough. On connections with a 100ms latency, we should be able to request ~1,677,720 blocks per second. At 1KiB per block (low bound), that's ~1GiB per second (8x a gigabit link).

UPDATE 1: Yamux is an issue, at least on a 10gbps link.

UPDATE 2:

We have fixed some issues in how windows are managed that should significantly improve the issue.
We have fixed some bugs that could cause yamux connections to die when under heavy load.

The text was updated successfully, but these errors were encountered:

aschmahmann · 2019-08-21T22:38:48Z

as they exposed a bug in secio (#6599)

@Stebalien the link in there is for this issue, not the correct issue.

All
Due to the way bitswap works, I'm doubtful that this is slowing bitswap down at all. Specifically,

Looks like Github cut you off there 😦

Stebalien · 2019-08-21T22:44:25Z

(thanks)

bam80 · 2021-02-27T21:31:29Z

Just curious what is the situation for now?

Stebalien · 2021-03-12T16:31:46Z

We have:

Improved the stream multiplexers.
Updated the bitswap protocol to allow asking for who has the block. This means:
- We should generally get blocks in one round-trip.
- We should get very little duplicate data (we usually ask for blocks from exactly one peer).
- Worst case, we should get blocks in 2 round-trips (as long as we're connected to a peer with the block).

However, there's still a lot of room for improvement.

Stebalien · 2021-03-12T16:40:58Z

I've updated the issue.

Stebalien added kind/bug A bug in existing code (including security flaws) topic/meta Topic meta labels Aug 21, 2019

Stebalien mentioned this issue Aug 21, 2019

ipfs slow file transfer #6594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct file transfer is slow #6599

Direct file transfer is slow #6599

Stebalien commented Aug 21, 2019 •

edited

Loading

aschmahmann commented Aug 21, 2019

Stebalien commented Aug 21, 2019

bam80 commented Feb 27, 2021

Stebalien commented Mar 12, 2021

Stebalien commented Mar 12, 2021

Direct file transfer is slow #6599

Direct file transfer is slow #6599

Comments

Stebalien commented Aug 21, 2019 • edited Loading

Background

Dag Traversal

ipfs get

ipfs pin

Bitswap Issues

Round Trips

Duplicate Blocks

Wantlist Size Limitations

Worker Limitations

Libp2p

Small Packets

Slow Stream Multiplexers

aschmahmann commented Aug 21, 2019

Stebalien commented Aug 21, 2019

bam80 commented Feb 27, 2021

Stebalien commented Mar 12, 2021

Stebalien commented Mar 12, 2021

Stebalien commented Aug 21, 2019 •

edited

Loading

`ipfs get`

`ipfs pin`