Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migrate from file list #501

Merged
merged 4 commits into from
Oct 9, 2024
Merged

feat: migrate from file list #501

merged 4 commits into from
Oct 9, 2024

Conversation

smrz2001
Copy link
Collaborator

This PR adds support for migrating IPFS blocks from an input file list. It also introduces both an offset and a limit that can be used to paginate migrations, whether from a directory or from a file list.

This is needed for being able to apply incremental ZFS snapshots without having to re-migrate the entire blockstore. The list of added blocks can be extracted from the incremental snapshot and input to the migration so that only new blocks are migrated. This will dramatically reduce the amount of time needed for migrating new blocks added after the first snapshot was taken.

Note that paginating over a directory might have unpredictable results if it is receiving live updates.

@smrz2001 smrz2001 requested a review from nathanielc August 22, 2024 20:07
@smrz2001 smrz2001 requested review from dav1do, a team and stbrody as code owners August 22, 2024 20:07
@smrz2001 smrz2001 requested review from stephhuynh18 and removed request for a team August 22, 2024 20:07
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 August 22, 2024 20:17 — with GitHub Actions Inactive
@smrz2001 smrz2001 force-pushed the feat/migrate-from-list branch from 8de6644 to 4671016 Compare August 22, 2024 21:04
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 August 22, 2024 21:14 — with GitHub Actions Inactive
@smrz2001 smrz2001 force-pushed the feat/migrate-from-list branch from 4671016 to 9241745 Compare August 22, 2024 21:53
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 August 22, 2024 22:03 — with GitHub Actions Inactive
@smrz2001 smrz2001 force-pushed the feat/migrate-from-list branch from 9241745 to e1bd60b Compare August 22, 2024 23:52
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 August 23, 2024 00:02 — with GitHub Actions Inactive
Copy link
Collaborator

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition, some UX changes proposed but nothing significant

@@ -53,6 +55,23 @@ pub struct FromIpfsOpts {

#[command(flatten)]
log_opts: LogOpts,

/// Path of list of files to migrate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the contents of this file? Is it newline delimited paths? Are the paths relative to the input dir? Or absolute?

let file = tokio::fs::File::open(input_file_list_path).await?;
let mut lines = tokio::io::BufReader::new(file).lines();
let mut i = 0;
while let Some(line) = lines.next_line().await? {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are skip and take methods on streams using StreamExt. Which should simplify this logic to let lines = lines.skip(offset).take(limit); and the simply have the while loop without any bounds checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio::io::Lines did not have the StreamExt functions. tokio_stream::wrappers::LinesStream might let us do it that way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio::io::Lines did not have the StreamExt functions. tokio_stream::wrappers::LinesStream might let us do it that way.

Oh wait, lol, that's what I ended up using.

let mut dirs = Vec::new();
dirs.push(self.input_ipfs_path.clone());

let offset = self.file_offset;
Copy link
Collaborator

@nathanielc nathanielc Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support limit/offset for the read dir case. From the docs:

The order in which this iterator returns entries is platform and filesystem dependent.

We cannot guarantee that limit/offset will be meaningful. And in practice the order in which blocks where traversed appeared to be random. Meaning it was not in lexicographical order on my local filesystem nor in various production filesystems we used.

I propose if limit/offset are not None and there is no input file list then we error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will likely be nearly stable on any one disk as the tree in the filesystem try's to not change much. If the user wants a guaranty that they cover the entire ipfs repo then they should use make a file list file.
I wander if we should make a CLI argument for making a sorted file list file and a CLI for subtracting one from another

or maybe in the make file

# List files in directory at migration_1 and sort them alphabetically
ls -l ipfs/blocks | sort > migration_1.txt
 
CERAMIC_ONE_INPUT_FILE_LIST_PATH=migration_1.txt \
ceramic-one migrations from-ipfs 

# List files in directory at migration_2 and sort them alphabetically
ls -l ipfs/blocks | sort > migration_2.txt

# Find files in migration_2 that are not in migration_1
comm -23  migration_2.txt migration_1.txt > delta

CERAMIC_ONE_INPUT_FILE_LIST_PATH=delta.txt \
ceramic-one migrations from-ipfs 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the feature? I'd rather not add an unused feature that has a potential to break in mysterious ways.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it out for now and added the check.

@smrz2001 smrz2001 force-pushed the feat/migrate-from-list branch from e1bd60b to 2a776a8 Compare August 27, 2024 22:15
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 August 27, 2024 22:25 — with GitHub Actions Inactive
@smrz2001 smrz2001 force-pushed the feat/migrate-from-list branch from 2a776a8 to a488ecc Compare September 21, 2024 03:06
@smrz2001 smrz2001 requested a review from nathanielc September 21, 2024 03:06
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 September 21, 2024 03:24 — with GitHub Actions Inactive
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 September 21, 2024 12:28 — with GitHub Actions Inactive
@smrz2001 smrz2001 force-pushed the feat/migrate-from-list branch from ffc074f to e677112 Compare October 9, 2024 18:49
@smrz2001 smrz2001 temporarily deployed to github-tests-2024 October 9, 2024 19:12 — with GitHub Actions Inactive
@smrz2001 smrz2001 added this pull request to the merge queue Oct 9, 2024
Merged via the queue into main with commit a92a809 Oct 9, 2024
5 checks passed
@smrz2001 smrz2001 deleted the feat/migrate-from-list branch October 9, 2024 20:23
@smrz2001 smrz2001 mentioned this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants