-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: migrate from file list #501
Conversation
8de6644
to
4671016
Compare
4671016
to
9241745
Compare
9241745
to
e1bd60b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good addition, some UX changes proposed but nothing significant
one/src/migrations.rs
Outdated
@@ -53,6 +55,23 @@ pub struct FromIpfsOpts { | |||
|
|||
#[command(flatten)] | |||
log_opts: LogOpts, | |||
|
|||
/// Path of list of files to migrate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document the contents of this file? Is it newline delimited paths? Are the paths relative to the input dir? Or absolute?
one/src/migrations.rs
Outdated
let file = tokio::fs::File::open(input_file_list_path).await?; | ||
let mut lines = tokio::io::BufReader::new(file).lines(); | ||
let mut i = 0; | ||
while let Some(line) = lines.next_line().await? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio::io::Lines did not have the StreamExt functions. tokio_stream::wrappers::LinesStream might let us do it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokio::io::Lines did not have the StreamExt functions. tokio_stream::wrappers::LinesStream might let us do it that way.
Oh wait, lol, that's what I ended up using.
one/src/migrations.rs
Outdated
let mut dirs = Vec::new(); | ||
dirs.push(self.input_ipfs_path.clone()); | ||
|
||
let offset = self.file_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should support limit/offset for the read dir case. From the docs:
The order in which this iterator returns entries is platform and filesystem dependent.
We cannot guarantee that limit/offset will be meaningful. And in practice the order in which blocks where traversed appeared to be random. Meaning it was not in lexicographical order on my local filesystem nor in various production filesystems we used.
I propose if limit/offset are not None and there is no input file list then we error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will likely be nearly stable on any one disk as the tree in the filesystem try's to not change much. If the user wants a guaranty that they cover the entire ipfs repo then they should use make a file list file.
I wander if we should make a CLI argument for making a sorted file list file and a CLI for subtracting one from another
or maybe in the make file
# List files in directory at migration_1 and sort them alphabetically
ls -l ipfs/blocks | sort > migration_1.txt
CERAMIC_ONE_INPUT_FILE_LIST_PATH=migration_1.txt \
ceramic-one migrations from-ipfs
# List files in directory at migration_2 and sort them alphabetically
ls -l ipfs/blocks | sort > migration_2.txt
# Find files in migration_2 that are not in migration_1
comm -23 migration_2.txt migration_1.txt > delta
CERAMIC_ONE_INPUT_FILE_LIST_PATH=delta.txt \
ceramic-one migrations from-ipfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the feature? I'd rather not add an unused feature that has a potential to break in mysterious ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it out for now and added the check.
e1bd60b
to
2a776a8
Compare
2a776a8
to
a488ecc
Compare
ffc074f
to
e677112
Compare
This PR adds support for migrating IPFS blocks from an input file list. It also introduces both an
offset
and alimit
that can be used to paginate migrations, whether from a directory or from a file list.This is needed for being able to apply incremental ZFS snapshots without having to re-migrate the entire blockstore. The list of added blocks can be extracted from the incremental snapshot and input to the migration so that only new blocks are migrated. This will dramatically reduce the amount of time needed for migrating new blocks added after the first snapshot was taken.
Note that paginating over a directory might have unpredictable results if it is receiving live updates.