Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mc mirror takes long time when target bucket contains a lot of files #4873

Open
archsh opened this issue Mar 14, 2024 · 9 comments
Open

mc mirror takes long time when target bucket contains a lot of files #4873

archsh opened this issue Mar 14, 2024 · 9 comments

Comments

@archsh
Copy link

archsh commented Mar 14, 2024

Expected behavior

Using mc mirror to mirror local folder to a target bucket, it should be exit soon if local folder is empty or few files.

Actual behavior

It took a long time to start transfer files if the target bucket contains a lot of objects.

Steps to reproduce the behavior

mc mirror --json ./localfolder target/bucket/path

mc --version

  • (paste output of mc --version)
    mc version RELEASE.2024-01-05T05-04-32Z (commit-id=59eca9fea8984adec1e8e7a1c95d0ea23107ceff)

System information

CentOS release 6.9 (Final) x64

@zveinn
Copy link
Contributor

zveinn commented Mar 15, 2024

Yes, mc mirror does a full source to target comparison when it's ran. This is due to the --remove flag. However, it might be possible to speed up the mirror given that the --remove flag is not present. I will look into this in the coming weeks, no promises on delivery date though since my schedule is rather full.

@zveinn zveinn self-assigned this Mar 15, 2024
@PrisedRabbit
Copy link

PrisedRabbit commented May 24, 2024

Same here. It takes a lot of time to mirror folder with 20 000+ files ( using aws cli solved the problem )

@zveinn
Copy link
Contributor

zveinn commented Oct 4, 2024

I will be looking into this next week, if there is any additional information you want to share, then now is the time :)

@karsapeng
Copy link

karsapeng commented Oct 16, 2024

helloHello, I wrote a script to use “mc mirror --remove --overwrite --limit-download $LIMIT_DOWNLOAD --limit-upload $LIMIT_UPLOAD minio-master minio-slave” to synchronize the data of the master node. Even if the data of the two minio nodes are consistent, if the master node is shut down and restarted due to a fault, I will change the direction of the synchronization and transfer the data to the slave node. Then the master will run “mc mirror --remove --overwrite --limit-download $LIMIT_DOWNLOAD --limit-upload $LIMIT_UPLOAD minio-slave minio-master”for synchronization. However, even if the data of the two minio nodes is consistent, minio-master will perform a full synchronization, which will take up a lot of io and cause the server to crash. My mc version is mc:RELEASE.2024-08-26T10-49-58Z

@karsapeng
Copy link

Hello, has this problem been solved?

@zveinn
Copy link
Contributor

zveinn commented Oct 22, 2024

Hey, I have not had time to look into this yet. Just as I was about to start, something else came up. I will see if someone else can take it off my hands.

@karsapeng

This comment was marked as duplicate.

@klauspost
Copy link
Contributor

There is no reason to add more clutter to the issue. This is a low priority item, and will be done when there is bandwidth for it.

@zveinn zveinn removed their assignment Nov 5, 2024
@fherenius
Copy link

One thing to add, I’m using mc mirror to move a cluster. The target cluster already has several million objects in it.

mc mirror will do a full synchronization of the target, even for buckets that don’t exist on the source. (Visible when running with —debug). It would make it a lot faster if it would filter buckets on name first, instead of content.

Or maybe, —exclude-bucket should also count for the target synchronization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants