-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent --cache-dir
mount failing?
#73
Comments
Thank you for the detailed report! This part is the most concerning:
VirtioFS works via builtin VirtioFS server/client in macOS. Host is running a server instance and guests connect to it. So an OS update either on host or guest can affect this behaviour. Which macOS version does you guest use? How frequently this behaviour happens? I tried to reproduce it with two VMs running 14.4.1 and host on 14.4.1 but with no luck. Mounting worked. PS I didn't even think that mounting a single folder into two VMs works 😅 |
Guests used 14.3.1, but I am not sure if it is relevant as on my Mac with the same setup (same OS same tooling version) and exactly the same guest images it works just fine. I just updated our guests to 14.4.1 so will try with that.
On the affected machine it was almost like 4/5 times as our pipelines start with 2 or 3 concurrent jobs.
I have not configured it this way intensionally, just did it somehow and it worked for months 😂 It would actually make sense if it did not, but that might be a shame as I think that sharing it among VMs actually makes a good sense. I am thinking if the executor couldn't rsync the cachedir to unique location (I guess in prepare stage) and in cleanup stage rsync it back, but that would of course mean introducing some locks and sync stuff which is not that simple 🤔 EDIT: What seems to be weird is that we mount shared directory for Mint using |
You might also consider using S3 cache as an alternative to the Some of the GitLab Tart Executor users use this approach too. |
Hi @edigaryev @fkorotkov, Okay I just encountered the same If it is not meant to be shared among concurrent executions, I think we can close this issue. But maybe it might be worth documenting? Also for |
Okay, now it becomes interesting... I am getting resource busy error even without shared cache dir 😬 Running with gitlab-runner 16.10.0 (81ab07f6)
on <...>, system ID: <...>
Preparing the "custom" executor
Using Custom executor...
2024/04/23 00:36:24 Pulling the latest version of ghcr.io/AckeeCZ/ackee-xcode:latest...
2024/04/23 00:36:25 Cloning and configuring a new VM...
2024/04/23 00:36:25 Waiting for the VM to boot and be SSH-able...
2024/04/23 00:36:33 Was able to SSH!
2024/04/23 00:36:33 Mounting cachedir...
mount_virtiofs: failed to mount /Users/admin/cachedir: Resource busy
2024/04/23 00:36:33 Process exited with status 75
ERROR: Job failed: exit status 1
concurrent = 2
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "<...>"
url = "<url>"
id = 834
token = "<token>"
token_obtained_at = <date>
token_expires_at = <date>
executor = "custom"
environment = ["MINT_PATH=/Volumes/My Shared Files/mint", "MINT_LINK_PATH=/Volumes/My Shared Files/mint/bin", "TUIST_CACHE_PATH=/Volumes/My Shared Files/TuistCache"]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.feature_flags]
FF_RESOLVE_FULL_TLS_CHAIN = false
[runners.custom]
config_exec = "gitlab-tart-executor"
config_args = ["config", "--cache-dir", "/Users/user/Library/Caches/GitlabCI/cache/$CUSTOM_ENV_CI_CONCURRENT_PROJECT_ID"]
prepare_exec = "gitlab-tart-executor"
prepare_args = ["prepare", "--concurrency", "2", "--cpu", "auto", "--memory", "auto", "--dir", "mint:/Users/user/.mint", "--dir", "TuistCache:/Users/user/.tuist/Cache"]
run_exec = "gitlab-tart-executor"
run_args = ["run"]
cleanup_exec = "gitlab-tart-executor"
cleanup_args = ["cleanup"] Any ideas before I reinstall our runner machine? |
But you have the As for the
|
You mean in the VM? |
Yes. Does it work without the |
Well not really. ~ % brew list --version | grep tart
gitlab-tart-executor 1.13.0
tart 2.9.0 At first I tried without concurrent = 2
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "<...>"
limit = 2
url = "<...>"
id = 840
token = "<...>"
token_obtained_at = <...>
token_expires_at = <...>
executor = "custom"
environment = ["MINT_PATH=/Users/admin/cachedir/mint", "MINT_LINK_PATH=/Users/admin/cachedir/mint/bin", "TUIST_CACHE_PATH=/Users/admin/cachedir/TuistCache"]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.feature_flags]
FF_RESOLVE_FULL_TLS_CHAIN = true
[runners.custom]
config_exec = "gitlab-tart-executor"
config_args = ["config", "--cache-dir", "/Users/<user>/Gitlab/cache/$CUSTOM_ENV_CI_CONCURRENT_PROJECT_ID"]
prepare_exec = "gitlab-tart-executor"
prepare_args = ["prepare", "--memory", "16384"]
run_exec = "gitlab-tart-executor"
run_args = ["run"]
cleanup_exec = "gitlab-tart-executor"
cleanup_args = ["cleanup"] Then I tried completely without our caching stuff and jobs seem to run. That introduces like extra half an hour to our pipeline. concurrent = 2
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "<...>"
limit = 2
url = "<...>"
id = 840
token = "<...>"
token_obtained_at = <...>
token_expires_at = <...>
executor = "custom"
# environment = ["MINT_PATH=/Users/admin/cachedir/mint", "MINT_LINK_PATH=/Users/admin/cachedir/mint/bin", "TUIST_CACHE_PATH=/Users/admin/cachedir/TuistCache"]
[runners.cache]
MaxUploadedArchiveSize = 0
[runners.feature_flags]
FF_RESOLVE_FULL_TLS_CHAIN = true
[runners.custom]
config_exec = "gitlab-tart-executor"
config_args = ["config", "--cache-dir", "/Users/<user>/Gitlab/cache/$CUSTOM_ENV_CI_CONCURRENT_PROJECT_ID"]
prepare_exec = "gitlab-tart-executor"
prepare_args = ["prepare", "--memory", "16384"]
run_exec = "gitlab-tart-executor"
run_args = ["run"]
cleanup_exec = "gitlab-tart-executor"
cleanup_args = ["cleanup"] |
How these commented-out environment variables are used? Perhaps you could try minimizing them down to a single variable, to find out which one of them causing this error? |
Well those properties should not have any actual effect unless Mint or Tuist are called. And the pipeline never got to executing the job as VM is ready. was not output. Not even sure if that environment is a real trace, might have been just a coincidence that it did work with it commented out...it is hard to let that executor run concurrently when that config causes big performance loss. |
Okay new update, that environment doesn't matter just got the same Resource busy error even when it is commented out. |
I looked deeper into the code and I am now able to reproduce the issue locally, without running using Gitlab CI Just a note, I use separate terminals so I can run blocking calls in parallel without backgrounding it.
tart clone ghcr.io/ackeecz/ackee-xcode:latest test1
tart clone ghcr.io/ackeecz/ackee-xcode:latest test2 Ran both of them with cachedir virtio tag, that Tart uses:
SSH into the first one:
And tried to mount cachedir:
Can you think about something I can do in this case to help to debug/prevent this error? |
Sorry for the spam but I am adding info and ideas when I get them 😃 I managed to mount cachedir to two running VMs when they were not using the same tag. What do you think about adding a job id to the virtiofs tags, that the executor uses? Don't think this is a solution if we are the only users that encounter this issue (but we currently have 2 devices experiencing it), otherwise I think it would be worth it. I currently have a spare Mac, that I can clean install (system, Tart, Gitlab Tart Executor) and check if this issue is present. |
I've tried reproducing this on The
I'm not yet sure if this is related because these tags are unlikely to be system-wide, otherwise with the default |
Well tried the |
Looks like I managed to work around it by downgrading Tart to 2.5.0 and Gitlab Executor to 1.10.0-6f152b4, at least on my Mac 🤔 No particular reason why I chose those versions. But such workarounds are pretty difficult as Hombrew is the only installation option besides manual installation. |
Well I did the same downgrade (Tart 2.5.0, Gitlab Executor 1.10.0, well also Softnet 0.8.2 but we don't use it so I don't think it matters) on our main CI Mac Mini and haven't experienced that issue since, while using the same host system and the same guest images 🤔 |
It's probably the ability to customize the VirtioFS tags that was introduced in Could you check if the latest Tart version works with the GitLab Executor downgraded to |
Yes, I will try this week ;-) |
I just did the update, will leave updated at least for this week (maybe even the next week) and then report back. |
Nice monday guys 🙂 so far I haven't seen this issue, so I think I can say that in those mentioned versions it is not present. Do you suggest any other approach than continuously updating Gitlab Tart Executor, to see which version introduced that for us? |
With this setup I just started to see some other weird failures...
|
This weirdness might be related to cirruslabs/tart#567 As for the initial issue I don't think we changed anything on our side to make it work again. Overall seems like instabilities in VirtioFS integration that we don't have control over. Please consider using an S3 cache. |
@fkorotkov Well okay, let's go with VirtioFS instabilities, still might be worth to prevent it - what do you think about adding the job id to the VirtioFS tag as mentioned earlier probably with some kind of opt-in (argument in config stage/environment value/...)...are you open to such PR? |
As reported in [this comment](#73 (comment)) it seems host VirtioFS server is getting confused when the same tag is used in two VMs simultaneously. Let's make them unique.
Oh! I think I missed that point about unique tags. Given that VirtioFS has a single daemon on the host side it might be the reason. IMO we can just make the tags unique. It should not break current behaviour. Implementing in #76 |
* Unique VirtioFS Tags As reported in [this comment](#73 (comment)) it seems host VirtioFS server is getting confused when the same tag is used in two VMs simultaneously. Let's make them unique. * fixed lint
👋 guys,
I wanted to consult a one thing. It is a few days we are experiencing some weird issues on our CI, mainly it seems to have something to do with mounting the cache dir when concurrent jobs get to mount it in about the same time.
I think the only change that occurred on our host machine could be OS update from 14.3 to 14.4.1, guest images are the same. I updated Gitlab Runner, Gitlab Tart Executor and Tart as an attempt to fix that, but without any success.
Our tart tooling should be currently up to date:
~ % brew list --versions gitlab-runner 16.10.0 gitlab-tart-executor 1.13.0 softnet 0.10.1 tart 2.8.1
Our runner config:
Executor creates different VMs correctly:
In our environment there is nothing Tart related.
The errors we are getting look like this
I think that maybe SSH config might be relevant, but there is nothing interesting:
Do you have any idea what the cause could be? We are not struggling with disk space, the host machine has over 350 GB disk space left so it should not be an issue. I don't think that using
(CUSTOM_ENV_)CI_CONCURRENT_ID
variable in cachedir path is a solution as I do want concurrent jobs to share the cache.Thanks in advance :-)
EDIT: Maybe good to know, that when I was running CI on my personal Macbook with the same setup, just more boilerplate stuff in system as it is not used only for CI (unline our Mac Mini that has this issue), everything ran just fine. When I disable concurrent jobs everything seems to run just fine. Machine restart did not help.
The text was updated successfully, but these errors were encountered: