Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcp: cloudrun: data dir flag fails on existing directory #2869

Open
bschaatsbergen opened this issue Dec 23, 2022 · 24 comments
Open

gcp: cloudrun: data dir flag fails on existing directory #2869

bschaatsbergen opened this issue Dec 23, 2022 · 24 comments
Labels
bug Something isn't working Stale

Comments

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Dec 23, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

I'm trying to set the environment variable ATLANTIS_DATA_DIR to a /mnt/gcs
When atlantis server is invoked, a cache dir and bin dir is supposed to be created, but this fails:

Error: initializing server: unable to create dir "/mnt/gcs/bin": mkdir /mnt/gcs: file exists

Reproduction Steps

  • Create a dir: /mnt/gcs
  • Set an environment variable: ATLANTIS_DATA_DIR to /mnt/gcs
  • Initiate atlantis server

Logs

Error: initializing server: unable to create dir "/mnt/gcs/bin": mkdir /mnt/gcs: file exists

Environment details

Deploying to Cloud Run, using gcsfuse to mount a Cloud Storage Bucket.

Additional Context

Before the docker-entrypoint.sh is ran I run the following:

# Create mount directory for service
mkdir -p $MNT_DIR

echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR 
echo "Mounting completed."

It seems like that Atlantis trips over that the data dir is already created.

@bschaatsbergen bschaatsbergen added the bug Something isn't working label Dec 23, 2022
@nitrocode
Copy link
Member

Does it fail with any directory? Have you tried changing the directory?

How are you deploying Atlantis?

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented Dec 24, 2022

Hi @nitrocode, I haven't tried anything outside of /mnt/gcs yet. I'm deploying it on Cloud Run and using gcsfuse to mount a Cloud Storage Bucket.

Atlantis seems to work fine on Cloud Run but I'm trying to set the data dir to the gcsfuse mount directory for persistent storage.

@nitrocode
Copy link
Member

Interesting deployment! Perhaps the atlantis user in the container does not have access to the directory that it's trying to create a subdirectory in?

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented Dec 24, 2022

Solid point! I'll see if I can mount another directory (trying $HOME/.atlantis now) and I'll dive into the user permissions. Nothing was mentioned in the GCS Fuse documentation regarding the user permissions though.

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented Dec 24, 2022

@nitrocode, I tried mounting in a directory that I have access too, a similar path as atlantis sets by default, but now in a /app directory and set ATLANTIS_DATA_DIR to: /app/home/atlantis/.atlantis

image

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented Dec 24, 2022

Likewise for the default path from Atlantis, without setting the ATLANTIS_DATA_DIR

Here I tried to precreate the directory: /home/atlantis/.atlantis and mount gcsfuse to this directory.

It seems to trip over that the directory already exists

image

Note: the reason that the directory already exists is because I run this:

# Create mount directory for service
mkdir -p $MNT_DIR

echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR 
echo "Mounting completed."

Before the docker-entrypoint.sh is ran (which initiates atlantis server)

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented Dec 24, 2022

Interestingly enough:

Relevant Go doc for MkdirAll:

MkdirAll creates a directory named path, along with any necessary parents, and returns nil, or else returns an error.

...

If path is already a directory, MkdirAll does nothing and returns nil.

@nitrocode nitrocode changed the title Atlantis data dir flag fails on existing directory gcp: cloudrun: data dir flag fails on existing directory Dec 24, 2022
@nitrocode
Copy link
Member

Seems like you're very close with the network storage.

Here are some related links

Related links

cc @ademariag @gaahrdner

@bschaatsbergen
Copy link
Member Author

Closing this as I managed to get around this exact reported issue.. I'll be continuing my journey :)

@nitrocode
Copy link
Member

@bschaatsbergen please post your journey in case others hit the same issue. How did you resolve it?

@kamilkrampa
Copy link

@nitrocode I think I've reached the same point as @bschaatsbergen and now atlantis fails during git cloning. It seems to me that using gcsfuse might be really painful to use(I'm not saying it's not possible we can make it working). I only wonder if it would be possible to keep pending plans in external store as well (probably it could be stored in Redis?). Additional question, is there anything else which needs to be done to make it truly stateless?

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented Dec 28, 2022

Same here @kamilkrampa,

I seem to have a hardtime understanding why the operation isn't permitted though.

running git clone --branch f/gcsfuse-cloudrun --depth=1 --single-branch https://xxxxxxxx/:<redacted>@github.com/xxxxxxxx/xxxxxxxx.git /app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default: Cloning into '/app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default'...
error: chmod on /app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'
: exit status 128

@nitrocode
Copy link
Member

nitrocode commented Dec 28, 2022

@kamilkrampa @bschaatsbergen making atlantis stateless is probably the correct way to go. I do not think there is a way to store the plan in an external storage (database or s3 bucket or similar) but that would be a great feature request.

If the clone isn't working, I'm unsure how that can be done in a stateless way unless we used a server + agent model.

related issue https://stackoverflow.com/questions/74913423/error-chmod-on-config-lock-failed-operation-not-permitted

@kamilkrampa
Copy link

@nitrocode Just to clarify, there is no way to store the plan in an external storage because it's not currently implemented in Atlantis or you think it's not possible to do at all?

@nitrocode
Copy link
Member

Anything is eventually possible but it may require golang changes to the atlantis server.

Maybe you could mount an external file system (like gcsfuse, s3 bucket), create a custom workflow to override the plan to save the planfile to the external system, then apply the planfile.

Or you could customize the container to incorporate some cli command to save the plan somewhere, then download the plan and apply it.

@bschaatsbergen
Copy link
Member Author

If we can make atlantis completely stateless by storing plans in a remote storage solution, e.g. GCS Bucket or S3 Bucket, I would be happy to open a PR for this (which might take a while though).

What are your thoughts @nitrocode ?

@nitrocode
Copy link
Member

That may require a lot of work. Especially because

  • terraform is downloaded to the container
  • git clone is run
  • terraform init is run which downloads modules
  • and the plan file is outputted

There may be other instances where a persistent volume is needed. Making atlantis stateless would need to be done in pieces and probably be gated behind a flag so it doesn't disrupt existing functionality.

@bschaatsbergen
Copy link
Member Author

Right, I think @kamilkrampa and I need to just figure out why gcsfuse is such a pain. And perhaps look into NFS (Google Filestore).

Anyhow, thanks for the thoughts nitro

@jamengual
Copy link
Contributor

jamengual commented Dec 29, 2022 via email

@ademariag
Copy link

Any chance for someone to share some code/examples/attempts done on this? I want to try the filestore way @nitrocode @bschaatsbergen

@bschaatsbergen
Copy link
Member Author

bschaatsbergen commented May 11, 2023

Currently working on cloud run based deployments using rclone.. (i had stopped trying for some time) will update once I get a bit closer

@nitrocode nitrocode reopened this May 11, 2023
@dosubot dosubot bot added the Stale label Oct 2, 2024
@m0ps
Copy link

m0ps commented Nov 1, 2024

#879 (comment)

@nitrocode
Copy link
Member

nitrocode commented Nov 5, 2024

It should be fairly simple to add a new /health route or even override the /healthz check endpoint using an argument

s.Router.HandleFunc("/healthz", s.Healthz).Methods("GET")

@m0ps
Copy link

m0ps commented Nov 5, 2024

I think that ideally it would be introduce an env var, which will allow to override health endpoint (with fallback to default /healthz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

6 participants