-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verdi storage backup
#6069
verdi storage backup
#6069
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6069 +/- ##
==========================================
+ Coverage 77.81% 78.16% +0.35%
==========================================
Files 550 558 +8
Lines 40720 42009 +1289
==========================================
+ Hits 31684 32833 +1149
- Misses 9036 9176 +140 ☔ View full report in Codecov by Sentry. |
Thanks a lot @eimrek . It would be super useful indeed if we can have an easy way to backup a profiles data storage (and restore it), so appreciate you looking into this.
If it is necessary to have a standalone bash script, I think it would make most sense to put it in the
Not sure. I don't think it is actually being used at all in the current state of this PR, is it? Is your idea that this backup should be able to be run even when the profile is in use? Or should the user make sure nothing is using the profile during the backup process. I would personally definitely favor the latter option. Just as the Then a few comments about the current design:
My first gut-instinct would be to implement something like the following: # Add abstract method to the base storage class
class StorageBackend:
@abstractmethod
def backup(self, **kwargs):
"""Create a backup of the storage contents."""
# which is then implemented in subclasses such as the `psql_dos` backend
class PsqlDosBackend(StorageBackend):
def backup(self, filepath_backup: pathlib.Path):
"""Create a backup of the postgres database and disk-objectstore to the provided path."""
# Here you can now use the Python API to get access to connection details of the postgres
# database and location of the disk-object store. Simplest is then to use `subprocess` to
# call `pg_dump` and `rsync`.
cfg = self._profile.storage_config
env = os.environ.copy()
env['PGPASSWORD'] = cfg['database_password']
try:
subprocess.run(
['pg_dump', '-h', cfg['database_hostname'], '-f', str(filepath_backup / 'db.psql'), ...],
check=True,
env=env
)
except subprocess.CalledProcessError:
# Handle it
filepath_container = get_filepath_container(self._profile)
try:
subprocess.run(['rsync', '-azh', str(filepath_container), str(filepath_backup)], check=True])
except subprocess.CalledProcessError:
# Handle it The advantage of this approach would be:
Do you see any fundamental problems with such an approach? Is there anything critical in the bash script you wrote that could not be implemented in Python on the |
Thanks a lot @sphuber for the comments. What you say makes sense. Initially the idea was that a bash script is preferential because this allows the users to see the logic and potentially modify it to their use case, but probably this is not needed very often. We discussed this and probably implementing this directly in python, as you suggest, is better. I'll try to adapt. Regarding
The idea is that if the users is adding nodes at the same time as the backup is running, this is the safest order to back up the DOS. (1. loose files; 2. sqlite db; 3. packed files). |
I see. I would personally just opt to say that we don't support using the profile during backup. We do the same for the maintenance operation. Now I can see the argument that technically there, at least for the full maintenance, concurrency is impossible, and for backing up it is possible in the way you describe, but it does add additional complexity and potential sources of problems. Would it not be easier to just say to users that for backing up the storage the profile should not be used? I think this is reasonable, or do you see a fundamental problem with this where people might need/want to backup so often that stopping using the profile is too disruptive? And if it is necessary to be able to do a live backup, maybe this logic can be implemented in the |
@sphuber thanks for the comments - I agree on the suggestions on top, at the beginning the idea is that it was complex to make it general enough and it was better to generate a script that people could change. But in the end, it's probably possible so we should just have a command in verdi that does everything. This also indeed would allow to have the functionality to avoid concurrent maintenance operations by just using the existing functionality in AiiDA. I agree doing it on the backend. However, I think it's essential that people will be able to backup while they are using AiiDA. And this works fast for incremental backups. If we ask people to stop their work to backup, they will essentially never do it. It should be something that they can put in a cron job and forget about it. We could do some thing in disk-objectstore, and I see the points, but this will not allow to easily reuse the logic to lock the profile etc, and anyway we want to backup both the DB. Anyway in AiiDA it will be specific to the backend. I suggest to go ahead at least to converge to a version we are happy with. We can always move this to disk-objecstore once we see the final implementation, either before merging this PR, or even later in the future (aiidateam/disk-objectstore#20) |
Fair enough. If you think we can guarantee to do this safely, then I'm fine with doing it that way.
I don't understand how this follows. If you implement the functionality in the disk-objectstore to backup a container, either through Python API and/or its CLI, the AiiDA storage backend can just call this. So from AiiDA's perspective, it will be behind the profile locking functionality. Of course when users would go straight to the disk-objectstore API they could circumvent this, but in the disk-objectstore the concept of a profile doesn't exist anyway so there is nothing to lock. So if we are anyway writing the logic to create a container backup, then I don't see a reason not to directly put it in the |
I had some time to work on this. Still keeping this a draft, as there is some reorganization left to be done. The two main methods added (both in
Script 2) probably shouldn't be in Another option is to instead exponse script 1) via @sphuber, if you have a moment, any suggestions welcome. |
Reorganized a bit: there are some command line commands that i'm calling (including One tiny usablity doubt for the current implementation is that without specifying Pinging who were interested (a quick test would be useful): @unkcpz @superstar54 @edan-bainglass |
@eimrek I did a test in AiiDAlab docker container. It works. I share the command here:
When I use the remote folder, it requires me to input the password more than ten times. Is it possible to reduce this? or maybe print out a comment before asking password, otherwise, the user may think the password is wrong. |
From an aiidalab container running on my Windows machine, I get the following:
Not sure at the moment what might be the issue here |
Okay, one immediate issue is that the path needs to be quote-escaped, so
Will keep at it later :) |
thanks a lot for the checks @edan-bainglass. Do i understand correctly that you want to back up aiida from a linux docker container to a windows host? I am not sure if this will work, but it would be very useful to try. if a python subprocess command fails, you can try to run it manually and see if a modification would make it work, e.g. the last one would be
|
Yep. That's exactly what I'm doing. Will have to resume tomorrow, but I think it'll work once enough quotes have been escaped 😅 Will report back soon 👍 |
This
yields
Some articles online suggest rsync is required also on Windows end. One person got it to work by installing rsync via cygwin on the Windows side. If this works, I would need to discuss with Empa IT installing this on the Empa workstation. Another solution is to Will local test to see if I can get it to work. Not sure which, if any, would be applicable at Empa though 🤔 |
Installed
Any thoughts on |
thanks @edan-bainglass. Sorry my previous comment was a bit misleading. I'm dumping the postgresql db to a temporary folder in python and then rsyncing that. I didn't realize that this won't be available when you run it manually. You can perhaps run tests with other files. In the end... the python code just uses subprocess to call 1) bash commands on the remote server, for checking input validity and making folders and such; 2) calling rsync. If both of these work manually, we can adapt the python subprocess commands accordingly. |
@eimrek "...the python code just uses subprocess to..." - regarding 1, you say it uses |
Tried this ☝️ Steps:
Results:
@eimrek not sure what the issue was in your case that raised symlink issues 🤷🏻♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the PR on the disk-objectstore is merged, can you please update this PR @eimrek . I have already reviewed it. Besides those comments, you should be wary that all source files have moved to src/aiida
. When you rebase, git should be smart enough to automatically move things, but new files will have to be moved manually
Thanks @sphuber for the review. I accidentally did the rebase before implementing your changes, so i couldn't directly commit via github, but i implemented everything manually. The functionality seems to work. Currently there are no tests for this is aiida-core, but at least some basic ones should be added, right? |
Thanks @eimrek . I agree, we should definitely have some tests. I would add a test for the |
One thing that might be good is to check if the profile names match for a backup and any existing backups in the destination. |
As you say, this is merely convention, and the current convention in |
I had a look at the implementation and the documentation, all good! |
Just to prevent this from being merged now: as @mbercx reported to me, there is a problem with the documentation. Namely, it currently assumes that the |
A few notes after testing for a small and large database, followed up by a discussion with @eimrek: A. Large DatabasesTested for a larger database with ~1 million nodes and 100GB loose packed files:
So this all works very well! Great work 👍 B. VerbosityI initially didn't like the amount of output, since the C. Restoring the profileThe main challenge at the moment is restoring the profile after backing it up. Already discussed the current instructions with @eimrek and he'll update a bit. It's clear that manually adapting the Let's say I've done two backups for a certain profile, and so my backup folder looks like this:
I've done this backup with the (very nice) SSH feature, and now want to restore after I've e.g. messed up my postgreSQL database/install or my hard drive got corrupted. I've executed all the required setup steps up to the point that I have AiiDA and services installed but no profile. Manual restoreTo manually restore the backed up profile (as described here), I'd have to:
I also tried creating a new profile and somehow use the backed up data, but unfortunately this fails no matter if you first create the database or first create the profile (well, it throws a bunch of errors, but it does seem - at first glance - to work afterwards, at least when you create a profile based on an already existing database/repository). So adapting the Here, having the Using
|
Just had a discussion with @mbercx following his comment to store the full TLDR: We think it is better to have This solution is what we originally implemented. After discussion Kristian and I decided to remove it since it wasn't clear what additional information the storage config would provide. Storing 1 copy per backup iteration wouldn't make sense, because if the storage config changed at some point, even when restoring the data, you most likely wouldn't want to revert to the old config. The alternative was to store a single copy in the root directory. This would however be overwritten each time, losing the config options of previous iterations. So the only additional valuable information that would be provided by backing up the entire That being said though, the config options are not really coupled to the state of the storage and are essentially independent. So it is not really important to keep a copy of the state of the config options to each snapshot of the storage. Finally, when we think of the various restore scenarios, there are essentially two:
@eimrek Marnik told me that you had already discussed and you were also on board with this solution. If so, could you please essentially revert to our original solution of just writing the |
I did a test on restoring the DB and repository backup. Thing all went fine. Option 1: Repeat some information and put it to the https://aiida--6069.org.readthedocs.build/projects/aiida-core/en/6069/howto/installation.html#restoring-data-from-a-backup Any opinion on this?
Here are other minor mistakes I made but takes some effort for me to figure it out. Would be great to see it somewhere to avoid the problem.
But anyhow, the backup works perfectly for me and also green light from my side. Cheers @eimrek |
Thanks for reverting the change to the backup config file @eimrek . I had a look and it looks good. I did have another think about the I think I would prefer this solution. Let me know what you think. |
Thanks @sphuber, yep, that looks good! Although EDIT: actually, this could be a potential problem: What if the user is doing regular backups for a profile to a folder with a specific version of aiida that implements the backup mechanism of the backend. And for whatever reason the user has to downgrade aiida version to a state where the specific backend does not implement the backup method. I think in this case, all of the backups would be deleted, no? |
Quite the edge-case, but fair enough, I think that would indeed be problematic. What about if we add a check making sure the directory is empty (excluding the config.json)? |
@sphuber yep, that should probably be fine. |
@eimrek I noticed that in |
@sphuber The reasoning was that it's good to have the folder "reserved" for that single profile before the initial backup starts (that can take a long time). To prevent accidentally backing up a different profile in the same folder. Additionally, if the backup should be interrupted for whatever reason (and leaves a live-backup folder), it's good to know what profile it was and potentially resume the backup. |
@sphuber. although, we could generate the |
That wouldn't help unfortunately I think, because then the
I don't think this would help right? Is there support for resuming an interrupted backup?
This is a fair point. Again very much an edge-case, but could happen. Guess I will have to just perform the logic that excludes |
i think there implicitly is support for this. If a backup is interrupted, there is a partially completed
Agreed. I was just mentioning it as that would potentially make the code a bit simpler. |
The `is_backup_implemented` property was added to the storage interface such that the base class can call this before calling the actual storage backend. The reason is that the creation of the output folder has to be done before `_backup_backend` is called, but if that then raises `NotImplementedError` the output directory would have been created for nothing and remain empty. The current solution adds an additional method to the abstract interface called `is_backup_implemented`. Storage plugins should override this to return `True` in addition to implementing `_backup_backend`. This feels a bit redundant and can easily be forgotten by developers. Here as an alternative, the `is_backup_implemented` is removed and the base class `backend` will simply create the output folder and then delete it if the plugin raises `NotImplementedError`.
…s-backend-implemented Fix/backup script alternative is backend implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I am pulling the trigger on this one. We can still test this before the release. Thanks a lot for all the work @eimrek !
This PR adds functionality to back up an AiiDA profile.