Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👌 CLI: Computer/Code export output_file optional #6486

Merged

Conversation

GeigerJ2
Copy link
Contributor

@GeigerJ2 GeigerJ2 commented Jun 24, 2024

As the title states: The output_file argument for verdi computer export [setup,config] and verdi computer export is made optional by generating a default filename based on the respective label, if not specified.

I think this should be what people would probably call their files most of the time anyway.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeigerJ2 fully agree with the feature, just some comments on the implementation

src/aiida/cmdline/commands/cmd_code.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_code.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_code.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_code.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
@sphuber
Copy link
Contributor

sphuber commented Jul 1, 2024

@GeigerJ2 I would like to release today please and this is the last open PR. Would it be possible to wrap this up a.s.a.p.? Or shall we punt it?

@GeigerJ2
Copy link
Contributor Author

GeigerJ2 commented Jul 1, 2024

We're on a group hike today, so unfortunately cannot work on it right now. I could wrap it up tomorrow morning first thing. Otherwise, if it cannot wait, feel free to release without, don't think it's that crucial.

@GeigerJ2 GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch 3 times, most recently from 4f7a280 to db62596 Compare July 2, 2024 13:05
@GeigerJ2
Copy link
Contributor Author

GeigerJ2 commented Jul 2, 2024

Alright, so I think this is ready for a second round of reviews. I should have resolved your original comments, @sphuber. Though, there's no rush as the new version is already released. For all three commands (verdi code export, verdi computer export [setup,config]), I also added the overwrite option, and the commands fail if the files already exist and overwrite is False. In addition, I defined the sort option as a general option in cmdline/params/options/main.py as it's now being used in multiple places.

After taking a closer look at the tests again, I felt like the parametrization for the sorting was actually a bit unnecessary, so I removed it in the last commit (db6259611). It didn't really matter for the logic in the test function bodies, and just duplicated the number of tests being run in the (already very extensive) test suite. Therefore, I removed the parametrization, and just added an individual test for the sorting to each function. Similarly, for the overwrite option, I'm testing that the command fails when overwrite is False, and that the change is reflected in the output config file after modifying a Code/Computer instance and writing with overwrite=True (thanks to @mbercx for the discussion on testing). Also pinging @agoscinski as the implementations of the Computer export commands were written by him.

@GeigerJ2 GeigerJ2 requested a review from agoscinski July 2, 2024 13:18
Copy link

codecov bot commented Jul 2, 2024

Codecov Report

Attention: Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.78%. Comparing base (ef60b66) to head (491ab3e).
Report is 116 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/cmdline/commands/cmd_computer.py 94.74% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6486      +/-   ##
==========================================
+ Coverage   77.51%   77.78%   +0.28%     
==========================================
  Files         560      562       +2     
  Lines       41444    41885     +441     
==========================================
+ Hits        32120    32578     +458     
+ Misses       9324     9307      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeigerJ2 . Fine with the implementation, just some minor implementation details. I am not sure I like the approach of the tests though where you are using a single test for all possible combinations. This tends to be very fragile as later tests in the function implicitly rely on some preconditions that were created (or not) by previous asserts and calls to the command. I think it is better to have separate test functions for the various options that are well separated. You are then forced to set up the necessary pre-conditions (for example pre-creating the output file in the case you are testing the user specifying a file that already exists) and it becomes also very clear when reading.

Also, I think using parametrization and using file_regression is a good thing. It makes for lean and easy to read tests. Now you are manually implementing the parametrization and reading/comparing outputs that is much more fragile and difficult to follow.

src/aiida/cmdline/commands/cmd_code.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_code.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_code.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_computer.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_code.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_computer.py Outdated Show resolved Hide resolved
src/aiida/cmdline/commands/cmd_computer.py Outdated Show resolved Hide resolved
@GeigerJ2
Copy link
Contributor Author

GeigerJ2 commented Jul 3, 2024

Thanks a lot for the review, @sphuber! Already a top-level answer, will resolve the issues soon:

I am not sure I like the approach of the tests though where you are using a single test for all possible combinations. This tends to be very fragile as later tests in the function implicitly rely on some preconditions that were created (or not) by previous asserts and calls to the command. I think it is better to have separate test functions for the various options that are well separated. You are then forced to set up the necessary pre-conditions (for example pre-creating the output file in the case you are testing the user specifying a file that already exists) and it becomes also very clear when reading.

Yeah, I was also not sure about that. As of now, the functions are very long. I'll split them up. Also, that's a good point that you bring up with some parts of the tests being reliant on previous calls of the function being tested, I'll definitely keep that consideration in mind!

Also, I think using parametrization and using file_regression is a good thing. It makes for lean and easy to read tests. Now you are manually implementing the parametrization and reading/comparing outputs that is much more fragile and difficult to follow.

Using file_regression is good, I agree. In general, I wanted to avoid all the code in the test function body being run with the parametrization, e.g. --sort/--no-sort, even if it did not influence the actual part being tested, but doubling the number of tests being run. This will be resolved by splitting up the functions accordingly, then I will keep parametrization and file_regression where applicable.

Further, I also considered parametrizing all input parameters, e.g. --sort/--no-sort and no overwrite / --overwrite, but I don't think that makes sense, as the things I'm testing for don't necessarily depend on the values of these two parameters, and I would need to write a bunch of custom logic inside the test function body to differentiate the cases. So yeah, splitting up the test functions, using parametrization where applicable, and generating the states needed for testing without relying on the function being tested seems like a good way to move forward. Still new to this whole testing business :)

@GeigerJ2 GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch 3 times, most recently from ef7ee62 to 7adcf46 Compare July 4, 2024 13:02
@GeigerJ2
Copy link
Contributor Author

GeigerJ2 commented Jul 4, 2024

Still requires some clean-up. Hope I get around this tomorrow.

@GeigerJ2 GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch 6 times, most recently from 8cf3d2e to 289bab2 Compare July 8, 2024 08:31
@GeigerJ2
Copy link
Contributor Author

GeigerJ2 commented Jul 8, 2024

So tests-presto GH action is still failing and, frankly, I'm not sure how to best resolve it. In particular, the exported Computer config file created with the --no-sort option is still sorted, or, rather, different from the expected file of the file_regression. When --no-sort is used, the output file should just mirror the order in which the fields are added in the src. I assume this order is different, and depends on the SQL backend, so the obtained no_sort file is different from what the file_regression expects (even though, safe_interval is a Transport, not a Computer property, so it should not depend on the SQL backend anyway? If I understand correctly, a Computer is by default still an entry in the PSQL table, so also wondering how that changes in the SQLite case?).

As discussed with @khsrali, I'm inclined to just remove the file_regression for the --no-sort case, and just check for the general file existence and content instead, and then test for the sorted file exactly, possibly via file_regression (even though there's now no parametrization anymore). This is because --no-sort shouldn't mean it's the reverse order as compared to --sort, but rather that it is undefined / arbitrary (like a Python dict). I remember you hinting at that in the past, @sphuber, so just wondering what you think. Do you see a better option? Changing the backend implementation order for the tests here to pass seems overkill (and unnecessary).

@sphuber
Copy link
Contributor

sphuber commented Jul 8, 2024

It is not dependent on the backend. The order for output that is based on pydantic models will be determined by the order of the fields as they are declared in the model. However, that is currently only the case for the Code. The Computer does not yet have a model and the export hardcodes the list. (This would be fixed by this PR btw which would add pydantic model for all ORM classes)

The test that is failing is for the configuration of the computer though, and those values are taken from the Transport classes, which also don't have a pydantic model yet. There the configuration is returned as a Python dictionary and there, as you correctly state, there is no real order. So my suspicion is that this test can also randomly fail for the normal test suite and has nothing to do with what storage plugin is used for the test profile.

I would simply just disable the file_regression test for the config export and just check that the command did not fail for --no-sort and that "some" output was generated. Just don't pay attention to the order as we don't care really what the order is.

Copy link
Contributor

@khsrali khsrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeigerJ2 I added some annoying comments :)

show_default=True,
)
@arguments.OUTPUT_FILE(type=click.Path(exists=False, path_type=pathlib.Path), required=False)
@options.OVERWRITE()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit of hassle and maybe unnecessarily. This is what happens to user:

  1. Hit the error FileExistsError first time,
  2. discover there is such option as overwrite
  3. run again with --overwrite

Instead one could easily just index files:
For example if file mycomputer.yaml exists just produce mycomputer_1.yaml if that exists just do mycomputer_2.yaml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this is what my browser does when I download files. They used to raise file exist in the past, and aske user for a new file name :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally not a fan of this to be honest. If the user explicitly specified the output file, I wouldn't want the command to silently change it. Of course you could get around the "silently" a bit by printing a warning that the message was changed, but this will not help in scripting use cases.

Copy link
Contributor Author

@GeigerJ2 GeigerJ2 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, @khsrali! There's no such thing as an annoying review :)

I see where you're coming from, but I'm not sure that I like the idea of automatically indexing the files, as that leads to the output name of the file being different from the label of the AiiDA entity, and I'd like those to be consistent. Suppose I run verdi computer list and I'm getting the output mycomputer. When I export it, I expect the file to be named mycomputer.yaml. Instead, if the file is called mycomputer_1.yaml, I might get confused (ofc we can inform the user, but still). Then, the user might still not be aware of the --overwrite flag, and then has to either run the command again using that, or mv his file to overwrite the original one or get the directory polluted with a bunch of files.

In the end, with all these commands that write to disk, we have to make some decisions, e.g. file names, how to handle overwriting, etc., and there's no right or wrong, and people have different preferences. I don't have a strong opinion, but slightly prefer the current behavior. Let's see what @sphuber thinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your points, my concern was mainly when the user is not specifying an output_name so he doesn't really care... But no strong opinion.

output_file = generate_validate_output_file(
output_file=output_file, entity_label=code.label, overwrite=overwrite, appendix=f'@{code_data["computer"]}'
)
except (FileExistsError, IsADirectoryError) as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally you can always create files with same name as folders in OS.
I would say just ignore and still make the file.
The less unnecessarily raise, less annoyed user, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I definitely don't agree. If I accidentally specify the filename that happens to be a directory and the command just deletes the entire directory and overwrites it with a file, I'd be pretty pissed. I think the "annoyance" of having to change the output filename or specify --overwrite is way less than all the accidental loss of data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would find it super confusing to have both, a file, and a directory with exactly the same name, even if it is theoretically possible. In any case, it should be very much an edge case, I don't expect users to have directories that end with .yml. And if they do, they should be notified of that, I think :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber I don't think that's ever possible,
write_text() is never going to delete your directory and replace a with a file.

I understand @GeigerJ2 's point that having a folder with .yml is pretty rare anyways.
The point I wanted to make was why to specify this rare scenario anyways. But ok.

Comment on lines 769 to 775
try:
output_file = generate_validate_output_file(
output_file=output_file, entity_label=computer.label, overwrite=overwrite, appendix='-setup'
)
except (FileExistsError, IsADirectoryError) as e:
echo.echo_critical(message=e)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong, but generate_validate_output_file seems like a redundant function :) ?
It raises two built-in errors IsADirectoryError & FileExistsError that are being handled all together right here.
I mean write_text itself raises FileExistsError , and in my opinion IsADirectoryError doesn't need to be raised at all.

Apart from that, also has output_file both as input and output. Feels a bit weird :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other functionality of the function is to define the default value of the output file in case one hasn't been specified by the user. This is also why the function returns the output file

Copy link
Contributor Author

@GeigerJ2 GeigerJ2 Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I created the function is because the logic was duplicated three times, for verdi computer export setup, verdi computer export config, and verdi code export, including the different exception texts. So I thought it's better to move it to a single location and re-use the code, although I agree it might not be strictly necessary. Same argument with write_text: I think if we would wait until this call to run into the FileExistsError, the handling logic would have to be repeated, as the call happens at different places in the code, e.g. for the Computer config there is a call to computer.get_configuration(user) before. Probably one could restructure the overall logic a bit, but I didn't want to modify @agoscinski's logic that he put in place when implementing the verdi computer export feature.

If passing output_file as input and output is bad practice here in that case, I'm happy to change it :)

tests/cmdline/commands/test_computer.py Outdated Show resolved Hide resolved
@GeigerJ2 GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch from 289bab2 to de07f8f Compare July 9, 2024 08:57
@GeigerJ2
Copy link
Contributor Author

GeigerJ2 commented Jul 9, 2024

It is not dependent on the backend. The order for output that is based on pydantic models will be determined by the order of the fields as they are declared in the model. However, that is currently only the case for the Code. The Computer does not yet have a model and the export hardcodes the list. (This would be fixed by this PR btw which would add pydantic model for all ORM classes)

The test that is failing is for the configuration of the computer though, and those values are taken from the Transport classes, which also don't have a pydantic model yet. There the configuration is returned as a Python dictionary and there, as you correctly state, there is no real order. So my suspicion is that this test can also randomly fail for the normal test suite and has nothing to do with what storage plugin is used for the test profile.

I would simply just disable the file_regression test for the config export and just check that the command did not fail for --no-sort and that "some" output was generated. Just don't pay attention to the order as we don't care really what the order is.

Thanks for the explanation, @sphuber, that makes a lot of sense! I removed the file_regression for the config export and just generally check for the content. In the sorted case, I'm doing it explicitly with startswith, which I think is fine, as we're also explicitly configuring it before, so we know what to expect.

@sphuber
Copy link
Contributor

sphuber commented Jul 10, 2024

@GeigerJ2 I was just about to approve and merge this until I saw the last commit? Why are you changing that? sort is not a built-in keyword, only a method on an iterable, i.e. [1, 2, 3].sort() and not sort([1, 2, 3]). So having a variable called sort is fine and is not redefining anything.

@GeigerJ2
Copy link
Contributor Author

@GeigerJ2 I was just about to approve and merge this until I saw the last commit? Why are you changing that? sort is not a built-in keyword, only a method on an iterable, i.e. [1, 2, 3].sort() and not sort([1, 2, 3]). So having a variable called sort is fine and is not redefining anything.

Wanted to be extra sure to not cause any confusion ^^ I can revert it.

@GeigerJ2 GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch from 9ceddf0 to de07f8f Compare July 10, 2024 11:24
@GeigerJ2 GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch from de07f8f to 491ab3e Compare July 10, 2024 11:24
@sphuber sphuber merged commit 9355a98 into aiidateam:main Jul 10, 2024
11 checks passed
@sphuber
Copy link
Contributor

sphuber commented Jul 10, 2024

Thanks a lot @GeigerJ2

@GeigerJ2
Copy link
Contributor Author

Thanks for the merge, @sphuber! Sorry this took a bit longer than anticipated, but good learning experience :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants