👌 CLI: Computer/Code export `output_file` optional #6486

GeigerJ2 · 2024-06-24T14:18:56Z

As the title states: The output_file argument for verdi computer export [setup,config] and verdi computer export is made optional by generating a default filename based on the respective label, if not specified.

I think this should be what people would probably call their files most of the time anyway.

sphuber

Thanks @GeigerJ2 fully agree with the feature, just some comments on the implementation

src/aiida/cmdline/commands/cmd_code.py

tests/cmdline/commands/test_code.py

tests/cmdline/commands/test_computer.py

sphuber · 2024-07-01T07:50:21Z

@GeigerJ2 I would like to release today please and this is the last open PR. Would it be possible to wrap this up a.s.a.p.? Or shall we punt it?

GeigerJ2 · 2024-07-01T07:57:11Z

We're on a group hike today, so unfortunately cannot work on it right now. I could wrap it up tomorrow morning first thing. Otherwise, if it cannot wait, feel free to release without, don't think it's that crucial.

GeigerJ2 · 2024-07-02T13:17:46Z

Alright, so I think this is ready for a second round of reviews. I should have resolved your original comments, @sphuber. Though, there's no rush as the new version is already released. For all three commands (verdi code export, verdi computer export [setup,config]), I also added the overwrite option, and the commands fail if the files already exist and overwrite is False. In addition, I defined the sort option as a general option in cmdline/params/options/main.py as it's now being used in multiple places.

After taking a closer look at the tests again, I felt like the parametrization for the sorting was actually a bit unnecessary, so I removed it in the last commit (db6259611). It didn't really matter for the logic in the test function bodies, and just duplicated the number of tests being run in the (already very extensive) test suite. Therefore, I removed the parametrization, and just added an individual test for the sorting to each function. Similarly, for the overwrite option, I'm testing that the command fails when overwrite is False, and that the change is reflected in the output config file after modifying a Code/Computer instance and writing with overwrite=True (thanks to @mbercx for the discussion on testing). Also pinging @agoscinski as the implementations of the Computer export commands were written by him.

codecov · 2024-07-02T13:35:39Z

Codecov Report

Attention: Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.78%. Comparing base (ef60b66) to head (491ab3e).
Report is 116 commits behind head on main.

Files with missing lines	Patch %	Lines
src/aiida/cmdline/commands/cmd_computer.py	94.74%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6486      +/-   ##
==========================================
+ Coverage   77.51%   77.78%   +0.28%     
==========================================
  Files         560      562       +2     
  Lines       41444    41885     +441     
==========================================
+ Hits        32120    32578     +458     
+ Misses       9324     9307      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sphuber

Thanks @GeigerJ2 . Fine with the implementation, just some minor implementation details. I am not sure I like the approach of the tests though where you are using a single test for all possible combinations. This tends to be very fragile as later tests in the function implicitly rely on some preconditions that were created (or not) by previous asserts and calls to the command. I think it is better to have separate test functions for the various options that are well separated. You are then forced to set up the necessary pre-conditions (for example pre-creating the output file in the case you are testing the user specifying a file that already exists) and it becomes also very clear when reading.

Also, I think using parametrization and using file_regression is a good thing. It makes for lean and easy to read tests. Now you are manually implementing the parametrization and reading/comparing outputs that is much more fragile and difficult to follow.

src/aiida/cmdline/commands/cmd_code.py

src/aiida/cmdline/commands/cmd_computer.py

src/aiida/cmdline/commands/cmd_code.py

tests/cmdline/commands/test_computer.py

src/aiida/cmdline/commands/cmd_computer.py

GeigerJ2 · 2024-07-03T10:24:31Z

Thanks a lot for the review, @sphuber! Already a top-level answer, will resolve the issues soon:

I am not sure I like the approach of the tests though where you are using a single test for all possible combinations. This tends to be very fragile as later tests in the function implicitly rely on some preconditions that were created (or not) by previous asserts and calls to the command. I think it is better to have separate test functions for the various options that are well separated. You are then forced to set up the necessary pre-conditions (for example pre-creating the output file in the case you are testing the user specifying a file that already exists) and it becomes also very clear when reading.

Yeah, I was also not sure about that. As of now, the functions are very long. I'll split them up. Also, that's a good point that you bring up with some parts of the tests being reliant on previous calls of the function being tested, I'll definitely keep that consideration in mind!

Also, I think using parametrization and using file_regression is a good thing. It makes for lean and easy to read tests. Now you are manually implementing the parametrization and reading/comparing outputs that is much more fragile and difficult to follow.

Using file_regression is good, I agree. In general, I wanted to avoid all the code in the test function body being run with the parametrization, e.g. --sort/--no-sort, even if it did not influence the actual part being tested, but doubling the number of tests being run. This will be resolved by splitting up the functions accordingly, then I will keep parametrization and file_regression where applicable.

Further, I also considered parametrizing all input parameters, e.g. --sort/--no-sort and no overwrite / --overwrite, but I don't think that makes sense, as the things I'm testing for don't necessarily depend on the values of these two parameters, and I would need to write a bunch of custom logic inside the test function body to differentiate the cases. So yeah, splitting up the test functions, using parametrization where applicable, and generating the states needed for testing without relying on the function being tested seems like a good way to move forward. Still new to this whole testing business :)

GeigerJ2 · 2024-07-04T13:27:32Z

Still requires some clean-up. Hope I get around this tomorrow.

GeigerJ2 · 2024-07-08T14:58:06Z

So tests-presto GH action is still failing and, frankly, I'm not sure how to best resolve it. In particular, the exported Computer config file created with the --no-sort option is still sorted, or, rather, different from the expected file of the file_regression. When --no-sort is used, the output file should just mirror the order in which the fields are added in the src. I assume this order is different, and depends on the SQL backend, so the obtained no_sort file is different from what the file_regression expects (even though, safe_interval is a Transport, not a Computer property, so it should not depend on the SQL backend anyway? If I understand correctly, a Computer is by default still an entry in the PSQL table, so also wondering how that changes in the SQLite case?).

As discussed with @khsrali, I'm inclined to just remove the file_regression for the --no-sort case, and just check for the general file existence and content instead, and then test for the sorted file exactly, possibly via file_regression (even though there's now no parametrization anymore). This is because --no-sort shouldn't mean it's the reverse order as compared to --sort, but rather that it is undefined / arbitrary (like a Python dict). I remember you hinting at that in the past, @sphuber, so just wondering what you think. Do you see a better option? Changing the backend implementation order for the tests here to pass seems overkill (and unnecessary).

sphuber · 2024-07-08T18:48:47Z

It is not dependent on the backend. The order for output that is based on pydantic models will be determined by the order of the fields as they are declared in the model. However, that is currently only the case for the Code. The Computer does not yet have a model and the export hardcodes the list. (This would be fixed by this PR btw which would add pydantic model for all ORM classes)

The test that is failing is for the configuration of the computer though, and those values are taken from the Transport classes, which also don't have a pydantic model yet. There the configuration is returned as a Python dictionary and there, as you correctly state, there is no real order. So my suspicion is that this test can also randomly fail for the normal test suite and has nothing to do with what storage plugin is used for the test profile.

I would simply just disable the file_regression test for the config export and just check that the command did not fail for --no-sort and that "some" output was generated. Just don't pay attention to the order as we don't care really what the order is.

khsrali

Thanks @GeigerJ2 I added some annoying comments :)

khsrali · 2024-07-09T08:05:33Z

src/aiida/cmdline/commands/cmd_code.py

-    show_default=True,
-)
+@arguments.OUTPUT_FILE(type=click.Path(exists=False, path_type=pathlib.Path), required=False)
+@options.OVERWRITE()


This might be a bit of hassle and maybe unnecessarily. This is what happens to user:

Hit the error FileExistsError first time,

discover there is such option as overwrite

run again with --overwrite

Instead one could easily just index files:
For example if file mycomputer.yaml exists just produce mycomputer_1.yaml if that exists just do mycomputer_2.yaml

btw, this is what my browser does when I download files. They used to raise file exist in the past, and aske user for a new file name :)

I am personally not a fan of this to be honest. If the user explicitly specified the output file, I wouldn't want the command to silently change it. Of course you could get around the "silently" a bit by printing a warning that the message was changed, but this will not help in scripting use cases.

Thanks for the comments, @khsrali! There's no such thing as an annoying review :)

I see where you're coming from, but I'm not sure that I like the idea of automatically indexing the files, as that leads to the output name of the file being different from the label of the AiiDA entity, and I'd like those to be consistent. Suppose I run verdi computer list and I'm getting the output mycomputer. When I export it, I expect the file to be named mycomputer.yaml. Instead, if the file is called mycomputer_1.yaml, I might get confused (ofc we can inform the user, but still). Then, the user might still not be aware of the --overwrite flag, and then has to either run the command again using that, or mv his file to overwrite the original one or get the directory polluted with a bunch of files.

In the end, with all these commands that write to disk, we have to make some decisions, e.g. file names, how to handle overwriting, etc., and there's no right or wrong, and people have different preferences. I don't have a strong opinion, but slightly prefer the current behavior. Let's see what @sphuber thinks.

I see your points, my concern was mainly when the user is not specifying an output_name so he doesn't really care... But no strong opinion.

khsrali · 2024-07-09T08:11:53Z

src/aiida/cmdline/commands/cmd_code.py

+        output_file = generate_validate_output_file(
+            output_file=output_file, entity_label=code.label, overwrite=overwrite, appendix=f'@{code_data["computer"]}'
+        )
+    except (FileExistsError, IsADirectoryError) as e:


Normally you can always create files with same name as folders in OS.
I would say just ignore and still make the file.
The less unnecessarily raise, less annoyed user, no?

Here I definitely don't agree. If I accidentally specify the filename that happens to be a directory and the command just deletes the entire directory and overwrites it with a file, I'd be pretty pissed. I think the "annoyance" of having to change the output filename or specify --overwrite is way less than all the accidental loss of data.

Personally, I would find it super confusing to have both, a file, and a directory with exactly the same name, even if it is theoretically possible. In any case, it should be very much an edge case, I don't expect users to have directories that end with .yml. And if they do, they should be notified of that, I think :D

@sphuber I don't think that's ever possible,
write_text() is never going to delete your directory and replace a with a file.

I understand @GeigerJ2 's point that having a folder with .yml is pretty rare anyways.
The point I wanted to make was why to specify this rare scenario anyways. But ok.

khsrali · 2024-07-09T08:35:12Z

src/aiida/cmdline/commands/cmd_computer.py

+    try:
+        output_file = generate_validate_output_file(
+            output_file=output_file, entity_label=computer.label, overwrite=overwrite, appendix='-setup'
+        )
+    except (FileExistsError, IsADirectoryError) as e:
+        echo.echo_critical(message=e)
+


Maybe I'm wrong, but generate_validate_output_file seems like a redundant function :) ?
It raises two built-in errors IsADirectoryError & FileExistsError that are being handled all together right here.
I mean write_text itself raises FileExistsError , and in my opinion IsADirectoryError doesn't need to be raised at all.

Apart from that, also has output_file both as input and output. Feels a bit weird :)

The other functionality of the function is to define the default value of the output file in case one hasn't been specified by the user. This is also why the function returns the output file

The main reason I created the function is because the logic was duplicated three times, for verdi computer export setup, verdi computer export config, and verdi code export, including the different exception texts. So I thought it's better to move it to a single location and re-use the code, although I agree it might not be strictly necessary. Same argument with write_text: I think if we would wait until this call to run into the FileExistsError, the handling logic would have to be repeated, as the call happens at different places in the code, e.g. for the Computer config there is a call to computer.get_configuration(user) before. Probably one could restructure the overall logic a bit, but I didn't want to modify @agoscinski's logic that he put in place when implementing the verdi computer export feature.

If passing output_file as input and output is bad practice here in that case, I'm happy to change it :)

tests/cmdline/commands/test_computer.py

GeigerJ2 · 2024-07-09T09:35:27Z

It is not dependent on the backend. The order for output that is based on pydantic models will be determined by the order of the fields as they are declared in the model. However, that is currently only the case for the Code. The Computer does not yet have a model and the export hardcodes the list. (This would be fixed by this PR btw which would add pydantic model for all ORM classes)

The test that is failing is for the configuration of the computer though, and those values are taken from the Transport classes, which also don't have a pydantic model yet. There the configuration is returned as a Python dictionary and there, as you correctly state, there is no real order. So my suspicion is that this test can also randomly fail for the normal test suite and has nothing to do with what storage plugin is used for the test profile.

I would simply just disable the file_regression test for the config export and just check that the command did not fail for --no-sort and that "some" output was generated. Just don't pay attention to the order as we don't care really what the order is.

Thanks for the explanation, @sphuber, that makes a lot of sense! I removed the file_regression for the config export and just generally check for the content. In the sorted case, I'm doing it explicitly with startswith, which I think is fine, as we're also explicitly configuring it before, so we know what to expect.

sphuber · 2024-07-10T11:21:44Z

@GeigerJ2 I was just about to approve and merge this until I saw the last commit? Why are you changing that? sort is not a built-in keyword, only a method on an iterable, i.e. [1, 2, 3].sort() and not sort([1, 2, 3]). So having a variable called sort is fine and is not redefining anything.

GeigerJ2 · 2024-07-10T11:23:07Z

@GeigerJ2 I was just about to approve and merge this until I saw the last commit? Why are you changing that? sort is not a built-in keyword, only a method on an iterable, i.e. [1, 2, 3].sort() and not sort([1, 2, 3]). So having a variable called sort is fine and is not redefining anything.

Wanted to be extra sure to not cause any confusion ^^ I can revert it.

sphuber · 2024-07-10T11:53:53Z

Thanks a lot @GeigerJ2

GeigerJ2 · 2024-07-10T13:23:47Z

Thanks for the merge, @sphuber! Sorry this took a bit longer than anticipated, but good learning experience :)

…ds (aiidateam#6486)

sphuber requested changes Jun 24, 2024

View reviewed changes

GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch 3 times, most recently from 4f7a280 to db62596 Compare July 2, 2024 13:05

GeigerJ2 requested a review from agoscinski July 2, 2024 13:18

sphuber requested changes Jul 2, 2024

View reviewed changes

GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch 3 times, most recently from ef7ee62 to 7adcf46 Compare July 4, 2024 13:02

GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch 6 times, most recently from 8cf3d2e to 289bab2 Compare July 8, 2024 08:31

khsrali reviewed Jul 9, 2024

View reviewed changes

GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch from 289bab2 to de07f8f Compare July 9, 2024 08:57

GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch from 9ceddf0 to de07f8f Compare July 10, 2024 11:24

👌 CLI: Computer/Code export output_file optional

491ab3e

GeigerJ2 force-pushed the feature/auto-computer-code-export-file branch from de07f8f to 491ab3e Compare July 10, 2024 11:24

sphuber approved these changes Jul 10, 2024

View reviewed changes

sphuber merged commit 9355a98 into aiidateam:main Jul 10, 2024
11 checks passed

GeigerJ2 deleted the feature/auto-computer-code-export-file branch July 10, 2024 13:23

GeigerJ2 mentioned this pull request Jul 21, 2024

CLI: Make output file for verdi <subcommend(s)> export optional #6414

Closed

mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024

CLI: Add default for output_file in computer and code export comman…

d342c75

…ds (aiidateam#6486)

This was referenced Nov 28, 2024

Fixed verdi code export does not return a message if successful #6643

Open

verdi code export does not return a message if successful #6421

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👌 CLI: Computer/Code export `output_file` optional #6486

👌 CLI: Computer/Code export `output_file` optional #6486

GeigerJ2 commented Jun 24, 2024 •

edited

Loading

sphuber left a comment

sphuber commented Jul 1, 2024

GeigerJ2 commented Jul 1, 2024

GeigerJ2 commented Jul 2, 2024

codecov bot commented Jul 2, 2024 •

edited

Loading

sphuber left a comment

GeigerJ2 commented Jul 3, 2024

GeigerJ2 commented Jul 4, 2024

GeigerJ2 commented Jul 8, 2024

sphuber commented Jul 8, 2024

khsrali left a comment

khsrali Jul 9, 2024

khsrali Jul 9, 2024

sphuber Jul 9, 2024

GeigerJ2 Jul 9, 2024 •

edited

Loading

khsrali Jul 9, 2024

khsrali Jul 9, 2024

sphuber Jul 9, 2024

GeigerJ2 Jul 9, 2024

khsrali Jul 9, 2024

khsrali Jul 9, 2024

sphuber Jul 9, 2024

GeigerJ2 Jul 9, 2024 •

edited

Loading

GeigerJ2 commented Jul 9, 2024

sphuber commented Jul 10, 2024

GeigerJ2 commented Jul 10, 2024

sphuber commented Jul 10, 2024

GeigerJ2 commented Jul 10, 2024

👌 CLI: Computer/Code export output_file optional #6486

👌 CLI: Computer/Code export output_file optional #6486

Conversation

GeigerJ2 commented Jun 24, 2024 • edited Loading

sphuber left a comment

Choose a reason for hiding this comment

sphuber commented Jul 1, 2024

GeigerJ2 commented Jul 1, 2024

GeigerJ2 commented Jul 2, 2024

codecov bot commented Jul 2, 2024 • edited Loading

Codecov Report

sphuber left a comment

Choose a reason for hiding this comment

GeigerJ2 commented Jul 3, 2024

GeigerJ2 commented Jul 4, 2024

GeigerJ2 commented Jul 8, 2024

sphuber commented Jul 8, 2024

khsrali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeigerJ2 Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GeigerJ2 Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

GeigerJ2 commented Jul 9, 2024

sphuber commented Jul 10, 2024

GeigerJ2 commented Jul 10, 2024

sphuber commented Jul 10, 2024

GeigerJ2 commented Jul 10, 2024

👌 CLI: Computer/Code export `output_file` optional #6486

👌 CLI: Computer/Code export `output_file` optional #6486

GeigerJ2 commented Jun 24, 2024 •

edited

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading

GeigerJ2 Jul 9, 2024 •

edited

Loading

GeigerJ2 Jul 9, 2024 •

edited

Loading