Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Espresso: Dynamic phonon workflow #1404

Merged
merged 110 commits into from
Jan 11, 2024

Conversation

tomdemeyere
Copy link
Contributor

@tomdemeyere tomdemeyere commented Dec 24, 2023

Requirements

Checklist

  • I have read the Contributing Guide. Don't lie! 😉
  • My PR is on a custom branch and is not named main.
  • I have used black, isort, and ruff as described in the style guide.
  • I have added relevant, comprehensive unit tests.

Notes

  • Your PR will likely not be merged without proper and thorough tests.
  • If you are an external contributor, you will see a comment from @buildbot-princeton. This is solely for the maintainers.
  • When your code is ready for review, ping one of the active maintainers.

Summary of Changes

First complex workflow with ph.x. It aims to make phonon calculations embarrassingly parallel by allowing parallelization at the representation level, requirements:

  • Takes two arguments, a static job (pw.x) and a ph.x job (ph.x)
  • quacc must be able to run 'test_runs' with both of these binaries
  • As a result, these two jobs will have variable computational loads at times. The flow should be able to run these jobs using different executors or finding a workaround to mimic that.
  • We should be able to copy a list of directories (I have made a workaround for now but it surely can be better)

Extra to-do:

  • Allowing an extra argument "blocks" which is an integer and allows the user to regroup multiple representations into one calculation, this is useful in case of limited scratch space. if "blocks" == 0 this is the extreme case where we fall back on q-point parallelization only.
  • Write tests.
  • Write docs.

@buildbot-princeton
Copy link
Collaborator

Can one of the admins verify this patch?

@Andrew-S-Rosen Andrew-S-Rosen marked this pull request as ready for review December 24, 2023 02:04
@Andrew-S-Rosen Andrew-S-Rosen marked this pull request as draft December 24, 2023 02:05
Copy link
Member

@Andrew-S-Rosen Andrew-S-Rosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tomdemeyere! This is already looking great. I added some preliminary comments for you (I know you're still working on it). Overall, the structure of the Flow makes sense, although there is some subtlety to deal with regarding dynamic subworkflows I can help address later (discussed below).

Some calculations will be able to be done very quickly because of symmetry, either find a way to trick QE into doing them at the end during the "recover" run or be able to specify another executor for them to do it either on the login node or a scavenger/serial HPC queue.

You can strip the decorator off a Job to turn it into a regular function via Job.__wrapped__. However, it has limitations since it requires the server/daemon/process to have access to the QE binaries, which is not always the case (e.g. if you dispatch workflows from a laptop). I generally only use that pattern if it is for simple operations that don't call a binary. There are other options. Let's come back to this later.

src/quacc/recipes/espresso/phonons.py Outdated Show resolved Hide resolved
src/quacc/calculators/espresso/espresso.py Show resolved Hide resolved
src/quacc/calculators/espresso/espresso.py Outdated Show resolved Hide resolved
src/quacc/calculators/espresso/utils.py Outdated Show resolved Hide resolved
src/quacc/recipes/espresso/phonons.py Show resolved Hide resolved
src/quacc/recipes/espresso/phonons.py Outdated Show resolved Hide resolved
@Andrew-S-Rosen Andrew-S-Rosen changed the title Espresso: first complex phonon flow Espresso: Dynamic phonon workflow Dec 24, 2023
@tomdemeyere
Copy link
Contributor Author

Thanks for your comments, this should be the last "core" PR for the espresso calc. If everything is working as intended, I will start to actually run production calculations using quacc. Other recipes and bug fixes should then come naturally without requiring much core changes

Copy link

codecov bot commented Dec 24, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (9db9cf8) 99.03% compared to head (4612d2f) 98.76%.
Report is 8 commits behind head on main.

❗ Current head 4612d2f differs from pull request most recent head 863926e. Consider uploading reports for the commit 863926e to get more accurate results

Files Patch % Lines
src/quacc/calculators/espresso/espresso.py 57.14% 9 Missing ⚠️
src/quacc/utils/files.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1404      +/-   ##
==========================================
- Coverage   99.03%   98.76%   -0.28%     
==========================================
  Files          79       79              
  Lines        2916     2994      +78     
==========================================
+ Hits         2888     2957      +69     
- Misses         28       37       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomdemeyere
Copy link
Contributor Author

OK, so:

  • All three problems are fixed. By default the function is set to put performance first, at the expense of (probably tons) of disk space. This balance can be fine tuned in many ways and this will be described in the doc.

  • I had to change quite a lot of things, I will also open a ASE PR and ping you if it's ok with you.

I just added a test for the copy_decompress_tree

Copy link
Member

@Andrew-S-Rosen Andrew-S-Rosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomdemeyere: Great, this is looking good. Still one issue with your @flow pattern. Let me know if you would like assistance with it.

Will wait for the ASE MR and then hopefully this will pass and we can get to merging it when you're ready.

src/quacc/recipes/espresso/phonons.py Outdated Show resolved Hide resolved
@tomdemeyere tomdemeyere mentioned this pull request Jan 9, 2024
5 tasks
src/quacc/recipes/espresso/phonons.py Outdated Show resolved Hide resolved
src/quacc/recipes/espresso/phonons.py Outdated Show resolved Hide resolved
@tomdemeyere
Copy link
Contributor Author

Can/should the internal functions be located in another file? Users will not be interested by that and/or this might confuse them?

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jan 10, 2024

Can/should the internal functions be located in another file? Users will not be interested by that and/or this might confuse them?

Right now, I have all internal functions that are needed for recipes located inside their respective recipe, an example of which can be found here. It would probably be wise to do the same with _grid_phonon_subflow.

Whether it, more generally would be valuable to move them to some other module is a good question. The pros are that the recipe becomes easier to read. The con is that the relationship between the internal function and its corresponding recipe becomes less clear. Personally, I would probably suggest moving _grid_phonon_subflow inside and then we can consider alternate ideas afterwards if there is a good proposal. You can drop the docstring for brevity, if preferred.

I am open to alternate suggestions. Edit: Perhaps one could make a _phonons.py and move them there? What are your thoughts given these points?

@tomdemeyere
Copy link
Contributor Author

tomdemeyere commented Jan 10, 2024

Edit: Perhaps one could make a _phonons.py and move them there? What are your thoughts given these points?

For now, I moved them both inside as you suggested. I am not sure what is the best choice here.

This should be close to go! To my knowledge quacc is now the only package that allows running such flexible ph.x grid-parallelization. If file transfer between machines becomes routine this will allow to run very large phonon calculations

EDIT: @Andrew-S-Rosen Good to go

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jan 10, 2024

FYI. All looks good on several tested workflow engines! I will try to merge this sometime tomorrow. I don't anticipate that anything else will be needed from you on this. [The codecov test will fail because of the coverage, but there has been substantial enough work on this that I feel okay with merging it after any final edits.]

Sidenote: This only works on the newest (0.232.0) release of Covalent because your workflow wrecked their memory usage on the older versions. The newest version handles this better, but it does not install with pip install quacc[covalent] due to a dependency conflict. I also tested this on Parsl and it seems to work fine there. I anticipate it will work fine on Dask, Redun, and Prefect (to be added soon).

image
image

@tomdemeyere
Copy link
Contributor Author

FYI. All looks good on several tested workflow engines! I will try to merge this sometime tomorrow. I don't anticipate that anything else will be needed from you on this. [The codecov test will fail because of the coverage, but there has been substantial enough work on this that I feel okay with merging it after any final edits.]

Sidenote: This only works on the newest (0.232.0) release of Covalent because your workflow wrecked their memory usage on the older versions. The newest version handles this better, but it does not install with pip install quacc[covalent] due to a dependency conflict. I also tested this on Parsl and it seems to work fine there. I anticipate it will work fine on Dask, Redun, and Prefect (to be added soon).

image

image

Amazing! After the merge, I will run a first example using HPC. (Possibly using Covalent, except if the 'electron Dict' executor thing is not fixed. In which case I will fall back on another workflow engine with an HPC executor)

When that works, and after improving the documentation I will start to share my quacc work (Blog post and probably on the QE mailing list)

@Andrew-S-Rosen
Copy link
Member

Alright! I'm going to merge this now for the sake of both of us! 😅 Thank you again for your really nice work on this!!

Amazing! After the merge, I will run a first example using HPC. (Possibly using Covalent, except if the 'electron Dict' executor thing is not fixed. In which case I will fall back on another workflow engine with an HPC executor)

My personal suggestion is that Covalent will not be the most efficient way to run this type of workflow, but only you'll be able to judge that for sure given the many intricacies of HPC policies.

When that works, and after improving the documentation I will start to share my quacc work (Blog post and probably on the QE mailing list)

Awesome!

@Andrew-S-Rosen Andrew-S-Rosen merged commit 03c0e5e into Quantum-Accelerators:main Jan 11, 2024
16 of 17 checks passed
@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jan 11, 2024

@tomdemeyere: Was test_run=True supposed to be included` here?

"ph_init_job": recursive_dict_merge(
{"input_data": {"inputph": {"lqdir": True, "only_init": True}}},
job_params["ph_job"],

@tomdemeyere
Copy link
Contributor Author

@tomdemeyere: Was test_run=True supposed to be included` here?

"ph_init_job": recursive_dict_merge(
{"input_data": {"inputph": {"lqdir": True, "only_init": True}}},
job_params["ph_job"],

No, it is entirely replaced by this “only_init” keyword

Andrew-S-Rosen added a commit that referenced this pull request Jan 11, 2024
## Requirements

### Checklist

- [ ] I have read the [Contributing
Guide](https://quantum-accelerators.github.io/quacc/dev/contributing.html).
Don't lie! 😉
- [ ] My PR is on a custom branch and is _not_ named `main`.
- [ ] I have used `black`, `isort`, and `ruff` as described in the
[style
guide](https://quantum-accelerators.github.io/quacc/dev/contributing.html#style).
- [ ] I have added relevant, comprehensive [unit
tests](https://quantum-accelerators.github.io/quacc/dev/contributing.html#unit-tests).

### Notes

- Your PR will likely not be merged without proper and thorough tests.
- If you are an external contributor, you will see a comment from
[@buildbot-princeton](https://github.com/buildbot-princeton). This is
solely for the maintainers.
- When your code is ready for review, ping one of the [active
maintainers](https://quantum-accelerators.github.io/quacc/about/contributors.html#active-maintainers).

## Summary of Changes

This should also improve the missing coverage from #1404

---------

Co-authored-by: Andrew S. Rosen <[email protected]>
@tomdemeyere tomdemeyere deleted the grid_recipe branch November 15, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants