Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLI command to dump inputs/outputs of CalcJob/WorkChain #6276

Merged
merged 30 commits into from
May 27, 2024

Commits on May 27, 2024

  1. CLI: Add dumping functionality.

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    6ebfe8d View commit details
    Browse the repository at this point in the history
  2. CLI: Add dumping functionality.

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    d8bb818 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    19922fd View commit details
    Browse the repository at this point in the history
  4. Echo missing plugin for --use-presubmit

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    863cc61 View commit details
    Browse the repository at this point in the history
  5. Some cleanup and refactor

    - Reverted formatting changes in `cmd_process`
    - Removed unneccessary comments
    - Removed `:type` if type annotations present
    - `no_node_inputs` -> `include_inputs` to be consistent with
      `include_attributes`/`include_extras`
    - Don't set default path (rather than pwd) and check for None
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    c2244af View commit details
    Browse the repository at this point in the history
  6. Big refactor to make code more concise

    - Remove `verdi calcjob dump` endpoint (not sure about this, might add
      again)
    - Only call `process_dump` -> Adapted the function for this
    - Removed conditional in `generate_node_input_label`
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    e369359 View commit details
    Browse the repository at this point in the history
  7. Add first version of --flat option.

    This option allows dumping of a CalcJob (or of a simple WorkChain that
    only calls a single CalcJob) in a flat directory, without creating the
    hierarchy of `raw_inputs`, `raw_outputs`, and `node_inputs`. This might
    be useful for cases where AiiDA is only used to run a calculation and
    dump the results in a specific custom path, where the custom path is
    dictated by the other code that calls AiiDA to submit calculations
    (e.g. what we are currently working on aiida-koopmans).
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    f5adf17 View commit details
    Browse the repository at this point in the history
  8. Shelfed --use-presubmit option

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    1075416 View commit details
    Browse the repository at this point in the history
  9. Updated tests for calcjob_dumps after code changes

    Different calcjob_dump io functions run through.  Also added tests for
    the flat option for these functions.  Still need to update tests for
    arithmetic_add, and process_dump functions, both for io and for
    multiply_add.
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    671538c View commit details
    Browse the repository at this point in the history
  10. Finalized tests apart from YAML dumping

    - Moved both `genereate_..._io` functions to the end of the file
    - Extended `generate_workchain_io` fixture to allow adding multiple
      `calcjob_nodes` to test that the flat dumping breaks in that case
    - Currently, when dumping the MultiplyAddWorkchain flat, the
      `source_file` of the `multiply` step is missing -> Still need to
      figure that one out
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    009b2aa View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    5b0e97f View commit details
    Browse the repository at this point in the history
  12. Naming: dump -> process_dump in cmd_process

    To be consistent with other commands in `cmd_process`.
    And with that `process_dump` and `calcjob_dump` in `processes.py` to
    `process_node_dump` and `calcjob_node_dump`.
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    5dac992 View commit details
    Browse the repository at this point in the history
  13. Moved logic for calcjob_io_paths to own function

    Now in function `generate_calcjob_io_dump_paths`. Takes care of handling
    the `flat` argument and the naming of the `raw_inputs`, `raw_outputs`,
    and `node_inputs` subdirectories.
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    9b0877d View commit details
    Browse the repository at this point in the history
  14. test

    Changed `--flat` option to still create subdirectories for the
    individual steps of the WorkChain. Instead, just the subdirectories
    per CalcJob are removed.
    
    Generalized the dumping of outputs that it doesn't only dump
    `retrieved` -> With this, it's dumping a whole range of aiida nodes,
    basically all the parsed outputs, which are mainly numpy arrays dumped
    as `.npy` files. Add an option to enable this, as it might not be
    necessary to dump all of those. Currently, I just defined a global
    variable in the file, but this will eventually become a class attribute
    of the ProcessDumper class.
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    af4a03e View commit details
    Browse the repository at this point in the history
  15. ♻️ First working OOP-version using ProcessDumper

    To avoid having to pass all the arguments
    `include_node_inputs`, `include_attributes/extras`, `overwrite`, `flat`,
    `all_aiida_nodes` through the different functions, everything related to
    the dumping is now compiled in the `ProcessDumper` class, which defines
    the main entry-point method `dump`. For nested workflows, this is
    recursively called (as before). Once `CalculationFunction` nodes are
    reached, their content is dumped via `dump_calculation_node`. The
    helper functions to create and validate labels and paths of nested
    subdirectories are also methods of the `ProcessDumper`.
    
    Introduced the `parent_process` class attribute which is dynamically
    generated from the parent_node, and which is used to generate the main
    README, which is only created when the dumping is done via the `verdi`
    CLI. For the other functions, this concept does not make sense, due to
    the recursion, so the respective `process_node`s (which are changing
    during the recursion) are always passed as arguments.
    
    Next steps:
    - Update tests to actually test the new implementations
    - Update docstrings
    - Add section to `How to work with data` section of the docs
    - If the `OverridableOptions` are only used here, they can also just be
      defined as normal `click` options (however, we can also start thinking
      about the `verdi archive dump` functionality that we should start
      implementing soon)
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    a1930cb View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    f7a3f00 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    fc3c181 View commit details
    Browse the repository at this point in the history
  18. Fix check for CalculationNode in dump.

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    b5a34a9 View commit details
    Browse the repository at this point in the history
  19. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    pre-commit-ci[bot] authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    b5e1a4e View commit details
    Browse the repository at this point in the history
  20. Fix annotations for 3.9 test suite

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    5464a71 View commit details
    Browse the repository at this point in the history
  21. Fix dump_node_yaml before CalcJob dump

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    07ac0e1 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    a62f73c View commit details
    Browse the repository at this point in the history
  23. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    pre-commit-ci[bot] authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    8db2e05 View commit details
    Browse the repository at this point in the history
  24. Final cleanup

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    1cbe414 View commit details
    Browse the repository at this point in the history
  25. Add _workflow_dump and change link node dumping

    Moved the recursive logic out of the top-level `dump` function instead
    into `_workflow_dump`. In addition, moved the default path creation and
    validation into the top-level `dump` function and out of the
    `cmd_process.py` file.
    
    The following entities are now dumped for each child `CalculationNode`
    reached during the dumping:
    - `CalculationNode` repository -> `inputs`
    - `CalculationNode` retrieved output -> `outputs`
    - `CalculationNode` input nodes -> `node_inputs`
    - `CalculationNode` output nodes (apart from `retrieved`)
      -> `node_outputs`
    By default, everything apart from the `node_outputs` is dumped, as to
    avoid too many non-`SinglefileData` or `FolderData` nodes to be written
    to disk. The `--all-aiida-nodes` option is instead removed. The number
    of files might still grow large for complex workchains, e.g.
    `SelfConsistentHubbardWorkchain` or `EquationOfStateWorkChain`.
    
    In addition, set `_generate_default_dump_path`, `_generate_readme`, and
    `_generate_child_node_label` as `staticmethod`s, as they logically
    belong to the class, but don't access any of its attributes. The former
    two are further only called  in the top-level `dump` method. Other
    methods like `_validate_make_dump_path` still access class attributes
    like `overwrite` or `flat`, so they remain normal class methods.
    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    ae9a912 View commit details
    Browse the repository at this point in the history
  26. Updated tests

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    ac2acb4 View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    deb8867 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    8cf53af View commit details
    Browse the repository at this point in the history
  29. Update documentation

    GeigerJ2 authored and sphuber committed May 27, 2024
    Configuration menu
    Copy the full SHA
    74facc2 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    497b14c View commit details
    Browse the repository at this point in the history