Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKO Rust reader v2 #400

Merged
merged 46 commits into from
Aug 22, 2024
Merged

EKO Rust reader v2 #400

merged 46 commits into from
Aug 22, 2024

Conversation

felixhekhorn
Copy link
Contributor

@felixhekhorn felixhekhorn commented Aug 12, 2024

Second attempt of #97 and restart of #260

  • comparison to EKO Rust reader #260:
    • I started from the operator loading and not the metadata loading, so the former is now available the latter not
    • serde_yaml used there and in pineappl is dead, the maybe-successor serde_yml is still 0.0 which I didn't dare to use and so I opted of yaml-rust2
  • relation to Improve runner API #296: this is kept in mind, i.e. we allow to read an already opened eko from dir
  • relation to Lazy loading for rust #97:
    • tools:
    • implementation goals:
      • operator loading ✔️
      • metadata loading: TODO - if we want to do small PRs can even be done later ...
      • "operators headers syncer": I don't know what is meant
  • as said by @alecandido it's not wise to maintain two implementation (both in Rust and Python). The structure using inventories is a direct mirror of the Python side to keep this idea in mind. The actual Python interface will not be in this PR, but possibly in a later.
  • that being said: I intend to have dekoder as a pure Rust package and put the Python interface somewhere else (e.g. into eko). This follows another idea of @alecandido to keep a dedicated Python interface layer to not clutter the actual library (see also pineappl and pineappl_py) (or any additional interfaces)
  • a dedicated memory management (as was the target of First jets implementation #242) is not necessary here, as we can rely on the Rust features: i.e. it is the users responsibility to explicitly keep the 4D object in scope as long as he needs (or else Rust will clean it up). This idea leads to the following structure: Rust passes ownership on return (see e283553)
  • The short term goal is to expose sufficient features for PineAPPL (not reached yet here and may be done in a second PR). One consequence is that we will of course preserve our idea of EvolutionPoint and PineAPPL will have to deal with that

@felixhekhorn felixhekhorn added rust Rust extension related output Output format and management labels Aug 12, 2024
@felixhekhorn
Copy link
Contributor Author

I'm now ready for some feedback - someone please?. While @alecandido would be an ideal candidate we can not assume something on his availability - so @cschwan could you please take a look? e.g. on my usage of Rust 🙃

@felixhekhorn felixhekhorn marked this pull request as ready for review August 20, 2024 08:36
crates/dekoder/tests/data/v0.15.tar Outdated Show resolved Hide resolved
crates/dekoder/tests/test_load.rs Outdated Show resolved Hide resolved
crates/dekoder/src/lib.rs Outdated Show resolved Hide resolved
crates/dekoder/src/inventory.rs Outdated Show resolved Hide resolved
crates/dekoder/src/inventory.rs Outdated Show resolved Hide resolved
crates/dekoder/src/inventory.rs Outdated Show resolved Hide resolved
@alecandido
Copy link
Member

I will review again the Ok(another_result?) pattern, since there should also be another way to convert between results.

As the amount of characters, that's clearly one of the easiest. But it's still a bit less explicit, because conceptually they're two operations instead of a single one (though I hope they're getting optimized to the same binary).

In any case, it's really a detail. It's just for better understanding the mechanics of ?, rather than for a clear advantage.

@felixhekhorn
Copy link
Contributor Author

I didn't tested, but I trust you that that part was already covered by you :)

the " ✔️ "* does (in case I forget) 🙃 (* in the inline preview the thing is properly green, but not in the preview tab ...)

@alecandido
Copy link
Member

(* in the inline preview the thing is properly green, but not in the preview tab ...)

This just depends on the font rendering the emoji xD

@alecandido
Copy link
Member

alecandido commented Aug 21, 2024

Ah, just for some more fun, the :white_check_mark: is actually a green box, with GitHub's webfont ✅

@alecandido
Copy link
Member

alecandido commented Aug 21, 2024

Did you discover any problems in the past related to YAML - if not in EKO, then in any NNPDF project? Otherwise this sounds like bikeshedding.

@cschwan I hit myself the Norway problem in a project within NNPDF (don't remember exactly which and when).
And PyYAML caused a lot of problems with object serialization (it tries to serialize unknown objects as custom types - unluckily, NumPy scalars were also custom objects, as 0-rank arrays, so we needed a lot of float(array[idx])) - while now is causing problem for lack of reliable libraries in Rust.

Outside NNPDF, we got a huge speed-up by dropping YAML in favor of JSON (parsers are much simpler and much more optimized).

(moreover, the main benefit of YAML is being human-readable - and writable - and I've seen it used a lot for things that were supposed not to be touched by humans, like here in EKO...)

Copy link
Member

@alecandido alecandido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the last @cschwan review, but they are all tiny improvements (that I'd recommend applying).

Other than that, the PR should be fine, for EKOs containing integration errors.
As I said, most of the EKO (if not all those produced for FK tables) should not contain the errors, since we switched the default to optimize disk and memory consumption (these errors are never used in the pineline). So, in practice, I expect dekoder not to be ready for production yet.

In any case, the npz/npy problem could be faced in another PR, possibly changing at the same time the format, getting rid of the npz.
This would be a breaking data format change, so, in principle, it should be scheduled for a major release, and even a data format version bump. However, as said above, the EKOs that are going to be broken I expect to be unused, and you can check it quickly on existing EKOs with something like tar -t myeko.tar | grep npz. I'd suggest doing this, before breaking something to @scarlehoff :P

@felixhekhorn felixhekhorn merged commit 3bfc89d into master Aug 22, 2024
7 checks passed
@felixhekhorn felixhekhorn deleted the rust-reader-v2 branch August 22, 2024 07:59
@felixhekhorn felixhekhorn mentioned this pull request Aug 22, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
output Output format and management rust Rust extension related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants