Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement cross-platform trajectory output #108

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jmineau
Copy link
Member

@jmineau jmineau commented Sep 17, 2024

resolves #107

I provided functionality to write trajectory output to HDF5 output. In my small, local test, h5 files were marginally larger than rds.

This should make it much easier to read the trajectories with languages like python, but leaves the default as rds as most folks are just interested in the footprint which is already netcdf.

I provide some example python code for reading in the h5 trajectory in the documentation.

@jmineau
Copy link
Member Author

jmineau commented Sep 18, 2024

I chose to do a check for rhdf5 in write_traj instead of including it in dependencies.r because of the need for bioconductor to install:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rhdf5")

@jmineau jmineau changed the title trajectory output - hdf5 implement cross-platform trajectory output Sep 20, 2024
@jmineau
Copy link
Member Author

jmineau commented Sep 20, 2024

  • Implement parquet option per Ben's suggestion in issue 107
  • Also included the option of setting trajec_fmt <- '' to disable writing trajectory output

@benfasoli
Copy link
Contributor

What do you see as the value in supporting multiple trajectory output formats for users?

Choosing a single format that supports interoperability with other languages makes sense (albeit a breaking change), but supporting I/O compatibility for multiple trajectory file formats adds maintenance burden and additional dependencies to manage. If the benefit is mostly to support user preference, then IMHO it may be better to choose a single STILT-sanctioned output format and make it users' responsibility to convert between STILT's and their own preferred conventions.

@jmineau
Copy link
Member Author

jmineau commented Sep 23, 2024

I suppose I was following the framework of write_footprint.r where the output formats have been implemented for netcdf, csv, and rds, but only netcdf is documented.

I hadn't really intended this to be a breaking change. Since it is an R package, I think it is reasonable that the intermediate output is R-specific (i.e. keeping rds as the default). Though I do believe providing the option for interoperability is important.

Your point on maintenance burden makes sense. I think it is reasonable to remove the h5 implementation, but would argue that leaving rds add no additional cost to this burden. Parquet is considered "stable" as of arrow 1.0.0, so hopefully maintenance is minimal.

I like the additional option of disabling output with '' as likely most users are only interested in the footprints and the trajectories take up disk space - though this prevents being able to rerun footprints.

Ultimately, I lean towards defaulting to rds to prevent a breaking change, while having parquet and '' as documented options. I don't have a problem with only outputting parquet files, but I'm biased as I prefer Python over R. Maybe users like @John-C-Lin, @tartanrunner25, or other R folks coming from Lin Group like Kai (don't know github tag) or @wde0924 might prefer the rds output.

Happy to make changes.

particle <- arrow::arrow_table(traj$particle)

# Write receptor metadata
write_meta(particle$metadata, 'receptor', traj$receptor)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check if this works with multiple receptors. Didn't realize we could pass multiple receptors to simulation_step via stilt_cli

@jmineau
Copy link
Member Author

jmineau commented Oct 2, 2024

My feelings remain the same for the origin repo, but I think I'm going to move forward with a simplified version of the cross-platform output in my own fork. I only provide the option for trajectory output to parquet files. I am also writing the stilt input including the receptor information to a json, for easy machine parsing later. Finally, I added the receptor information to the footprint netcdf attributes.

I'm a bit confused by stilt_cli's ability to pass multiple receptors to simulation_step so I have not tested that (I also have not used the cli - it seems like even though you can pass multiple receptors, HYSPLIT still only outputs one trajectory file?).

Feel free to incorporate the changes you feel are relevant.

@benfasoli
Copy link
Contributor

I'm a bit confused by stilt_cli's ability to pass multiple receptors to simulation_step so I have not tested that (I also have not used the cli - it seems like even though you can pass multiple receptors, HYSPLIT still only outputs one trajectory file?).

Are you thinking for simultaneous release evenly distributed along a line (e.g. a column receptor)? I think that's the only multi-receptor option and a single trajectory file makes sense in that context.

Feel free to incorporate the changes you feel are relevant.

I'm not a maintainer of STILT anymore, so up to you and @John-C-Lin what direction you want to take it. Just chiming in for support and context.

jmineau and others added 9 commits October 3, 2024 14:03
* relative linking

* no need for stilt_wd as parameter to link_files
* num hours per met file

* only call find_met_files twice if met_subgrid is enabled

* implement met_hours using n_hours_per_met_file

* call unique on request and output

* add n_hours_per_met_file param to stilt_cli

* n_hours_per_met_file guidance

---------

Co-authored-by: John-C-Lin <[email protected]>
* processes_per_node

* processes per node docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cross-platform trajectory output
2 participants