implement cross-platform trajectory output #108

jmineau · 2024-09-17T23:44:56Z

resolves #107

I provided functionality to write trajectory output to HDF5 output. In my small, local test, h5 files were marginally larger than rds.

This should make it much easier to read the trajectories with languages like python, but leaves the default as rds as most folks are just interested in the footprint which is already netcdf.

I provide some example python code for reading in the h5 trajectory in the documentation.

jmineau · 2024-09-18T00:06:24Z

I chose to do a check for rhdf5 in write_traj instead of including it in dependencies.r because of the need for bioconductor to install:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("rhdf5")

jmineau · 2024-09-20T20:15:50Z

Implement parquet option per Ben's suggestion in issue 107
Also included the option of setting trajec_fmt <- '' to disable writing trajectory output

benfasoli · 2024-09-23T19:53:59Z

What do you see as the value in supporting multiple trajectory output formats for users?

Choosing a single format that supports interoperability with other languages makes sense (albeit a breaking change), but supporting I/O compatibility for multiple trajectory file formats adds maintenance burden and additional dependencies to manage. If the benefit is mostly to support user preference, then IMHO it may be better to choose a single STILT-sanctioned output format and make it users' responsibility to convert between STILT's and their own preferred conventions.

jmineau · 2024-09-23T22:53:43Z

I suppose I was following the framework of write_footprint.r where the output formats have been implemented for netcdf, csv, and rds, but only netcdf is documented.

I hadn't really intended this to be a breaking change. Since it is an R package, I think it is reasonable that the intermediate output is R-specific (i.e. keeping rds as the default). Though I do believe providing the option for interoperability is important.

Your point on maintenance burden makes sense. I think it is reasonable to remove the h5 implementation, but would argue that leaving rds add no additional cost to this burden. Parquet is considered "stable" as of arrow 1.0.0, so hopefully maintenance is minimal.

I like the additional option of disabling output with '' as likely most users are only interested in the footprints and the trajectories take up disk space - though this prevents being able to rerun footprints.

Ultimately, I lean towards defaulting to rds to prevent a breaking change, while having parquet and '' as documented options. I don't have a problem with only outputting parquet files, but I'm biased as I prefer Python over R. Maybe users like @John-C-Lin, @tartanrunner25, or other R folks coming from Lin Group like Kai (don't know github tag) or @wde0924 might prefer the rds output.

Happy to make changes.

jmineau · 2024-09-25T07:42:58Z

r/src/write_traj.r

+    particle <- arrow::arrow_table(traj$particle)
+
+    # Write receptor metadata
+    write_meta(particle$metadata, 'receptor', traj$receptor)


Need to check if this works with multiple receptors. Didn't realize we could pass multiple receptors to simulation_step via stilt_cli

jmineau · 2024-10-02T22:38:28Z

My feelings remain the same for the origin repo, but I think I'm going to move forward with a simplified version of the cross-platform output in my own fork. I only provide the option for trajectory output to parquet files. I am also writing the stilt input including the receptor information to a json, for easy machine parsing later. Finally, I added the receptor information to the footprint netcdf attributes.

I'm a bit confused by stilt_cli's ability to pass multiple receptors to simulation_step so I have not tested that (I also have not used the cli - it seems like even though you can pass multiple receptors, HYSPLIT still only outputs one trajectory file?).

Feel free to incorporate the changes you feel are relevant.

benfasoli · 2024-10-03T00:50:49Z

I'm a bit confused by stilt_cli's ability to pass multiple receptors to simulation_step so I have not tested that (I also have not used the cli - it seems like even though you can pass multiple receptors, HYSPLIT still only outputs one trajectory file?).

Are you thinking for simultaneous release evenly distributed along a line (e.g. a column receptor)? I think that's the only multi-receptor option and a single trajectory file makes sense in that context.

Feel free to incorporate the changes you feel are relevant.

I'm not a maintainer of STILT anymore, so up to you and @John-C-Lin what direction you want to take it. Just chiming in for support and context.

* relative linking * no need for stilt_wd as parameter to link_files

* num hours per met file * only call find_met_files twice if met_subgrid is enabled * implement met_hours using n_hours_per_met_file * call unique on request and output * add n_hours_per_met_file param to stilt_cli * n_hours_per_met_file guidance --------- Co-authored-by: John-C-Lin <[email protected]>

* processes_per_node * processes per node docs

trajectory output - hdf5

42f5d65

jmineau mentioned this pull request Sep 17, 2024

cross-platform trajectory output #107

Open

parquet trajectory output

1f6509f

jmineau changed the title ~~trajectory output - hdf5~~ implement cross-platform trajectory output Sep 20, 2024

jmineau added 2 commits September 20, 2024 14:25

remove 't' from 'traject_fmt'

e996f04

add note about installing arrow

ee05e64

need to split particle_error before reassigning particle

5ab1d56

jmineau commented Sep 25, 2024

View reviewed changes

jmineau and others added 9 commits October 3, 2024 14:03

relative linking (uataq#105)

5b31c43

* relative linking * no need for stilt_wd as parameter to link_files

Slurm processes_per_node (uataq#103)

bc45316

* processes_per_node * processes per node docs

trajectory output - hdf5

4d782c7

parquet trajectory output

27b1976

remove 't' from 'traject_fmt'

b6875aa

add note about installing arrow

f9ecced

need to split particle_error before reassigning particle

9c1a61d

rebase upstream main into traj_output

8c838e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement cross-platform trajectory output #108

implement cross-platform trajectory output #108

jmineau commented Sep 17, 2024

jmineau commented Sep 18, 2024

jmineau commented Sep 20, 2024 •

edited

Loading

benfasoli commented Sep 23, 2024

jmineau commented Sep 23, 2024

jmineau Sep 25, 2024

jmineau commented Oct 2, 2024

benfasoli commented Oct 3, 2024

implement cross-platform trajectory output #108

Are you sure you want to change the base?

implement cross-platform trajectory output #108

Conversation

jmineau commented Sep 17, 2024

jmineau commented Sep 18, 2024

jmineau commented Sep 20, 2024 • edited Loading

benfasoli commented Sep 23, 2024

jmineau commented Sep 23, 2024

jmineau Sep 25, 2024

Choose a reason for hiding this comment

jmineau commented Oct 2, 2024

benfasoli commented Oct 3, 2024

jmineau commented Sep 20, 2024 •

edited

Loading