Replies: 5 comments 4 replies
-
It's a good question, I agree that it would be nice to have a clearer recommendation for that. I might need to think about it more, but I kind of think of two/three situations: So the code/script storage need might be different for different hubs. But, in any case, I don't think we should be restrictive about code, as it's often small files and it should not cause any issues if stored in "expected" folder and is documented on where to find it. But to answer to your 2 points:
That's my first idea, but I might have miss something. |
Beta Was this translation helpful? Give feedback.
-
I am basically in agreement with @LucieContamin, but I think there's an important broader question of philosophy here. My perspective comes from some history of the development of Internet protocols, such as the Robustness Principle -"be conservative in what you do, be liberal in what you accept from others". In this view, the Hubverse should insist upon a relatively minimal set of required behaviors while still making suggestions as to what we think will work well. This can be enacted through the use of phrasing of requirement levels - such as those used in Internet protocol development - which use phrases like "MUST", "MUST NOT" ,"SHOULD", "SHOULD NOT", "MAY", and others to distinguish between what is required and what is not. Thus if we believe that it's best to not include code in the hub file space, we might say that they "SHOULD NOT" do this, suggesting that "there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full As per Lucie's point, we could certainly say that they "MUST NOT" store code in the model_output folder. |
Beta Was this translation helpful? Give feedback.
-
As usual, I really like @harryhoch's perspective (am thinking here of the Robustness Principle and prior comments about extensibility). I also agree with @LucieContamin's notes. When considering a MUST NOT vs SHOULD NOT recommendation about where to store code/scripts, a factor to consider is the multiple functions of a Hub's repo:
The third item adds additional moving parts that potentially introduce confusion to first two use cases. For example, if the test suite for the target data code suddenly begins failing on the day submissions are due, people will have a bad time. We could certainly mitigate the likelihood of something like that happening (e.g., very targeted GitHub workflows), but there's a complexity penalty to mixing all of these concerns. As the person developing the package for the Variant Nowcast target data, I'd prefer not to worry about impacting hub modelers when jumping in with a bug fix or other time-sensitive update. That's obviously a single perspective about a single hub. But it's worth calling out as a "con" if we decide on a SHOULD NOT recommendation. |
Beta Was this translation helpful? Give feedback.
-
Emily suggested that it would be helpful to catalogue the kinds of code/functionality that hub administrators might create. Here's a go at that:
Here's my take on where discussion on the devteam call landed (others should chime in if I'm not capturing the discussion correctly!):
|
Beta Was this translation helpful? Give feedback.
-
We added additional documentation here to reflect the discussion above: https://hubverse.io/en/latest/user-guide/hub-structure.html |
Beta Was this translation helpful? Give feedback.
-
As we were discussing setting up the forthcoming SARS-CoV-2 Variant Nowcast Hub yesterday, we ran across some guidelines in the hubverse documentation that did not fully jive with our understanding of best practices. The question is around where to include code/scripts that are used to create target data and/or scores for model output in the hub. The hubverse currently says things like:
and
However, should this really be the recommendation, even for scripts that are needed to generate target data that live in the hub? It seemed to be the consensus of people on the call that we were on, that code for generating target data should live in the hub. And perhaps code for generating scores can/should also live in the hub (although we were less sure about this).
This seems at least partially related to ongoing conversations about how to manage automated calculations of score data.
But also just generally, it might be nice to be more specific about
scripts
orcode
folder, or should the scripts/code live in the folder relevant to the data being generated.Early on in the hubverse visioning, there were some folks who felt fairly strongly that only data should live in the repos. I was not one of them, but I just wanted to register that there were some strong voices about this viewpoint.
Beta Was this translation helpful? Give feedback.
All reactions