-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jupyterhub: update recommended paths for public share folders #481
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exactly what is causing the conflict or how to work around it (been a while I didn't tackle this complicated volume mount problem), but the PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
is purposely set to public
to align with Weaver's public output location, which both are configured together, and if not, uses the same public
default. Whether set or not, both location should change and still resolve the same paths.
The idea is that inside Jupyter, you could access the publicly shared files (ie: the ones dumped in .../wpsoutputs/public
), allowing fast access to data rather than going through the URL and loop-back. However, those are read-only given their public share to avoid everyone editing them.
The public location is automatically picked if running a Weaver process (that is allowed publicly) while not being logged in, or explicitly requesting to make it public (with X-Wps-Output-Context
header). Otherwise, it places the wps/job outputs under the nested dir by user-id, which should be mounted as user-workspaces by Cowbird in Jupyter.
So, unless I misunderstand the conflict here, setting explicitly different paths is not really a fix? It avoids the feature entirely.
@fmigneault this doesn't affect the directories created by cowbird and used by weaver and the WPS birds. This is a recommended change to the mechanism that jupyterhub uses to share files between users. |
@mishaschwartz The following configurations mount the volume in cowbird purposely so it monitors the creation/removal/update of jupyter directories within user-workspaces/public-share, and applies the corresponding Magpie permissions to the relevant services matching the cowbird mappings such that those same files have (from the point of view of the user), the same set of permissions/access-level via the other interfaces (THREDDS, WPS-Outputs, etc.). birdhouse-deploy/birdhouse/components/cowbird/default.env Lines 71 to 72 in 19d5401
birdhouse-deploy/birdhouse/components/jupyterhub/jupyterhub_config.py.template Lines 126 to 129 in 19d5401
birdhouse-deploy/birdhouse/components/weaver/config/magpie/weaver_hooks.py.template Lines 35 to 50 in 19d5401
I don't see which other purpose that dir could have. |
We're talking about this code, not the cowbird paths. This has nothing to do with wps outputs or cowbird managed file permissions: birdhouse-deploy/birdhouse/env.local.example Lines 383 to 432 in 96bb421
Cowbird should not interact with these paths and should not change permissions. If a user with the username "X" puts a file in their |
To clarify further... the feature that is enabled by the jupyterhub code snippet in It was never intended to be used with the cowbird feature. It is a workaround that allows sharing files between users that should probably be deprecated as soon as we have a better alternative. But for now, if you want to enable this feature and cowbird, you need to make sure their mount points do not overlap or you will encounter an error as reported in #392 |
What I mean is that the paths were purposely selected to match those "conveniently", so Cowbird can be enabled as a drop-in replacement of the basic sharing feature, and add the advanced permission sync capabilities. I agree, they are not used together. It is one or the other. But the idea is that you don't need to completely refactor the entire data/workspace directory hierarchy when switching feature. If both directories are maintained simultaneously with different paths, it just creates 2 distinct hierarchies that have different capabilities (which users are not fully aware of the implications), and that just causes confusion. So what I'm saying is that changing the path is not a "proper fix" for #392. It's a workaround (which might be valid), but I think we must further resolve the switch to Cowbird capabilities to make it work as intended instead. Using the "deprecated" approach should not be recommended. The message should be clear about that (they should never be enabled simultaneously).
In a way, the fact that errors are raised by conflicting permissions is a good indication that something is misconfigured, outside the intended approach. |
Can you explain this more please. What additional changes need to be added to cowbird to "make it work as intended"?
Sure, I agree with this. I'm happy to just delete the entire section from env.local.example. |
Cowbird is enabled by default, and its job is to manage cross-service access to corresponding resources. Cowbird's directory monitoring is supposed to work with Jupyter sessions. So, unless something is broken in that feature, I don't see a reason why users should care about the name of that directory duplicating the feature. No conflict if there's no duplicate. |
So to clarify, the user experience for publicly sharing a file in cowbird is exactly the same:
|
FYI, I have not had time to look at this yet. Was sick yesterday and recovering today. Just quickly the existing poor-man sharing is what we have been using since Cowbird did not exist before and it served us very well. No problem to update the default so both can play nice together but I'd rather keep it than removed it since it is 100% reliable so far and is what our current users are used to. I'd rather not have support calls from our rather large user base because of behavior change, so that's why I'd like to keep the existing poor-man sharing. It is not activated by default anyways so no problem there.
This is very interesting !!! Is this compatible with the existing data from the poor-man sharing? Meaning before Cowbird, use the poor-man sharing is activated. With Cowbird enabled, poor-man sharing disabled, and Cowbird taking over the existing poor-man sharing file path, can Cowbird show the old data from poor-man sharing? Meaning the switch is completely seemless to the user? |
Yes. That is the intent. But you're not limited to "public". You can have a "users/user-x" hierarchy that is deny-access by default (except for "user-x"), and have "user-y" granted access to the user-workspace of "user-x" (eg: colleagues working together). So, they end up with a shared space, but it's not shared to everyone (public) on the platform. And because Cowbird works by events with Magpie, you could have very fancy webhook configurations based on any actions on the services. The user-workspace uses a watchdog to trigger on-create/delete/updated of the file-system. Removing a permission on Magpie side will take effect as a reaction to remove the link that virtually gives access to the file via Jupyter user-workspaces (and vice versa). I'm fine with keeping the current alternative so we have a fallback until ready to change. Just have to keep in mind that using different paths un purpose will require messing with the vars and actual file hierarchy when doing the switch. However, that allows reverting without having to chown/chmod all the files (the "conflict" mentioned by the PR). |
Ok great. But is this actually implemented in any way at the moment or does this system of webhooks need to be created? I'm asking because I'm having a really hard time finding this in any of the documentation:
and I'm not able to get this functionality working on a test instance either. |
The permission-sync mappings of webhooks and the service "handlers" being monitored by Cowbird are defined here: The FS monitor function are (mostly) defined here: Magpie webhooks reception / trigger of user/group/permission modifications are done here: What each handler does and how it interacts with the others is very custom-made for the platform, so you have to dig in the code to understand what actually happens. |
Just so that we're on the same page:
Please... With cowbird as it is currently set up in birdhouse-deploy can users share files?
If the answer is yes ... what actions do users need to take to share files? |
[x] I don't know It was in the process of being implemented, but our development funding ended about at the same time. So, I do not have a complete overview of the current state and latest code/configs applied. I believe all the triggers and configs for the main user-workspace directory of each user in jupyter is configured, such that fs-monitor events will sync permissions of the files in the directories with the other services as applicable. The nested "user/user-x" sharing might not be fully completed. Sharing between specific users might not be configured yet. |
Ok thank you Let's see if @ChaamC can help us out with what has or hasn't been implemented and we'll continue the discussion from there. |
It's been a while, but I looked through a bit in the history. I think most of the work on that subject is from birdhouse-deploy#360 and cowbird#40, where Cowbird shares automatically the generated wps outputs to either the related user's workspace, or the public workspace if it was generated in a path defined as 'public'. The public workspace is also mounted as 'ro' to prevent modification on files. I don't think there is currently any ways to share files between users. |
@ChaamC @mishaschwartz @tlvu I don't foresee having time to configure this anytime soon, so unless one of you wants to give it a shoot, we can live with the poor-man sharing solution in the meantime. |
No problem for Ouranos to stay with the poor-man sharing since that feature has been working flawlessly and we currently do not have demand for specific user sharing yet. |
OK thanks everyone. I'll make an issue for this shortly so that we can make sure this gets looked at eventually. Going back to this PR... since we're going to have to stick with the poor-man sharing solution for now, is everyone happy with this PR or do you recommend any changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Overview
The recommended public share folders in the
env.local.example
file create a conflict with the defaultPUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
path when both are enabled and mounted on a Jupyterlab container. This change updates the recommended paths for the public share folders to avoid this conflict and adds a warning helping users to avoid this conflict.Note: the conflict arises when
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR
is mounted to a container as read-only volume and then Jupyterhub tries to mount the public share folder within that volume. Since the parent volume is read-only, the second volume mount fails.Changes
Non-breaking changes
None, documentation only
Breaking changes
None
Related Issue / Discussion
Additional Information
Links to other issues or sources.
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false