Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik v2 file provider #135

Closed
wants to merge 19 commits into from

Conversation

alexleach
Copy link
Contributor

@alexleach alexleach commented Jul 20, 2021

Hello,

As discussed in #133 and #97, I have done some work getting jupyterhub-traefik-proxy working with traefik v2. This PR should allow jupyterhub to work with the Traefik v2 file provider, but comes with some caveats (as mentioned in #133 ):-

  • TLS will not be enabled on any traefik router.
  • JupyterHub services will bind to every traefik entrypoint defined in the static configuration.
  • Consul and etcd providers are not functional.

Fixes are available for these (in https://github.com/alexleach/traefik-proxy/tree/Traefik_v2), but this PR aims to minimise the size of the patches. Features can then be introduced incrementally to ease reviewing.

Hope this helps as a starting point for getting traefik v2 support functional anyway! 🙂

fixes 2i2c-org/upstream#29

@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena, hope this works as a starting point.

I'm sure the commit history is not as tidy as it could have been, maybe the rebasing didn't go exactly as planned(!), as I think it includes the tidying commits you already made to the traefik-v2 branch... Not really sure tbh... Should I give it another go and try to tidy up the commit history?

Cheers,
Alex

devnull-mr and others added 5 commits July 26, 2021 11:54
  - Renamed frontends and backedns to routers and services, respectively.
  - Traefik API paths at /api/providers/{provider_name} no longer work, so
    search /api/http/{services|routers} instead
  - Traefik file provider doesn't like arbitrary data (any more?), so have put
    JupyterHub's 'data' object into the dynamic configuration file in separate
    root keys.

To Do:
  - Haven't touched the consul or etcd providers, other than to rename
    'frontends' and 'backends', as above.
  - Test, test, test.

Additionally, have renamed TraefikTomlProxy to TraefikFileProviderProxy, and
edited relevant traefik_utils functions so the proxy provider should now work
with either yaml or toml files. (need to test).

  - jupyterhub_config.py will now need to reference the proxy class
    'traefik_file', e.g.:-

      c.JupyterHub.proxy_class = 'traefik_file'
      c.TraefikFileProviderProxy = '/path/to/rules.toml'
documentation, examples and performance test suite. These are replaced by
`traefik_file` and `TraefikFileProviderProxy`, respectively.
This is a re-work of original commit faa2832
to try and make it a bit more manageable. In this commit, work specifically
on the file provider proxy.

 - Relevant documentation has been updated in README.md, docs/source/file.md
   (renamed from toml.md)

 - requirements.txt removes toml a forced dependency as ruamel.yaml could be
   used instead. This latter module is required by jupyterhub anyway, so should
   already be present on a system running JupyterHub.

 - The external file provider now watches a directory instead of a file, so
   have added a pre-baked dynamic_config directory (and file), where the
   rules.toml file will be saved and managed by the TraefikFileProviderProxy.

 - Have duplicated the toml_proxy and external_toml_proxy pytest fixtures to
   test both toml and yaml files. (Would have preferred to parametrize the
   existing fixtures to avoid duplicating them, but couldn't figure out how).

 - `tests/test_traefik_api_auth.py` - Have had to make the test give traefik
   more of a chance to read its dynamic config before running the test.
   Previously, in traefik v1, the api authentication would be configured in
   the static config. Now the api 'middleware' is configured in the dynamic
   config, the previous wait condition of just waiting for the port to come
   up didn't give traefik enough time to set up the API authentication on
   that entrypoint / router.
@GeorgianaElena
Copy link
Member

Hey @alexleach! This a great starting point, indeed 🎉

I'm sure the commit history is not as tidy as it could have been, maybe the rebasing didn't go exactly as planned(!), as I think it includes the tidying commits you already made to the traefik-v2 branch... Not really sure tbh... Should I give it another go and try to tidy up the commit history?

Rebasing is difficult, especially for a PR this size, so don't worry! I gave it a shot too and rebased it against the traefik-v2 branch and this is how it looks like:
https://github.com/jupyterhub/traefik-proxy/compare/traefik-v2...GeorgianaElena:traefik-v2-file?expand=1

I could push those changes on your branch and update this PR, but I would really appreciate it if before, you could take a look at the changes to make sure I didn't remove any useful info while solving merge conflicts. (The tests pass locally btw, so this is a good sign)

@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena,

thanks for the kind words and for tidying it up to how it should be! I think this looks much better, but I would like to do a git / vim diff between your repository and mine to check how the final working copies compare. Pushing to my branch sounds like a plan, not sure how else to go about it, but maybe now is a good point to raise a couple of points / considerations:-

  • Is renaming TraefikTomlProxy to TraefikFileProviderProxy okay with you? On reflection, I think I prefer just TraefikFileProxy, as they're all 'Providers'. What do you think?
  • I made a mistake and clearly rushed commit 3f5b673. I should have renamed toml.md to file.md, but I instead renamed it yaml.md. This should be amended.
  • Commit 6d8b714 is also a mistake, and shouldn't be in the commit history, where I'd effectively commented out some of the test_ functions, by renaming them _test.
  • The tests in your latest commit fail because of my changes to requirements.txt, where I'd commented out toml and ruamel.yaml. toml probably should be required, and definitely so for the tests, and ruamel.yaml is required by one of the dependencies anyway, so doesn't need to be explicitly required. Therefore, I think my changes to requirements.txt can be removed from the commit history, and we leave toml as a strict dependency. I took a stab at making the toml and ruamel.yaml requirements optional (at least one is required), but wasn't really convinced with my implementation, so probably best just to keep toml as required?

I'll have a more thorough comparison when I can and will get back to you on that anyway, but would be glad to hear your thoughts on the above in the meantime 🙂

KR, Alex

@GeorgianaElena
Copy link
Member

@alexleach, let me know when you feel comfortable with me pushing the rebased version of these changes on your branch.

Some ideas below about some of the points in the previous comment:

Is renaming TraefikTomlProxy to TraefikFileProviderProxy okay with you? On reflection, I think I prefer just TraefikFileProxy, as they're all 'Providers'. What do you think?

I think the shorter version makes sense, as you said they are all providers TraefikFileProxy

I made a mistake and clearly rushed commit 3f5b673. I should have renamed toml.md to file.md, but I instead renamed it yaml.md. This should be amended.
Commit 6d8b714 is also a mistake, and shouldn't be in the commit history, where I'd effectively commented out some of the test_ functions, by renaming them _test.

It should be fairly easy to drop some of the commits, once we've rebased the PR against the latest version of traefik-v2 branch.

The tests in your latest commit fail because of my changes to requirements.txt, where I'd commented out toml and ruamel.yaml. toml probably should be required, and definitely so for the tests, and ruamel.yaml is required by one of the dependencies anyway, so doesn't need to be explicitly required. Therefore, I think my changes to requirements.txt can be removed from the commit history, and we leave toml as a strict dependency. I took a stab at making the toml and ruamel.yaml requirements optional (at least one is required), but wasn't really convinced with my implementation, so probably best just to keep toml as required?

I believe it would make our life much easier if both of them are listed as explicit requirements. At least until we find a reason against it, as it happened for the etcd and consul python clients.

@alexleach
Copy link
Contributor Author

alexleach commented Jul 29, 2021

Hi @GeorgianaElena ,

Thanks for the responses to my comments. I've had a chance to have a look this morning and to compare the HEAD's of our branches both locally and then on github. The default compare view (Ref: 1) shows all the commit history, which actually makes it really difficult to compare!

On github, just comparing the HEAD's by commit ID (Ref: 2), we can see that there are just some minor differences to various version numbers in install.py and then a non-functional change in conftest.py.

I think that only one difference in install.py is relevant at this stage; most of the differences are to version numbers of the consul and etcd shown in the argument parser's epilog, but the differences that have a functional effect are the default versions installed of consul and traefik. The latter change (Ref: 3) is especially important. In your HEAD, it will try and install traefik version 2.2.0 and in mine version 2.4.8. The only checksums present for traefik (in both our HEADs) are for versions 2.4.8, 2.3.7 and 2.2.11, so this needs to be updated. I'm actually surprised your github workflow succeeded when trying to install traefik, maybe it just issued a warning that the checksum wasn't present?

I haven't narrowed down which commit(s) actually introduced these differences, but I suppose that isn't relevant. Now we have found these differences, we can quite easily patch them in a new commit. The patch needs to be made locally it seems, showing the raw patch on github (Ref: 4) shows the full commit history again, which is overkill...

Okay, so a to-do list (please let me know your comments):-

  • I'm happy if you push to my local branch, I think you have a better idea of how to get this to work than I do!
  • Patch all (etc, consul?) version numbers in install.py, or just the relevant one (i.e. traefik?) - as shown in the diff above (Ref: 2)
  • New commit renaming TraefikFileProviderProxy to TraefikFileProxy, or edit one of the initial commits where I renamed TraefikTomlProxy to TraefikFileProviderProxy?
  • Drop erroneous commit mentioned above, re: commenting out test functions (6d8b714)
  • Edit erroneous commit where I renamed toml.md to yaml.md (3f5b673) instead of file.md.
  • Anything else, before moving on to traefik-v2-file-tls?

Please let me know your thoughts and how I can next help!

KR, Alex 🙂

alexleach and others added 8 commits August 16, 2021 16:21
  - fileprovider.py - Reintroduce _start_traefik, removed in commit
    05e7c07
  - install.py - Revert some changes made in commit
    05e7c07.
  - proxy.py - Revert some API changes on the method signatures, which
    broke compatibility with the TKvProxy subclasses.
  - test_install.py - Make sure download traefik v2, as the download URLs have
    changed since v1 (and is reflected in install.py).
  - test_traefik_api_auth.py - No longer test TKvProxy subclasses, they don't
    work
  - test_traefik_utils.py - Remove reference to traefik_utils.persist_routes,
    which no longer exists.
Traefik v2 is now distributed as a compressed archive, not the raw binary,
so needs to be uncompressed before it can be used.
Correct traefik-v2 documentation URLs in install.md.

Reduce line-lengths of f-strings in proxy.py and traefik_utils.py.

Co-authored-by: GeorgianaElena <[email protected]>
@GeorgianaElena
Copy link
Member

Hey @alexleach! Sorry for the long silence, but I had some days off and tried to stay away from the keyboard.

I followed your to-do list above and updated my copy of your branch (great catch of the default traefik version in the installer 🎉). When I tried to push the changes back into your branch, I got a 403 error.

Do you think you can add me as a contributor to your fork? The option should be under https://github.com/alexleach/traefik-proxy, Settings -> Manage Access -> Invite collaborator green button. I believe this should do the trick and allow me to push to your fork directly. Alternatively I think I could open a PR to your fork?

@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena ,

Thanks for getting back to me, I was wondering about the long silence, but no worries! Hope you had an enjoyable and well-deserved break from the keyboard! 😄 🌞 🏖️

You've probably got the invitation to be a collaborator by now. I imagine that this is a better idea, as I think it will allow you to tidy up the commit history (i.e. to delete / edit past commits), whereas opening a PR I think would probably allow you to add commits. Not sure tbh, but either way, you should be able to edit my fork as you like now!

Thanks for going through the check-list. Let me know how I can help next!

KR, Alex

@alexleach alexleach mentioned this pull request Oct 6, 2021
Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience @alexleach and sorry again for taking so long for a follow up.

I left a few comments and would love for your feedback on some of the questions there.
In the meanwhile, I'll do some more cleanup and address some of suggestions there.
(Also, I renamed the file proxy file back to toml.py for a better diff, so the tests are failing.)

"rules.toml", config=True, help="""traefik's dynamic configuration file"""
)

dynamic_config_handler = Any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a short description about the dynamic_config_handler usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting this edit in my working tree:-
dynamic_config_handler = Any(help="""An instance of traefik_utils.TraefikConfigFileHandler""")

jupyterhub_traefik_proxy/toml.py Show resolved Hide resolved
Comment on lines +219 to +221
self.dynamic_config["jupyter"]["routers"][router_alias] = {
"data": data
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the jupyter entrypoint created just to have a place to store data?

Add the data node to a separate top-level node, so traefik doesn't complain

@alexleach, do you remember what kind of error did traefik threw when trying to store data under http.routers.router_alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've just tested this, by adding the data node instead to http/routers/${router_alias}/data, and this error is produced:-

ERRO[2021-10-26T10:54:15Z] Error occurred during watcher callback: field not found, node: data providerName=file

I don't remember testing out different options thoroughly, but have just spent loads of time trying to add the data node to http/jupyter/routers/${router_alias}/data, and this also fails, with:-

ERRO[2021-10-26T13:17:10Z] Cannot start the provider *file.Provider: field not found, node: jupyter

So yes, the data node needs a separate top-level place to go, at which point traefik does not raise an error.

Comment on lines +269 to +277
# If empty, delete the keys
if not self.dynamic_config["http"]["routers"]:
self.dynamic_config["http"].pop("routers")
if not self.dynamic_config["http"]["services"]:
self.dynamic_config["http"].pop("services")
if not self.dynamic_config["http"]:
self.dynamic_config.pop("http")
if not self.dynamic_config["jupyter"]["routers"]:
self.dynamic_config["jupyter"].pop("routers")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is traefik complaining if the keys don't have a value?


return backend_entry
def generate_service_weight_entry( proxy, service_alias, separator="/"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def generate_service_weight_entry( proxy, service_alias, separator="/"):
def generate_service_weight_entry(proxy, service_alias, separator="/"):

Comment on lines +145 to +147
class TraefikConfigFileHandler(object):
"""Handles reading and writing Traefik config files. Can operate
on both toml and yaml files"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on offering support for both file formats @alexleach!
I'm wondering if this is something that's sustainable in the long run (from the support perspective) 👀

Comment on lines +170 to +172
def dump(self, data):
with open(self.file_path, "w") as f:
self._dump(data, f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency:

Suggested change
def dump(self, data):
with open(self.file_path, "w") as f:
self._dump(data, f)
def dump(self, data):
"""Depending on self.file_path, call either yaml.dump or toml.dump"""
with open(self.file_path, "w") as fd:
self._dump(data, fd)

Comment on lines +5 to +7
# toml is now optional, as can use yaml configuration files instead now...
#toml
#ruamel.yaml
Copy link
Member

@GeorgianaElena GeorgianaElena Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe choose to go with a default configuration file type and install that? Not installing either of them gives me the impression that we're providing an incomplete package... Not sure what's the standard practice for this though.

@@ -36,14 +36,40 @@ def pytest_configure(config):


@pytest.fixture
# There must be a way to parameterise this to run on both yaml and toml files?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


rc = await api_login()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this and the assert below are not needed anymore since we're already checking the return code in cmp_api_login.

@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena ,

I'm having a look at this now. I had to make sure of course that I'm looking at the same commit, and will start running some tests to give accurate answers to some of these questions.

Of course the tests don't work as is, as fileprovider.py is now toml.py. What I've done is create a sym-link from file.py to toml.py, and edited __init__.py to import from file, instead of from fileprovider, which I think is tidier and in-line with earlier conversation to use file instead of fileprovider. Anyway, I'll hope to help as much as I can while I have a chance today.

Cheers,
Alex

@minrk
Copy link
Member

minrk commented Feb 7, 2023

closed by #145

@minrk minrk closed this Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants