Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik v2 #133

Closed
wants to merge 11 commits into from
Closed

Traefik v2 #133

wants to merge 11 commits into from

Conversation

alexleach
Copy link
Contributor

As mentioned in #97, I've done quite a bit of work to try and get https://github.com/jupyterhub/traefik-proxy working with Traefik v2.

I think this should be kept (at least for now) as a separate branch, as Traefik 1.7 is still being actively maintained. Hopefully this is a good start to getting out a release compatible with Traefik v2.

To do:-

  • Maybe add some more unit tests for the code added to traefik_utils.py.
  • Anything else? Please comment...

Desired:-

  • I think the KV storage logic could be made more efficient. e.g. To get a routespec, two sequential get requests against the KV store are needed, one to get jupyterhub/routes/{routespec}, which gives the value {target} and another to get jupyterhub/targets/{escaped_target}, which has the value {data}. This to me seems like it could be reduced to a single get request, as {target} is already known by traefik and doesn't need to be stored separately from the traefik config.
  • I've tried not to change the API at all, to keep the tests working and to maintain backwards compatibility. However I do think that the TraefikFileProviderProxy, TraefikEtcdProxy and TraefikConsulProxy classes could be somewhat optimised. Not only by changing the layout of the KV storage logic as mentioned above, but there just seems to me a lot of unnecessary functions (also in traefik_utils.py), that might be unnecessary. I wanted to get a Traefik v2 implementation passing all tests before spending time optimising any of this though.

devnull-mr and others added 8 commits April 8, 2020 22:58
  - Renamed frontends and backedns to routers and services, respectively.
  - Traefik API paths at /api/providers/{provider_name} no longer work, so
    search /api/http/{services|routers} instead
  - Traefik file provider doesn't like arbitrary data (any more?), so have put
    JupyterHub's 'data' object into the dynamic configuration file in separate
    root keys.

To Do:
  - Haven't touched the consul or etcd providers, other than to rename
    'frontends' and 'backends', as above.
  - Test, test, test.

Additionally, have renamed TraefikTomlProxy to TraefikFileProviderProxy, and
edited relevant traefik_utils functions so the proxy provider should now work
with either yaml or toml files. (need to test).

  - jupyterhub_config.py will now need to reference the proxy class
    'traefik_file', e.g.:-

      c.JupyterHub.proxy_class = 'traefik_file'
      c.TraefikFileProviderProxy = '/path/to/rules.toml'
…ng initial

work done in previous commit:-

41a5ef3

Also, see issue: jupyterhub#97

 - Relevant documentation has been updated in README.md, docs/source/file.md
   (renamed from toml.md)

 - KV providers (consul and etcd) have been updated to use new Traefik KV
   paths, as per:-
   https://doc.traefik.io/traefik/reference/dynamic-configuration/kv/

 - requirements.txt now makes toml an optional dependency as well as
   ruamel.yaml. This latter module is required by jupyterhub anyway, so should
   already be present on a system running JupyterHub.

 - The (thoroughly excellent) test system has had a bit of an overhaul:-

   - There were issues with repeated launching and shutting down of external
     `consul` and `etcd` servers, causing test failures. Added code to
     gracefully shutdown consul and wait for servers to launch (within
     `tests/conftest.py`) before continuing with the tests.

   - `traefik` v2 no longer has a 'storeconfig' command. Therefore, to load a
     pre-baked configuration file into the KV stores, have had to resort to
     loading configurations with either `etcdctl txn` or `consul kv import`
     commands.

   - The external file provider now watches a directory instead of a file, so
     have added a pre-baked dynamic_config directory (and file), where the
     rules.toml file will be saved and managed by the TraefikFileProviderProxy.

   - Removed the previous traefik_{consul_config,etcd_config}.toml files,
     (which acted as traefik static configuration files), and instead applied
     the static KV configuration using the CLI.

   - Refactored some of the text fixtures to try and re-use fixtures and make
     them (hopefully) a bit easier to follow.

   - Have duplicated the file_proxy and external_file_proxy pytest fixtures to
     test both toml and yaml files. (Would have preferred to parametrize the
     existing fixtures to avoid duplicating them, but couldn't figure out how).

   - `tests/test_traefik_api_auth.py` - Have had to make the test give traefik
     more of a chance to read its dynamic config before running the test.
     Previously, in traefik v1, the api authentication would be configured in
     the static config. Now the api 'middleware' is configured in the dynamic
     config, the previous wait condition of just waiting for the port to come
     up didn't give traefik enough time to set up the API authentication on
     that entrypoint / router.

 - All tests now pass on my dev system, woohoo!
documentation, examples and performance test suite. These are replaced by
`traefik_file` and `TraefikFileProviderProxy`, respectively.
…elation to

issue: jupyterhub#97

Although all tests passed on my test system, the github workflow tests failed with the
etcd tests, for the following reasons:-

    - Hadn't actually tested etcd-3.4.15, the default version installed by the
      install script. I'd used the default ubuntu version, 3.2.26. I'm not sure
      if this caused issues (maybe?, see next point), but some new warnings are
      emitted about the log parameters for instance.

    - The tests mainly failed due to what I thought was a nasty race-condition
      between successive TraefikEtcdProxy test fixture calls. The `proxy()`
      fixture in `tests/test_proxy.py` doesn't appear to fully destroy the
      dependent Proxy classes between test runs. In the case of
      TraefikEtcdProxy, this leaves the etcd3/grpc client open through the end
      of one test and into the next test. Then the next TraefikEtcdProxy test
      gets a connection error.  I don't know why only one grpc client is
      allowed - is this related to the etcd version? - but regardless, the less
      than ideal solution is to manually call
      `TraefikEtcdProxy.kv_client.close()` during the teardown of the
      external_etcd* test runs. This is also manually called now by
      `TraefikEtcdProxy.stop()`, when `should_start=True`.
            (This took me days to figure out!!)

Some further modifications to the code include:-

    - Changed the KV store structure for jupyterhub-specific information. All
      jupyterhub routespecs, targets and data are now stored as:-

          jupyterhub/
                   -/routes/{escaped_routespec} = {target}
                   -/targets/{escaped_target}   = {data}

      N.B. I think this can be condensed to one single request. Atm, to get the
      {data} for a routespec, two KV get requests are required: 1. to get the
      {target}, and 2. to get the {data} using that {target}. The {target} is
      also in the traefik configuration, so it seems like unnecessary
      duplication of information and redirection.

    - Added `log_level` and `traefik_log_level` config parameters to the base
      Proxy class. The first sets the log level for the logger (and handler)
      of the Proxy class. The latter sets traefik's log level, if
      `should_start=True`.

    - General tidy up, removing excessive debug statements and commented-out
      code.
077a625, and later. The change in this commit
made `jupyterhub_traefik/install.py` not download any versions of traefik, consul
or etcd where the checksums were not present.

Further, (and this was a change made in a later commit
faa2832), the default version of etcd was
updated to 3.4.15 and although this was present in the checksums, for some
reason it did not install properly in the github workflow.

Finally, updated some assert statements in tests/proxytest.py, to instead
run `assert_equal(val, cmp)`, which makes the pytest error message easier to
follow.
`proxy.kv_separator` in the wrong place, causing faulty target lookups
in the etcd database.
@welcome
Copy link

welcome bot commented Jun 23, 2021

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@GeorgianaElena GeorgianaElena changed the base branch from master to traefik-v2 June 24, 2021 08:04
Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great GREAT! Thanks for this amazing work @alexleach 🎉

I think this should be kept (at least for now) as a separate branch, as Traefik 1.7 is still being actively maintained

Yes, you're right, that would be the best choice. I've created a new branch trafik-v2 and edited this PR to be opened against this branch instead of master.

However I do think that the TraefikFileProviderProxy, TraefikEtcdProxy and TraefikConsulProxy classes could be somewhat optimised

They definitely need some ❤️ But as you said, this PR is already big as it is, so we could do these once this is done. But, please feel free to open issues with any improvement ideas you have.

I will do my best to review these changes in the following days. Thanks once again @alexleach!

@alexleach
Copy link
Contributor Author

Thanks for the positive feedback, very encouraging!

I'm now trying to use this in my own (first) jupyterhub set up, and have a couple of issues that I need to iron out. This will affect the code in this branch, so I think there may need to be some design decisions, and further configuration settings. I'll explain my setup first, and then go into the issues.

Architecture
My external traefik instance actually has four entryPoints:-

  1. ssh, on port 22
  2. web, on port 80
  3. websecure, on port 443
  4. traefik, on port 8099

I've configured Jupyterhub to authenticate using https://github.com/jupyterhub/oauthenticator. OAuth requires a redirect URL with https, so I need the jupyterhub routers to listen on my websecure entryPoint, with TLS configured on the traefik router(s). traefik will terminate the TLS and route to an http (i.e. non-https) jupyterhub service.

Limitations

There are a couple of issues with the current traefik-v2 implementation, as I see it:-

  • Traefik v2 has done away with the defaultEntrypoint option. Now, entrypoints are configured separately on each router instance. Currently, jupyterhub_traefik_proxy doesn't add any entryPoints to the routers, so the routers listen on every entryPoint. i.e. My jupyterhub-configured routers listen on ssh, web and websecure entrypoints. (Not the traefik entrypoint for some reason.)
  • TLS has to be configured on the traefik routers as well. If we want jupyterhub to listen on both http and https, then two routers are required. I haven't configured jupyterhub with an SSL key/certificate pair, as have a default certificate store in traefik. However, I need the jupyterhub-configured traefik router's to be TLS-enabled, for OAuth to work.

Proposals

I think a couple of configurable options need to be added to the base TraefikProxy class:-

# If not configured, listen on every entrypoint
traefik_entrypoint = Unicode(
     config=True, help="""The default entrypoint name, to assign to each Traefik router"""
)

# If set to True, enable TLS on every router jupyterhub_traefik_proxy configures
traefik_tls = Bool(
    False, config=True, help="""Enable TLS on the jupyterhub-configured Traefik routers."""
)

Is a single traefik entrypoint enough? Are there instances where jupyterhub should have both http and https entrypoints? I think not, but maybe if jupyterhub-traefik-proxy starts traefik and certificates are configured, then port 80 should be redirected to port 443?

I am going to add the two options above and try and get them working on my setup. Ideally, further unit tests would be added to test these options though, and maybe the redirect feature can be added later.

Thoughts, comments and suggestions welcome!
Tx

… routers.

As mentioned in
jupyterhub#133 (comment)
configurable options have been added to the base TraefikProxy class:-

 - TraefikProxy.traefik_tls
    Setting this to `True ` will set the traefik tls option on each
    jupyterhub-configured traefik router
 - TraefikProxy.default_entrypoint
    A string that should match the traefik entryPoint, for which each
    jupyterhub-configured service will be attached.
 - TraefikProxy.traefik_entrypoints
    A list of strings, where each entrypoint should match traefik-configured
    entrypoints. If empty, then just use the default_entrypoint

I'm not sure if this is the best way to do this to be honest. It looks like
jupyterhub's `bind_url` predicates use of a single port anyway. So, is only a
default_entrypoint necessary? Maybe these configuration options aren't required
and should just be figured out from a combination of the other jupyterhub
configuration variables (e.g. is bind_url http or https), and pulled from
existing jupyterhub configuration variables anyway?
Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great @alexleach! I did an initial pass and left a few comments and questions, but I'm planning to test this by hand to better understand the new traefik architecture and API and hopefully provide more useful feedback.

Is a single traefik entrypoint enough? Are there instances where jupyterhub should have both http and https entrypoints? I think not, but maybe if jupyterhub-traefik-proxy starts traefik and certificates are configured, then port 80 should be redirected to port 443?

I don't think so either. At some point, I tried embedding in the proxy, traefik's Let's Encrypt support and I think I made the same assumption (either a http or a https entrypoint). Ref here.

@@ -69,7 +69,7 @@
## Enabling traefik-proxy in JupyterHub


[TraefikTomlProxy](https://github.com/jupyterhub/traefik-proxy/blob/master/jupyterhub_traefik_proxy/toml.py), [TraefikEtcdProxy](https://github.com/jupyterhub/traefik-proxy/blob/master/jupyterhub_traefik_proxy/etcd.py) and [TraefikConsulProxy](https://github.com/jupyterhub/traefik-proxy/blob/master/jupyterhub_traefik_proxy/consul.py) are custom proxy implementations that subclass [Proxy](https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/proxy.py) and can register in JupyterHub config using `c.JupyterHub.proxy_class` entrypoint.
[TraefikFileProviderProxy](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/jupyterhub_traefik_proxy/fileprovider.py), [TraefikEtcdProxy](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/jupyterhub_traefik_proxy/etcd.py) and [TraefikConsulProxy](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/jupyterhub_traefik_proxy/consul.py) are custom proxy implementations that subclass [Proxy](https://github.com/jupyterhub/jupyterhub/blob/Traefik_v2/jupyterhub/proxy.py) and can register in JupyterHub config using `c.JupyterHub.proxy_class` entrypoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the new branch is called traefik-v2

Suggested change
[TraefikFileProviderProxy](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/jupyterhub_traefik_proxy/fileprovider.py), [TraefikEtcdProxy](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/jupyterhub_traefik_proxy/etcd.py) and [TraefikConsulProxy](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/jupyterhub_traefik_proxy/consul.py) are custom proxy implementations that subclass [Proxy](https://github.com/jupyterhub/jupyterhub/blob/Traefik_v2/jupyterhub/proxy.py) and can register in JupyterHub config using `c.JupyterHub.proxy_class` entrypoint.
[TraefikFileProviderProxy](https://github.com/jupyterhub/traefik-proxy/blob/traefik-v2/jupyterhub_traefik_proxy/fileprovider.py), [TraefikEtcdProxy](https://github.com/jupyterhub/traefik-proxy/blob/traefik-v2/jupyterhub_traefik_proxy/etcd.py) and [TraefikConsulProxy](https://github.com/jupyterhub/traefik-proxy/blob/traefik-v2/jupyterhub_traefik_proxy/consul.py) are custom proxy implementations that subclass [Proxy](https://github.com/jupyterhub/jupyterhub/blob/traefik-v2/jupyterhub/proxy.py) and can register in JupyterHub config using `c.JupyterHub.proxy_class` entrypoint.

@@ -78,13 +78,13 @@ you can load a specific config file and start JupyterHub using:
$ jupyterhub -f /path/to/jupyterhub_config.py
```

There is an example configuration file [here](https://github.com/jupyterhub/traefik-proxy/blob/master/examples/jupyterhub_config.py) that configures JupyterHub to run with *TraefikEtcdProxy* as the proxy and uses dummyauthenticator and simplespawner to enable testing without administrative privileges.
There is an example configuration file [here](https://github.com/jupyterhub/traefik-proxy/blob/Traefik_v2/examples/jupyterhub_config.py) that configures JupyterHub to run with *TraefikEtcdProxy* as the proxy and uses dummyauthenticator and simplespawner to enable testing without administrative privileges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

jupyterhub_traefik_proxy/consul.py Outdated Show resolved Hide resolved
provider_config = {
"consul": {
"rootKey": self.kv_traefik_prefix,
#"watch": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the watch option gone in v2 or was it not doing anything already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certain providers still have a watch option (ie. docker, file, marathon, rancher) , but not the consul provider. IIRC, traefik throws an error if you try and add watch where not expected.
Ref: https://doc.traefik.io/traefik/reference/static-configuration/file/

Comment on lines +87 to +88
# A: Although defined in the traefik docs, they appear to
# do nothing, and CONSUL_HTTP_TOKEN needs to be used instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember having issues with this :( I think username/password were documented in the traefik v1 docs too (and they're clearly in the v2 docs https://doc.traefik.io/traefik/providers/consul/#username) but I couldn't make them work in practice.

Does username/password auth works in Traefik v2 now? It looks like CONSUL_HTTP_TOKEN is still being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I tried with user / password auth, but with no success. I referenced a github comment where it mentions traefik uses libkv for the consul provider. For some reason, this needs the environment variable, and the arguments passed to traefik don't actually appear to do anything...

self._generate_htpassword()
api_url = urlparse(self.traefik_api_url)
api_path = api_url.path if api_url.path else '/api'
api_credentials = "{0}:{1}".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use f-strings here also to be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, yes! I have been converting many of the %s style format-strings to f-strings too, so may as well here. Thanks

return "/".join(
[proxy.kv_traefik_prefix, "http", "routers", router_alias, "service"]
)
#return proxy.kv_traefik_prefix + "routers/" + router_alias + "/service"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#return proxy.kv_traefik_prefix + "routers/" + router_alias + "/service"

proxy.kv_traefik_prefix + frontend_rule_entry + separator + "rule"
router_rule_entry = separator.join(
[proxy.kv_traefik_prefix, router_rule_entry, "rule"]
#proxy.kv_traefik_prefix + router_rule_entry + separator + "rule"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#proxy.kv_traefik_prefix + router_rule_entry + separator + "rule"

Comment on lines 178 to 179
def persist_dynamic_conf(file_path, routes_dict):
# FIXME: Only used by fileprovider, remove?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confused me a bit since there's another similarly named proxy specific method persist_dynamic_config. But this one from utils doesn't look to be used anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just double checked and it isn't used any more. Was thinking about merging those methods into the TraefikProxy base class tbh. Either way, I have removed those standalone functions from traefik_utils.py in my working copy now. I did think that they might make nice static or class methods maybe, but it's been too long since I've written either...

@@ -31,14 +31,14 @@
"metadata": {},
"outputs": [],
"source": [
"toml_df_concurrent = pd.read_csv('~/results/toml_methods_concurrent.csv')\n",
"file_df_concurrent = pd.read_csv('~/results/file_methods_concurrent.csv')\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be modifying anything performance-related yet. With traefik's v2 architecture change, I expect the results to be slightly different, so we definetly need to re-run the perf tests with the new traefik version (maybe even chage/improve them).
However, I think this should be tackled in a different PR. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It would definitely be good to compare any performance differences between the two traefik major versions, and ofc this branch of jupyterhub-traefik-proxy vs master. I haven't touched or even run the performance suite yet, other than to replace any mention of 'toml' with 'file'. I doubt there would be much of a performance difference between the toml and yaml file providers (so maybe worth running both?), but you never know...

In terms of this branch vs master, I'd suspect master to be more performant at this stage, as I feel I've added various conditional statements without removing any. Further, especially wrt. traefik startup, the separation of various traefik configuration options (e.g. the API auth settings) from the static config into the provider config, adds quite a significant delay in _wait_for_static_config.

@alexleach
Copy link
Contributor Author

Hi, I've saved your suggested changes (although I haven't touched the logger, which I'm not really too sure on atm tbh, see response to your comment) to my working tree and hope to commit that soon to my traefik-v2-tls branch, assuming the test suite passes.

What I'm currently doing is running the test suite with all the proxytest fixtures using public_url set to an https location, so traefik will use an auto-generated / self-signed certificate. I hadn't tested this previously, and ofc a lot of the tests failed. Now, should the test suite be run twice? i.e. once with http, and once with https? I'm currently running one at a time, as it already takes about an hour to run in full!

Is a single traefik entrypoint enough? Are there instances where jupyterhub should have both http and https entrypoints? I think not, but maybe if jupyterhub-traefik-proxy starts traefik and certificates are configured, then port 80 should be redirected to port 443?

I don't think so either. At some point, I tried embedding in the proxy, traefik's Let's Encrypt support and I think I made the same assumption (either a http or a https entrypoint). Ref here.

Having had a good think about this, and using jupyterhub with traefik in practice for the last week or so, I think the best course of action is to determine the desired entrypoint scheme from public_url, and when using an external traefik instance, to query traefik's API for the active entrypoints. A cross-comparison of the running traefik entrypoints to the public_url will inform the TraefikProxy instance which entrypoint to use. I've implemented this in my working tree and hope to push to my traefik-v2-tls branch soon.

I created a new branch, because I thought there was enough going on with this PR already! But TLS support is definitely required in any production setting, so I think it's important to get it right. Once I'm happy with that branch, I'll merge it into my traefik-v2 branch, and I guess it will be appended to this PR.

…with the

traefik entrypoint.

    * Removed the import of all TraefikProxy derived classes in
      jupyterhub_traefik_proxy/__init__.py. Only one needed at a time, never
      all of them.
    * When using an external traefik proxy, the TraefikProxy classes will now
      query traefik's API for the entrypoint that matches the assigned
      public_url. See `TraefikProxy._get_traefik_entrypoint()`.
    * Only a single entrypoint is allowed by this manner (reverting suggested
      change in previous commit 3501d08) to
      have multiple, configurable entrypoints.
    * Done away with configurable option `traefik_tls`, instead determining it
      from the scheme of `public_url`, i.e. http or https.
    * Fixed docstrings in kv_proxy.py, if sphinx complained.
    * Removed the `debug` Boolean trait from TraefikProxy, instead relying on
      `traefik_log_level` and `log_level`, for setting traefik's log level, and
      the TraefikProxy log level, respectively.
    * Documentation fixes: updated docs/sphinxext/autodoc_traits.py to match
      master, at https://github.com/jupyterhub/autodoc-traits/
    * Implemented various suggested changes by @GeorgianaElena in
      jupyterhub#133
    * Removed the standalone configuration loading and dumping functions from
      traefik_utils, leaving them only in TraefikConfigFileHandler.
    * Updated test suite so works when testing https URLs (without validation,
      as traefik will self-sign certificates) as well as http URLs. Currently,
      proxytest.py will do either http or https. Should it run both?
@alexleach alexleach closed this Jul 2, 2021
@alexleach alexleach deleted the Traefik_v2 branch July 2, 2021 12:35
@alexleach alexleach restored the Traefik_v2 branch July 2, 2021 12:40
@alexleach
Copy link
Contributor Author

Ah, I changed the name of my Traefik_v2 branch to traefik-v2, and it closed this PR... Not my intention, just wanted to make my branch name consistent with yours...

@alexleach alexleach reopened this Jul 2, 2021
Remove old, commented-out code, as per @GeorgianaElena's review

Co-authored-by: Georgiana Elena <[email protected]>
@alexleach
Copy link
Contributor Author

Hiya,

So, in my previous big commit 50e59d6, I tried to remove any extraneous options with respect to entrypoints and TLS configuration. I thought a smart way of doing this was to check the scheme (http or https) in Proxy.public_url and enable TLS on that basis.

In production however, my configuration file just sets c.JupyterHub.bind_url, with TraefikProxy.public_url deriving from this. Now, in https://github.com/jupyterhub/jupyterhub/blob/main/jupyterhub/app.py, JupyterHub.bind_url is validated, and the validation code changes https to http, if I haven't set c.JupyterHub.ssl_cert. This is a bit annoying, as I load the SSL certificate in traefik externally - JupyterHub doesn't need to be aware of my SSL certificate, only the proxy - but now my JupyterHub traefik routers don't have TLS enabled. 🙄

So, there's still a bit of work to do here:-

  • Add an option to enable TLS with externally managed certificates (e.g. bring back in a traefik_tls configuration option?). Alternatively, can we override that validation code in JupyterHub.app somehow?
  • Further, the unit tests don't cover the case where jupyterhub_traefik_proxy has an ssl_cert and ssl_key configured. This should be checked of course, which I haven't done...
  • It would be desirable to merge the LetsEncrypt PR Support automatic Let's Encrypt #108 you linked to above, into the traefik-v2 branch.

@dolfinus
Copy link
Contributor

dolfinus commented Jul 5, 2021

Hi.

In production however, my configuration file just sets c.JupyterHub.bind_url, with TraefikProxy.public_url deriving from this

Actually, I didn't get what do you mean. bind_url is used by Jupyterhub to start Tornado web server and listen for requests. Traefik cannot use the same config value because the same port number is already used by hub itself. Could you please provide a part of your configuration?

@@ -19,7 +19,7 @@ depending on how traefik store its routing configuration.

For **smaller**, single-node deployments:

* TraefikTomlProxy

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TraefikFileProviderProxy now supports both yaml and toml configuration files. It seemed inappropriate to leave it as TraefikTomlProxy, when yaml files can also easily be supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, thank you.

@@ -424,14 +438,14 @@ async def test_websockets(proxy, launch_backend):
launch_backend(default_backend_port, "ws")

await exponential_backoff(
utils.check_host_up, "Traefik not reacheable", ip="localhost", port=traefik_port
utils.check_host_up, "Traefik not reacheable", ip="127.0.0.1", port=traefik_port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localhost was resolving to ::1 in my test container. I'm running everything in docker containers, where IPv6 is disabled by default, so I don't think ::1 is a usable address. I was experiencing a load of 'cannot bind ::1 address' test error messages, which I hoped this would fix. Those errors might have been caused by some other problem, I admit. I haven't specifically reverted this change and re-tested after everything else works tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localhost was resolving to ::1 in my test container

It looks like something is wrong with your Docker installation. For example, you have IPv6 disabled on the kernel level (/proc/sys/net/ipv6/conf/all/disable_ipv6 is set to 1), but there is no ipv6: false in your /etc/docker/daemon.json file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought IPv6 was just disabled by default with docker: https://docs.docker.com/config/daemon/ipv6/

On my current system:-

$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
0
$ ls /etc/docker/
key.json
$

However, also (while in the container):-

root@8933dcac4db4:/srv/jupyterhub# cat /etc/hosts
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.23.0.4      8933dcac4db4
root@8933dcac4db4:/srv/jupyterhub# openssl s_client -connect ip6-localhost:8081
CONNECTED(00000003)
^C
root@8933dcac4db4:/srv/jupyterhub#

So yes, changing localhost to 127.0.0.1 was probably not what fixed whatever issue I was facing at the time!

@alexleach
Copy link
Contributor Author

Hi.

In production however, my configuration file just sets c.JupyterHub.bind_url, with TraefikProxy.public_url deriving from this

Actually, I didn't get what do you mean. bind_url is used by Jupyterhub to start Tornado web server and listen for requests. Traefik cannot use the same config value because the same port number is already used by hub itself. Could you please provide a part of your configuration?

Sure. I've removed the authenticator and database parts of the config, and obfuscated the traefik api password. But otherwise this is it:-

c.JupyterHub.bind_url = 'https://:443/jupyter'
c.JupyterHub.cleanup_proxy = True
c.JupyterHub.hub_ip = ''
c.JupyterHub.proxy_class = 'traefik_file'
c.TraefikProxy.should_start = False
c.TraefikFileProviderProxy.dynamic_config_file = "/var/run/traefik/rules.toml"
c.TraefikProxy.traefik_api_url = "http://traefik:8080" # N.B. traefik hostname resolved by docker
c.TraefikProxy.traefik_api_username = "jupyter"
c.TraefikProxy.traefik_api_password = "###"
c.JupyterHub.reset_db = True
c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'
c.DockerSpawner.network_name = 'traefik_routable'
c.DockerSpawner.image = 'myjupyternotebook'
c.DockerSpawner.notebook_dir = os.environ.get('DOCKER_NOTEBOOK_DIR') or '/home/jovyan/work'
c.DockerSpawner.volumes = { 'jupyterhub-user-{username}': c.DockerSpawner.notebook_dir }
c.DockerSpawner.remove_containers = True
c.Spawner.default_url = '/lab'

With this configuration, if I access http(s)://hostname/jupyter/, traefik will proxy that through to the jupyterhub container. Traefik dashboard screenshots below.

The router configuration (currently port 80 because of the mentioned bind_url validate problem):-
Screenshot 2021-07-05 at 10 56 44

The service configuration (where the endpoint is an image derived from jupyterhub/jupyterhub:latest):-
Screenshot 2021-07-05 at 10 56 52

@dolfinus
Copy link
Contributor

dolfinus commented Jul 5, 2021

I mean why do you need:

c.JupyterHub.bind_url = 'https://:443/jupyter'

at all? Why not http://:80/jupyter aloing with terminating SSL by Traefik?

@alexleach
Copy link
Contributor Author

I mean why do you need:

c.JupyterHub.bind_url = 'https://:443/jupyter'

at all? Why not http://:80/jupyter aloing with terminating SSL by Traefik?

That's the exact plan! 😄 The jupyterhub server is actually listening on http://:8081, with an http connection. It's traefik that's listening at https://:443/jupyter, not jupyterhub itself.

e.g. When I launch my jupyterhub container, I get these startup messages for instance:-

jupyter-test  | [I 2021-07-05 10:11:17.082 JupyterHub app:2530] Initialized 0 spawners in 0.005 seconds
jupyter-test  | [I 2021-07-05 10:11:17.088 JupyterHub app:2742] Not starting proxy
jupyter-test  | [I 2021-07-05 10:11:17.090 JupyterHub app:2778] Hub API listening on http://*:8081/jupyter/hub/
jupyter-test  | [I 2021-07-05 10:11:17.090 JupyterHub app:2780] Private Hub API connect url http://8933dcac4db4:8081/jupyter/hub/
jupyter-test  | [D 2021-07-05 10:11:17.090 JupyterHub proxy:342] Fetching routes to check
jupyter-test  | [I 2021-07-05 10:11:17.090 JupyterHub proxy:347] Checking routes
jupyter-test  | [I 2021-07-05 10:11:17.090 JupyterHub app:2853] JupyterHub is now running at http://:443/jupyter
jupyter-test  | [D 2021-07-05 10:11:17.091 JupyterHub app:2456] It took 0.416 seconds for the Hub to start

So here, what was configured as the bind_url has been changed to http, which is also what TraefikProxy.public_url becomes

@dolfinus
Copy link
Contributor

dolfinus commented Jul 5, 2021

So here, what was configured as the bind_url has been changed to http

But why not just set c.TraefikProxy.public_url to a value you need? Why are you using this weird way of setting up Traefik URL?

@GeorgianaElena
Copy link
Member

Hey @alexleach!

Thanks a lot for thinking everything through and trying to make this as better as possible. I really appreciate it!
Btw, I got myself a dev setup with this traefik-proxy version and I was really happy to see that everything "just" worked 🎉

I was thinking that before moving further with this, we could split this PR into smaller ones. The reasoning behind this would be:

To make it easier to review and merge

I must confess, that every time I tried to review this PR, I felt overwhelmed about all the changes here and not being very familiar with everything that changed in traefik v2. It's also a bit hard to remember everything when context-switching from other work.
I would be more comfortable reviewing and merging smaller bits of code. For example, I was thinking about having one PR per proxy, then multiple smaller one that could fix things like https, refactor code, improve tests, etc.

To have the discussions and decisions around a certain topic (file-provider, consul, etcd, tls) in their own context

This would make the discussions around different design decision easier to follow.

I believe you're using this branch in your personal setup, right? If so, maybe creating different branches for the new PRs would be the way to go. We could also create issues to help with dividing and organizing the work in smaller tasks.

What do think @alexleach? I would really love to see this work being added to traefik-proxy!

@alexleach
Copy link
Contributor Author

But why not just set c.TraefikProxy.public_url to a value you need? Why are you using this weird way of setting up Traefik URL?

The same thing just occurred to me 👍
However, after putting exactly that in the config, I get this message at startup:-

jupyter-test | [W 2021-07-05 10:29:24.906 JupyterHub configurable:190] Config option `public_url` not recognized by `TraefikFileProviderProxy`.

I guess we can make it configurable, but then bind_url becomes pointless. My config file was originally produced by jupyterhub --generate-config-file, which kind of indicates that bind_url is the place to put this setting...

@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena, thanks for the comment 😄

Thanks a lot for thinking everything through and trying to make this as better as possible. I really appreciate it!
Btw, I got myself a dev setup with this traefik-proxy version and I was really happy to see that everything "just" worked 🎉

Excellent, glad to hear it! 🥳

I was thinking that before moving further with this, we could split this PR into smaller ones. The reasoning behind this would be:

To make it easier to review and merge

I must confess, that every time I tried to review this PR, I felt overwhelmed about all the changes here and not being very familiar with everything that changed in traefik v2. It's also a bit hard to remember everything when context-switching from other work.

Oh dear.. Yes it has become a big PR (!), many commits and there seems to be more and more of them required! I agree that browsing through the changes here on github is overwhelming, it takes ages to scroll through the diffs to find a code comment for instance!

I would be more comfortable reviewing and merging smaller bits of code. For example, I was thinking about having one PR per proxy, then multiple smaller one that could fix things like https, refactor code, improve tests, etc.

Tbh, I don't have confidence that my git-foo is good enough to backtrack all my changes into smaller, bite size chunks. I wanted the test-suite to fully pass ofc, so moving to traefik v2 required changes to all of the proxy providers at once. I agree that renaming TraefikTomlProxy to TraefikFileProviderProxy could have been a separate feature request / PR, in supporting yaml as well as toml files, but that was one of the first things I did and the code changes required for that were fairly minimal compared to everything else.

Also agreed that some of the test-suite changes were perhaps excessive. Again though, I tried to make sure all the tests passed. Traefik v2 seems to take a lot longer to start the API than v1, as it is configured through the 'dynamic config', and not the static config file now. Making sure everything was fully started (e.g. consul, then traefik, then traefik's API) before the test functions began was a big challenge.

To have the discussions and decisions around a certain topic (file-provider, consul, etcd, tls) in their own context

This would make the discussions around different design decision easier to follow.

I believe you're using this branch in your personal setup, right? If so, maybe creating different branches for the new PRs would be the way to go. We could also create issues to help with dividing and organizing the work in smaller tasks.

I have been switching, I was using my https://github.com/alexleach/traefik-proxy/tree/traefik-v2-tls branch, then when I thought that was working, merged that back into https://github.com/alexleach/traefik-proxy/tree/Traefik_v2 (thereby adding to this PR!) and started using that again... For my working jupyterhub container, I actually use pip to install from my github repo. It was working perfectly for me back on commit alexleach@3501d08, but the TLS implementation was very broken for anything other than an external file provider proxy. The reason I've been adding more changes recently, especially wrt. TLS is that it seems that having a broken TLS implementation would be a regression. Something best avoided?

What do think @alexleach? I would really love to see this work being added to traefik-proxy!

I understand why you'd want to reduce the size of this PR and break it down into smaller chunks, but I don't really know where to start or how, I've gone too far down the rabbit hole! Help! 😬

@dolfinus
Copy link
Contributor

dolfinus commented Jul 5, 2021

Config option `public_url` not recognized by `TraefikFileProviderProxy`.

This is just a wrong config option. It should be c.JupyterHub.hub_connect_url (public URL which is used by Spawner, Proxy and Services) or c.Spawner.hub_connect_url (used just by Spawner) instead.

@alexleach
Copy link
Contributor Author

This is just a wrong config option. It should be c.JupyterHub.hub_connect_url (public URL which is used by Spawner, Proxy and Services) or c.Spawner.hub_connect_url (used just by Spawner) instead.

I have tried playing around with c.JupyterHub.hub_connect_url. This seems to change the URL of the Hub API / traefik service (or backend in traefik v1 nomenclature) on my internal docker network. What I want to tell JupyterHub, is traefik's entrypoint URL and port on my external network, and that it is https / secured by TLS.

e.g. with c.JupyterHub.hub_connect_url = 'https://:443'

jupyter-test  | [I 2021-07-05 11:58:32.606 JupyterHub app:2778] Hub API listening on https://:443/jupyter/hub/
jupyter-test  | [D 2021-07-05 11:58:32.606 JupyterHub proxy:342] Fetching routes to check
jupyter-test  | [I 2021-07-05 11:58:32.606 JupyterHub proxy:347] Checking routes
jupyter-test  | [I 2021-07-05 11:58:32.607 JupyterHub app:2853] JupyterHub is now running at http://:443/jupyter

This doesn't actually route, as the traefik service needs to be configured with the container name.

So, with c.JupyterHub.hub_connect_url = 'https://jupyter-test:443':-

jupyter-test  | [I 2021-07-05 12:04:14.688 JupyterHub app:2778] Hub API listening on https://jupyter-test:443/jupyter/hub/
jupyter-test  | [D 2021-07-05 12:04:14.688 JupyterHub proxy:342] Fetching routes to check
jupyter-test  | [I 2021-07-05 12:04:14.688 JupyterHub proxy:347] Checking routes
jupyter-test  | [W 2021-07-05 12:04:14.688 JupyterHub proxy:360] Updating Hub route https://:443 → https://jupyter-test:443
jupyter-test  | [I 2021-07-05 12:04:14.688 JupyterHub proxy:432] Adding route for Hub: /jupyter/ => https://jupyter-test:443
jupyter-test  | [D 2021-07-05 12:04:14.691 JupyterHub proxy:243] 200 GET http://traefik:8080/api/entrypoints
...  <various traefik stuff>
jupyter-test  | [I 2021-07-05 12:04:14.696 JupyterHub app:2853] JupyterHub is now running at http://:443/jupyter

This routes, but I still have the same issue. i.e. My traefik service is not TLS enabled, and I've just changed the internal urls on my docker network for no major benefit. TLS is not required on the internal docker network at this stage, as for me at least, I'm running everything on the same host.

@alexleach
Copy link
Contributor Author

I would be more comfortable reviewing and merging smaller bits of code. For example, I was thinking about having one PR per proxy, then multiple smaller one that could fix things like https, refactor code, improve tests, etc.

Hiya, I've had a go at rebasing and cherry-picking some of my commits and created a traefik-v2-file branch, dedicated to getting the traefik v2 file provider working with both toml and yaml configuration files.

https://github.com/jupyterhub/traefik-proxy/compare/master...alexleach:traefik-v2-file?expand=1

It's still quite a large body of work, and the consul and etcd providers definitely(!) don't work in this branch. However, is this an easier starting point? If so, should I try and create a new branch for the KV providers from this point?

@GeorgianaElena
Copy link
Member

Hiya, I've had a go at rebasing and cherry-picking some of my commits and created a traefik-v2-file branch, dedicated to getting the traefik v2 file provider working with both toml and yaml configuration files.

It's looking great @alexleach!

It's still quite a large body of work, and the consul and etcd providers definitely(!) don't work in this branch. However, is this an easier starting point? If so, should I try and create a new branch for the KV providers from this point?

Yeah, makes sense not to work. We could potentially "deactivate" running the tests in the CI for these other proxies, by removing them from this list

@pytest.fixture(
params=[
"no_auth_consul_proxy",
"auth_consul_proxy",
"no_auth_etcd_proxy",
"auth_etcd_proxy",
"toml_proxy",
"external_consul_proxy",
"auth_external_consul_proxy",
"external_etcd_proxy",
"auth_external_etcd_proxy",
"external_toml_proxy",
]
)

New idea

However I'm thinking we could maybe simplify things even further. What I have in mind is:

  • Before opening the PR transitioning the file provider proxy to traefik v2, remove from jupyterhub-traefik-proxy:traefik-v2 branch:

    • any reference to the key-value store proxies
    • the performance tests
  • This way, when we are happy with the v2 of the TomlProxy and merge the changes, the traefik-v2 branch will have only a working version of the TomlProxy with traefik v2 and nothing else. (I think the traefik folks did something simmilar when releasing the first traefik v2 versions, i.e. they didn't offer support for all the providers right from the start)

  • Add the two other proxies (consul and etcd) later on

I'm really not sure what's the best solution here @alexleach, but I'm happy we can work things out together! What do you think about the idea above?

(also pinging @minrk, @yuvipanda and @consideRatio ☀️ any advice would be really helpful 👀 )

@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena ,

I've just had a chance to get back to this now, mainly to get my traefik-v2-file branch passing all relevant tests. As per my comment on your recent PR (#134 (comment)), I think your suggestion above makes sense, i.e. Remove anything related to TKvProxy, consul and etcd and focus on the file provider for the initial traefik-v2 PR.

However, I'm not sure where to take it from here tbh. Maybe the best course of action for me is to stop(!) until a path forward is agreed, as I don't want to mess up the commit history any more than it already is, and I'm already nervous about how I'm going to merge against your recent PR!

Happy to do that though, but some direction would be really appreciated, e.g. should I rebase against your clean-for-traefikv2 fork, or to the main repo's traefik-v2 branch after you commit your PR?

Will await instruction and work on something else before I do anything more to this, either way!

Kind regards,
Alex

alexleach added a commit to alexleach/traefik-proxy that referenced this pull request Jul 18, 2021
… the

traefik entrypoint.

    * When using an external traefik proxy, the TraefikProxy classes will now
      query traefik's API for the entrypoint that matches the assigned
      public_url. See `TraefikProxy._get_traefik_entrypoint()`.
    * Only a single entrypoint is allowed by this manner (reverting suggested
      change in previous commit 3501d08) to
      have multiple, configurable entrypoints.
    * Done away with configurable option `traefik_tls`, instead determining it
      from the scheme of `public_url`, i.e. http or https.
    * Fixed docstrings in kv_proxy.py, if sphinx complained.
    * Implemented various suggested changes by @GeorgianaElena in
      jupyterhub#133
    * Removed the standalone configuration loading and dumping functions from
      traefik_utils, leaving them only in TraefikConfigFileHandler.
    * Updated test suite so works when testing https URLs (without validation,
      as traefik will self-sign certificates) as well as http URLs. Currently,
      proxytest.py will do either http or https. Should it run both?
@alexleach
Copy link
Contributor Author

Hi @GeorgianaElena , I hope you're well.

I've had a chance this morning to rebase my traefik-v2-file branch against the main traefik-v2 branch. This is I hope about as minimal an initial set of changes as I could make it, but does have some issues wrt. TLS and traefik entrypoints:-

  • TLS will not be enabled on routers, regardless of whether an ssl key and certificate is configured
  • Jupyterhub Routers will bind to every traefik entrypoint.

You can check out the diff here. Let me know if you'd like me to make a PR from this branch. The documentation could potentially do with a prune as well, considering that consul and etcd proxies are not yet supported...
https://github.com/jupyterhub/traefik-proxy/compare/traefik-v2...alexleach:traefik-v2-file?expand=1

I have made another branch as well, traefik-v2-file-tls. For this, I cherry-picked some code from my Traefik_V2 branch, where TLS is enabled and entrypoints dynamically determined from the traefik api.

Not sure if better to make a PR of everything in traefik-v2-file-tls, or easier to start with traefik-v2-file, and then make a follow-up PR with traefik-v2-file-tls?

Let me know what works better for you 🙂

@GeorgianaElena
Copy link
Member

Hey @alexleach!

Thanks a lot for getting back to this! 🌻

I think tackling one thing at a time is the way to go. Opening a first PR, with the file provider (even though TLS won't work) is fine IMO. We can deal with anything that's missing in the next iteration. Regarding the docs, yeah, I agree. There is extra info there that needs to go. But we can update those once we have something stable. What do you think?

@minrk
Copy link
Member

minrk commented Feb 7, 2023

finally managed to merge this as #145, thanks @alexleach and @dolfinus!

@minrk minrk closed this Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants