Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rock for TWA #30

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Rock for TWA #30

wants to merge 1 commit into from

Conversation

dimara
Copy link

@dimara dimara commented Jul 24, 2023

Add rockcraft.yaml for tensorboards-web-app.

@dimara dimara requested a review from a team as a code owner July 24, 2023 19:50
summary: "tensorboards service"
command: gunicorn -w 3 --bind 0.0.0.0:5000 --access-logfile - entrypoint:app
startup: enabled
user: ubuntu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove user: ubuntu in favour of new setup of non-root user.
It should be added outside of services:
https://github.com/canonical/seldonio-rocks/blob/main/seldon-core-operator/rockcraft.yaml#L17

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. My reference was the in-tree jupyter-web-app rock that still uses user: ubuntu..

stage-packages:
- python3-venv

non-root-user:
Copy link
Contributor

@i-chvets i-chvets Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part can be removed when run-user is used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@i-chvets
Copy link
Contributor

Was any manual testing performed?
I.e. using skopeo to copy to Docker daemon and execute command in services?

@kimwnasptd Can we provide access to best practices spec that has all these details?

@i-chvets
Copy link
Contributor

All commits should be signed. Use -S option when commiting changes.

@i-chvets
Copy link
Contributor

Execution of service command. Looks like missing dependencies in the ROCK and/or incorrect paths:

$ docker run tensorboards-web-app:v1.7.0_1 exec pebble exec gunicorn -w 3 --bind 0.0.0.0:5000 --access-logfile - entrypoint:app
2023-07-25T13:27:07.495Z [pebble] Started daemon.
2023-07-25T13:27:07.504Z [pebble] POST /v1/exec 9.024693ms 202
2023-07-25T13:27:07.523Z [pebble] GET /v1/tasks/1/websocket/control 18.093997ms 200
2023-07-25T13:27:07.523Z [pebble] GET /v1/tasks/1/websocket/stdio 136.499µs 200
2023-07-25T13:27:07.524Z [pebble] GET /v1/tasks/1/websocket/stderr 87.434µs 200
2023-07-25T13:27:07.544Z [pebble] POST /v1/exec 15.78413ms 202
2023-07-25T13:27:07.554Z [pebble] GET /v1/tasks/2/websocket/control 9.100712ms 200
2023-07-25T13:27:07.555Z [pebble] GET /v1/tasks/2/websocket/stdio 46.601µs 200
2023-07-25T13:27:07.555Z [pebble] GET /v1/tasks/2/websocket/stderr 35.017µs 200
[2023-07-25 13:27:07 +0000] [19] [INFO] Starting gunicorn 21.2.0
[2023-07-25 13:27:07 +0000] [19] [INFO] Listening at: http://0.0.0.0:5000 (19)
[2023-07-25 13:27:07 +0000] [19] [INFO] Using worker: sync
[2023-07-25 13:27:07 +0000] [24] [INFO] Booting worker with pid: 24
[2023-07-25 13:27:07 +0000] [24] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
  File "/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/lib/python3.8/site-packages/gunicorn/util.py", line 371, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'entrypoint'
[2023-07-25 13:27:07 +0000] [24] [INFO] Worker exiting (pid: 24)
[2023-07-25 13:27:07 +0000] [25] [INFO] Booting worker with pid: 25
[2023-07-25 13:27:07 +0000] [19] [ERROR] Worker (pid:24) exited with code 3
[2023-07-25 13:27:07 +0000] [25] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
  File "/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/lib/python3.8/site-packages/gunicorn/util.py", line 371, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'entrypoint'
[2023-07-25 13:27:07 +0000] [25] [INFO] Worker exiting (pid: 25)
[2023-07-25 13:27:07 +0000] [19] [ERROR] Worker (pid:25) exited with code 3
Traceback (most recent call last):
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 202, in run
    self.manage_workers()
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 571, in manage_workers
    self.spawn_workers()
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 643, in spawn_workers
    time.sleep(0.1 * random.random())
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
    self.reap_workers()
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 530, in reap_workers
    raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/gunicorn", line 8, in <module>
    sys.exit(run())
  File "/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/lib/python3.8/site-packages/gunicorn/app/base.py", line 236, in run
    super().run()
  File "/lib/python3.8/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 229, in run
    self.halt(reason=inst.reason, exit_status=inst.exit_status)
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 342, in halt
    self.stop()
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 396, in stop
    time.sleep(0.1)
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 242, in handle_chld
    self.reap_workers()
  File "/lib/python3.8/site-packages/gunicorn/arbiter.py", line 530, in reap_workers
    raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
2023-07-25T13:27:07.670Z [pebble] GET /v1/changes/2/wait 115.034705ms 200
2023-07-25T13:27:07.686Z [pebble] GET /v1/changes/1/wait 161.996344ms 200

This web app is responsible for allowing the user to manipulate Tensorboard
instances in their Kubeflow cluster. To achieve this it provides a user
friendly way to handle the lifecycle of Tensorboard CRs.
version: v1.7.0_1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version should be v1.7.0_20.04_1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Add rockcraft.yaml for tensorboards-web-app.
@dimara
Copy link
Author

dimara commented Jul 30, 2023

Was any manual testing performed?

Yes. I extracted the image with scopeo and loaded in a kind cluster that runs Kubeflow. The pod was running.

All commits should be signed. Use -S option when commiting changes.

ACK.

Execution of service command. Looks like missing dependencies in the ROCK and/or incorrect paths:

$ docker run tensorboards-web-app:v1.7.0_1 exec pebble exec gunicorn -w 3 --bind 0.0.0.0:5000 --access-logfile - entrypoint:app

I see that the Kubernetes deployment does not set a command, i.e., it uses a the image endtrypoint that is:

            "Entrypoint": [
                "/bin/pebble",
                "enter",
                "--verbose"
            ],

and has also:

  - env:
    - name: APP_PREFIX
      value: /tensorboards

So if I run:

docker run --rm -e APP_PREFIX=/tensorboards tensorboards-web-app:v1.7.0_1 

It works but it fails because it doesn't find a kubeconfig...

2023-07-30T06:59:25.553Z [pebble] Started daemon.
2023-07-30T06:59:25.558Z [pebble] POST /v1/services 4.067703ms 202
2023-07-30T06:59:25.558Z [pebble] Started default services with change 1.
2023-07-30T06:59:25.562Z [pebble] Service "serve" starting: gunicorn -w 3 --bind 0.0.0.0:5000 --access-logfile - entrypoint:app
2023-07-30T06:59:25.648Z [serve] [2023-07-30 06:59:25 +0000] [15] [INFO] Starting gunicorn 21.2.0
2023-07-30T06:59:25.649Z [serve] [2023-07-30 06:59:25 +0000] [15] [INFO] Listening at: http://0.0.0.0:5000 (15)
2023-07-30T06:59:25.649Z [serve] [2023-07-30 06:59:25 +0000] [15] [INFO] Using worker: sync
2023-07-30T06:59:25.651Z [serve] [2023-07-30 06:59:25 +0000] [17] [INFO] Booting worker with pid: 17
2023-07-30T06:59:25.667Z [serve] [2023-07-30 06:59:25 +0000] [18] [INFO] Booting worker with pid: 18
2023-07-30T06:59:25.701Z [serve] [2023-07-30 06:59:25 +0000] [19] [INFO] Booting worker with pid: 19
....

Let me try your suggested changes, and force-push the updated branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants