Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation timing out #141

Closed
kubeworkz opened this issue Jan 10, 2023 · 70 comments
Closed

Installation timing out #141

kubeworkz opened this issue Jan 10, 2023 · 70 comments

Comments

@kubeworkz
Copy link

kubeworkz commented Jan 10, 2023

Hi:

I'm doing a fresh install and I get the following error after about 15 minutes:

osboxes@osboxes-VirtualBox:~/k3s$ hetzner-k3s create --config config.yaml 
Validating configuration......configuration seems valid.

=== Creating infrastructure resources ===
Creating network...done.
Creating firewall...done.
Creating SSH key...done.
Creating placement group cloudrock-masters...done.
Creating placement group cloudrock-small...done.
Creating server cloudrock-cpx11-master1...
Creating server cloudrock-cx41-pool-small-worker1...
Creating server cloudrock-cx41-pool-small-worker3...
Creating server cloudrock-cx41-pool-small-worker2...
...server cloudrock-cx41-pool-small-worker3 created.
...server cloudrock-cx41-pool-small-worker1 created.
...server cloudrock-cx41-pool-small-worker2 created.
...server cloudrock-cpx11-master1 created.
Waiting for server cloudrock-cx41-pool-small-worker3...
Waiting for server cloudrock-cx41-pool-small-worker1...
Waiting for server cloudrock-cx41-pool-small-worker2...
Waiting for server cloudrock-cpx11-master1...
Unhandled exception in spawn: timeout after 00:00:05 (Tasker::Timeout)
  from /usr/share/crystal/src/array.cr:114:31 in 'execute!'
  from /workspace/lib/tasker/src/tasker.cr:29:7 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???
Unhandled exception in spawn: timeout after 00:00:05 (Tasker::Timeout)
  from /usr/share/crystal/src/array.cr:114:31 in 'execute!'
  from /workspace/lib/tasker/src/tasker.cr:29:7 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???
Unhandled exception in spawn: timeout after 00:00:05 (Tasker::Timeout)
  from /usr/share/crystal/src/array.cr:114:31 in 'execute!'
  from /workspace/lib/tasker/src/tasker.cr:29:7 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???
Unhandled exception in spawn: timeout after 00:00:05 (Tasker::Timeout)
  from /usr/share/crystal/src/array.cr:114:31 in 'execute!'
  from /workspace/lib/tasker/src/tasker.cr:29:7 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???

My config file is:

---
hetzner_token: <my token>
cluster_name: cloudrock
kubeconfig_path: "/cluster/kubeconfig"
k3s_version: v1.25.5+k3s1
public_ssh_key_path: "~/.ssh/kubeworkz.pub"
private_ssh_key_path: "~/.ssh/kubeworkz"
ssh_allowed_networks:
  - 0.0.0.0/0
api_allowed_networks:
  - 0.0.0.0/0
verify_host_key: false
schedule_workloads_on_masters: false
masters_pool:
  instance_type: cpx11
  instance_count: 1
  location: hel1
worker_node_pools:
- name: small
  location: hel1
  instance_type: cx41
  instance_count: 3

I also tried different k3s versions and Hetzner locations.

@Freyrecorp1
Copy link

I am also having the same error with this version 1.0.5, using the same configuration file.

@vitobotta
Copy link
Owner

Are you using an SSH key with a passphrase by chance? If yes, please set use_ssh_agent to true and use an SSH agent so that it can use the password.

@kubeworkz
Copy link
Author

Hi: The ssh key has no passphrase.

@vitobotta
Copy link
Owner

Hi, does it happen again if you re-run the create command? Just trying to understand if there was a temporary issue for example with the Hetzner API or something

@Freyrecorp1
Copy link

Hola, ¿te vuelve a pasar si vuelves a ejecutar el comando crear? Solo trato de entender si hubo un problema temporal, por ejemplo, con la API de Hetzner o algo así.

I don't know if it helps, but the create only works up to version 1.0.2 of 1.0.3 it gives me this same error.

@vitobotta
Copy link
Owner

Hola, ¿te vuelve a pasar si vuelves a ejecutar el comando crear? Solo trato de entender si hubo un problema temporal, por ejemplo, con la API de Hetzner o algo así.

I don't know if it helps, but the create only works up to version 1.0.2 of 1.0.3 it gives me this same error.

Do you have the exact same problem with v1.0.5? It's easier for me to investigate with the latest release. Please try with 1.0.5 and let me know what happens exactly

@Freyrecorp1
Copy link

It gives me exactly the same error, with version v1.0.5.

@vitobotta
Copy link
Owner

It gives me exactly the same error, with version v1.0.5.

Timeout? After how long?

@kubeworkz
Copy link
Author

About 15 minutes for me

@vitobotta
Copy link
Owner

About 15 minutes for me

I think I know what the problem might be. Can you try creating a new cluster with 1.0.5 (in a new Hetzner Cloud project) and confirming that it works? If it does I'll give you instructions to fix the other cluster if you still need it. Otherwise if a totally new cluster created with 1.0.5 works as expected you can keep that

@kubeworkz
Copy link
Author

kubeworkz commented Jan 11, 2023

Same error on a new project (using v1.0.5):

osboxes@osboxes-VirtualBox:~/k3s$ hetzner-k3s create --config config.yaml 
Validating configuration......configuration seems valid.

=== Creating infrastructure resources ===
Network already exists, skipping.
Updating firewall...done.
SSH key already exists, skipping.
Placement group test-masters already exists, skipping.
Server test-cpx21-master1 already exists, skipping.
Waiting for server test-cpx21-master1...
Unhandled exception in spawn: timeout after 00:00:05 (Tasker::Timeout)
  from /usr/share/crystal/src/array.cr:114:31 in 'execute!'
  from /workspace/lib/tasker/src/tasker.cr:29:7 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???

Can you re-create it at your end?

@Privatecoder
Copy link

Privatecoder commented Jan 11, 2023

@vitobotta we were facing the same issue today on a fresh macOS install (new Hetzner project)

MicrosoftTeams-image

@Freyrecorp1
Copy link

Me da exactamente el mismo error, con la versión v1.0.5.

¿Se acabó el tiempo? ¿Después de cuánto tiempo?

About 15 minutes approximately

@vitobotta
Copy link
Owner

Thanks for the info everyone. I am finishing some work stuff and it's getting late but will look into it tomorrow.

@vitobotta
Copy link
Owner

I am gonna take a quick look now.... despite the timeout, were the servers created in the Hetzner Console for you?

@vitobotta
Copy link
Owner

I just created a cluster successfully with your same configuration, without any timeouts or other issues. The only difference is that I used an ed25519 SSH key with a passphrase, so I used the SSH agent on my Mac.

I am gonna try with a passwordless key without agent now.

Can you please tell me what kind of SSH key you are using?

@Freyrecorp1
Copy link

Voy a echar un vistazo rápido ahora... a pesar del tiempo de espera, ¿se crearon los servidores en la Consola Hetzner para usted?

If the servers are created.

@vitobotta
Copy link
Owner

Same error on a new project (using v1.0.5):

osboxes@osboxes-VirtualBox:~/k3s$ hetzner-k3s create --config config.yaml 
Validating configuration......configuration seems valid.

=== Creating infrastructure resources ===
Network already exists, skipping.
Updating firewall...done.
SSH key already exists, skipping.
Placement group test-masters already exists, skipping.
Server test-cpx21-master1 already exists, skipping.
Waiting for server test-cpx21-master1...
Unhandled exception in spawn: timeout after 00:00:05 (Tasker::Timeout)
  from /usr/share/crystal/src/array.cr:114:31 in 'execute!'
  from /workspace/lib/tasker/src/tasker.cr:29:7 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from ???

Can you re-create it at your end?

Can you please try creating a new cluster in a separate project using 1.0.5? From the output of your command seems that you were running the command in a project were you had run a previous version already. I want to see if the problem you are having is due to a change I made a few versions ago. So if you just re-run the create command on an existing cluster created with 1.0.2 for example, it might not be helpful. So please try creating a new cluster in a new project and let me know if you still see the same issue.

@vitobotta
Copy link
Owner

@Freyrecorp1 and @Privatecoder can you please also try creating a new cluster in a new project using 1.0.5 for the reason I just explained to @kubeworkz ? Thanks!

@vitobotta
Copy link
Owner

I think I found the problem. Can you please re-run the create command even with an existing cluster, but using the SSH agent? Thanks

@Freyrecorp1
Copy link

@Freyrecorp1y@Privatecoder¿Puedes intentar crear un nuevo clúster en un nuevo proyecto usando 1.0.5 por la razón que acabo de explicar?@kubeworkz? ¡Gracias!

I have been using ssh id_rsa and id_rsa.pub and I try with ssh ed25519 and I am warning.

@vitobotta
Copy link
Owner

@Freyrecorp1y@Privatecoder¿Puedes intentar crear un nuevo clúster en un nuevo proyecto usando 1.0.5 por la razón que acabo de explicar?@kubeworkz? ¡Gracias!

I have been using ssh id_rsa and id_rsa.pub and I try with ssh ed25519 and I am warning.

Can you please try with use_ssh_agent set to true and with your key added to the SSH agent?

@vitobotta
Copy link
Owner

I think I found the problem :)

I changed the default OS image from ubuntu-20.04 to the newer 22.04, and it seems that this version has a problem with older RSA keys. I don't want to revert the change because I think it's better to move on to a new version of the OS, which means we have two options:

  1. manually SSH into each server and add PubkeyAcceptedAlgorithms=+ssh-rsa to /etc/ssh/sshd_config, which is just a bad workaround

or

  1. use a more modern key type like ed25519

Just to confirm that this is the problem you are having, can you please try a new cluster (in a new hetzner project) using an ed25519 key instead of an old RSA?

@Privatecoder
Copy link

@Freyrecorp1 and @Privatecoder can you please also try creating a new cluster in a new project using 1.0.5 for the reason I just explained to @kubeworkz ? Thanks!

sure, I'll go ahead and check this tomorrow (with use_ssh_agent set to true which is not set currently).

@Privatecoder
Copy link

Just to confirm that this is the problem you are having, can you please try a new cluster (in a new hetzner project) using an ed25519 key instead of an old RSA?

Will also do that.

@vitobotta
Copy link
Owner

Just to confirm that this is the problem you are having, can you please try a new cluster (in a new hetzner project) using an ed25519 key instead of an old RSA?

Will also do that.

Thanks. I tried also doing the manual change to /etc/ssh/sshd_config mentioned above, and after doing that and restarting ssh the cluster could be created correctly. So it's a problem with old SSH keys :)

@Privatecoder
Copy link

Thank you Vito! Great support! 👍

@kubeworkz
Copy link
Author

kubeworkz commented Jan 11, 2023

Awesome - thanks for chasing this down Vito! I'll check it out. This is the best tool I've found so far for this exercise...

@vitobotta
Copy link
Owner

Awesome - thanks for chasing this down Vito! I'll check it out. This is the best tool I've found so far for this exercise...

Glad to hear that :)

I am also looking to see if I can update the Crystal library that I use for SSH connections, to solve the problem at the source.

@vitobotta
Copy link
Owner

I am using the ssh-agent , the id_ed25519 and the use_ssh_agent: true and it still gives me the same error.

Can you share your config? Also so to confirm, you are using 1.0.5 right?

@kubeworkz
Copy link
Author

The important thing is that any additional software is optional and easy to enable/disable

Perfect, yeah I just added cert-manager. Testing it out now. Ah, didn't know you installed an LB, cool

@vitobotta
Copy link
Owner

The important thing is that any additional software is optional and easy to enable/disable

Perfect, yeah I just added cert-manager. Testing it out now. Ah, didn't know you installed an LB, cool

Only for the K8s API server if you create an HA cluster for multiple masters. So it's not a LB that you use with applications

@Freyrecorp1
Copy link

Estoy usando ssh-agent, id_ed25519 y use_ssh_agent: true y todavía me da el mismo error.

¿Puedes compartir tu configuración? También para confirmar, estás usando 1.0.5, ¿verdad?

If I'm using v1.0.5
config_test.txt

@vitobotta
Copy link
Owner

Estoy usando ssh-agent, id_ed25519 y use_ssh_agent: true y todavía me da el mismo error.

¿Puedes compartir tu configuración? También para confirmar, estás usando 1.0.5, ¿verdad?

If I'm using v1.0.5 config_test.txt

Did you try with a new cluster or over an existing one?

@Freyrecorp1
Copy link

Estoy usando ssh-agent, id_ed25519 y use_ssh_agent: true y todavía me da el mismo error.

¿Puedes compartir tu configuración? También para confirmar, estás usando 1.0.5, ¿verdad?

Si estoy usando v1.0.5 config_test.txt

¿Probaste con un clúster nuevo o sobre uno existente?

I tried both ways, and with a new project and still nothing, only the servers are created in the hetzner console but from there it stays, it keeps giving me the same error!

@vitobotta
Copy link
Owner

Estoy usando ssh-agent, id_ed25519 y use_ssh_agent: true y todavía me da el mismo error.

¿Puedes compartir tu configuración? También para confirmar, estás usando 1.0.5, ¿verdad?

Si estoy usando v1.0.5 config_test.txt

¿Probaste con un clúster nuevo o sobre uno existente?

I tried both ways, and with a new project and still nothing, only the servers are created in the hetzner console but from there it stays, it keeps giving me the same error!

Can you please paste here the output of the tool?

@Freyrecorp1
Copy link

estoy usando linux y me da el mismo error cambiando la configuracion como se indica, debo hacer alguna configuracion adicional? Captura desde 2023-01-11 16-21-08 Captura desde 2023-01-11 16-48-02

This same result keeps giving me.

@vitobotta
Copy link
Owner

estoy usando linux y me da el mismo error cambiando la configuracion como se indica, debo hacer alguna configuracion adicional? Captura desde 2023-01-11 16-21-08 Captura desde 2023-01-11 16-48-02

This same result keeps giving me.

Can you please try SSH'ing into the servers manually, with the regular SSH client and using the same key? Does it work? If yes, from one of the servers run tail -f /var/log/auth.log and then run the create command again. What do you see in the log?

@Freyrecorp1
Copy link

It doesn't work, I attach the image!
Captura desde 2023-01-12 13-59-11

@vitobotta
Copy link
Owner

It doesn't work, I attach the image! Captura desde 2023-01-12 13-59-11

I was referring to the logs on the servers. Can you SSH manually (with the ssh command, not my tool) into the servers? If yes please run tail -f /var/log/auth.log on one of the servers and then try the create command again with my tool. I am interested to know if you see any errors in the servers' logs

@Freyrecorp1
Copy link

The ssh connection to the servers does not work.

@Privatecoder
Copy link

Privatecoder commented Jan 12, 2023

I am using the ssh-agent , the id_ed25519 and the use_ssh_agent: true and it still gives me the same error.

set use_ssh_agent to false when using an ed25519 key

@Freyrecorp1
Copy link

Changing the use_ssh_agent to false and keeping everything else the same in the configuration, I managed to run, I tried the autoscaler and a server is created but not with the autoscaler prefix but with the big as I had indicated in the subject https://github.com //issues/140

@Privatecoder
Copy link

Privatecoder commented Jan 12, 2023

@Freyrecorp1 at least you are able to create the cluster this way.

@vitobotta

  • I tried to work around changing the key by using use_ssh_agent set to true with the old RSA key which didn't work
  • I then created a new ed25519 key which also didn't work
  • It only worked using the newly created ed25519 key along with use_ssh_agent set to false

@vitobotta
Copy link
Owner

Changing the use_ssh_agent to false and keeping everything else the same in the configuration, I managed to run, I tried the autoscaler and a server is created but not with the autoscaler prefix but with the big as I had indicated in the subject https://github.com //issues/140

Glad that you are making progress. Perhaps the new version of the autoscaler has changed the naming convention for the autoscaled nodes.

@vitobotta
Copy link
Owner

I am using the ssh-agent , the id_ed25519 and the use_ssh_agent: true and it still gives me the same error.

set use_ssh_agent to false when using an ed25519 key

If it works with the agent disabled it means that the agent was not used properly

@vitobotta
Copy link
Owner

@Freyrecorp1 at least you are able to create the cluster this way.

@vitobotta

  • I tried to work around changing the key by using use_ssh_agent set to true with the old RSA key which didn't work
  • I then created a new ed25519 key which also didn't work
  • It only worked using the newly created ed25519 key along with use_ssh_agent set to false

Did you get the timeouts again with those keys? If yes, can you SSH into the servers manually using the same keys?

@vitobotta
Copy link
Owner

@Privatecoder @Freyrecorp1 @kubeworkz I have released 1.0.6 which configures new servers with Ubuntu 22.04 (and others that deprecate SHA1) to allow using keys with old crypto for now. This is a temporary workaround until libssh2 is updated upstream. Can you please try with this version? Bear in mind that this only affects new servers, so you need to test with a new cluster. If for some reason you want to "fix" an existing cluster, then you need to manually add the line PubkeyAcceptedKeyTypes=+ssh-rsa to /etc/ssh/sshd_config. But since I guess you haven't gone far with the existing attempts due to the timeouts, it's easier if you try with a new cluster.

@kubeworkz
Copy link
Author

Will do

@vitobotta
Copy link
Owner

Will do

Thanks! 👍

@kubeworkz
Copy link
Author

All good at my end.

@vitobotta
Copy link
Owner

All good at my end.

Thanks!

@vitobotta
Copy link
Owner

@Privatecoder and @Freyrecorp1 are you still have issues with this or can we close? Thanks

@Freyrecorp1
Copy link

Everything ok so far, from me you can close it.

@Privatecoder
Copy link

all good :)

@vitobotta
Copy link
Owner

Thanks :)

@Freyrecorp1
Copy link

Thanks to you Vito.

@kubeworkz
Copy link
Author

Yeah, great tool Vito!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants