Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kine for multiple fixes #11269

Closed
brandond opened this issue Nov 8, 2024 · 1 comment
Closed

Update kine for multiple fixes #11269

brandond opened this issue Nov 8, 2024 · 1 comment
Assignees

Comments

@brandond
Copy link
Member

brandond commented Nov 8, 2024

Update kine for:

I was able to reproduce database is locked errors after 10 minute of load testing v1.31.2-k3s1 using a small cluster with 1 kine+sqlite server and 2 agents, using https://github.com/brandond/jobloader:

NAME           STATUS   ROLES                  AGE   VERSION
k3s-agent-1    Ready    <none>                 13m   v1.31.2+k3s1
k3s-agent-2    Ready    <none>                 13m   v1.31.2+k3s1
k3s-server-1   Ready    control-plane,master   13m   v1.31.2+k3s1
INFO[0300] COMPACT compactRev=0 targetCompactRev=2 currentRev=17811
INFO[0300] COMPACT deleted 0 rows from 2 revisions in 1.482438ms - compacted to 2/17811

INFO[0600] COMPACT compactRev=2 targetCompactRev=1002 currentRev=38398
INFO[0600] COMPACT deleted 336 rows from 1000 revisions in 137.39489ms - compacted to 1002/38398
INFO[0600] COMPACT compactRev=1002 targetCompactRev=2002 currentRev=38403
ERRO[0600] Compact failed: failed to compact to revision 2002: database is locked

Another side effect of this is that the sqlite database file gets quite large after a while since compaction never succeeds and the WAL is never checkpointed:

root@k3s-server-1:~# ls -lah /var/lib/rancher/k3s/server/db/
total 6.2G
drwx------ 3 root root 4.0K Nov 11 22:50 .
drwx------ 8 root root 4.0K Nov 11 22:50 ..
drwx------ 2 root root 4.0K Nov 11 22:50 etcd
-rw-r--r-- 1 root root 1.4G Nov 12 00:42 state.db
-rw-r--r-- 1 root root 9.5M Nov 12 00:42 state.db-shm
-rw-r--r-- 1 root root 4.8G Nov 12 00:42 state.db-wal
@brandond brandond changed the title Update kine to fix failed compaction retry Update kine to pull in fix for failed compaction causing the next iteration to be skipped Nov 8, 2024
@brandond brandond self-assigned this Nov 8, 2024
@brandond brandond moved this from New to Working in K3s Development Nov 8, 2024
@brandond brandond added this to the 2024-11 Release Cycle milestone Nov 8, 2024
@brandond brandond mentioned this issue Nov 8, 2024
@brandond brandond moved this from Working to To Test in K3s Development Nov 12, 2024
@ShylajaDevadiga ShylajaDevadiga self-assigned this Nov 12, 2024
@brandond brandond changed the title Update kine to pull in fix for failed compaction causing the next iteration to be skipped Update kine for multiple fixes Nov 12, 2024
@ShylajaDevadiga
Copy link
Contributor

Validated the issue on k3s version v1.31.2+k3s-62caa4a8

On a 1 kine+sqlite server and 2 agents, using https://github.com/brandond/jobloader:

$ kubectl get nodes
NAME               STATUS   ROLES                  AGE     VERSION
ip-172-31-11-90    Ready    control-plane,master   3h39m   v1.30.2+k3s1
ip-172-31-14-168   Ready    <none>                 3h38m   v1.30.2+k3s1
ip-172-31-15-212   Ready    <none>                 3h37m   v1.30.2+k3s1

Results from reproducing the issue

ubuntu@ip-172-31-11-90:~$ k3s -v
k3s version v1.30.2+k3s1 (aa4794b3)
go version go1.22.4
ubuntu@ip-172-31-11-90:~$ journalctl -u k3s |grep locked
Nov 12 19:58:25 ip-172-31-11-90 k3s[1486]: time="2024-11-12T19:58:25Z" level=info msg="Bootstrap key locked for initial create"
Nov 12 20:47:59 ip-172-31-11-90 k3s[1486]: Trace[1872650200]: ---"watchCache locked acquired" 594ms (20:47:59.622)
Nov 12 20:48:16 ip-172-31-11-90 k3s[1486]: Trace[1490692442]: ---"watchCache locked acquired" 503ms (20:48:16.534)
Nov 12 20:53:30 ip-172-31-11-90 k3s[1486]: time="2024-11-12T20:53:30Z" level=error msg="Compact failed: failed to compact to revision 15102: database is locked"
Nov 12 21:03:26 ip-172-31-11-90 k3s[1486]: time="2024-11-12T21:03:26Z" level=error msg="Compact failed: failed to compact to revision 16102: database is locked"
Nov 12 21:13:26 ip-172-31-11-90 k3s[1486]: time="2024-11-12T21:13:26Z" level=error msg="Compact failed: failed to compact to revision 16102: database is locked"

Results from validation using latest commit on master

$ k3s -v
k3s version v1.31.2+k3s-62caa4a8 (62caa4a8)
go version go1.22.8
$ journalctl -u k3s |grep locked
Nov 12 23:10:27 ip-172-31-11-212 k3s[337358]: time="2024-11-12T23:10:27Z" level=info msg="Bootstrap key locked for initial create"
ubuntu@ip-172-31-11-212:~$

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants