off2 is sometimes very slow (processes in IOWAIT) #263

stephanegigandet · 2023-10-10T14:11:39Z

Running some scripts on off2 sometimes takes much more time than usual, I have been testing some import scripts and everything seems sluggish compared to off1.

e.g. to start some Perl processes, it sometimes takes 15 seconds, other times 2 minutes:

off@off-pro:/srv/off-pro/lib$ time ./startup_apache2.pl 
Can't open /mnt/off-pro/logs/off-pro/log4perl.log (Permission denied) at /usr/share/perl5/Log/Log4perl/Appender/File.pm line 151.

real	2m11.803s
user	0m13.394s
sys	0m1.442s

[5 minutes later]

off@off-pro:/srv/off-pro/lib$ time ./startup_apache2.pl 
Can't open /mnt/off-pro/logs/off-pro/log4perl.log (Permission denied) at /usr/share/perl5/Log/Log4perl/Appender/File.pm line 151.

real	0m14.031s
user	0m12.912s
sys	0m1.117s

I'm guessing in the first run, we were just waiting for the disks for 2 minutes.

Same thing happened with systemctl start apache2@off which timed out once then was able to start the next time.

The text was updated successfully, but these errors were encountered:

stephanegigandet · 2023-10-10T14:14:26Z

Lots of processes in iowait

https://www.computel.fr/munin/openfoodfacts/off2.openfoodfacts/cpu.html

alexgarel · 2023-10-10T16:20:42Z

@cquest Are drive the same ? Or maybe one has more constraints ?

john-gom · 2023-10-13T06:54:32Z

Could be irrelevant but is it possible that some slow or inaccessible devices are in the PATH?

alexgarel · 2023-10-16T17:49:59Z

Could be irrelevant but is it possible that some slow or inaccessible devices are in the PATH?

I don't see anything special. As root on off2: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
On a VM (off): /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

alexgarel · 2023-10-17T12:59:32Z

I did a small test with dd to compare with my laptop (following this article):

On off2:

dd if=/dev/zero of=/home/alex/test.img bs=1G count=1 oflag=dsync
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 0,995176 s, 1,1 GB/s

dd if=/dev/zero of=/home/alex/test.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 17,5354 s, 29,2 kB/s

On off1

dd if=/dev/zero of=/home/alex/test.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.60762 s, 668 MB/s

dd if=/dev/zero of=/home/alex/test.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 8.94754 s, 57.2 kB/s

On my laptop:

dd if=/dev/zero of=/home/alex/test.img bs=1G count=1 oflag=dsync
1073741824 octets (1,1 GB, 1,0 GiB) copiés, 1,44455 s, 743 MB/s

dd if=/dev/zero of=/home/alex/test.img bs=512 count=1000 oflag=dsync
512000 octets (512 kB, 500 KiB) copiés, 4,1427 s, 124 kB/s

So we have a multiplication by two on write latency om off2 compared to off1, but not more.

This is just a write test… might not be very interesting, as perf problems seems more on read.

alexgarel · 2023-10-17T13:14:39Z

read test (but I can't be sure if there is no cache, and don't want to invalidate all cache ! I try with the nocache option)

off1:

dd if=/srv2/off/html/data/openfoodfacts-mongodbdump.gz of=/dev/null bs=8k
...
7989754834 bytes (8.0 GB, 7.4 GiB) copied, 51.0625 s, 156 MB/s

dd if=/srv2/off/html/data/openfoodfacts-mongodbdump.gz iflag=nocache of=/dev/null bs=8k 
7989754834 bytes (8.0 GB, 7.4 GiB) copied, 39.3774 s, 203 MB/s

off2:

dd if=/zfs-hdd/off/html_data/openfoodfacts-mongodbdump.gz of=/dev/null bs=8k
...
7431006564 bytes (7,4 GB, 6,9 GiB) copied, 90,3344 s, 82,3 MB/s

dd if=/zfs-hdd/off/html_data/openfoodfacts-mongodbdump.gz iflag=nocache of=/dev/null bs=8k
...
7431006564 bytes (7,4 GB, 6,9 GiB) copied, 41,3374 s, 180 MB/s

stephanegigandet · 2023-10-18T09:42:05Z

Another example: 6 seconds to do a grep on 2 small files:

root@opf:/var/log# ls -lrt syslog messages
-rw-r----- 1 root adm    1702 Oct 18 09:39 messages
-rw-r--r-- 1 root root 267895 Oct 18 09:40 syslog
root@opf:/var/log# time grep oom syslog messages

real	0m6.708s
user	0m0.000s
sys	0m0.002s
root@opf:/var/log# time grep oom syslog messages

real	0m8.337s
user	0m0.002s
sys	0m0.000s

alexgarel · 2023-10-18T09:45:26Z

sudo zpool status


  pool: zfs-hdd
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 9 days 04:45:09 with 0 errors on Tue Oct 17 05:10:14 2023
config:

	NAME         STATE     READ WRITE CKSUM
	zfs-hdd      ONLINE       0     0     0
	  raidz1-0   ONLINE       0     0     0
	    sda4     ONLINE       3     0     0
	    sdb4     ONLINE       0     0     0
	    sdc4     ONLINE       0     0     0
	    sdd4     ONLINE       0     0     0
	cache
	  nvme3n1p2  ONLINE       0     0     0

stephanegigandet · 2023-10-18T10:34:21Z

fail2ban nginx-bot and nginx-http were doing a lot of reads and writes and @alexgarel disabled them, but we still have very high usage of the disks:

root@off2:/var/log# iostat -x 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.18    0.00    2.02   61.44    0.00   31.37

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme2n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme3n1         79.80   1526.40     0.00   0.00    0.12    19.13    1.60     31.20     0.00   0.00    0.00    19.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.01  17.20
sda             38.20    706.40     0.00   0.00   24.24    18.49  233.80   2914.40     0.00   0.00    2.56    12.47    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.53  59.60
sdb             43.00    803.20     0.00   0.00   25.98    18.68  232.80   2846.40     0.00   0.00    2.38    12.23    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.67  58.72
sdc             44.60    772.80     0.00   0.00   88.86    17.33  108.20   3018.40     0.00   0.00   40.15    27.90    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    8.31  93.12
sdd             42.60    750.40     0.00   0.00  115.08    17.62  146.00   2950.40     0.20   0.14    7.45    20.21    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    5.99  77.44

after removing fail2ban nginx bots,
iotop -a -d 5 shows a high current disk write, but it doesn't seem to show the processes that do those writes.

Total DISK READ:         8.74 M/s | Total DISK WRITE:        42.24 K/s
Current DISK READ:       4.13 M/s | Current DISK WRITE:      11.17 M/s
    TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                                                    
 441670 be/4 root       1152.00 K      2.76 M  ?unavailable?  [kworker/u32:4-flush-zfs-107]
3535889 be/4 root          0.00 B   1904.00 K  ?unavailable?  systemd-journald
2358804 be/4 100000        0.00 B   1680.00 K  ?unavailable?  systemd-journald
2618831 be/4 100033        6.50 M   1021.00 K  ?unavailable?  nginx: worker process
3219470 be/4 100033       12.00 K    808.00 K  ?unavailable?  nginx: worker process
3219471 be/4 100033        0.00 B    710.00 K  ?unavailable?  nginx: worker process
 434969 be/4 root        540.00 K    660.00 K  ?unavailable?  [kworker/u32:3-flush-zfs-107]
 640367 be/4 root        272.00 K    628.00 K  ?unavailable?  [kworker/u32:1-flush-zfs-12]
 380477 be/4 www-data    193.78 M    515.00 K  ?unavailable?  nginx: worker process
   4499 be/4 root          5.00 M    445.00 K  ?unavailable?  python3 /usr/bin/fail2ban-server -xf start [f2b/f.sshd]
 380480 be/4 www-data     76.83 M    417.00 K  ?unavailable?  nginx: worker process
 380476 be/4 www-data    156.90 M    398.00 K  ?unavailable?  nginx: worker process
 380478 be/4 www-data    167.61 M    360.00 K  ?unavailable?  nginx: worker process
 380479 be/4 www-data    130.88 M    288.00 K  ?unavailable?  nginx: worker process
2604899 be/4 100033       28.78 M    253.00 K  ?unavailable?  nginx: worker process
 500393 be/4 100000        2.38 M    225.00 K  ?unavailable?  python3 /usr/bin/fail2ban-server -xf start [f2b/f.sshd]
 380481 be/4 www-data     58.06 M    215.00 K  ?unavailable?  nginx: worker process

stephanegigandet · 2023-10-18T15:09:14Z

set atime=off on all volumes on hdd

things look much better:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.93    0.00    0.48    7.86    0.00   88.73

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme2n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme3n1         48.67    616.00     0.00   0.00    0.10    12.66    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00  12.53
sda             45.00    873.33     0.00   0.00   10.05    19.41   25.33    332.00     0.00   0.00    1.21    13.11    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.48  43.60
sdb             41.00    770.67     0.00   0.00   11.27    18.80   22.00    321.33     0.00   0.00    2.11    14.61    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.51  42.93
sdc             32.67    597.33     0.00   0.00   14.56    18.29   19.33    306.67     0.00   0.00    9.60    15.86    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.66  43.07
sdd             36.33    698.67     0.00   0.00   15.61    19.23   22.00    330.67     0.00   0.00    1.92    15.03    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.61  41.60

Total DISK READ:         9.13 M/s | Total DISK WRITE:       903.05 K/s
Current DISK READ:       4.24 M/s | Current DISK WRITE:       2.96 M/s
    TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND                                                                                                                     
1643228 be/4 root        204.12 M      4.04 M  ?unavailable?  [nfsd]
1643225 be/4 root          5.63 M   1948.00 K  ?unavailable?  [nfsd]
3756909 be/4 root        352.00 K   1920.00 K  ?unavailable?  [kworker/u32:4-events_unbound]
1643227 be/4 root         56.49 M   1513.00 K  ?unavailable?  [nfsd]
2618831 be/4 100033       12.11 M   1421.00 K  ?unavailable?  nginx: worker process
3219470 be/4 100033        0.00 B   1403.00 K  ?unavailable?  nginx: worker process
2358804 be/4 100000        0.00 B   1192.00 K  ?unavailable?  systemd-journald
 380476 be/4 www-data    525.33 M   1173.00 K  ?unavailable?  nginx: worker process
1643226 be/4 root         14.17 M   1024.00 K  ?unavailable?  [nfsd]

teolemon · 2024-09-23T17:54:12Z

cat /proc/pressure/io
some avg10=84.50 avg60=84.73 avg300=81.75 total=269336601043
full avg10=81.85 avg60=79.88 avg300=74.52 total=227694438466
whereas I don't see any high read/write:

I don't see really annoying bots (appart maybe a "Amazon CloudFront" on images, using many ips, but it's about 10% of the traffic)

iostat --pretty --human

Linux 5.15.131-2-pve (off2) 23/09/2024 x86_64 (16 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
22,5% 0,4% 3,9% 36,6% 0,0% 36,6%

  tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
24,32       130,2k       172,6k         0,0k      64,9G      86,1G       0,0k nvme0n1
24,22       129,4k       172,6k         0,0k      64,5G      86,1G       0,0k nvme1n1
 5,75         0,4k       694,3k         0,0k     185,5M     346,3G       0,0k nvme2n1

116,68 836,8k 3,4M 0,0k 417,3G 1,7T 0,0k nvme3n1
234,14 6,2M 3,1M 0,0k 3,1T 1,5T 0,0k sda
232,14 6,0M 3,0M 0,0k 3,0T 1,5T 0,0k sdb
169,68 6,2M 3,1M 0,0k 3,1T 1,5T 0,0k sdc
160,76 6,1M 3,1M 0,0k 3,0T 1,5T 0,0k sdd

avg-cpu: %user %nice %system %iowait %steal %idle
7,8% 0,2% 1,5% 79,5% 0,0% 11,0%

  tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device

126,00 0,0k 1,8M 0,0k 0,0k 1,8M 0,0k nvme0n1
127,00 0,0k 1,8M 0,0k 0,0k 1,8M 0,0k nvme1n1
0,00 0,0k 0,0k 0,0k 0,0k 0,0k 0,0k nvme2n1
74,00 641,5k 19,0k 0,0k 641,5k 19,0k 0,0k nvme3n1
258,00 848,0k 8,0M 0,0k 848,0k 8,0M 0,0k sda
251,00 580,0k 7,6M 0,0k 580,0k 7,6M 0,0k sdb
168,00 904,0k 7,8M 0,0k 904,0k 7,8M 0,0k sdc
150,00 740,0k 10,3M 0,0k 740,0k 10,3M 0,0k sdd

cquest · 2024-10-26T10:25:03Z

Causes of I/O issues on off2 have been identified:

Bad firmware on 2 HDDs (sdc and sdd), that need to be updated
Too much load caused by serving images from off2 (solved by serving images from another server, currently ks1)
hourly backups to HDDs

I propose to close this issue and open more specific new ones if needed.

alexgarel · 2024-11-04T14:39:26Z

closing as proposed by @cquest

stephanegigandet added 🐛 bug Something isn't working performance labels Oct 10, 2023

teolemon added this to Infrastructure & Monitoring Oct 10, 2023

teolemon added the off2 label Nov 17, 2023

teolemon assigned alexgarel, cquest and teleboas Sep 23, 2024

teolemon removed the 🐛 bug Something isn't working label Oct 19, 2024

cquest changed the title ~~off2 is sometimes very slow~~ off2 is sometimes very slow (processes in IOWAIT) Oct 26, 2024

alexgarel closed this as completed Nov 4, 2024

github-project-automation bot moved this to Done in Infrastructure & Monitoring Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

off2 is sometimes very slow (processes in IOWAIT) #263

off2 is sometimes very slow (processes in IOWAIT) #263

stephanegigandet commented Oct 10, 2023

stephanegigandet commented Oct 10, 2023

alexgarel commented Oct 10, 2023

john-gom commented Oct 13, 2023

alexgarel commented Oct 16, 2023

alexgarel commented Oct 17, 2023

alexgarel commented Oct 17, 2023

stephanegigandet commented Oct 18, 2023

alexgarel commented Oct 18, 2023 •

edited

Loading

stephanegigandet commented Oct 18, 2023

stephanegigandet commented Oct 18, 2023

teolemon commented Sep 23, 2024

cquest commented Oct 26, 2024

alexgarel commented Nov 4, 2024

off2 is sometimes very slow (processes in IOWAIT) #263

off2 is sometimes very slow (processes in IOWAIT) #263

Comments

stephanegigandet commented Oct 10, 2023

stephanegigandet commented Oct 10, 2023

alexgarel commented Oct 10, 2023

john-gom commented Oct 13, 2023

alexgarel commented Oct 16, 2023

alexgarel commented Oct 17, 2023

alexgarel commented Oct 17, 2023

stephanegigandet commented Oct 18, 2023

alexgarel commented Oct 18, 2023 • edited Loading

stephanegigandet commented Oct 18, 2023

stephanegigandet commented Oct 18, 2023

teolemon commented Sep 23, 2024

iostat --pretty --human

cquest commented Oct 26, 2024

alexgarel commented Nov 4, 2024

alexgarel commented Oct 18, 2023 •

edited

Loading