openfoodfacts · alexgarel · Oct 17, 2024 · Sep 24, 2024 · Sep 24, 2024 · Sep 25, 2024
diff --git a/.github/labeler.yml b/.github/labeler.yml
@@ -25,6 +25,10 @@ off1:
 - changed-files:
   - any-glob-to-any-file: '**/*off1*'
 
+ks1:
+- changed-files:
+  - any-glob-to-any-file: '**/*ks1*'
+
 ovh1:
 - changed-files:
   - any-glob-to-any-file: '**/*ovh1*'

diff --git a/docs/mail.md b/docs/mail.md
@@ -137,7 +137,8 @@ We normally keeps a standard `/etc/aliases`.
 We have specific groups to receive emails: `[email protected]` and `[email protected]`
 
 You may add some redirections for non standard users to one of those groups.
-Do not forget to run `newaliases`, and [`etckeeper`](./linux-server.md#etckeeper).
+Do not forget to run `newaliases`, and [`etckeeper`](./linux-server.md#etckeeper)
+and restart the postfix service (`postfix.service` and/or `[email protected]`).
 
 ### Postfix configuration
 
@@ -159,7 +160,7 @@ Run: `dpkg-reconfigure postfix`:
 **IMPORTANT:**
 On some system, the real daemon is not `postfix.service` but `[email protected]`
 
-(so eg., if you touch `/etc/alias` (with after `sudo newaliases`) you need to `systemctl reload  [email protected]`
+So, for example, if you touch `/etc/alias` (with after `sudo newaliases`) you need to `systemctl reload  [email protected]`
 
 ### Exim4 configuration
 

diff --git a/docs/reports/2024-06-05-off1-reverse-proxy-install.md b/docs/reports/2024-06-05-off1-reverse-proxy-install.md
@@ -26,7 +26,7 @@ Network: name=eth0,bridge=vmbr1,ip=10.1.0.100/24,gw=10.0.0.1
 
 I then simply install `nginx` using apt.
 
-I also [configure postfix](../mail#postfix-configuration) and tested it.
+I also [configure postfix](../mail.md#postfix-configuration) and tested it.
 
 ### Adding the IP
 

diff --git a/docs/reports/2024-09-24-kimsufi-stor-ks1-installation.md b/docs/reports/2024-09-24-kimsufi-stor-ks1-installation.md
@@ -1,28 +1,260 @@
 # Kimsufi STOR - ks1 installation
 
-## Rationale for new server
+## Rationale for new server
 
 We have performance issues on off1 and off2 that are becoming unbearable, in particular disk usage on off2 is so high that 60% of processes are in iowait state.
 
 We just moved today (24/09/2024) images serving from off2 to off1, but that just move the problem to off1.
 
 We are thus installing a new cheap Kimsufi server to see if we can move the serving of images to it.
 
-## Server specs
+## Server specs
 
 KS-STOR - Intel Xeon-D 1521 - 4 c / 8 t - 16 Gb RAM - 4x 6 Tb HDD + 500 Gb SSD
 
-## Install
+## Install
 
 We create a A record ks1.openfoodfacts.org to point it to the IP of the server: 217.182.132.133
 In OVH's console, we rename the server to ks1.openfoodfacts.org
 
 On OVH console, we install Debian 12 Bookworm on the SSD.
 
+**IMPORTANT:** this was not an optimal choice, we should have reserved part of the SSD to use it as a cache drive for the ZFS pool.
+
 Once the install is complete, OVH sends the credentials by email.
 
 We add users for the admin(s) and give sudo access:
-
+```bash
 sudo usermod -aG sudo [username]
+```
+
+Set hostname `hostnamectl hostname ks1`
+
+I also manually runned the usual commands found in ct_postinstall.
+
+I also followed [How to have server config in git](../how-to-have-server-config-in-git.md)
+
+I also added the email on failure systemd unit.
+
+I edited `/etc/netplan/50-cloud-init.yaml` to add default search
+```yaml
+network:
+    version: 2
+    ethernets:
+        eno3:
+            (...)
+            nameservers:
+                search: [openfoodfacts.org]
+```
+and run `netplan try`.
+
+## Email
+
+Email is important to send alert on service failure.
+
+I also configured email by removing exim4 and installing postfix.
+```bash
+sudo apt purge exim4-base exim4-config && \
+sudo apt install postfix bsd-mailx
+```
+and following [Server, postfix configuration](../mail.md#postfix-configuration).
+
+I also had to had ks1 ip address to [forwarding rules on ovh1 to the mail gateway](../mail.md#redirects).
+```bash
+iptables -t nat -A PREROUTING -s 217.182.132.133 -d pmg.openfoodfacts.org -p tcp  --dport 25 -j DNAT --to 10.1.0.102:25
+iptables-save > /etc/iptables/rules.v4.new
+# control
+diff /etc/iptables/rules.v4{,.new}
+mv /etc/iptables/rules.v4{.new,}
+etckeeper commit "added rule for ks1 email"
+```
+
+Test from ks1:
+```bash
+echo "test message from ks1" |mailx -s "test root ks1" -r [email protected] root
+```
+
+## Install and setup ZFS
+
+### Install ZFS
+```bash
+sudo apt install  zfsutils-linux
+sudo /sbin/modprobe zfs
+```
+
+Added the `zfs.conf`  file to `/etc/modprobe.d`
+Then run `update-initramfs -u -k all`
+
+### Create ZFS pool
+
+`lsblk` shows me existing disks. The 4 disks are available, system is installed on the NVME SSD.
+
+So I created the pool with them (see [How to create a zpool](../zfs-overview.md#how-to-create-a-zpool))
+
+```bash
+zpool create zfs-hdd /dev/sd{a,b,c,d}
+```
+
+### Setup compression
+
+We want to enable compression on the pool.
+
+```bash
+zfs set compression=on zfs-hdd
+```
+
+Note: in reality it was not enabled from start, I enabled it after first snapshot sync,
+as I saw is was taking much more space than on the original server.
+
+### Fine tune zfs
+
+Set `atime=off` et `relatime=no` on the ZFS dataset `zfs-hdd/off/images` to avoid writting.
+
+## Install sanoid / syncoid
+
+I installed the sanoid.deb that I got from the off1 server.
+
+```bash
+apt install libcapture-tiny-perl libconfig-inifiles-perl
+apt install lzop mbuffer pv
+dpkg -i /home/alex/sanoid_2.2.0_all.deb
+```
+
+## Sync data
+
+After installing sanoid, I am ready to sync data.
+
+I first create a off dataset to have same structure as on other servers:
+```bash
+zfs create zfs-hdd/off
+```
+
+I 'll sync the data from OVH3 since it's the same data-center.
+
+I created a ks1operator user on ovh3, following [creating operator on PROD_SERVER](../sanoid.md#creating-operator-on-prod_server)
+
+I also had to make a `ln -s /usr/sbin/zfs /usr/bin/zfs` on ovh3
+
+Then I used:
+
+```bash
+ time syncoid --no-sync-snap --no-privilege-elevation [email protected]:rpool/off/images zfs-hdd/off/images
+```
+
+It took 3594 minutes, that is 60 hours or 2.5 days.
+
+I removed old snapshots (old style) from ks1, as they are not needed here):
+```bash
+for f in $(zfs list -t snap -o name zfs-hdd/off/images|grep "images@202");do zfs destroy $f;done
+```
+the other snapshot will normally be pruned by sanoid.
+
+## Configure sanoid
+
+I created the sanoid and syncoid configuration.
+
+I added ks1operator on off2.
+
+Finally I also installed the standard sanoid / syncoid systemd units and the sanoid_check unit.
+
+and enable them:
+
+```bash
+systemctl enable --now sanoid.timer
+systemctl enable syncoid.service
+systemctl enable --now sanoid_check.timer
+
+
+## Firewall
+
+As the setting will be simple (no masquerading / forwarding), we will use ufw.
+
+```bash
+apt install ufw
+
+ufw allow OpenSSH
+ufw allow http
+ufw allow https
+ufw default deny incoming
+ufw default allow outgoing
+
+# verify
+ufw show added
+# go
+ufw enable
+```
+
+fail2ban is already installed, but failing with:
+```
+Failed during configuration: Have not found any log file for sshd jail
+```
+This is because the sshd daemon logs into systemd-journald, not in a log file.
+To fix that, I modified `/etc/fail2ban/jail.d/defaults-debian.conf` to be:
+```ini
+[sshd]
+enabled = true
+backend = systemd
+```
+
+Addendum: after Christian installed Munin node, I added port 4949
+
+## NGINX
+
+### Install
+
+I installed nginx and certbot:
+```bash
+apt install nginx
+apt install python3-certbot python3-certbot-nginx
+```
+
+I also added the nginx.service.d override to email on failure.
+
+### Configure
+
+Created `confs/ks1/nginx/sites-available/images-off` akin to off1 configuration.
+
+`ln -s /opt/openfoodfacts-infrastructure/confs/ks1/nginx/sites-available/images-off /etc/nginx/sites-enabled/images-off`
+
+### Certificates
+
+As I can't use certbot until having the DNS pointing to this server,
+I copied the one from off1.
+
+```bash
+ssh -A off1
+sudo -E bash
+# see active certificates
+ls -l /etc/letsencrypt/live/images.openfoodfacts.org/
+# here it's 19, copy them
+scp /etc/letsencrypt/archive/images.openfoodfacts.org/*19* [email protected]:
+
+exit
+exit
+```
+
+On ks1:
+```bash
+mkdir -p /etc/letsencrypt/{live,archive}/images.openfoodfacts.org
+mv /home/alex/*19* /etc/letsencrypt/archive/images.openfoodfacts.org/
+ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/cert19.pem /etc/letsencrypt/live/images.openfoodfacts.org/cert.pem
+ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/chain19.pem /etc/letsencrypt/live/images.openfoodfacts.org/chain.pem
+ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/fullchain19.pem /etc/letsencrypt/live/images.openfoodfacts.org/fullchain.pem
+ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/privkey19.pem /etc/letsencrypt/live/images.openfoodfacts.org/privkey.pem
+chown -R root:root /etc/letsencrypt/
+chmod go-rwx /etc/letsencrypt/{live,archive}
+```
+
+## Testing
 
+On my host I modified /etc/hosts to have:
+```hosts
+217.182.132.133 images.openfoodfacts.org
+```
+and visited the website with my browser, with developer tools open.
 
+I can also use curl:
+```bash
+curl --resolve images.openfoodfacts.org:443:217.182.132.133 https://images.openfoodfacts.org/images/products/087/366/800/2989/front_fr.3.400.jpg  --output /tmp/front_fr.jpg -v
+xdg-open /tmp/front_fr.jpg
+```
diff --git a/docs/sanoid.md b/docs/sanoid.md
@@ -152,12 +152,13 @@ mkdir /home/$OPERATOR/.ssh
 vim /home/$OPERATOR/.ssh/authorized_keys
 # copy BACKUP_SERVER root public key
 
-chown  -R /home/$OPERATOR
+chown $OPERATOR:$OPERATOR -R /home/$OPERATOR
 chmod go-rwx -R /home/$OPERATOR/.ssh
 ```
 
 Adding needed permissions to pull zfs syncs
 ```bash
+# choose the right dataset according to your needs
 zfs allow $OPERATOR hold,send zfs-hdd
 zfs allow $OPERATOR hold,send zfs-nvme
 zfs allow $OPERATOR hold,send rpool
@@ -169,7 +170,7 @@ On BACKUP_SERVER, test ssh connection:
 
 ```bash
 OPERATOR=${BACKUP_SERVER}operator
-ssh $OPERATOR@<ip for server>
+ssh $OPERATOR@<ip or host>
 ```
 
 #### config syncoid
@@ -187,4 +188,7 @@ Use `--recursive` to also backup subdatasets.
 
 Don't forget to create a sane retention policy (with `autosnap=no`) in sanoid on $BACKUP_SERVER to remove old data.
 
-**Note:** because of the 6h timeout, if you have big datasets, you may want to do the first synchronization before enabling the service.
+**Note:** because of the 6h timeout, if you have big datasets, you may want to do the first synchronization before enabling the service.
+
+**Important:** try to have a good hierarchy of datasets, and separate what's from the server and what's from other servers.
+Normally we put other servers backups in a off-backups dataset. It's important not to mix it with backups dataset which is for the server itself.