Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ks1 install #411

Merged
merged 9 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ off1:
- changed-files:
- any-glob-to-any-file: '**/*off1*'

ks1:
- changed-files:
- any-glob-to-any-file: '**/*ks1*'

ovh1:
- changed-files:
- any-glob-to-any-file: '**/*ovh1*'
Expand Down
5 changes: 3 additions & 2 deletions docs/mail.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,8 @@ We normally keeps a standard `/etc/aliases`.
We have specific groups to receive emails: `[email protected]` and `[email protected]`

You may add some redirections for non standard users to one of those groups.
Do not forget to run `newaliases`, and [`etckeeper`](./linux-server.md#etckeeper).
Do not forget to run `newaliases`, and [`etckeeper`](./linux-server.md#etckeeper)
and restart the postfix service (`postfix.service` and/or `[email protected]`).

### Postfix configuration

Expand All @@ -159,7 +160,7 @@ Run: `dpkg-reconfigure postfix`:
**IMPORTANT:**
On some system, the real daemon is not `postfix.service` but `[email protected]`

(so eg., if you touch `/etc/alias` (with after `sudo newaliases`) you need to `systemctl reload [email protected]`
So, for example, if you touch `/etc/alias` (with after `sudo newaliases`) you need to `systemctl reload [email protected]`

### Exim4 configuration

Expand Down
2 changes: 1 addition & 1 deletion docs/reports/2024-06-05-off1-reverse-proxy-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Network: name=eth0,bridge=vmbr1,ip=10.1.0.100/24,gw=10.0.0.1

I then simply install `nginx` using apt.

I also [configure postfix](../mail#postfix-configuration) and tested it.
I also [configure postfix](../mail.md#postfix-configuration) and tested it.

### Adding the IP

Expand Down
240 changes: 236 additions & 4 deletions docs/reports/2024-09-24-kimsufi-stor-ks1-installation.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,260 @@
# Kimsufi STOR - ks1 installation

## Rationale for new server
## Rationale for new server

We have performance issues on off1 and off2 that are becoming unbearable, in particular disk usage on off2 is so high that 60% of processes are in iowait state.

We just moved today (24/09/2024) images serving from off2 to off1, but that just move the problem to off1.

We are thus installing a new cheap Kimsufi server to see if we can move the serving of images to it.

## Server specs
## Server specs

KS-STOR - Intel Xeon-D 1521 - 4 c / 8 t - 16 Gb RAM - 4x 6 Tb HDD + 500 Gb SSD

## Install
## Install

We create a A record ks1.openfoodfacts.org to point it to the IP of the server: 217.182.132.133
In OVH's console, we rename the server to ks1.openfoodfacts.org

On OVH console, we install Debian 12 Bookworm on the SSD.

**IMPORTANT:** this was not an optimal choice, we should have reserved part of the SSD to use it as a cache drive for the ZFS pool.

Once the install is complete, OVH sends the credentials by email.

We add users for the admin(s) and give sudo access:

```bash
sudo usermod -aG sudo [username]
```

Set hostname `hostnamectl hostname ks1`

I also manually runned the usual commands found in ct_postinstall.

I also followed [How to have server config in git](../how-to-have-server-config-in-git.md)

I also added the email on failure systemd unit.

I edited `/etc/netplan/50-cloud-init.yaml` to add default search
```yaml
network:
version: 2
ethernets:
eno3:
(...)
nameservers:
search: [openfoodfacts.org]
```
and run `netplan try`.

## Email

Email is important to send alert on service failure.

I also configured email by removing exim4 and installing postfix.
```bash
sudo apt purge exim4-base exim4-config && \
sudo apt install postfix bsd-mailx
```
and following [Server, postfix configuration](../mail.md#postfix-configuration).

I also had to had ks1 ip address to [forwarding rules on ovh1 to the mail gateway](../mail.md#redirects).
```bash
iptables -t nat -A PREROUTING -s 217.182.132.133 -d pmg.openfoodfacts.org -p tcp --dport 25 -j DNAT --to 10.1.0.102:25
iptables-save > /etc/iptables/rules.v4.new
# control
diff /etc/iptables/rules.v4{,.new}
mv /etc/iptables/rules.v4{.new,}
etckeeper commit "added rule for ks1 email"
```

Test from ks1:
```bash
echo "test message from ks1" |mailx -s "test root ks1" -r [email protected] root
```

## Install and setup ZFS

### Install ZFS
```bash
sudo apt install zfsutils-linux
sudo /sbin/modprobe zfs
```

Added the `zfs.conf` file to `/etc/modprobe.d`
Then run `update-initramfs -u -k all`

### Create ZFS pool

`lsblk` shows me existing disks. The 4 disks are available, system is installed on the NVME SSD.

So I created the pool with them (see [How to create a zpool](../zfs-overview.md#how-to-create-a-zpool))

```bash
zpool create zfs-hdd /dev/sd{a,b,c,d}
```

### Setup compression

We want to enable compression on the pool.

```bash
zfs set compression=on zfs-hdd
```

Note: in reality it was not enabled from start, I enabled it after first snapshot sync,
as I saw is was taking much more space than on the original server.

### Fine tune zfs

Set `atime=off` et `relatime=no` on the ZFS dataset `zfs-hdd/off/images` to avoid writting.

## Install sanoid / syncoid

I installed the sanoid.deb that I got from the off1 server.

```bash
apt install libcapture-tiny-perl libconfig-inifiles-perl
apt install lzop mbuffer pv
dpkg -i /home/alex/sanoid_2.2.0_all.deb
```

## Sync data

After installing sanoid, I am ready to sync data.

I first create a off dataset to have same structure as on other servers:
```bash
zfs create zfs-hdd/off
```

I 'll sync the data from OVH3 since it's the same data-center.

I created a ks1operator user on ovh3, following [creating operator on PROD_SERVER](../sanoid.md#creating-operator-on-prod_server)

I also had to make a `ln -s /usr/sbin/zfs /usr/bin/zfs` on ovh3

Then I used:

```bash
time syncoid --no-sync-snap --no-privilege-elevation [email protected]:rpool/off/images zfs-hdd/off/images
```

It took 3594 minutes, that is 60 hours or 2.5 days.

I removed old snapshots (old style) from ks1, as they are not needed here):
```bash
for f in $(zfs list -t snap -o name zfs-hdd/off/images|grep "images@202");do zfs destroy $f;done
```
the other snapshot will normally be pruned by sanoid.

## Configure sanoid

I created the sanoid and syncoid configuration.

I added ks1operator on off2.

Finally I also installed the standard sanoid / syncoid systemd units and the sanoid_check unit.

and enable them:

```bash
systemctl enable --now sanoid.timer
systemctl enable syncoid.service
systemctl enable --now sanoid_check.timer


## Firewall

As the setting will be simple (no masquerading / forwarding), we will use ufw.

```bash
apt install ufw

ufw allow OpenSSH
ufw allow http
ufw allow https
ufw default deny incoming
ufw default allow outgoing

# verify
ufw show added
# go
ufw enable
```

fail2ban is already installed, but failing with:
```
Failed during configuration: Have not found any log file for sshd jail
```
This is because the sshd daemon logs into systemd-journald, not in a log file.
To fix that, I modified `/etc/fail2ban/jail.d/defaults-debian.conf` to be:
```ini
[sshd]
enabled = true
backend = systemd
```

Addendum: after Christian installed Munin node, I added port 4949

## NGINX

### Install

I installed nginx and certbot:
```bash
apt install nginx
apt install python3-certbot python3-certbot-nginx
```

I also added the nginx.service.d override to email on failure.

### Configure

Created `confs/ks1/nginx/sites-available/images-off` akin to off1 configuration.

`ln -s /opt/openfoodfacts-infrastructure/confs/ks1/nginx/sites-available/images-off /etc/nginx/sites-enabled/images-off`

### Certificates

As I can't use certbot until having the DNS pointing to this server,
I copied the one from off1.

```bash
ssh -A off1
sudo -E bash
# see active certificates
ls -l /etc/letsencrypt/live/images.openfoodfacts.org/
# here it's 19, copy them
scp /etc/letsencrypt/archive/images.openfoodfacts.org/*19* [email protected]:

exit
exit
```

On ks1:
```bash
mkdir -p /etc/letsencrypt/{live,archive}/images.openfoodfacts.org
mv /home/alex/*19* /etc/letsencrypt/archive/images.openfoodfacts.org/
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/cert19.pem /etc/letsencrypt/live/images.openfoodfacts.org/cert.pem
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/chain19.pem /etc/letsencrypt/live/images.openfoodfacts.org/chain.pem
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/fullchain19.pem /etc/letsencrypt/live/images.openfoodfacts.org/fullchain.pem
ln -s /etc/letsencrypt/archive/images.openfoodfacts.org/privkey19.pem /etc/letsencrypt/live/images.openfoodfacts.org/privkey.pem
chown -R root:root /etc/letsencrypt/
chmod go-rwx /etc/letsencrypt/{live,archive}
```

## Testing

On my host I modified /etc/hosts to have:
```hosts
217.182.132.133 images.openfoodfacts.org
```
and visited the website with my browser, with developer tools open.

I can also use curl:
```bash
curl --resolve images.openfoodfacts.org:443:217.182.132.133 https://images.openfoodfacts.org/images/products/087/366/800/2989/front_fr.3.400.jpg --output /tmp/front_fr.jpg -v
xdg-open /tmp/front_fr.jpg
```
10 changes: 7 additions & 3 deletions docs/sanoid.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,12 +152,13 @@ mkdir /home/$OPERATOR/.ssh
vim /home/$OPERATOR/.ssh/authorized_keys
# copy BACKUP_SERVER root public key

chown -R /home/$OPERATOR
chown $OPERATOR:$OPERATOR -R /home/$OPERATOR
chmod go-rwx -R /home/$OPERATOR/.ssh
```

Adding needed permissions to pull zfs syncs
```bash
# choose the right dataset according to your needs
zfs allow $OPERATOR hold,send zfs-hdd
zfs allow $OPERATOR hold,send zfs-nvme
zfs allow $OPERATOR hold,send rpool
Expand All @@ -169,7 +170,7 @@ On BACKUP_SERVER, test ssh connection:

```bash
OPERATOR=${BACKUP_SERVER}operator
ssh $OPERATOR@<ip for server>
ssh $OPERATOR@<ip or host>
```

#### config syncoid
Expand All @@ -187,4 +188,7 @@ Use `--recursive` to also backup subdatasets.

Don't forget to create a sane retention policy (with `autosnap=no`) in sanoid on $BACKUP_SERVER to remove old data.

**Note:** because of the 6h timeout, if you have big datasets, you may want to do the first synchronization before enabling the service.
**Note:** because of the 6h timeout, if you have big datasets, you may want to do the first synchronization before enabling the service.

**Important:** try to have a good hierarchy of datasets, and separate what's from the server and what's from other servers.
Normally we put other servers backups in a off-backups dataset. It's important not to mix it with backups dataset which is for the server itself.
Loading