Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: off1 hardware upgrade #278

Merged
merged 20 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/promox.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ Then connect to the proxmox host:
* Install useful package and do some other configurations:
`sudo /root/cluster-scripts/ct_postinstall` (or `/opt/openfoodfacts-infrastructure/scripts/proxmox-management/ct_postinstall`) choose the container ID when asked.

See [scripts/proxmox-management/ct_postinstall](https://github.com/openfoodfacts/openfoodfacts-infrastructure/blob/637405791d49abe03f667dae22bd89399ec3c53e/scripts/proxmox-management/ct_postinstall)
See [scripts/proxmox-management/ct_postinstall](https://github.com/openfoodfacts/openfoodfacts-infrastructure/blob/develop/scripts/proxmox-management/ct_postinstall)

* [create a user](#how-to-create-a-user-in-a-container-or-vm)

Expand All @@ -316,6 +316,7 @@ Using the web interface:
* Target: ovh3
* Schedule: */5 if you want every 5 minutes (takes less than 10 seconds, thanks to ZFS)

Also think about [configuring email](./mail.md#postfix-configuration) in the container

## Logging in to a container or VM

Expand Down
4 changes: 2 additions & 2 deletions docs/reports/2023-02-17-off2-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -281,7 +281,7 @@ We set some properties and rename it from off-zfs to zfs-nvme and create the zfs
**EDIT:** on 2023-06-13, I re-created the zpool (it was lost in between, until we changed nvme disks).
```bash
$ zpool destroy testnvme
$ zpool create -o ashift=12 testnvme mirror nvme1n1 nvme0n1
$ zpool create -o ashift=12 zfs-nvme mirror nvme1n1 nvme0n1
$ zpool add zfs-nvme log nvme2n1
zpool status zfs-nvme
pool: zfs-nvme
Expand All @@ -308,7 +308,7 @@ we also receive the data from rpool2 back here:

### zfs-hdd pool

First we create partitions for those rpool.
First we create partitions for this new pool.
For each sda/sdb/sdc/sdd:
```bash
parted /dev/sdX mkpart zfs-hdd zfs 70g 100%
Expand Down
308 changes: 308 additions & 0 deletions docs/reports/2023-12-08-off1-upgrade.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,308 @@
# 2023-12-18 off1 upgrade

This is the same operation as [# 2023-02-17 Off2 Upgrade](./2023-02-17-off2-upgrade.md)
but for off1.

We will:
* add four 14T disks
* add an adapter card for SSD
* add two 2T nvme disk and one 14G optane, while keeping existing nvme
* completely reinstall the system with Proxmox 7.4.1
* rpool in mirror-0 for the system
* using a 70Gb part on all hdd disks
* zfs-hdd in raidz1-0 for the data
* using a 14T-70G par on all hdd disks
* zfs-nvme in raidz1-0 for data that needs to be fast
* using the two 2T nvme
* using 8G part on octane for logs

## 2023-12-18 server physical upgrade at free datacenter

## 2023-12-21 continuing server install

### First ssh connexion

Using root: `ssh root@off1 -o PubkeyAuthentication=no`

### Update and add some base packages

Change `/etc/apt/sources.list.d/pve-enterprise.list` for `/etc/apt/sources.list.d/pve-install-repo.list`,
with inside:

```bash
apt update
apt upgrade
apt install munin-node sudo vim parted tree etckeeper rsync screen fail2ban git curl htop lsb-release bsd-mailx
```

### Configure locale

```bash
vim /etc/locale.gen
# uncomment fr_FR.UTF-8, and exit
locale-gen
```

### Network configuration

In `/etc/network/interfaces` added the `vmbr1` bridge interface (on eno2):
```conf
auto vmbr1
iface vmbr1 inet static
address 10.0.0.1/8
bridge-ports eno2
bridge-stp off
bridge-fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s '10.1.0.0/16' -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '10.1.0.0/16' -o vmbr0 -j MASQUERADE
```

Then `systemctl restart networking`

And verify with `ip address list` and `ip route list`
```
default via 213.36.253.222 dev vmbr0 proto kernel onlink
10.0.0.0/8 dev vmbr1 proto kernel scope link src 10.0.0.1
213.36.253.192/27 dev vmbr0 proto kernel scope link src 213.36.253.206
```

### Creating users


First created off user (to ensure it have 1000 id):

```bash
adduser --shell /usr/sbin/nologin off
```

Then add other sudo users, like:
```bash
adduser alex
...
adduser alex sudo
```
and copy ssh keys.

I also copied password hash for off2 to off1 (user can then decide to change their password altogether).

### Creating ZFS pools

We already have rpool where the distribution is installed.


#### Partition disks


First we need to create partitions on our four HDD to be part of `zfs-hdd`.
They already have a partition for the system (participating in `rpool`)

Running:
```bash
for name in a b c d; do parted /dev/sd$name print; done
```
show us the same pattern for all disks:
```
Model: ATA TOSHIBA MG07ACA1 (scsi)
Disk /dev/sdd: 14,0TB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 17,4kB 1049kB 1031kB bios_grub
2 1049kB 1075MB 1074MB fat32 boot, esp
3 1075MB 69,8GB 68,7GB zfs
```

We can print sectors to know where to start next partition more precisely:

```bash
for name in a b c d; do parted /dev/sd$name 'unit s print'; done

...
Number Start End Size File system Name Flags
...
3 2099200s 136314880s 134215681s zfs

```

We add 2048 to 136314880 to be better aligned.

So now we create the partitions on the remaining space:
```bash
for name in a b c d; do parted /dev/sd$name mkpart zfs-hdd zfs 136316928s 100%; done
```

We also want to partition the SSDs and octane.

Listing them to know which is which:
```bash
for device in /dev/nvme?;do echo $device "--------";smartctl -a $device|grep -P '(Model|Size/Capacity)';done

/dev/nvme0 --------
Model Number: WD_BLACK SN770 2TB
Namespace 1 Size/Capacity: 2 000 398 934 016 [2,00 TB]
/dev/nvme1 --------
Model Number: INTEL MEMPEK1J016GA
Namespace 1 Size/Capacity: 14 403 239 936 [14,4 GB]
/dev/nvme2 --------
Model Number: Samsung SSD 970 EVO Plus 1TB
Namespace 1 Size/Capacity: 1 000 204 886 016 [1,00 TB]
```
(at the time of install, one 2TB SSD is missing because it was impossible to boot with it).

So:
* /dev/nvme0 is the 2TB SSD
* /dev/nvme1 is the octane (14,4 GB)
* /dev/nvme2 is the (old) 1TB SSD

I follow the partitioning of off2:

```bash
# 2TB SSD is devoted entirely to zfs-nvme
parted /dev/nvme0n1 mklabel gpt
parted /dev/nvme0n1 mkpart zfs-nvme zfs 2048s 100%

# Octane is divided as log for zfs-hdd and zfs-nvme
parted /dev/nvme1n1 mklabel gpt
parted /dev/nvme1n1 \
mkpart log-zfs-nvme zfs 2048s 50% \
mkpart log-zfs-hdd zfs 50% 100%

# 1TB SSD will be cache for zfs-hdd
parted /dev/nvme2n1 mklabel gpt
# we need to have a xfs partition, I don't know exactly why !
# but without it, the zfs partition is changed to a xfs one…
parted /dev/nvme2n1 \
mkpart xfs 2048s 64G \
mkpart zfs-hdd-cache zfs 64G 100%
```

We can see all our partition on the disk:
```bash
ls /dev/sd?? /dev/nvme?n1p?

lsblk
```

#### Creating zfs pools

We creates a zfs-hdd pool with partitions mounted as zraid1 and (sda4, sdb4, sdc4 and sdd4) and a partition on the octane disk as log and some properties.

```bash
zpool create zfs-hdd -o ashift=12 raidz1 sda4 sdb4 sdc4 sdd4
zfs set compress=on xattr=sa atime=off zfs-hdd
zpool add zfs-hdd log nvme1n1p2
zpool add zfs-hdd cache nvme2n1p2
```

Note:
Doing the later I got: `/dev/nvme1n1p2 is part of potentially active pool 'zfs-hdd'`
This is because octane was used in a previous install on off2.
I just did: `zpool labelclear -f nvme1n1`, `zpool labelclear -f nvme1n1p2` and `zpool labelclear -f nvme2n1p2`
(for real, tried to clear the label on any not yet used partition).

We creates a zfs-nvme pool but with only nvme0n1p1, as it's the only SSD and a partition on the octane disk as log and some properties. It can't be a mirror yet because there is only one device… we will have to backup, destroy and re-create it with new nvme to be able to have a mirror.

```bash
zpool create zfs-nvme -o ashift=12 nvme0n1p1
zfs set compress=on xattr=sa atime=off zfs-nvme
zpool add zfs-nvme log nvme1n1p1
```

## Joining PVE cluster

It's time to join the cluster. We will join it using internal ip !


### preparing /etc/hosts

I edited /etc/hosts on off1 to have:

```conf
127.0.0.1 localhost.localdomain localhost
# 213.36.253.206 off1.openfoodfacts.org off1
10.0.0.1 off1.openfoodfacts.org off1 pve-localhost
10.0.0.2 off2.openfoodfacts.org off2
...
```

And on off2:
```conf
127.0.0.1 localhost.localdomain localhost
10.0.0.2 off2.openfoodfacts.org off2 pve-localhost
#213.36.253.208 off2.openfoodfacts.org off2
10.0.0.1 off1.openfoodfacts.org off1
...
```

### creating the cluster on off2

See [official docs](https://pve.proxmox.com/pve-docs-6/chapter-pvecm.html)

On off2:
```bash
pvecm create off-free
pvecm status
```

### joining the cluster from off1

On off1
```bash
pvecm add 10.0.0.2 --fingerprint "43:B6:2A:DC:BF:17:C8:70:8F:3C:A4:A8:2D:D5:F8:24:18:6B:78:6D:24:8A:65:DA:71:04:A3:FE:E0:45:DE:B6"
```

Note: first time I did it without `--fingerprint` option.
I verified the fingerprint by looking at the certificate of proxmox manager in firefox.

### Using systemd-timesyncd

As it's [proposed by Proxmox guide](https://pve.proxmox.com/pve-docs-6/pve-admin-guide.html#_time_synchronization)
I installed `systemd-timesyncd` on off1 and off2.


### adding the storages

We create pve pools:

```bash
zfs create zfs-hdd/pve
zfs create zfs-nvme/pve
```
They are immediatly availabe in proxmox !

```
pvesm status
Name Type Status Total Used Available %
backups dir active 39396965504 256 39396965248 0.00%
local dir active 64475008 11079168 53395840 17.18%
zfs-hdd zfspool active 39396965446 139 39396965307 0.00%
zfs-nvme zfspool active 1885863288 96 1885863192 0.00%
```

Also the backups dir automatically get on zfs-hdd, but I don't really know why !
`cat /etc/pve/storage.cfg` helps see that.


## Getting containers templates

See [proxmox docs on container images](https://pve.proxmox.com/wiki/Linux_Container#pct_container_images)

```bash
pveam update
pveam available|grep 'debian-.*-standard'
pveam download local debian-11-standard_11.7-1_amd64.tar.zst
pveam download local debian-12-standard_12.2-1_amd64.tar.zst
```

## Adding openfoodfacts-infrastructure repository

Added root ssh pub key (`cat /root/.ssh/id_rsa.pub`) as a [deploy key to github infrastructure repository](https://github.com/openfoodfacts/openfoodfacts-infrastructure/settings/keys)



```bash
cd /opt
git clone [email protected]:openfoodfacts/openfoodfacts-infrastructure.git
```
Loading
Loading