Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update page about setting up a private Stratum 1 #157

Merged
merged 26 commits into from
Jun 5, 2024
Merged
Changes from 7 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0ecd2bb
update page for setting up a private stratum 1
bedroge Feb 15, 2024
d2cbf84
fix typo in bandwidth
bedroge Feb 15, 2024
12900ff
add sentence about SSH keys and sudo
bedroge Feb 15, 2024
6bbc5a0
make site-specific vars file optional
bedroge Feb 16, 2024
0ff336a
add warning about IPS
bedroge Feb 16, 2024
f830adb
added sentence about downside of https
bedroge Feb 16, 2024
b018fb4
change headers of subsections
bedroge Feb 16, 2024
a4851b0
add recommendation for having squid proxies
bedroge Apr 12, 2024
d555b4b
fix typo in mechanisms
bedroge Apr 12, 2024
2b32fa9
reword sentence about replicating from stratum 0 a bit
bedroge Apr 12, 2024
ba51497
discourage https
bedroge Apr 12, 2024
40668b1
extend paragraph about geo api, instructions for disabling it on the …
bedroge Apr 12, 2024
7c6742c
remove note about Squid proxy on Stratum 1, as it's now disabled by d…
bedroge Apr 12, 2024
0fac17c
remove cache hit example
bedroge Apr 12, 2024
93b1c34
use eessi.io instead of eessi-hpc.org
bedroge Apr 12, 2024
46a2f2f
fix typo in however
bedroge Apr 12, 2024
59e2f8d
remove -p ./roles in ansible-galaxy command
bedroge Jun 4, 2024
9825341
add link to stratum 1 page
bedroge Jun 4, 2024
c6adda4
add section about proxy configuration
bedroge Jun 4, 2024
5ac0a48
add section about configuring an additional stratum 1
bedroge Jun 4, 2024
52bbfd7
move client config part to native installation page
bedroge Jun 4, 2024
d8bb91e
fix link
bedroge Jun 4, 2024
47e3824
fix link
bedroge Jun 4, 2024
8c8ce18
correct paragraph about /srv
bedroge Jun 4, 2024
020139c
remove instructions for mounting an additional file system
bedroge Jun 4, 2024
9e97d4f
remove note, rearrange the sections, add section for larger systems
bedroge Jun 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 73 additions & 83 deletions docs/filesystem_layer/stratum1.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,69 @@
# Setting up a Stratum 1

Setting up a Stratum 1 involves the following steps:

- set up the Stratum 1, preferably by running the Ansible playbook that we provide;
- request a Stratum 0 firewall exception for your Stratum 1 server;
- request a `<your site>.stratum1.cvmfs.eessi-infra.org` DNS entry;
- open a pull request to include the URL to your Stratum 1 in the EESSI configuration.

The last two steps can be skipped if you want to host a "private" Stratum 1 for your site.

The EESSI project provides a number of geographically distributed public Stratum 1 servers that you can use to make EESSI available on your machine(s).
If you want to be better protected against network outages and increase the bandwidth between your cluster nodes and the Stratum 1 servers,
you could consider setting up a local (private) Stratum 1 server that replicates the EESSI CVMFS repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should say one or multiple (private) Stratum 1 (s)? I don't remember what our recommendation was in the best practice training, I think it was about 1 per 500 clients or so. Or was the recommendation: one stratum 1 + one proxy per 500 clients?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe add that after this sentence, or even at the end of the paragraph: "For large systems, consider setting up multiple Stratum 1 servers. Approximately one stratum 1 per 500 clients is recommended."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was about proxies, indeed. I could still say multiple here, but the advantage of having multiple is really minimal I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have a call out at the end to mention these kinds of points, doesn't need to be here already

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may even want to know how to "upgrade" your own S1 to an S0 so you can sync within your network

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is much benefit to have multiple private stratum 1 servers, but multiple proxies is a good idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a4851b0 adds a sentence with a recommendation to at least have local proxies (bit outside the scope of this page, but probably good to mention that here as well). Left the recommendation for a stratum 1 unchanged, as I don't see much value in having more than one either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, good point by @rptaylor that multiple proxies is sufficient - the only reason to have a stratum 1 is to be resilient against network outage. For that purpose, one is enough.

This guarantees that you always have a full and up-to-date copy of the entire stack available in your local network.

## Requirements for a Stratum 1

The main requirements for a Stratum 1 server are a good network connection to the clients it is going to serve,
and sufficient disk space. For the EESSI repository, a few hundred gigabytes should suffice, but for production
environments at least 1 TB would be recommended.
and sufficient disk space. As the EESSI repository is constantly growing, make sure that the disk space can easily be extended if necessary.
Currently, we recommend to have at least 1 TB available.

In terms of cores and memory, a machine with just a few (~4) cores and 4-8 GB of memory should suffice.

Various Linux distributions are supported, but we recommend one based on RHEL 7 or 8.
Various Linux distributions are supported, but we recommend one based on RHEL 8 or 9.

Finally, make sure that ports 80 and 8000 are open to clients.


## Configure the Stratum 1

Stratum 1 servers usually replicate from the Stratum 0 server.
bedroge marked this conversation as resolved.
Show resolved Hide resolved
In order to ensure the stability and security of the EESSI Stratum 0 server, it has a strict firewall, and only the EESSI-maintained public Stratum 1 servers are allowed to replicate from it.
However, EESSI provides a synchronisation server that can be used for setting up private Stratum 1 replica servers, and this is available at `http://aws-eu-west-s1-sync.eessi.science`.

!!! warn Potential issues with intrusion prevention systems
In the past we have seen a few occurrences of data transfer issues when files were being pulled in by or from a Stratum 1 server.
In such cases the `cvmfs_server snapshot` command, used for synchronizing the Stratum 1, may break with errors like `failed to download <URL to file>`.
Trying to manually download the mentioned file with `curl` will also not work, and result in errors like:
```
curl: (56) Recv failure: Connection reset by peer
```
In all cases this was due to an intrusion prevention system scanning the associated network, and hence scanning all files going in or out of the Stratum 1.
Though it was a false-positive in all cases, this breaks the synchronization procedure of your Stratum 1.
If this is the case, you can try switching to HTTPS by using `https://aws-eu-west-s1-sync.eessi.science` for synchronizing your Stratum 1.
Even though there is no advantage for CVMFS itself in using HTTPS (it has built-in mechasnims for ensuring the integrity of the data),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we always recommending running with HTTPS? Is there a downside? I'd say there might be a small speed penalty due to the encryption / decription. Maybe mention that this is why plain HTTP is the default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed the main downside, as far as I know. Added a sentence about it in f830adb.

Copy link

@rptaylor rptaylor Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only is there no advantage - it should be noted this is a disadvantage because it makes caching in forward proxies impossible (unless, hypothetically, you distribute the private TLS keys of the stratum servers to all the squids so they can do the TLS termination). I would not recommend it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a typo: mechasnims

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good point. Changed it in ba51497, and it strongly discourages HTTPS now. Also fixed the typo in d555b4b.

this will prevent the described issues, as the intrusion prevention system will not be able to inspect the encrypted data.
As HTTPS does introduce some overhead due to the encryption/decryption, it is still recommended to use HTTP as default.
bedroge marked this conversation as resolved.
Show resolved Hide resolved

Finally, make sure that ports 80 (for the Apache web server) and 8000 are open.
### Manual configuration

In order to set up a Stratum 1 manually, you can make use of the instructions in the [Private Stratum 1 replica server](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/access/stratum1/)
section of the MultiXscale tutorial ["Best Practices for CernVM-FS in HPC"](https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/).

## Step 1: set up the Stratum 1
### Configuration using Ansible

The recommended way for setting up an EESSI Stratum 1 is by running the Ansible playbook `stratum1.yml`
from the [filesystem-layer repository on GitHub](https://github.com/EESSI/filesystem-layer).
For the commands in this section, we are assuming that you cloned this repository, and your working directory is `filesystem-layer`.

Installing a Stratum 1 requires a GEO API license key, which will be used to find the (geographically) closest Stratum 1 server for your client and proxies.
More information on how to (freely) obtain this key is available in the CVMFS documentation: https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup.
!!! note GEO API
Installing a Stratum 1 usually requires a GEO API license key, which will be used to find the (geographically) closest Stratum 1 server for your client and proxies.
However, for a private Stratum 1 this can be skipped, as clients should just connect to your local Stratum 1 by default.

If you do want to set up the GEO API, you can find more information on how to (freely) obtain this key in the CVMFS documentation: https://cvmfs.readthedocs.io/en/stable/cpt-replica.html#geo-api-setup.
casparvl marked this conversation as resolved.
Show resolved Hide resolved

You can put your license key in the local configuration file `inventory/local_site_specific_vars.yml`.

You can put your license key in the local configuration file `inventory/local_site_specific_vars.yml`.
!!! note Squid reverse proxy
The Stratum 1 playbooks also installs and configures a Squid reverse proxy on the server. The template configuration file for Squid can be found at `templates/eessi_stratum1_squid.conf.j2`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is that needed? What's the point of running a Squid next to a Stratum 1 on the same machine? The clients might as well directly connect to the Stratum 1 then, no? (probably my limited knowledge, but might be something to explain here as well)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Caching of data is actually not needed, since the data is already on the same disk. I think the main usecase of this Squid is then to cache GEO API lookups, but since we recommend to disable that on private stratum 1s, this doesn't make sense. Let me check if I can easily introduce an option for this in the playbook, so that it won't set up Squid by default, unless specifically requested (which we can then do in our playbooks for the public servers).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does do some in-memory caching as well:

cache_mem 128 MB
# CERN config examples use 128 KB for both local proxies and stratum 1, but
# data objects are larger than this
maximum_object_size_in_memory 4 MB

That could perhaps still be somewhat beneficial...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @casparvl . You could just leave the memory to the OS to help cache data for httpd. IIRC Dave recommends using a reverse proxy only for some monitoring capability that OSG uses. A priori I wouldn't expect much performance benefit - if anything it could introduce a small latency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new version of our playbooks the Squid installation has been made optional, and it's disabled by default. So, to avoid any confusion, I've removed this part in 7c6742c, as I don't think it should be part of this documentation.

If you want to customize it, for instance for limiting the access to the Stratum 1, you can make your own version of this template file
and point to it by setting `local_stratum1_cvmfs_squid_conf_src` in `inventory/local_site_specific_vars.yml`.
See the comments in the example file for more details.

Furthermore, the Stratum 1 runs a Squid server. The template configuration file can be found at `templates/eessi_stratum1_squid.conf.j2`.
If you want to customize it, for instance for limiting the access to the Stratum 1, you can make your own version of this template file
and point to it by setting `local_stratum1_cvmfs_squid_conf_src` in `inventory/local_site_specific_vars.yml`.
See the comments in the example file for more details.

Start by installing Ansible:
Start by installing Ansible, e.g.:

```bash
sudo yum install -y ansible
Expand All @@ -47,58 +72,34 @@ sudo yum install -y ansible
Then install Ansible roles for EESSI:

```bash
ansible-galaxy role install -r requirements.yml -p ./roles --force
ansible-galaxy role install -r ./requirements.yml -p ./roles --force
```

Make sure you have enough space in `/srv` (on the Stratum 1) since the snapshot of the Stratum 0
will end up there by default. To alter the directory where the snapshot gets copied to you can add
this variable in `inventory/host_vars/<url-or-ip-to-your-stratum1>`:
Make sure you have enough space in `/srv` on the Stratum 1, since the snapshot of the repositories
will end up there by default. To alter the directory where the snapshots get stored you can add
the following variable in `inventory/host_vars/<url-or-ip-to-your-stratum1>`:

```bash
cvmfs_srv_mount: /srv
cvmfs_srv_mount: /lots/of/space
```

Make sure that you have added the hostname or IP address of your server to the
`inventory/hosts` file. Finally, install the Stratum 1 using one of the two following options.
Also make sure that you have added the hostname or IP address of your server to the
`inventory/hosts` file, that you are able to log in to the server from the machine that is going to run the playbook
(preferably using an SSH key), and that you can use `sudo`.

Option 1:
Finally, install the Stratum 1 using:

``` bash
# -b to run as root, optionally use -K if a sudo password is required
ansible-playbook -b [-K] -e @inventory/local_site_specific_vars.yml stratum1.yml
```

Option2:

Create a ssh key pair and make sure the `ansible-host-keys.pub` is in the
`$HOME/.ssh/authorized_keys` file on your Stratum 1 server.

```bash
ssh-keygen -b 2048 -t rsa -f ~/.ssh/ansible-host-keys -q -N ""
# -b to run as root, optionally use -K if a sudo password is required, and optionally include your site-specific variables
ansible-playbook -b [-K] [-e @inventory/local_site_specific_vars.yml] stratum1.yml
```

Then run the playbook:

```bash
ansible-playbook -b --private-key ~/.ssh/ansible-host-keys -e @inventory/local_site_specific_vars.yml stratum1.yml
```

Running the playbook will automatically make replicas of all the repositories defined in `group_vars/all.yml`.


## Step 2: request a firewall exception

(This step is not implemented yet and can be skipped)
### Verification of the Stratum 1 using `curl`

bedroge marked this conversation as resolved.
Show resolved Hide resolved
You can request a firewall exception rule to be added for your Stratum 1 server by
[opening an issue on the GitHub page of the filesystem layer repository](https://github.com/EESSI/filesystem-layer/issues/new).

Make sure to include the IP address of your server.

## Step 3: Verification of the Stratum 1

When the playbook has finished your Stratum 1 should be ready. In order to test your Stratum 1, even
without a client installed, you can use `curl`.
When the playbook has finished, your Stratum 1 should be ready. In order to test your Stratum 1,
even without a client installed, you can use `curl`:

```bash
curl --head http://<url-or-ip-to-your-stratum1>/cvmfs/software.eessi.io/.cvmfspublished
Expand All @@ -115,25 +116,30 @@ The second time you run it, you should get a cache hit:

```bash
X-Cache: HIT from <url-or-ip-to-your-stratum1>

```

Example with the Norwegian Stratum 1:
Example with the EESSI Stratum 1 running in AWS:

```bash
curl --head http://bgo-no.stratum1.cvmfs.eessi-infra.org/cvmfs/software.eessi.io/.cvmfspublished
curl --head http://aws-eu-central-s1.eessi.science/cvmfs/software.eessi.io/.cvmfspublished
```

bedroge marked this conversation as resolved.
Show resolved Hide resolved
### Verification of the Stratum 1 using a CVMFS client

You can also test access to your Stratum 1 from a client, for which you will have to install the CVMFS
[client](https://github.com/EESSI/filesystem-layer#clients).

Then run the following command to add your newly created Stratum 1 to the existing list of EESSI Stratum 1 servers by creating a local CVMFS configuration file:
Then run the following command to prepend your newly created Stratum 1 to the existing list of EESSI Stratum 1 servers by creating a local CVMFS configuration file:

```bash
echo 'CVMFS_SERVER_URL="http://<url-or-ip-to-your-stratum1>/cvmfs/@fqrn@;$CVMFS_SERVER_URL"' | sudo tee -a /etc/cvmfs/domain.d/eessi-hpc.org.local
```

If this is the first time you set up the client you now run:
!!! note
By prepending your new Stratum 1 to the list of existing Stratum 1 servers, your clients should by default use the private Stratum 1.
In case of downtime of your private Stratum 1, they will also still be able to make use of the public EESSI Stratum 1 servers.

If this is the first time you set up the client, you now run:

```bash
sudo cvmfs_config setup
Expand All @@ -151,24 +157,8 @@ Finally, verify that the client connects to your new Stratum 1 by running:
cvmfs_config stat -v software.eessi.io
```

Assuming that your new Stratum 1 is the geographically closest one to your client, this should return:
Assuming that your new Stratum 1 is working properly, this should return something like:

```bash
Connection: http://<url-or-ip-to-your-stratum1>/cvmfs/software.eessi.io through proxy DIRECT (online)
```


## Step 4: request an EESSI DNS name

In order to keep the configuration clean and easy, all the EESSI Stratum 1 servers have a DNS name
`<your site>.stratum1.cvmfs.eessi-infra.org`, where `<your site>` is often a short name or
abbreviation followed by the country code (e.g. `rug-nl` or `bgo-no`). You can request this for
your Stratum 1 by mentioning this in the issue that you created in Step 2, or by opening another
issue.

## Step 5: include your Stratum 1 in the EESSI configuration

If you want to include your Stratum 1 in the EESSI configuration, i.e. allow any (nearby) client to be able to use it,
you can open a pull request with updated configuration files. You will only have to add the URL to your Stratum 1 to the
`urls` list of the `eessi_cvmfs_server_urls` variable in the
[`all.yml` file](https://github.com/EESSI/filesystem-layer/blob/main/inventory/group_vars/all.yml).
Loading