Skip to content

Commit

Permalink
Policies: document deterministic and non-deterministic PFN algorithms (
Browse files Browse the repository at this point in the history
…#332)

* Policies: document deterministic and non-deterministic PFN algorithms #5129

* Clarify difference between deterministic and non-deterministic PFN algorithms

* Don't imply that file can only be in a single dataset
  • Loading branch information
jamesp-epcc authored Oct 11, 2024
1 parent 7ed9670 commit 74742bf
Showing 1 changed file with 34 additions and 6 deletions.
40 changes: 34 additions & 6 deletions docs/operator/policy_packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,16 +75,19 @@ returns a dictionary of custom algorithms implemented within the package.
In fact, this structure should be a "dictionary of dictionaries" where
the outer dictionary contains algorithm types, and each inner
dictionary contains all the algorithms provided by the package for that
type. Currently supported types are `surl` for SURL algorithms,
`lfn2pfn` for LFN2PFN algorithms, and `scope` for scope extraction
algorithms.
type. Currently supported types are `lfn2pfn` for generating PFNs for
deterministic storage, `non_deterministic_pfn` for generating PFNs for
non-deterministic storage, and `scope` for scope extraction algorithms.
(For backwards compatibility, `surl` can be used in place of
`non_deterministic_pfn`, however this is not recommended for new policy
packages).

Example:

```python
def get_algorithms():
return { 'surl':
{ 'voname_surl': construct_surl_voname },
return { 'non_deterministic_pfn':
{ 'voname_non_deterministic_pfn': construct_non_deterministic_pfn_voname },
'lfn2pfn':
{ 'voname_lfn2pfn': lfn2pfn_voname },
'scope':
Expand All @@ -95,6 +98,31 @@ In all cases the names used to register the functions must be prefixed
with the name of the virtual organisation that owns the policy package,
to avoid naming conflicts on multi-VO Rucio installations.

### lfn2pfn vs. non_deterministic_pfn algorithms

`lfn2pfn` algorithms and `non_deterministic_pfn` algorithms are
conceptually similar, but there are important differences between
them. Both produce a physical filename for use on an RSE, however
`lfn2pfn` algorithms can only be used on deterministic RSEs - for
example, disk systems where the appropriate physical filename can be
derived from the file's scope and name alone (as well as
protocol-specific information for the RSE in question).
`non_deterministic_pfn` algorithms are used on non-deterministic
RSEs (most often tape systems), and may use additional information
about the file (such as its metadata, any datasets that it is a part
of, etc.) to construct the physical filename. Because files cannot
be uploaded directly to non-deterministic storage,
`non_deterministic_pfn` algorithms are only ever called for
replications, but `lfn2pfn` algorithms can also be called for
initial uploads.

The `lfn2pfn` algorithm to be used is determined by the
`lfn2pfn_algorithm` attribute of the relevant RSE. If this is not set,
the `lfn2pfn_algorithm_default` value from the `[policy]` section of
the config file is used instead. The `non_deterministic_pfn` algorithm
to be used is determined by the `naming_convention` attribute of the
relevant RSE.

## Adding a new algorithm class

The system for registering algorithms within policy packages is
Expand All @@ -109,7 +137,7 @@ relatively easily. The basic workflow is as follows:
will differ depending on what the new class actually does and how it
integrates with the Rucio code, but typically the algorithm name to
be used will be selected by a value in the config file, as for the
current `lfn2pfn` and `surl` algorithm types.
current `lfn2pfn` and `non_deterministic_pfn` algorithm types.
- Before the algorithm is called for the first time, the core Rucio
code should call `rucio.common.utils.register_policy_package_algorithms`
to import the algorithms for this class from the policy package and
Expand Down

0 comments on commit 74742bf

Please sign in to comment.