Skip to content

v3_lhsm_tuto

Thomas Leibovici edited this page Apr 28, 2020 · 39 revisions

Quick start: Lustre/HSM with robinhood v3

Table of Contents

Installation

This section briefly list the steps for installing and configuring robinhood to run Lustre/HSM policies.

For more details about installation, software and hardware requirements, tunings, etc. refer to: Admin guide: Installation.

Lustre/HSM

First, setup Lustre/HSM components:

  • Install Lustre >= 2.5.
  • Enable Lustre MDT changelogs (remember the returned client id):
    lctl --device <fsname>-MDT0000 changelog_register
  • Set the changelog mask to "ALL-ATIME" (logging atime strongly impacts MDS performance so it is not to be used on production):
    > lctl set_param mdd.<fsname>-MDT*.changelog_mask=ALL-ATIME

Then check it:

    > lctl get_param mdd.<fsname>-MDT*.changelog_mask
    MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO OPEN CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME
  • Enable hsm coordinator (on MDS):
    echo enabled > /proc/fs/lustre/mdt/<fsname>-MDT0000/hsm_control
  • Setup and run copytool daemons for your specific HSM backend. They run on dedicated Lustre clients.
Refer to Lustre documentation for more details about these steps: Lustre manual

Robinhood

Requirements

  • Robinhood must run on a Lustre client. For maximum compatibility, it is recommended to run the same major version of Lustre as the servers.
  • Robinhood uses a MySQL or MariaDB database storage backend. It is recommended to install the DB server on the same host as robinhood to ensure a minimum latency for DB operations.
Install and start MySQL on RHEL 6:
 yum install mysql-server
 service mysqld start

Install and start MariaDB on RHEL 7:

 yum install mariadb-server
 systemctl start mariadb.service

/!\ Default database configuration is not suitable for production and will result in very low performances. See Admin guide: database tunings for recommended database configuration.

Installation

  • Download 'robinhood-lustre' and 'robinhood-adm' packages from http://sourceforge.net/projects/robinhood/files/robinhood and install them on the robinhood host.
    • Make sure to get the 'robinhood-lustre' package for the version of Lustre you run e.g. robinhood-lustre-3.0-1.'''lustre2.5'''.el6.x86_64.rpm for lustre 2.5.

Configuration

  • Create robinhood database, using rbh-config helper.
 rbh-config create_db <db_name>    'localhost' 'rbh_password'
    • A common name for robinhood database name is 'rbh_fsname.
    • Write the selected password to a file only readable by 'root' (600), for example in /etc/robinhood.d/.dbpassword.
  • Create a robinhood configuration file, starting with a simple robinhood template:
 cp /etc/robinhood.d/templates/basic.conf /etc/robinhood.d/<fsname>.conf
  • Or start with the specific lustre/HSM example config:
 cp /etc/robinhood.d/templates/example_lhsm.conf /etc/robinhood.d/<fsname>.conf
  • Edit the configuration file:
    • In 'General' block, set Lustre filesystem root path, and 'lustre' filesystem type:
 fs_path = "/fs/root";
 fs_type = lustre;
    • In 'ListManager' block, set database connection parameters:
 # database name passed to 'rbh-config create_db'
 db = <db_name>;
 password_file = "/etc/robinhood.d/.dbpassword" ;
    • In 'ChangeLog' block, check that the specified 'reader_id' matches the id retuned by 'lctl changelog_register':
 reader_id = "cl1";
    • Include policy definitions for Lustre/HSM:
 %include "includes/lhsm.inc"
      • This include defines Lustre/HSM policies: 'lhsm_archive', 'lhsm_release' and 'lhsm_remove'.
      • This sets up specific database fields to manage Lustre/HSM status and other entry specific properties.
It is also recommended to define your fileclasses before running the initial filesystem scan:
  • This way, you will get relevent information in 'rbh-report --class-info' report after the initial scan is completed.
  • This will allow optimizations when running policies (e.g. skip processing of 'ignored' classes).
To define fileclasses, think about the different policies you want to apply to the different entries, depending on their attributes:
  • What entries you don't want to archive or release from filesystem?
  • Do you need to execute a different actions for some entries, or use specific action parameters (e.g. 'archive_id', copytool-specific archive parameters...)?
  • Do you want to keep some entries in the filesystem longer than others?
  • Do you want to archive some entries sooner than others?
Examples:
 fileclass empty_file {
    definition { type == file and size == 0 }
 }
 fileclass small_file {
    definition { type == file
             and size > 0
             and size <= 32MB }
 }
 fileclass noarchive {
    definition { name == "*.log" or name == "*.o"
              or name == ".*.swp" or  name == "#*#" }
 }
 fileclass archive1 {
    definition { tree == "/fs/dir1" }
    # specify target archive_id for 'lhsm_archive' policy
    lhsm_archive_action_params {
        archive_id = 1;
    }
 }
 fileclass archive2 {
    definition { tree == "/fs/dir2" }
    # specify target archive_id for 'lhsm_archive' policy
    lhsm_archive_action_params {
        archive_id = 2;
    }
 }

Feeding robinhood

To populate robinhood DB, follow these steps:

  • 1) Enable changelogs (this should have been done in installation steps above).
  • 2) Run the initial scan
  • 3) Run robinhood daemon to continuously read changelogs

Initial scan

  • If you want to run the initial scan in a terminal and see the log messages in this terminal, run:
 robinhood --scan --once -L stderr
  • If you prefer running it in background (and display messages into robinhood log):
 robinhood --scan --once -d

Running changelog reader

  • You can run a changelog reader test by reading pending changelog records, then exit:
 robinhood --readlog --once -L stderr
  • To start a robinhood daemon to read changelog continuously:
    • Edit /etc/sysconfig/robinhood to indicate that we just want robinhood daemon to read changelogs, not yet run policies:
 RBH_OPT="--readlog"
    • Start robinhood service:
 # on RHEL 6:
 service robinhood start
 # on RHEL 7:
 systemctl start robinhood.service

Monitoring scan/changelog progress

You can monitor scan progress, or changelog reader activity by looking at robinhood statistics (dumped every 15min by default):

 grep STATS /var/log/robinhood.log

Filesystem reports

Once you have run the initial scan and started a changelog reader, robinhood database reflects the filesystem state and is updated near real-time. Robinhood comes with several reporting and querying commands:

  • rbh-report provides overall reports about filesystem contents (HSM status, users and groups usage, file size profile, fileclasses...)
  • rbh-find implements classic 'find' command, except that it queries robinhood database instead of the filesystem, which makes it faster. Moreover, it provides specific options to query entries per HSM status and other Lustre-specific attributes.
  • rbh-du is a enhanced version of classic 'du' command. It queries robinhood database instead of the filesystem, which makes it faster. It can also report details about entry types, count, etc.
rbh-report examples:
 # filesystem entries:
 # rbh-report --fs-info
 type    ,    count,   volume, avg_size
      dir,  1780074,  8.02 GB,  4.72 KB
     file, 21366275, 91.15 TB,  4.47 MB
  symlink,   496142, 24.92 MB,       53
 # user info, split by group
 # rbh-report -u bar -S
 user , group,  type,  count,  spc_used,   avg_size
 bar  , proj1,  file,      4,  40.00 MB,   10.00 MB
 bar  , proj2,  file,   3296, 947.80 MB,  273.30 KB
 bar  , proj3,  file, 259781, 781.21 GB,    3.08 MB
 # file size profile for a given user
 # rbh-report -u foo --szprof
 user, type,  count,    volume,  avg_size,   0,  1~31,  32~1K-, 1K~31K, 32K~1M-, 1M~31M, 32M~1G-, 1G~31G, 32G~1T-, +1T
 foo ,  dir,     48,   1.48 MB,  31.67 KB,   0,     0,       0,     26,      22,      0,       0,      0,       0,   0
 foo , file,  11055, 308.16 GB,  28.54 MB,   2,     0,      14,     23,    5276,   5712,       9,     17,       2,   0
 # top disk space consumers
 # rbh-report --top-users
 rank, user    , spc_used,  count, avg_size
   1, usr0021 , 11.14 TB, 116396, 100.34 MB
   2, usr3562 ,  5.54 TB,    575,   9.86 GB
   3, usr2189 ,  5.52 TB,   9888, 585.50 MB
   4, usr2672 ,  3.21 TB, 238016,  14.49 MB
   5, usr7267 ,  2.09 TB,   8230, 266.17 MB
 ...
 # HSM status
 # rbh-report --status-info lhsm
 lhsm.status   ,     type,      count,   spc_used,     volume,   avg_size
            new,     file,       3296   947.80 MB,  879.68 MB,  273.30 KB
      archiving,     file,          5,    5.00 MB,    5.00 MB,    1.00 MB
 

But also:

  • --top-size Report largest files in the filesystem.
  • --entry-info Report all information about a given entry.
  • Run rbh-report --help to get the full list of available reports.
rbh-find example:
 # rbh-find /mnt/lustre/dir -u root -size +32M -mtime +1h -ost 2 -status lhsm:new -ls

rbh-du examples:

 # rbh-du -H -u foo /mnt/lustre/dir.3
 45.0G /mnt/lustre/dir.3

Configuring policies for Lustre/HSM

Overview

Lustre/HSM actions

Lustre/HSM manages the following actions to implement hierarchical storage:

  • archive: copy of a file from Lustre to a storage backend.
  • release: release file data from Lustre (must be archived and non-dirty).
  • restore: copy-back a file from the backend to Lustre (must be released).
  • remove: remove a copy from the backend (allow cleaning a file from the backend after it has been deleted from Lustre).

Robinhood policies for Lustre/HSM

Policy template /etc/robinhood.d/includes/lhsm.inc (installed with robinhood) defines the following policies for scheduling Lustre/HSM actions:

  • lhsm_archive: schedules archive actions on filesystem entries.
  • lhsm_release: schedules release actions on filesystem entries.
  • lhsm_remove: schedules remove actions for deleted filesystem entries.

Robinhood lhsm states

The following diagram shows the states managed by robinhood for Lustre/HSM entries, and the transitions between states:

 NEW (no flag)  ---- archive ----->    SYNCHRO    ----- release --->    RELEASED
                                        |   ^     <----- restore ----
                                        |   |
                                     write archive
                                        |   |
                                        v   |
                                      MODIFIED

'lhsm_archive' policy

This policy allows scheduling archive actions for Lustre/HSM.

Robinhood archives data incrementally. In other words, it only copies new or modified files but do not copy unchanged files multiple times.

Simple lhsm_archive policy

Robinhood makes it possible to define different rules for several file classes.
In this first example, we only define a single policy rule for all files. This is done by specifying a 'default' rule.
Archive policy rules are defined in a lhsm_archive_rules block:

 lhsm_archive_rules {
    rule default {
        condition {last_mod > 1h}
    }
 }
  • 'default' policy is a special policy that applies to files that don't match other rules.
  • In a policy, you must specify a condition for allowing entries to be archived. In this example, we don't want to copy recently modified entries (modified within the last hour).

Detailed example of lhsm_archive policy

This example is more featured. It makes the assumption that the used fileclasses have been previously defined in the configuration file.

 lhsm_archive_rules {
     # don't archive empty files
     ignore { size == 0 }
 
     # don't archive files matching fileclasses 'tmp_files' or 'trash'
     ignore_fileclass = tmp_files;
     ignore_fileclass = trash;
 
     # quickly archive some kind of data.
     rule fast_archive {
         target_fileclass = result_data;
         target_fileclass = important_data;
         # archive 1h after last modification or 6h after last archive
         condition { last_mod > 1h or last_archive > 6h }
     }
 
     # archive some data to archive_id=2
     rule foo_archive {
         target_fileclass = fooproject;
         action_params { archive_id = 2; }
         condition { last_mod > 6h or last_archive > 1d }
     }
 
     rule bar_archive {
         target_fileclass = barproject;
         # pass custom arguments to Lustre/HSM copytool
         action_params { cos = 23; stripe_count = 2; }
         condition { last_mod > 6h or last_archive > 1d }
     }
 }

Manual policy run

To run the policy manually on all eligible files (new or modified), execute:

 robinhood --run=lhsm_archive --target=all

If you want to restrict your testing to a subset of files, you can specify one of the following arguments for --target option:

  • all : run on all filesystem entries (matching the policy scope).
  • user:''username'' : run on entries of the given user.
  • group:''groupname'' : run on entries of the given group.
  • file:''path'' : run on the given entry.
  • class:''fileclass'' : run on entries in the given fileclass.
  • ost:''ost_idx'' : run on entries stored on the given OST index.
  • pool:''poolname'' : run on entries in the given OST pool.

Automatic policy run

The most common configuration is to run robinhood as a daemon that automatically runs policies. In this case, policy runs are started by triggers.

This defines a simple trigger for 'lhsm_archive' policy, to run it every 15 min:

 lhsm_archive_trigger {
     trigger_on = scheduled;
     check_interval = 15min;
 }

There are several types of triggers. See Admin guide: Triggers for more details about policy triggers.

If started with one of the following manners, Robinhood daemon will automatically run the policy at scheduled interval:

  • service robinhood start: runs robinhood daemon with the options specified in /etc/sysconfig/robinhood (no option by default, like for the next item).
  • robinhood -d: runs robinhood daemon to continuously read changelogs, and regularly check triggers of all policies.
  • robinhood --run=all -d: runs a daemon that regularly check triggers for all policies.
  • robinhood --run=lhsm_archive -d: runs a daemon that regularly check triggers for 'lhsm_archive' policy only.

'lhsm_release' policy

This policy allows scheduling release actions for Lustre/HSM.

Robinhood only release entries whose state is 'synchro'. In other words, it only release data which has been archived to the storage backend and not modified since it was archived.

lhsm_release example

Defining 'lhsm_release' policy is done the same way as 'lhsm_archive' policies, by defining rules with target fileclasses, conditions... The only difference is the name of the block: lhsm_release_rules.

Example:

 lhsm_release_rules {
     # never release very small files
     ignore { size < 16K }
 
     # don't release this data which is in the fowolling fileclasses
     ignore_fileclass = always_online;
     ignore_fileclass = boot_images;
 
     # don't release small files to quickly to reduce tape library mount operations
     rule keep_longer {
         target_fileclass = small_files;
         condition { last_access > 90d }
     }
 
     # default rule to release other data
     rule default {
         condition { last_access > 6h }
     }
 }

lhsm_release triggers

As any robinhood policy, you need to specify triggers to run policies automatically.
The most common configuration for a HSM system is to release data when space is missing in the top tier which is Lustre. So, for Lustre/HSM the recommended trigger is on OST usage:

 lhsm_release_trigger {
     trigger_on = ost_usage;
     high_threshold_pct = 95%;
     low_threshold_pct = 93%;
     check_interval = 5min;
 }

With this trigger definition: if an OST gets full (usage > 95%), robinhood will release eligible files from this OST by applying lhsm_release policy rules until the usage of this OST is back to 93%.

Running lhsm_release policy

In a daemon

As for 'lhsm_archive' policy, you can run 'lhsm_release' policy in a robinhood daemon. This is the case when you run one of the folowing command:

  • service robinhood start: runs robinhood daemon with the options specified in /etc/sysconfig/robinhood (no option by default, like for the next item).
  • robinhood -d: runs robinhood daemon to continuously read changelogs, and regularly check triggers of all policies.
  • robinhood --run=all -d: runs a daemon that regularly check triggers for all policies.
  • robinhood --run=lhsm_release -d: runs a daemon that regularly check triggers for 'lhsm_release' policy only.
In command line (one shot run)

For one-shot runs, it is highly recommended to specify a limit in applying 'lhsm_release' policy (by specifying '--target-usage', '--max-count' or '--max-volume' arguments), else you would release all archived files of your Lustre filesystem!

 # this releases entries until the overall filesystem space usage decreases to 90%
 robinhood --run=lhsm_release(target-usage=90%)
 # this releases entries from OST 22 until its usage is back to 85%
 robinhood --run=lhsm_release(target=ost:22,target-usage=85%)
 # release up to 1000 entries of fileclass 'foo'
 robinhood --run=lhsm_release(target=class:foo,max-count=1000)

For more details about possible policy run options, see: Admin guide: Command lines to run policies

'lhsm_remove' policy

The purpose of this policy is to clean entries in the archive backend after they have been deleted from Lustre. It makes it possible to control the delay before cleaning the archive of an entry, thus allowing to 'undelete' it if it was deleted by error. This undelete can be done using rbh-undelete command (see Admin guide: rbh-undelete).
This policy is particular because it applies to entries that no longer exists.
However, specifying this policy is very similar to other policies using policy rules, triggers... So you can specify different removal delays for different fileclasses.

A common criteria for policy rules condition is a special attribute rm_time which is the time elapsed since the entry was deleted in the filesystem.

lhsm_remove example

 lhsm_remove_rules {
    # Never remove archived version of entries matching 'important' fileclass,
    # even if they are deleted from Lustre.
    ignore_fileclass = important;
 
    # keep documents and code files archives 6 months after their removal
    rule keep_1month {
        target_fileclass = documents;
        target_fileclass = code;
        condition { rm_time > 180d }
    }
 
    # keep other entries during 1 months
    rule default {
        condition { rm_time > 30d }
    }
 }

This kind of policy is commonly triggered at scheduled interval (e.g. every hour):

 lhsm_remove_trigger {
     trigger_on = scheduled;
     check_interval = 1h;
 }

This policy can be run like any other (as a daemon, or command line): refer to examples for 'lhsm_release' policy.