-
Notifications
You must be signed in to change notification settings - Fork 62
v3_lhsm_tuto
This section briefly list the steps for installing and configuring robinhood to run Lustre/HSM policies.
For more details about installation, software and hardware requirements, tunings, etc. refer to: Admin guide: Installation.
First, setup Lustre/HSM components:
- Install Lustre >= 2.5.
- Enable Lustre MDT changelogs (remember the returned client id):
lctl --device <fsname>-MDT0000 changelog_register
- Set the changelog mask to "ALL-ATIME" (logging atime strongly impacts MDS performance so it is not to be used on production):
> lctl set_param mdd.<fsname>-MDT*.changelog_mask=ALL-ATIME
Then check it:
> lctl get_param mdd.<fsname>-MDT*.changelog_mask MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO OPEN CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME
- Enable hsm coordinator (on MDS):
echo enabled > /proc/fs/lustre/mdt/<fsname>-MDT0000/hsm_control
- Setup and run copytool daemons for your specific HSM backend. They run on dedicated Lustre clients.
- Robinhood must run on a Lustre client. For maximum compatibility, it is recommended to run the same major version of Lustre as the servers.
- Robinhood uses a MySQL or MariaDB database storage backend. It is recommended to install the DB server on the same host as robinhood to ensure a minimum latency for DB operations.
yum install mysql-server service mysqld start
Install and start MariaDB on RHEL 7:
yum install mariadb-server systemctl start mariadb.service
/!\ Default database configuration is not suitable for production and will result in very low performances. See Admin guide: database tunings for recommended database configuration.
- Download 'robinhood-lustre' and 'robinhood-adm' packages from http://sourceforge.net/projects/robinhood/files/robinhood and install them on the robinhood host.
- Make sure to get the 'robinhood-lustre' package for the version of Lustre you run e.g. robinhood-lustre-3.0-1.'''lustre2.5'''.el6.x86_64.rpm for lustre 2.5.
- Create robinhood database, using rbh-config helper.
rbh-config create_db <db_name> 'localhost' 'rbh_password'
- A common name for robinhood database name is 'rbh_fsname.
- Write the selected password to a file only readable by 'root' (600), for example in /etc/robinhood.d/.dbpassword.
- Create a robinhood configuration file, starting with a simple robinhood template:
cp /etc/robinhood.d/templates/basic.conf /etc/robinhood.d/<fsname>.conf
- Or start with the specific lustre/HSM example config:
cp /etc/robinhood.d/templates/example_lhsm.conf /etc/robinhood.d/<fsname>.conf
- Edit the configuration file:
- In 'General' block, set Lustre filesystem root path, and 'lustre' filesystem type:
fs_path = "/fs/root"; fs_type = lustre;
- In 'ListManager' block, set database connection parameters:
# database name passed to 'rbh-config create_db' db = <db_name>; password_file = "/etc/robinhood.d/.dbpassword" ;
- In 'ChangeLog' block, check that the specified 'reader_id' matches the id retuned by 'lctl changelog_register':
reader_id = "cl1";
- Include policy definitions for Lustre/HSM:
%include "includes/lhsm.inc"
- This include defines Lustre/HSM policies: 'lhsm_archive', 'lhsm_release' and 'lhsm_remove'.
- This sets up specific database fields to manage Lustre/HSM status and other entry specific properties.
- This way, you will get relevent information in 'rbh-report --class-info' report after the initial scan is completed.
- This will allow optimizations when running policies (e.g. skip processing of 'ignored' classes).
- What entries you don't want to archive or release from filesystem?
- Do you need to execute a different actions for some entries, or use specific action parameters (e.g. 'archive_id', copytool-specific archive parameters...)?
- Do you want to keep some entries in the filesystem longer than others?
- Do you want to archive some entries sooner than others?
fileclass empty_file { definition { type == file and size == 0 } } fileclass small_file { definition { type == file and size > 0 and size <= 32MB } } fileclass noarchive { definition { name == "*.log" or name == "*.o" or name == ".*.swp" or name == "#*#" } } fileclass archive1 { definition { tree == "/fs/dir1" } # specify target archive_id for 'lhsm_archive' policy lhsm_archive_action_params { archive_id = 1; } } fileclass archive2 { definition { tree == "/fs/dir2" } # specify target archive_id for 'lhsm_archive' policy lhsm_archive_action_params { archive_id = 2; } }
To populate robinhood DB, follow these steps:
- 1) Enable changelogs (this should have been done in installation steps above).
- 2) Run the initial scan
- 3) Run robinhood daemon to continuously read changelogs
- If you want to run the initial scan in a terminal and see the log messages in this terminal, run:
robinhood --scan --once -L stderr
- If you prefer running it in background (and display messages into robinhood log):
robinhood --scan --once -d
- You can run a changelog reader test by reading pending changelog records, then exit:
robinhood --readlog --once -L stderr
- To start a robinhood daemon to read changelog continuously:
- Edit /etc/sysconfig/robinhood to indicate that we just want robinhood daemon to read changelogs, not yet run policies:
RBH_OPT="--readlog"
- Start robinhood service:
# on RHEL 6: service robinhood start # on RHEL 7: systemctl start robinhood.service
You can monitor scan progress, or changelog reader activity by looking at robinhood statistics (dumped every 15min by default):
grep STATS /var/log/robinhood.log
Once you have run the initial scan and started a changelog reader, robinhood database reflects the filesystem state and is updated near real-time. Robinhood comes with several reporting and querying commands:
- rbh-report provides overall reports about filesystem contents (HSM status, users and groups usage, file size profile, fileclasses...)
- rbh-find implements classic 'find' command, except that it queries robinhood database instead of the filesystem, which makes it faster. Moreover, it provides specific options to query entries per HSM status and other Lustre-specific attributes.
- rbh-du is a enhanced version of classic 'du' command. It queries robinhood database instead of the filesystem, which makes it faster. It can also report details about entry types, count, etc.
# filesystem entries: # rbh-report --fs-info type , count, volume, avg_size dir, 1780074, 8.02 GB, 4.72 KB file, 21366275, 91.15 TB, 4.47 MB symlink, 496142, 24.92 MB, 53
# user info, split by group # rbh-report -u bar -S user , group, type, count, spc_used, avg_size bar , proj1, file, 4, 40.00 MB, 10.00 MB bar , proj2, file, 3296, 947.80 MB, 273.30 KB bar , proj3, file, 259781, 781.21 GB, 3.08 MB
# file size profile for a given user # rbh-report -u foo --szprof user, type, count, volume, avg_size, 0, 1~31, 32~1K-, 1K~31K, 32K~1M-, 1M~31M, 32M~1G-, 1G~31G, 32G~1T-, +1T foo , dir, 48, 1.48 MB, 31.67 KB, 0, 0, 0, 26, 22, 0, 0, 0, 0, 0 foo , file, 11055, 308.16 GB, 28.54 MB, 2, 0, 14, 23, 5276, 5712, 9, 17, 2, 0
# top disk space consumers # rbh-report --top-users rank, user , spc_used, count, avg_size 1, usr0021 , 11.14 TB, 116396, 100.34 MB 2, usr3562 , 5.54 TB, 575, 9.86 GB 3, usr2189 , 5.52 TB, 9888, 585.50 MB 4, usr2672 , 3.21 TB, 238016, 14.49 MB 5, usr7267 , 2.09 TB, 8230, 266.17 MB ...
# HSM status # rbh-report --status-info lhsm lhsm.status , type, count, spc_used, volume, avg_size new, file, 3296 947.80 MB, 879.68 MB, 273.30 KB archiving, file, 5, 5.00 MB, 5.00 MB, 1.00 MB
But also:
- --top-size Report largest files in the filesystem.
- --entry-info Report all information about a given entry.
- Run rbh-report --help to get the full list of available reports.
# rbh-find /mnt/lustre/dir -u root -size +32M -mtime +1h -ost 2 -status lhsm:new -ls
rbh-du examples:
# rbh-du -H -u foo /mnt/lustre/dir.3 45.0G /mnt/lustre/dir.3
Lustre/HSM manages the following actions to implement hierarchical storage:
- archive: copy of a file from Lustre to a storage backend.
- release: release file data from Lustre (must be archived and non-dirty).
- restore: copy-back a file from the backend to Lustre (must be released).
- remove: remove a copy from the backend (allow cleaning a file from the backend after it has been deleted from Lustre).
Policy template /etc/robinhood.d/includes/lhsm.inc (installed with robinhood) defines the following policies for scheduling Lustre/HSM actions:
- lhsm_archive: schedules archive actions on filesystem entries.
- lhsm_release: schedules release actions on filesystem entries.
- lhsm_remove: schedules remove actions for deleted filesystem entries.
The following diagram shows the states managed by robinhood for Lustre/HSM entries, and the transitions between states:
NEW (no flag) ---- archive -----> SYNCHRO ----- release ---> RELEASED | ^ <----- restore ---- | | write archive | | v | MODIFIED
This policy allows scheduling archive actions for Lustre/HSM.
Robinhood archives data incrementally. In other words, it only copies new or modified files but do not copy unchanged files multiple times.
Robinhood makes it possible to define different rules for several file classes.
In this first example, we only define a single policy rule for all files. This is done by specifying a 'default' rule.
Archive policy rules are defined in a lhsm_archive_rules block:
lhsm_archive_rules { rule default { condition {last_mod > 1h} } }
- 'default' policy is a special policy that applies to files that don't match other rules.
- In a policy, you must specify a condition for allowing entries to be archived. In this example, we don't want to copy recently modified entries (modified within the last hour).
This example is more featured. It makes the assumption that the used fileclasses have been previously defined in the configuration file.
lhsm_archive_rules { # don't archive empty files ignore { size == 0 } # don't archive files matching fileclasses 'tmp_files' or 'trash' ignore_fileclass = tmp_files; ignore_fileclass = trash; # quickly archive some kind of data. rule fast_archive { target_fileclass = result_data; target_fileclass = important_data; # archive 1h after last modification or 6h after last archive condition { last_mod > 1h or last_archive > 6h } } # archive some data to archive_id=2 rule foo_archive { target_fileclass = fooproject; action_params { archive_id = 2; } condition { last_mod > 6h or last_archive > 1d } } rule bar_archive { target_fileclass = barproject; # pass custom arguments to Lustre/HSM copytool action_params { cos = 23; stripe_count = 2; } condition { last_mod > 6h or last_archive > 1d } } }
To run the policy manually on all eligible files (new or modified), execute:
robinhood --run=lhsm_archive --target=all
If you want to restrict your testing to a subset of files, you can specify one of the following arguments for --target option:
- all : run on all filesystem entries (matching the policy scope).
- user:''username'' : run on entries of the given user.
- group:''groupname'' : run on entries of the given group.
- file:''path'' : run on the given entry.
- class:''fileclass'' : run on entries in the given fileclass.
- ost:''ost_idx'' : run on entries stored on the given OST index.
- pool:''poolname'' : run on entries in the given OST pool.
The most common configuration is to run robinhood as a daemon that automatically runs policies. In this case, policy runs are started by triggers.
This defines a simple trigger for 'lhsm_archive' policy, to run it every 15 min:
lhsm_archive_trigger { trigger_on = scheduled; check_interval = 15min; }
There are several types of triggers. See Admin guide: Triggers for more details about policy triggers.
If started with one of the following manners, Robinhood daemon will automatically run the policy at scheduled interval:
- service robinhood start: runs robinhood daemon with the options specified in /etc/sysconfig/robinhood (no option by default, like for the next item).
- robinhood -d: runs robinhood daemon to continuously read changelogs, and regularly check triggers of all policies.
- robinhood --run=all -d: runs a daemon that regularly check triggers for all policies.
- robinhood --run=lhsm_archive -d: runs a daemon that regularly check triggers for 'lhsm_archive' policy only.
This policy allows scheduling release actions for Lustre/HSM.
Robinhood only release entries whose state is 'synchro'. In other words, it only release data which has been archived to the storage backend and not modified since it was archived.
Defining 'lhsm_release' policy is done the same way as 'lhsm_archive' policies, by defining rules with target fileclasses, conditions... The only difference is the name of the block: lhsm_release_rules.
Example:
lhsm_release_rules { # never release very small files ignore { size < 16K } # don't release this data which is in the fowolling fileclasses ignore_fileclass = always_online; ignore_fileclass = boot_images; # don't release small files to quickly to reduce tape library mount operations rule keep_longer { target_fileclass = small_files; condition { last_access > 90d } } # default rule to release other data rule default { condition { last_access > 6h } } }
As any robinhood policy, you need to specify triggers to run policies automatically.
The most common configuration for a HSM system is to release data when space is missing in the top tier which is Lustre. So, for Lustre/HSM the recommended trigger is on OST usage:
lhsm_release_trigger { trigger_on = ost_usage; high_threshold_pct = 95%; low_threshold_pct = 93%; check_interval = 5min; }
With this trigger definition: if an OST gets full (usage > 95%), robinhood will release eligible files from this OST by applying lhsm_release policy rules until the usage of this OST is back to 93%.
In a daemon
As for 'lhsm_archive' policy, you can run 'lhsm_release' policy in a robinhood daemon. This is the case when you run one of the folowing command:
- service robinhood start: runs robinhood daemon with the options specified in /etc/sysconfig/robinhood (no option by default, like for the next item).
- robinhood -d: runs robinhood daemon to continuously read changelogs, and regularly check triggers of all policies.
- robinhood --run=all -d: runs a daemon that regularly check triggers for all policies.
- robinhood --run=lhsm_release -d: runs a daemon that regularly check triggers for 'lhsm_release' policy only.
For one-shot runs, it is highly recommended to specify a limit in applying 'lhsm_release' policy (by specifying '--target-usage', '--max-count' or '--max-volume' arguments), else you would release all archived files of your Lustre filesystem!
# this releases entries until the overall filesystem space usage decreases to 90% robinhood --run=lhsm_release(target-usage=90%)
# this releases entries from OST 22 until its usage is back to 85% robinhood --run=lhsm_release(target=ost:22,target-usage=85%)
# release up to 1000 entries of fileclass 'foo' robinhood --run=lhsm_release(target=class:foo,max-count=1000)
For more details about possible policy run options, see: Admin guide: Command lines to run policies
The purpose of this policy is to clean entries in the archive backend after they have been deleted from Lustre. It makes it possible to control the delay before cleaning the archive of an entry, thus allowing to 'undelete' it if it was deleted by error. This undelete can be done using rbh-undelete command (see Admin guide: rbh-undelete).
This policy is particular because it applies to entries that no longer exists.
However, specifying this policy is very similar to other policies using policy rules, triggers... So you can specify different removal delays for different fileclasses.
A common criteria for policy rules condition is a special attribute rm_time which is the time elapsed since the entry was deleted in the filesystem.
lhsm_remove_rules { # Never remove archived version of entries matching 'important' fileclass, # even if they are deleted from Lustre. ignore_fileclass = important; # keep documents and code files archives 6 months after their removal rule keep_1month { target_fileclass = documents; target_fileclass = code; condition { rm_time > 180d } } # keep other entries during 1 months rule default { condition { rm_time > 30d } } }
This kind of policy is commonly triggered at scheduled interval (e.g. every hour):
lhsm_remove_trigger { trigger_on = scheduled; check_interval = 1h; }
This policy can be run like any other (as a daemon, or command line): refer to examples for 'lhsm_release' policy.
Back to wiki home