Lustre

The Lustre file system, shared with Nautilus (but not Kraken), is available as scratch space at /lustre/medusa/<user-name>. Lustre is a highly-scalable cluster file system. Storage of a given file is distributed (or, striped) across several hardware locations. This allows larger files than could be stored on any one location, also allowing for much faster transfer speeds if access to the file is parallelized.

Lustre Purge Policy

While there is no quota on Lustre, files not accessed in 30 days are subject to purge and we do monitor usage. The scratch file system should not be used for long term storage, and files on scratch are not backed up or guaranteed by NICS. In the event of a file system crash or purge, files in scratch directories cannot be recovered. It is the users’ responsibility to back up all important data to HPSS or other storage.

The purge policies for the scratch systems are listed below. Files are exempt from purge if they have been written to or read within the last 30 days. To find out if files will be purged you can use:

lfs find /lustre/scratch/$USER -atime +30 | xargs ls -l --time=atime --sort=time

Modifying file access times (using "touch" or any other method) for the purpose of circumventing purge policies may result in the loss of access to the scratch file systems. Under special circumstances, users may request a purge exemption by submitting a request in a timely manner that includes detailed justification to help@xsede.org. Please include full path for the files, PI of the project, user requesting exemption, TG-Account, time requested (two weeks etc.), and detailed justification.

Lustre Architecture

Files will be written on Object Storage Targets, or OSTs (disks or RAID arrays with a traditional file system). Each file has a stripe count which defines the number of OSTs it is written on. Each file also has a stripe size – files are written on OSTs in chunks of <stripe size> in a round robin fashion. For example, a 3MB file with a stripe count of 2 and a stripe size of 1MB would be broken up into 3 chunks, two would go on one OST and one on another.

Recommended Usage

There are some detailed tips on the NICS website. Some of this information is less critical at the scale of Keeneland. Essentially: large files or files that use parallel I/O (like MPI-IO or HDF5, etc) should have larger stripe counts, small files and files with a single reader/writer should have small stripe counts.

The lfs commands can be used to get striping information or set striping. New files get the default striping of their parent directory, otherwise, use lfs setstripe to create a new file with non-default settings. You can change the default striping for directories, but this only affects new files, not existing ones. You can not change the striping of an existing file.

Here, (assuming /lustre/medusa/jdoe is empty to begin with), we create two files with non-default striping (note that a stripe size of 0 indicates the file system default), and copy a file to that directory, which will take on the directory's default striping. Then we use lfs getstripe to check the striping of the files.

$ lfs setstripe /lustre/medusa/jdoe/1wide -s 0 -c 1
$ lfs setstripe /lustre/medusa/jdoe/2wide -s 0 -c 2
$ cp dump.out /lustre/medusa/jdoe
$ lfs getstripe /lustre/medusa/jdoe
jdoe
stripe_count:   2 stripe_size:    1048576 stripe_offset:  -1
jdoe/1wide
stripe_count:   1 stripe_size:    0 stripe_offset:  -1
jdoe/2wide
stripe_count:   2 stripe_size:    0 stripe_offset:  -1
jdoe/dump.out
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_stripe_offset:  25
        obdidx           objid          objid            group
            25           56825         0xddf9                0
            19           56834         0xde02                0

To change the striping of dump.out (if it is a large file, it ought to have a greater stripe count), a new file must be created as such. Note that the new copy is not being verified before the old copy is deleted. Scratch file systems are not backed up, so it may be a good idea to verify that copy succeeded (with diff or md5sum for example).

$ cd /lustre/medusa/jdoe
$ lfs setstripe newdump.out -c 4
$ cp dump.out newdump.out
$ rm dump.out
$ mv newdump.out dump.out