- Computing Systems
- Educational Materials
Quick Start Guide
- System Overview
- Getting a NICS Account
- Getting Help
- Logging In
- Configuring your Environment
- File Systems and Storage
- Software Development
- Running Jobs
- Known Problems
The Keeneland Initial Delivery (KID) system, which was delivered in October 2010. It is composed of an HP SL-390 (Ariston) cluster with Intel Westmere hex-core CPUs, NVIDIA 6GB Fermi GPUs, and a Qlogic QDR InfiniBand interconnect. Each node has two hex-core CPUs and 3 GPUs, with a total of 120 nodes, 240 CPUs and 360 GPUs.
Jobs are charged like so:
1 node-hr = 16 (KFS) CPU-hrs = 3 GPU-hrs = 3 SUs.
Please see Getting Access to KIDS for details on getting an account.
Once you have an account, you will be added to the Keeneland Users mailing list. System-wide announcements will broadcast to this list.
Please direct any questions to email@example.com. To ensure your question gets routed correctly, please include "Keeneland" in the subject line.
To login to the KID system, SSH to kids.gatech.xsede.org using your NICS account as your username and your 'PIN+token code' for your PASSCODE.
On Keeneland, modules are used to manage the environment, for example, changing
LD_LIBRARY_PATH to use different applications or libraries. Of particular note are the
PE- modules which affect the compiler vendor. Modules for libraries often check this module to determine what version of the library to use.
For more information, including a list of commands, see Modules.
Each user is provided with a home directory to store frequently used items such as source code, binaries, and scripts. Home directories are shared among all NICS resources, for more information see NICS Home Directories.
Groups may also request NFS project directories. These are intended to share files among a group, see NICS Project Directories.
Scratch directories are on a parallel file system, intended to provide high performance access to temporary input and output files. There is no quota, however, files that have not been accessed in 30 days may be purged. Scratch directories on Keeneland are provided at:
For more information, see Lustre.
As with most other software on the Keeneland system, the software development toolchain packages are managed using modules. See the modules section for more information.
There are several compilers available on the Keeneland ID system: Intel, GNU, and PGI.
The GNU compilers are installed in system default locations, and thus are always in the user's
PATH, though the
PE-gnu module is required in order for
mpicc to use
Note that only certain versions of the PGI compilers support PGI accelerator directives and CUDA Fortran.
We have CUDA and NVIDIA GPU Computing SDK on the system.
There are a few MPI implementations available on the Keeneland system: OpenMPI, MVAPICH2, and MPICH2.
Select one of these MPI implementations using a command like
module load openmpi/1.5.1-intel.
Note that there are also MPI implementations installed as part of the Open Fabrics Enterprise Distribution (OFED) software stack on the Keeneland ID login nodes in directories under
/usr/mpi. These installations will not work correctly on the Keeneland ID system, because they have not been built to be integrated with the resource management software used on the Keeneland system (i.e., Torque).
See Running Jobs for information about launching MPI-based programs from batch jobs.
We have subversion and git on the system.
Keeneland uses Torque (an open source PBS derivative) as its batch queue software, with the Moab scheduler, similar to other systems at NICS. There are some important differences, as described below. Here's an example batch queue script (see the notes afterward for some explanation). This assumes that you have set up the modules in your
.bash_profile as described in the Modules section of this document.
#!/bin/sh #PBS -N kiat-imb #PBS -j oe #PBS -A UT-TENN0000 ### Unused PBS options ### ## If left commented, must be specified when the job is submitted: ## 'qsub -l walltime=hh:mm:ss,nodes=2:ppn=12:gpus=3:shared ' ## ##PBS -l walltime=00:30:00 ##PBS -l nodes=2:ppn=12:gpus=3:shared ### End of PBS options ### date cd $PBS_O_WORKDIR echo "nodefile=" cat $PBS_NODEFILE echo "=end nodefile" # run the program which mpirun mpirun --mca mpi_paffinity_alone 1 /bin/hostname date # eof
The scheduler is set up to give exclusive access to nodes, so there should be no need to add a flag (like "
-l naccesspolicy=singletask") to ensure each job gets its node to itself.
-Sparameter to PBS is required if you want to use a shell other than bash. Adding something like
#!/bin/kshin the first line is not enough to choose a different shell.
- If you write batch scripts for another shell than bash, you must be sure that the module setup has been done as described in Modules.
- If you are sharing your script with anyone else you must be sure that everyone who uses your script has done this setup. Since this is a burden and error prone, you might want to do the module setup explicitly in the batch script if you are using a non-bash shell for your batch scripts.
- The account number is required. The account number is the same number as the project(s) to which your NICS account is tied to.
If you have your environment set up correctly, and are using the OpenMPI from
/sw/keeneland/openmpi/1.5.1-intel(check the output of the 'which mpirun' from running this script, you should not need to pass either '
-np 2' or '
-hostfile $PBS_NODEFILE' to the
mpiruncommand. If your
mpirun's don't work, it may be that your environment is trying to use the wrong
mpirunthat was not built with Torque integration
This job script does not hard-code the number of nodes or processes per node, so these need to be specified on the command line. If you wish to specify this in the batch script, add:
#PBS -l walltime=hh:mm:ss,nodes=2:ppn=12:gpus=3:shared
See NUMA for information on the
OpenMPI has optional NUMA support. It has to be built into OpenMPI at compile time, and isn't configured to do so by default. If it isn't built in, we have been using
mpirun to start shell scripts that attempt to use numactl to control process and memory placement. If it is built in, the mechanism is much simpler: pass '
--mca mpi_paffinity_alone 1' to the
mpirun when you start your program, and don't use a separate script.
Check if the OpenMPI you are using has NUMA built in with:
ompi_info | grep affinity
If it is built into the OpenMPI you are using, there will be lines like:
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.3) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.3) MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.3)
Submit jobs with the
qsub command. This example submits a job which uses 4 processes per node on 12 nodes for 30 minutes:
qsub -l walltime=00:30:00,nodes=2:ppn=12:gpus=3:shared kiat-imb.ksh
There are several queues defined for the ID system. The output of
qstat -q shows the queues, their restrictions, and their state (e.g., 'enabled', 'running'):
kidlogin1.nics.utk.edu$ qstat -q server: kidserv1.nics.utk.edu Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- serial -- -- 48:00:00 1 0 0 -- E R hpss -- -- 48:00:00 -- 0 0 -- E R capability -- -- 48:00:00 110 0 0 -- E S parallel -- -- 48:00:00 60 0 0 -- E R dmover -- -- 48:00:00 -- 0 0 -- E R batch -- -- -- -- 0 0 -- E R ----- ----- 0 1
qstat -a' (or simply '
qstat' if you prefer the default output) to see jobs in the batch queues.
qstat -f' to see full information about all jobs, '
qstat -f id' to see full information about job with id 'id'
The Moab command '
showq' shows the scheduler's view of the queues. Both '
qstat -f' and '
showq' complement each other to tell you why your job isn't running.
The Moab command '
checkjob id' can also help troubleshoot problems with job 'id'.
There is currently no need to specify a queue in the current configuration. Jobs get placed in a specific queue based on their size.
- Specifically, the 'batch' queue is a gateway queue. Submit to the batch queue, and Torque figures out which other queue that the job belongs in. Empirically, it seems like if the job requests 1 node, it ends up in the 'serial' queue. If it requests between 1 and 72 nodes, it ends up in the parallel queue. If more than 72, it ends up in the capability queue. The hpss and dmover queues are special-purpose, and you should never choose these.
With the PBS options in the example batch script above (specifically
-j oe), the output of job with id <id> will go into a single file named
kiat-imb.o<id>after the run completes.
As the queue software is currently configured, the temporary output of a job is available in a file named something like
<id>.kidserv1.nics.utk.edu.OUin the directory from which the
qsubwas done. If you want to keep an eye on a job as it runs, you can
If you are running from an NFS file system, when
tailcomplains about a stale NFS handle, the job is done and the same output will be available in the
.o<id>file described above.
If you are running from a GPFS file system (recommended), the system produces no such warning message when the job completes, though the
- If you are running from an NFS file system, when
When you compile, you may see the following warning from the Intel compiler:
/opt/intel/Compiler/11.1/073/lib/intel64/libimf.so: warning: warning: feupdateenv is not implemented and will always fail.This warning seems to be benign unless you are using fenv functions from C99. See the Intel forums for more discussion about this issue. (Note: adding "
-shared-intel" avoids this warning, but causes your executable to use the shared object versions of the Intel libraries.)