- Computing Systems
- Educational Materials
There are several useful tools available on the Keeneland Initial Delivery System (KIDS). In particular, there are tools for debugging programs and performance analysis tools that work with programs that use GPUs. This page highlights a few of those tools. It focuses on third-party tools; see http://keeneland.gatech.edu/software/keeneland for information about software being developed by the Keeneland team. This page does not describe all tools available on KIDS, so please check the /sw/keeneland directory to see what other libraries and tools are available.
Compilers and translators provided on KIDS are described at http://keeneland.gatech.edu/software/compilers.
This page is under construction.
KIDS users can use the module command to manage which packages are available in their software environment. For example, a simple command like
<kidlogin1>$ module load cuda # note that "<kidlogin1>$" is the shell prompt
will update your PATH and LD_LIBRARY_PATH environment variables so that NVIDIA's CUDA program development tools like nvcc can be run without specifying the full path to the executable, and so that CUDA- and OpenCL-based programs can be run on the system. Issuing another simple command
<kidlogin1>$ module unload cuda
removes the CUDA directories from the PATH and LD_LIBRARY_PATH.
When you log into a Keeneland login node, the module command should be available in your environment. That is, you should be able to issue the command
<kidlogin1>$ module list Currently Loaded Modulefiles: 1) modules 5) PE-gnu 9) intel/2011.5.220 2) torque/2.5.7 6) openmpi/1.5.1-gnu 10) subversion/1.6.15 3) moab/6.0.4 7) cuda/4.0 4) gold 8) mkl/2011.5.220
to see which modules are already loaded into your environment.
Another useful command is
<kidlogin1>$ module avail ...lots of output omitted here...
that shows which modules are available to be loaded into your environment. Notice that there may be several versions of a module available on the sysetm at the same time. If you are only interested in seeing which versions are available for a particular package (e.g., CUDA), try
<kidlogin1>$ module avail cuda -------------------------- /sw/keeneland/modulefiles --------------------------- cuda/3.1 cuda/3.2RC cuda/4.0RC2 cuda/3.2(default) cuda/4.0
For more information about the module command, see http://www.nics.tennessee.edu/user-support/general-support/modules and http://modules.sourceforge.net/.
DDT is a debugger for serial and parallel programs (e.g., multi-node MPI programs). The version of DDT installed on KIDS also supports debugging of CUDA programs. It allows single-stepping through CUDA kernels as they run on a GPU and examining data in GPU memory. Using the DDT graphical user interface (GUI), you can submit a debugging job to the KIDS batch queue. When the job starts to run, it connects back to the GUI so you can debug interactively.
Be sure you build your CUDA programs with debug and CUDA debug information. Pass both the -g and -G flags to nvcc when compiling your program.
There are some caveats with DDT on KIDS. First, the current default version supports CUDA 3.2 only. DDT cannot debug more than one CUDA process per node (a limitation imposed by the NVIDIA driver). However, DDT does support debugging a process that uses multiple GPUs.
For more information about DDT, see http://www.allinea.com/products/ddt-with-cuda/.
NVIDIA provides support for debugging CUDA programs using the well-known GNU debugger gdb. The cuda-gdb debugger is most appropriate for debugging programs with a single process (though perhaps many threads). It is also useful for attaching to a single process that is part of a larger MPI-based program running on multiple nodes. For debugging an entire MPI-based parallel program, using DDT is a better choice.
cuda-gdb is installed as part of the CUDA Toolkit. On KIDS, the CUDA Toolkit is installed under /sw/keeneland/cuda/version/linux_binary, where version is something like 4.0.
For more information about cuda-gdb, see http://developer.nvidia.com/cuda-gdb.
The Tuning and Analysis Utilities (TAU) provide support for collecting and analyzing performance data from serial and parallel programs, including support for programs that use CUDA.
TAU is installed under /sw/keeneland/tau. However, a
<kidlogin>$ module load tau
will configure your environment to use the default version of TAU.
For more information about using TAU, see http://tau.uoregon.edu.
NVIDIA Visual Profiler
As part of the CUDA Toolkit, NVIDIA provides a profiling tool called Visual Profiler that collects performance data while a CUDA or OpenCL program runs and then analyzes the data to provide suggestions about how to improve performance. The Compute Visual Profiler uses performance counters available on the GPUs in KIDS.
The Visual Profiler is designed to work with single process programs. If you are doing performance diagnosis on a parallel program running on multiple nodes of KIDS, TAU is a better choice.
The Visual Profiler is installed as part of the CUDA Toolkit, but the executable is not in the Toolkit's bin directory. Rather, the profiler executable, documentation, and support files are in the computeprof directory at the root of the CUDA Toolkit directory tree. For instance, on KIDS the Visual Profiler for CUDA 4.0 is located at /sw/keeneland/cuda/4.0/linux_binary/computeprof/bin/computeprof.
For more information on the NVIDIA Visual Profiler, see http://developer.nvidia.com/nvidia-visual-profiler.