There are several useful tools available on Keeneland.  In particular, there are tools for debugging programs and performance analysis tools that work with programs that use GPUs.  This page highlights a few of those tools.  It focuses on third-party tools; see for information about software being developed by the Keeneland team. This page does not describe all tools available on Keeneland, so please check the /sw/keeneland directory to see what other libraries and tools are available.

Compilers and translators provided on Keeneland are described at


This page is under construction.


Keeneland users can use the module command to manage which packages are available in their software environment. For example, a simple command like

    <kidlogin1>$ module load cuda    # note that "<kidlogin1>$" is the shell prompt

will update your PATH and LD_LIBRARY_PATH environment variables so that NVIDIA's CUDA program development tools like nvcc can be run without specifying the full path to the executable, and so that CUDA- and OpenCL-based programs can be run on the system.  Issuing another simple command

    <kidlogin1>$ module unload cuda

removes the CUDA directories from the PATH and LD_LIBRARY_PATH.

When you log into a Keeneland login node, the module command should be available in your environment.  That is, you should be able to issue the command

    <kidlogin1>$ module list
    Currently Loaded Modulefiles:
      1) modules             5) PE-gnu              9) intel/2011.5.220
      2) torque/2.5.7        6) openmpi/1.5.1-gnu  10) subversion/1.6.15
      3) moab/6.0.4          7) cuda/4.0
      4) gold                8) mkl/2011.5.220

to see which modules are already loaded into your environment.

Another useful command is

    <kidlogin1>$ module avail
    ...lots of output omitted here...

that shows which modules are available to be loaded into your environment.  Notice that there may be several versions of a module available on the sysetm at the same time.  If you are only interested in seeing which versions are available for a particular package (e.g., CUDA), try

    <kidlogin1>$ module avail cuda

    -------------------------- /sw/keeneland/modulefiles ---------------------------
    cuda/3.1          cuda/3.2RC        cuda/4.0RC2
    cuda/3.2(default) cuda/4.0

For more information about the module command, see

Allinea DDT

DDT is a debugger for serial and parallel programs (e.g., multi-node MPI programs).  The version of DDT installed on Keeneland also supports debugging of CUDA programs.  It allows single-stepping through CUDA kernels as they run on a GPU and examining data in GPU memory.  Using the DDT graphical user interface (GUI), you can submit a debugging job to the Keeneland batch queue.  When the job starts to run, it connects back to the GUI so you can debug interactively.

Be sure you build your CUDA programs with debug and CUDA debug information.  Pass both the -g and -G flags to nvcc when compiling your program.

There are some caveats with DDT on Keeneland.  First, the current default version supports CUDA 3.2 only.  DDT cannot debug more than one CUDA process per node (a limitation imposed by the NVIDIA driver).  However, DDT does support debugging a process that uses multiple GPUs.


NVIDIA provides support for debugging CUDA programs using the well-known GNU debugger gdb.  The cuda-gdb debugger is most appropriate for debugging programs with a single process (though perhaps many threads).  It is also useful for attaching to a single process that is part of a larger MPI-based program running on multiple nodes.  For debugging an entire MPI-based parallel program, using DDT is a better choice.

cuda-gdb is installed as part of the CUDA Toolkit.  On Keeneland, the CUDA Toolkit is installed under /sw/keeneland/cuda/version/linux_binary, where version is something like 4.0.

For more information about cuda-gdb, see


The Tuning and Analysis Utilities (TAU) provide support for collecting and analyzing performance data from serial and parallel programs, including support for programs that use CUDA.

TAU is installed under /sw/keeneland/tau.  However, a 

    <kidlogin>$ module load tau

will configure your environment to use the default version of TAU.

For more information about using TAU, see

NVIDIA Visual Profiler

As part of the CUDA Toolkit, NVIDIA provides a profiling tool called Visual Profiler that collects performance data while a CUDA or OpenCL program runs and then analyzes the data to provide suggestions about how to improve performance.  The Compute Visual Profiler uses performance counters available on the GPUs in Keeneland.

The Visual Profiler is designed to work with single process programs.  If you are doing performance diagnosis on a parallel  program running on multiple nodes of Keeneland, TAU is a better choice.

The Visual Profiler is installed as part of the CUDA Toolkit, but the executable is not in the Toolkit's bin directory.  Rather, the profiler executable, documentation, and support files are in the computeprof directory at the root of the CUDA Toolkit directory tree.  For instance, on Keeneland the Visual Profiler for CUDA 4.0 is located at /sw/keeneland/cuda/4.0/linux_binary/computeprof/bin/computeprof.

For more information on the NVIDIA Visual Profiler, see