There are many useful libraries available on the Keeneland Initial Delivery System (KIDS).  In addition to those typically used for developing on a traditional Linux system (commonly in /lib64 or /usr/lib64, with headers under /usr/include), KIDS provides libraries that implement numerical and data management operations using both GPUs and CPUs.  KIDS also provides libraries for parallel I/O support.  This page highlights several of these libraries. It focuses on third-party libraries; see for information about software being developed by the Keeneland team.  This page does not describe all libraries available on KIDS, so please check the /sw/keeneland directory to see what other libraries and tools are available.


Thrust is a C++ template library that provides implementations of several data structures and algorithms that run on CUDA-capable GPUs.  In particular, it provides implementations of data containers such as vectors, and algorithms that operate on data in those containers such as sort and reduction.   In that respect, Thrust is similar to the C++ Standard Template Library (STL) that provides containers and algorithms implemented for CPUs.  In fact, Thrust containers are compatible with STL containers.  Thrust containers also implement automatic memory allocation/deallocation.

For example, the following code fragment allocates a vector of integers in host memory, initializes it to random values, and then transfers it to a vector in the GPU's memory.

thrust::host_vector<int> hvec(16384);
thrust::generate(hvec.begin(), hvec.end(), rand);
thrust::device_vector<int> dvec = hvec;

Once there, the data can be manipulated by applying concise Thrust algorithms (a sort, in this case).

thrust::sort(dvec.begin(), dvec.end());

Finally, the transformed data can be brought back to host memory with another concise Thrust operation.

thrust::copy(dvec.begin(), dvec.end(), hvec.begin());

Thrust is open source software.  Beginning with CUDA 4.0, Thrust is included with the CUDA Toolkit.  On KIDS, CUDA Toolkits for all installed versions are found under /sw/keeneland/cuda/version/linux_binary, where version is something like 4.0.

For more information on Thrust, see

cuBLAS, cuFFT, cuSPARSE, cuRAND, and NPP

cuBLAS is a library that provides CUDA-based implementations of many of the Basic Lineary Algebra Subprograms (BLAS).  Because the routines in cuBLAS provide the well-defined BLAS interface, programs that are currently using a CPU-based BLAS library can be re-linked against CUBLAS to execute on CUDA GPUs.

cuFFT is a GPU-accelerated FFT library. 

cuSPARSE is a GPU-accelerated library that provides CUDA-based implementations of common sparse matrix operations.

cuRAND is a GPU-accelerated library for producing streams of pseudo-random numbers.

NPP is a library that provides CUDA-based implementations of useful video, image, and signal processing operations.

These libraries are produced by NVIDIA and included with the CUDA Toolkit.  On KIDS, CUDA Toolkits for all installed versions are found under /sw/keeneland/cuda/version/linux_binary, where version is something like 4.0.


HDF5 and netCDF are I/O libraries tailored for reading and writing structured data (e.g., multi-dimensional arrays) to a file system.  Both HDF5 and netCDF are available on KIDS, in /sw/keeneland/hdf5 and /sw/keeneland/netcdf, respectively. Currently, the netCDF installation provides support for serial I/O only (i.e., pnetCDF is not yet installed).  HDF5, being implemented on top of MPI-IO, supports parallel I/O as well as serial I/O.  Finally, the MPI implementations available on KIDS support programs that directly use the MPI-IO interface.