Compiling
Programming environments
The Keeneland System uses modules to control the programming environment. For instance, there are compilers from several different vendors, each of which may have several releases available. Modules generally interact with your environment variables (eg: $PATH
) to choose which compiler is being used, and which libraries are being linked in.
By default, several modules are already loaded:
$ module list Currently Loaded Modulefiles: 1) modules 2) torque/2.5.11 3) moab/6.1.5 4) gold 5) mkl/2011_sp1.8.273 6) intel/2011_sp1.8.273 7) openmpi/1.5.1-intel 8) PE-intel 9) cuda/4.1
The PE-intel
module sets up the environment to use Intel compilers (icc
and ifort
). The module show
command lists all the actions that loading this module will make: in this case, it checks to see whether another programming environment is already loaded, and quits if that is the case. Otherwise, it loads the default version of the Intel compilers, and the version of OpenMPI intended to work with Intel, and sets some generic environmental variables to be used in Makefiles (ie: $CC
should always refer to the current C compiler):
$ module show PE-intel ------------------------------------------------------------------- /opt/modulefiles/PE-intel: conflict PE-pgi PE-gnu PE-intel module load intel module load openmpi/1.5.1-intel setenv CC icc setenv CPP icc -E setenv CXX icpc setenv FC ifort setenv F77 ifort setenv F90 ifort -------------------------------------------------------------------
In order to change compilers, use module swap
. Third party libraries may need to be reloaded after this as they often check which programming environment is loaded and set paths accordingly. This changes from Intel compilers to PGI:
$ module swap PE-intel PE-pgi
Notes:
- Some third party software (notably DDT) does not work with the new version of CUDA yet, it may still be advisable to give that version a try.
-
The default version of
gcc
without modules is 4.1.2.User may load other newer version ofgcc
.
Compiling for CPU
In order to compile non-MPI, non-CUDA code, the compilers may be called directly. Additional documentation is also provided in the man pages for each compiler, for example man gcc
.
C | C++ | Fortran | |
---|---|---|---|
Generic |
$CC |
$CXX |
$FC, $F77, $F90 |
GNU |
gcc, gcc44 |
g++, g++44 |
gfortran, gfortran44 |
Intel |
icc |
icpc |
ifort |
PGI |
pgcc |
pgCC |
pgfortran, pgf90, pgf77 |
Notes:
Warning: Optimizing with ifort -fast
may currently fail because this requires a static compilation, and there is currently no static version of libnuma
available. It is possible to compile with all the options contained in -fast
besides -static
:
$ ifort -xHOST -03 -ipo -no-prec-div
Compiling MPI
There are two major MPI implementations available via modules. OpenMPI is loaded by default, and MVAPICH2 may be used if preferred. In either case, the MPI compiler wrappers should be used (given in table below). If the environment is set properly, these should call the chosen compiler and link against all the necessary MPI libraries. In order to check what these are doing, pass the -dryrun
option, in which case the wrapper will print all of the commands it would normally run.
C | C++ | Fortran |
---|---|---|
mpicc |
mpicxx |
mpif90, mpif77 |
CUDA
CUDA programs and objects should be compiled with nvcc
. This will generate code for the GPU, and it will also make calls to a C compiler to generate code for the CPU. To see the calls to the C compiler, you can use the flag -dryrun
.
nvcc deviceQuery.cu -dryrun
By default, gcc
is used to compile the C code. If another compiler is preferred, specify that with the option -ccbin
flag. For example, in order to use Intel's compiler, enter the following:
nvcc -ccbin icc deviceQuery.cu
For some examples of working Makefiles, check out nVIDIA CUDA SDK or try some of the MAGMA test cases.
-
In addition, the modules should define
$CPP
to be the generic C pre-processor -
icc
will compile C or C++ depending on the suffix of the source code,icpc
is the same compiler, but it forces C++ - Similarly, the Fortran compilers may have several different names, corresponding to the Fortran 77 or Fortran 90 standards. Refer to the man pages for more specifics.
For a trivial example of how to compile and run CUDA, consider helloworld.cu:
#include#include __global__ void helloworld() { printf("Hello World!\n"); } int main(void) { helloworld<<< 1, 32 >>>(); cudaDeviceSynchronize(); return 0; }
Compile like so:
nvcc -arch=sm_21 helloworld.cu
You would not even need the -arch=sm_21 were it not for the printf statement inside device code!
Run like so:
./a.out
You will see Hello World! 32 times