This is an old revision of the document!
Table of Contents
How to Compile CP2K
Prerequisites
You need the following before you can compile CP2K
- MPI (version 2) and SCALAPACK (optional, required for parallel version)
- libxc (optional, for exchange-correlation functionals used in QUICKSTEP)
- FFTW (optional)
- libint (optional, for Hartree-Fock exchange)
- libsmm (optional)
- Library ELPA (optional, replaces SCALAPACK
SYEVD
if used) - CUDA (optional, for utilising GPUs)
- Machine architecture abstraction support (optional)
- Process mapping support (optional)
Obtaining a copy of CP2K source
To obtain a copy of CP2K source, please follow the instructions given here.
GNU make
GNU make should be on your system (gmake or make on Linux) and it is used for the build. If you do not have it, go to
http://www.gnu.org/software/make/make.html
and download from
Fortran 95 Compiler
A Fortran 95 compiler should be installed on your system. We have good experience with gfortran 4.4.X and above. Be aware that some compilers have bugs that might cause them to fail (internal compiler errors, segfaults) or, worse, yield a miscompiled CP2K.
Please report bugs to compiler vendors; they (and we) have an interest in fixing them.
yacc
yacc
is needed to compile the dependency generator. It can be found as a part of the GNU bison package:
BLAS and LAPACK
BLAS and LAPACK linear algebra libraries should be installed. Using vendor-provided libraries can make a very significant difference (up to 100%, e.g., ACML, MKL, ESSL).
Note that the BLAS/LAPACK libraries must match the Fortran compiler used.
Use the latest versions available and download all patches! The canonical BLAS and LAPACK can be obtained from the Netetlib repository.
- http://www.netlib.org/lapack/ and see also
A faster alternative is to use the ATLAS project. It provides BLAS and enough of LAPACK to run CP2K, both optimized for the local machine upon installation: http://math-atlas.sourceforge.net/
GotoBLAS is yet a faster BLAS alternative: http://www.tacc.utexas.edu/resources/software/
If compiling with OpenMP support then it is recommended to use a non-threaded version of BLAS.
Note for AMD Bulldozer owners:
- Intel compilers DO NOT support FMA4 as Intel CPUs implement FMA3
- For this reason, compiling against Intel compilers (including MKL)in AMD Bulldozer machines could result in performance loss.
MPI and SCALAPACK
MPI (version 2) and ScaLAPACK are needed for parallel code (popt
and psmp
versions). Use the latest versions available and download all patches!
If your computing platform does not provide MPI, there are several freely available alternatives:
- MPICH2 MPI: http://www-unix.mcs.anl.gov/mpi/mpich/
- OpenMPI MPI: http://www.open-mpi.org/
ScaLAPACK can be part of ACML or cluster MKL. These libraries are recommended if available.
Canonical ScaLAPACK can be obtained from
http://www.netlib.org/scalapack/
and see also http://www.netlib.org/lapack-dev/
Recently a ScaLAPACK installer has been added that makes installing ScaLAPACK easier: http://www.netlib.org/scalapack/scalapack_installer.tgz
Exchange-Correlation Functionals Library
The version 2.0.1 (ONLY this one) of libxc needs to be downloaded from
http://www.tddft.org/programs/octopus/wiki/index.php/Libxc
and installed (to $LIBXC_DIR
, what ever which directory it may be).
During the installation, the directory $LIBXC_DIR/lib
is created. Add the preprocessor flag
-D__LIBXC2 |
to DFLAGS
, and
-L$(LIBXC_DIR)/lib -lxc |
to LIBS
in your arch file.
Fast Fourier Transform Library
FFTW can be used to improve FFT speed on a wide range of architectures.
It is strongly recommended to install and use FFTW3. The current version of CP2K works with FFTW 3.X:
Note that FFTW must know the Fortran compiler you will use in order to install properly (e.g., export F77=gfortran
before configure if you intend to use gfortran). Note that on machines and compilers which support SSE you can configure FFTW3 with –enable-sse2
.
Compilers/systems that do not align memory (NAG f95, Intel IA32/gfortran) should either not use –enable-sse2
or otherwise add
-D__FFTW3_UNALIGNED |
to DFLAGS
in the arch file.
When building an OpenMP parallel version of CP2K (ssmp
or psmp
), the FFTW3 threading library libfftw3_threads
(or libfftw3_omp
) is required. These can be generated using the –enable-threads
and –enable-openmp
flags during configuration of FFTW3.
When using FFTW3 library, add
-D__FFTW3 |
to DFLAGS
, and link to the appropriate libraries in your arch file.
Hartree-Fock Exchange
Hartree-Fock exchange (optional) requires the libint package to be installed.
It is easiest to install with a Fortran compiler that supports ISO_C_BINDING and Fortran procedure pointers (recent gfortran, xlf90, ifort).
Additional information can be found in
cp2k/tools/hfx_tools/libint_tools/README_LIBINT
Tested against libinit-1.1.4 and currently hardcoded to the default angular momentum
LIBINT_MAX_AM 5
(check your include/libint/libint.h
to see if it matches)
http://www.chem.vt.edu/chem-dept/valeev/software/libint/libint.html
Note, do NOT use libinit-1.1.3.
When using libint library, add
-D__LININT |
to DFLAGS
, and link to the appropriate libraries in your arch file.
Small Matrix Multiplication Library
A library for small matrix multiplications comes with the CP2K package. This library, if built and used with CP2k, should allow significant speedups (depending on the problem and your machine) to your calculations.
The library can be built from the included source:
cp2k/tools/build_libsmm
See the README file inside the build_libsmm
directory.
Usually only the double precision real and perhaps complex is needed. Add the following to DFLAGS
in your arch file
-D__HAS_smm_dnn | to make the code use the double precision real library |
-D__HAS_smm_snn | to make the code use the single precision real library |
-D__HAS_smm_znn | to make the code use the double precision complex library |
-D__HAS_smm_cnn | to make the code use the single precision complex library |
-D__HAS_smm_vec | to enable the new vectorized interface of libsmm |
Library ELPA
This is an alternative library to ScaLAPACK for the solution of eigenvalue problems. A version of ELPA can be downloaded from
http://elpa.rzg.mpg.de/software
ELPA replaces the ScaLAPACK SYEVD
to improve the performance of the diagonalization. For specific architectures it may be better to install specifically optimized kernels and/or employ a higher optimization level to compile it.
During the installation, the libelpa.a
(or libelpa_mt.a
if multi-thread support is enabled) is created. We tested the version of November 2013, with generic kernel and with/without OpenMP.
To use ELPA
-D__ELPA |
to DFLAGS
and
-L$(ELPA_DIR)/lib -lxc |
to LIBS
in your arch file.
CUDA Support
This is still experimental.
Add
-D__DBCSR_CUDA |
to DFLAGS
in your arch file to compile with CUDA support for matrix multiplication. For linking, add
-lcudart -lrt |
to LIBS
in your arch file. The compiler must support ISO_C_BINDING.
Use
-D__PW_CUDA |
in DFLAGS
for CUDA support for PW (gather/scatter/fft) calculations. The Fortran compiler must use an appended underscore for linking C subroutines.
USE
-D__CUDA_PROFILING |
in DFLAGS
to turn on NVIDIA Tools Extensions.
Consult cp2k/cuda_tools/README
in the CP2K source for more information.
Machine Architecture Abstraction Support
Still under development
Add
-D__HWLOC |
or
-D__LIBNUMA |
to DFLAGS
to compile with hwloc
or libnuma
support for machine architecture and process/thread/memory placement and visualization. It is necessary to link with
-lhwloc |
or
-lnuma |
The compiler must support ISO_C_BINDING.
Machine architecture visualization is supported only with hwloc
. Process/threads/memory placement and visualization is supported by both hwloc
and libnuma
.
Note that it is not possible to use at same time hwloc
and libnuma
.
Consult cp2k/machine/README
in CP2K source for more information.
Process Mapping Support
Still under development
Use the target machine to compile with topology support.
You can also define the strategy to be used using a command line, with
-mpi-mapping [1,2,3,4,5,6,7]
1
= SMP-style rank ordering2
= file based rank ordering3
= Hilbert space-filling curve4
= Peano space-filling curve5
= Round-Robin rank ordering6
= Hilbert-Peano space-filling curve7
= Cannon pattern mapping
The compiler must support ISO_C_BINDING
Consult cp2k/machine/README
in CP2K source for more information.
Compiling the Code
I am Feeling Lucky
The “I'm feeling lucky” version of building will try to guess what architecture you are on. Just type
make sopt
in cp2k/makefiles
, and the script cp2k/tools/get_arch_code
will try to guess your architecture. You can set the FORT_C_NAME
to indicate the compiler part of the architecture string:
export FORT_C_NAME=gfortran
If you are not feeling lucky… Or you want to know exactly what you are doing when compiling CP2K, and what options are available, please read on.
The arch File
The locations of the compilers and libraries needs to be specified, together with compilation options, in an “arch” file in cp2k/arch
of CP2K source. Examples for a number of common architectures is already available in the directory (e.g., Linux-x86-64-gfortran.sopt
).
Conventionally, there are four versions:
sopt
= serial versionpopt
= parallel, MPI only version — recommended for general usagessmp
= parallel, OpenMP only versionpsmp
= parallel, MPI + OpenMP
You will need to modify one of these files to match your system's settings.
Compilation Commands
After you have finished creating or editing your own arch file in cp2k/arch
, you can build CP2K in directory cp2k/makefiles
using the following commands:
make -j N ARCH=architecture VERSION=version
where -j N
allows for a parallel build using N
processes; architecture
corresponds to the root-name of your arch file, and version is one of sopt
, popt
, ssmp
or psmp
.
For example, if you have created foo.sopt
in cp2k/arch
, then in cp2k/makefiles
, you type in the command:
make -j 4 ARCH=foo VERSION=sopt
to compile (with 4 processes in parallel) the serial version of CP2K, with compilers, libraries and options specified in the file cp2k/arch/foo/sopt
.
As a short-cut, you can build several version of the code at onece:
make -j N ARCH=architecture sopt popt ssmp psmp
provided you have the corresponding arch files already in place.
After a successful compilation, an executable should appear in cp2k/exe/*
All compiled files, libraries, executables, .. of all architectures and versions can be removed with
make distclean
in cp2k/makefiles
.
To remove only objects and mod files (i.e. keep exe) for a given ARCH/VERSION, use
make ARCH=architecture VERSION=version clean
To remove everything for a given ARCH/VERSION use:
make ARCH=architecture VERSION=version realclean
DFLAGS Options
The following flags should be present (or not) in the arch file:
For parallel versions
-D__parallel |
-D__BLACS |
-D__SCALAPACK |
If using libint (needed for HF exchange)
-D__LIBINT |
For libxc (needed by QUICKSTEP DFT calculations)
-D__LIBXC |
If using ELPA in space of ''SYEVD'' to solve eigenvalue problems
-D__ELPA |
Various FFTs
-D__FFTSG | Stefan Goedecker FFT (should always be there) |
-D__FFTW3 | FFTW version 3 |
-D__PW_CUDA | CUDA FFT and associated gather/scatter on the GPU |
Various compilers/architectures needing their own machine_* file
-D__NAG | if using NAG F95 compiler |
-D__AIX | if using AIX compiler |
-D__ABSOFT | if using ABSoft compiler |
-D__PGI | if using PGI compiler |
-D__INTEL | if using Intel compiler |
-D__GFORTRAN | if using GNU gfortran compiler |
-D__G95 | if using g95 compiler |
-D__SX | |
-D__DEC | |
-D__XT3 | if compiling on a XT3 machine |
-D__XT5 | if compiling on a XT5 machine |
Various network interconnections
-D__GEMINI | if Gemini interconnect is used in the cluster |
-D__SEASTAR | if SeaStar interconnect is used in the cluster |
-D__BLUEGENE | if BlueGene interconnect is used in the cluster |
-D__NET |
Specific optimized core routines can be selected with
-D__GRID_CORE=X |
with X
=1..6. Reasonable defaults are provided (see cp2k/src/lib/collocate_fast.F
) but trial-and-error might yield (a small ~10%) speedup.
Tuned versions of integrate and collocate routines can be generated using
-D__HAS_LIBGRID |
and -L/path/to/libgrid.a
in LIBS
. See cp2k/tools/autotune_grid/README
for details.
-D__PILAENV_BLOCKSIZE=1024 |
or similar is a hack to overwrite (if the linker allows this) the PILAENV
function provided by ScaLAPACK. This can lead to much improved PDGEMM
performance. The optimal value depends on hardware (GPU?) and precise problem.
Options controlling MPI behavior and capabilities
-D__NO_MPI_THREAD_SUPPORT_CHECK | Workaround for MPI libraries that do not declare they are thread safe but you want to use them with OpenMP anyways. |
-D__NO_MPI_MEMORY | Do not use MPI memory allocation/deallocation routines |
Options on language features
CP2K currently assumes full Fortran 95 compliance and expects the ISO_C_BINDING module of Fortran 2003 to be present, which commonly is available even in current compilers. For OpenMP, version 3.0 is assumed.
If you get compilation errors about unsupported language features, then some flags may be used to reduce the language features required.
In addition, some flags are used to declare compiler support for additional language features.
Subparts of Fortran 2003 or later that help various aspects of the code:
-D__PTR_RANK_REMAP | compiler supports pointer rank remapping |
-D__HAS_NO_ISO_C_BINDING | compiler does not support all needed ISO_C_BINDING features. (At least g95 0.91 silently fails with segfaults since it does not support C_F_POINTER.) |
Other language capabilities and support:
-D__HAS_NO_OMP_3 | CP2K assumes that compilers support OpenMP version 3. If this is not the case, specify this flag to compile. Runtime performance will be poorer on low number of processors. |
-D__CRAY_POINTERS | Compiler supports CRAY pointers |
-D__HAS_NO_CUDA_STREAM_PRIORITIES | Needed for CUDA sdk version < 5.5 |
Additional esoteric, development and debugging options
This section can be safely skipped over. Listed here just for completeness besides the flags described in this document.
-D__NO_STATH_ACCESS | Do not try to read from /proc/self/statm to get memory usage information. This is otherwise attempted on several Linux-based architectures or using with the NAG, gfortran, compilers. |
-D__mp_timeset__ | Timing of MPI routines. |
-D__USE_LEGACY_WEIGHTS | Use legacy atomic weights |
-D__NO_ASSUMED_SIZE_NOCOPY_ASSUMPTION | Do not assume that assumed-size dummy arguments will always be passed in by reference. Unless the ISO_C_BINDING is present, CP2K will not compile with this option. |
-D__cray_pointers | CRAY pointers will be used in preference to the ISO_C_BINDING call to MPI_ALLOC_MEM |
-D__PLASMA | PLASMA support for DBCSR (neglected, may not work) |
-D__USE_PAT | Use with CRAY-PAT profiling |
-D__HMD | |
-D__HPM | |
-D_USE_GA | Use Global Arrays Toolkit |
Compiling Together With PLUMED v1.3
- Get version 1.3 of
plumed
from their svn repository - Unpack the
plumed-1.3
archive somewhere - Set the environment variable
$plumedir
to the root directory of the plumed distribution:export plumedir=/path/to/plumed-1.3
- Symbolic link the
plumed-1.3/patches/plumedpatch_cp2k.sh
into the CP2Ksrc
directory:ln -s $plumedir/patches/plumedpatch_cp2k.sh cp2k/src/
- run the
plumedpatch_cp2k
script with parameter-patch
:./plumedpatch.sh -patch
, it should create a subdirectorysrc-plumed
containing a number of cpp files and aplumed.inc
- compile cp2k and plumed together with (it is safer to run a distclean before compiling):
make plumed -j ARCH=… VERSION=popt PLUMED=yes
Tests
If CP2K compiled okay, you can run one of the test cases to try out the executable (most inputs in any of the cp2k/tests/*regtest*/
directories are tested on a daily basis).
cd /path/to/cp2k/cp2k/tests/QS/ /path/to/cp2k/cp2k/exe/YOURMACHNE/cp2k.sopt C.inp
systematic testing can be done following the description on regression testing.
Troubleshooting
- If things fail, take a break… have a look at section Options on language features and go back to section The arch File.
- If your compiler/machine is really special, it should not be too difficult to support it. Only
cp2k/src/machine*.F
(and possiblycp2k/src/dbcsr_lib/machine.F
) should be affected.